Re: Incorrect Maven Artifact Names

2015-01-14 Thread Marcelo Vanzin
Hi RJ, I think I remember noticing in the past that some Guava metadata ends up overwriting maven-generated metadata in the assembly's manifest. That's probably something we should fix if that still affects the build. That being said, this is probably happening because you're using install-file

Re: Incorrect Maven Artifact Names

2015-01-14 Thread Sean Owen
Guava is shaded, although one class is left in its original package. This shouldn't have anything to do with Spark's package or namespace though. What are you saying is in com/google/guava? You can un-skip the install plugin with -Dmaven.install.skip=false On Wed, Jan 14, 2015 at 7:26 PM, RJ

Re: K-Means And Class Tags

2015-01-14 Thread Joseph Bradley
(After asking around,) retag() is private[spark] in Scala, but Java ignores the private[X], making retag (unintentionally) public in Java. Currently, your solution of retagging from Java is the best hack I can think of. It may take a bit of engineering to create a proper fix for the long-term.

Re: Incorrect Maven Artifact Names

2015-01-14 Thread RJ Nowling
Hi Sean, I confirmed that if I take the Spark 1.2.0 release (a428c446), undo the guava PR [1], and use -Dmaven.install.skip=false with the workflow above, the problem is fixed. RJ [1]

Re: Incorrect Maven Artifact Names

2015-01-14 Thread RJ Nowling
Thanks, Marcelo! I'll look into install vs install-file. What is the difference between pom and jar packaging? One of the challenges is that I have to satisfy Fedora / Red Hat packaging guidelines, which makes life a little more interesting. :) (e.g., RPMs should resolve against other RPMs

Re: Incorrect Maven Artifact Names

2015-01-14 Thread RJ Nowling
Thanks, Sean. Yes, Spark is incorrectly copying the spark assembly jar to com/google/guava in the maven repository. This is for the 1.2.0 release, just to clarify. I reverted the patches that shade Guava and removed the parts disabling the install plugin and it seemed to fix the issue. It

Re: Incorrect Maven Artifact Names

2015-01-14 Thread Marcelo Vanzin
On Wed, Jan 14, 2015 at 1:40 PM, RJ Nowling rnowl...@gmail.com wrote: What is the difference between pom and jar packaging? If you do an install on a pom packaging module, it will only install the module's pom file in the target repository. -- Marcelo

Re: DBSCAN for MLlib

2015-01-14 Thread Xiangrui Meng
Please find my comments on the JRIA page. -Xiangrui On Tue, Jan 13, 2015 at 1:49 PM, Muhammad Ali A'råby angelland...@yahoo.com.invalid wrote: I have to say, I have created a Jira task for it: [SPARK-5226] Add DBSCAN Clustering Algorithm to MLlib - ASF JIRA | | | | | | | | |

Re: SciSpark: NASA AIST14 proposal

2015-01-14 Thread Reynold Xin
Chris, This is really cool. Congratulations and thanks for sharing the news. On Wed, Jan 14, 2015 at 6:08 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Hi Spark Devs, Just wanted to FYI that I was funded on a 2 year NASA proposal to build out the concept of a

Re: SciSpark: NASA AIST14 proposal

2015-01-14 Thread RJ Nowling
Congratulations, Chris! I created a JIRA for dimensional RDDs that might be relevant: https://issues.apache.org/jira/browse/SPARK-4727 Jeremy Freeman pointed me to his lab's work on for neuroscience that have some related functionality : http://thefreemanlab.com/thunder/ On Wed, Jan 14, 2015 at

Re: SciSpark: NASA AIST14 proposal

2015-01-14 Thread Matei Zaharia
Yeah, very cool! You may also want to check out https://issues.apache.org/jira/browse/SPARK-5097 as something to build upon for these operations. Matei On Jan 14, 2015, at 6:18 PM, Reynold Xin r...@databricks.com wrote: Chris, This is really cool. Congratulations and thanks for sharing

SciSpark: NASA AIST14 proposal

2015-01-14 Thread Mattmann, Chris A (3980)
Hi Spark Devs, Just wanted to FYI that I was funded on a 2 year NASA proposal to build out the concept of a scientific RDD (create by space/time, and other operations) for use in some neat climate related NASA use cases. http://esto.nasa.gov/files/solicitations/AIST_14/ROSES2014_AIST_A41_awards.

Spark SQL API changes and stabilization

2015-01-14 Thread Reynold Xin
Hi Spark devs, Given the growing number of developers that are building on Spark SQL, we would like to stabilize the API in 1.3 so users and developers can be confident to build on it. This also gives us a chance to improve the API. In particular, we are proposing the following major changes.

Re: SciSpark: NASA AIST14 proposal

2015-01-14 Thread Aniket
Hi Chris This is super cool. I was wondering if this would be an open source project so that people can contribute or reuse? Thanks, Aniket On Thu Jan 15 2015 at 07:39:29 Mattmann, Chris A (3980) [via Apache Spark Developers List] ml-node+s1001551n10115...@n3.nabble.com wrote: Hi Spark Devs,

SparkSpark-perf terasort WIP branch

2015-01-14 Thread Ewan Higgs
Hi all, I'm trying to build the Spark-perf WIP code but there are some errors to do with Hadoop APIs. I presume this is because there is some Hadoop version set and it's referring to that. But I can't seem to find it. The errors are as follows: [info] Compiling 15 Scala sources and 2 Java