Need advice for Spark newbie

2015-02-25 Thread Vikram Kone
Hi, I'm a newbie when it comes to Spark and Hadoop eco system in general. Our team has been predominantly a Microsoft shop that uses MS stack for most of their BI needs. So we are talking SQL server for storing relational data and SQL Server Analysis services for building MOLAP cubes for sub-secon

Re: Google Summer of Code - ideas

2015-02-25 Thread Manoj Kumar
Hi, I think that would be really good. Are there any specific issues that are to be implemented as per priority?

Re: Using CUDA within Spark / boosting linear algebra

2015-02-25 Thread Joseph Bradley
Better documentation for linking would be very helpful! Here's a JIRA: https://issues.apache.org/jira/browse/SPARK-6019 On Wed, Feb 25, 2015 at 2:53 PM, Evan R. Sparks wrote: > Thanks for compiling all the data and running these benchmarks, Alex. The > big takeaways here can be seen with this

Re: Some praise and comments on Spark

2015-02-25 Thread Nicholas Chammas
Thanks for sharing the feedback about what works well for you! It's nice to get that; as we all probably know, people generally reach out only when they have problems. On Wed, Feb 25, 2015 at 5:38 PM Reynold Xin wrote: > Thanks for the email and encouragement, Devl. Responses to the 3 requests:

Re: Using CUDA within Spark / boosting linear algebra

2015-02-25 Thread Evan R. Sparks
Thanks for compiling all the data and running these benchmarks, Alex. The big takeaways here can be seen with this chart: https://docs.google.com/spreadsheets/d/1aRm2IADRfXQV7G2vrcVh4StF50uZHl6kmAJeaZZggr0/pubchart?oid=1899767119&format=interactive 1) A properly configured GPU matrix multiply impl

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-25 Thread Patrick Wendell
Hey All, Just a quick updated on this thread. Issues have continued to trickle in. Not all of them are blocker level but enough to warrant another RC: I've been keeping the JIRA dashboard up and running with the latest status (sorry, long link): https://issues.apache.org/jira/issues/?jql=project%

Re: Some praise and comments on Spark

2015-02-25 Thread Reynold Xin
Thanks for the email and encouragement, Devl. Responses to the 3 requests: -tonnes of configuration properties and "go faster" type flags. For example Hadoop and Hbase users will know that there are a whole catalogue of properties for regions, caches, network properties, block sizes, etc etc. Plea

Some praise and comments on Spark

2015-02-25 Thread Devl Devel
Hi Spark Developers, First, apologies if this doesn't belong on this list but the comments/praise are relevant to all developers. This is just a small note about what we really like about Spark, I/we don't mean to start a whole long discussion thread in this forum but just share our positive exper

Re: Have Friedman's glmnet algo running in Spark

2015-02-25 Thread Joseph Bradley
Some of this discussion seems valuable enough to preserve on the JIRA; can we move it there (and copy any relevant discussions from previous emails as needed)? On Wed, Feb 25, 2015 at 10:35 AM, wrote: > Hi Debasish, > Any method that generates point solutions to the minimization problem > could

RE: PySpark SPARK_CLASSPATH doesn't distribute jars to executors

2015-02-25 Thread Michael Nazario
Neither of those helped. I'm still getting a ClassNotFoundException on the workers. From: Denny Lee [denny.g@gmail.com] Sent: Tuesday, February 24, 2015 7:21 PM To: Michael Nazario; dev@spark.apache.org Subject: Re: PySpark SPARK_CLASSPATH doesn't distribute ja

Re: [jenkins infra -- pls read ] installing anaconda, moving default python from 2.6 -> 2.7

2015-02-25 Thread shane knapp
i'm going to punt on this until after the next spark 1.3 release (2-3 weeks?). since i'll be installing a bunch of other packages (including mongodb), i'd rather wait and be safe. :) the full install list is forthcoming, and i'll update the spark infra wiki w/what's installed on the workers. sh

Re: Have Friedman's glmnet algo running in Spark

2015-02-25 Thread mike
Hi Debasish, Any method that generates point solutions to the minimization problem could simply be run a number of times to generate the coefficient paths as a function of the penalty parameter. I think the only issues are how easy the method is to use and how much training and developer time i

Re: Help vote for Spark talks at the Hadoop Summit

2015-02-25 Thread Xiangrui Meng
Made 3 votes to each of the talks. Looking forward to see them in Hadoop Summit:) -Xiangrui On Tue, Feb 24, 2015 at 9:54 PM, Reynold Xin wrote: > Hi all, > > The Hadoop Summit uses community choice voting to decide which talks to > feature. It would be great if the community could help vote for S

Re: UnusedStubClass in 1.3.0-rc1

2015-02-25 Thread Cody Koeninger
I'm building with mvn -Phadoop-2.4 -DskipTests install Yeah, commenting out the unused dependency in the root pom.xml resolves it. I'm a little surprised that it cropped up now as well, I had built against multiple different snapshots of 1.3 over the past couple weeks with no problems. Is it

Re: UnusedStubClass in 1.3.0-rc1

2015-02-25 Thread Patrick Wendell
This has been around for multiple versions of Spark, so I am a bit surprised to see it not working in your build. - Patrick On Wed, Feb 25, 2015 at 9:41 AM, Patrick Wendell wrote: > Hey Cody, > > What build command are you using? In any case, we can actually comment > out the "unused" thing now

Re: UnusedStubClass in 1.3.0-rc1

2015-02-25 Thread Patrick Wendell
Hey Cody, What build command are you using? In any case, we can actually comment out the "unused" thing now in the root pom.xml. It existed just to ensure that at least one dependency was listed in the shade plugin configuration (otherwise, some work we do that requires the shade plugin does not h

Re: Have Friedman's glmnet algo running in Spark

2015-02-25 Thread Debasish Das
Any reason why the regularization path cannot be implemented using current owlqn pr ? We can change owlqn in breeze to fit your needs... On Feb 24, 2015 3:27 PM, "Joseph Bradley" wrote: > Hi Mike, > > I'm not aware of a "standard" big dataset, but there are a number > available: > * The YearPre

UnusedStubClass in 1.3.0-rc1

2015-02-25 Thread Cody Koeninger
So when building 1.3.0-rc1 I see the following warning: [WARNING] spark-streaming-kafka_2.10-1.3.0.jar, unused-1.0.0.jar define 1 overlappping classes: [WARNING] - org.apache.spark.unused.UnusedStubClass and when trying to build an assembly of a project that was previously using 1.3 snapshots