?????? Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Sea
Thanks, Yin Huai I work it out. I use JDK1.7 to build Spark 1.4.0, but my yarn cluster run on JDK1.6. But java.version in pom.xml in 1.6 and the exception makes me confused -- -- ??: "Yin Huai";; : 2015??6??18??(??) 11:19 ??: "Se

Re: Increase partition count (repartition) without shuffle

2015-06-18 Thread Mridul Muralidharan
If you can scan input twice, you can of course do per partition count and build custom RDD which can reparation without shuffle. But nothing off the shelf as Sandy mentioned. Regards Mridul On Thursday, June 18, 2015, Sandy Ryza wrote: > Hi Alexander, > > There is currently no way to create an

Re: Increase partition count (repartition) without shuffle

2015-06-18 Thread Sandy Ryza
Hi Alexander, There is currently no way to create an RDD with more partitions than its parent RDD without causing a shuffle. However, if the files are splittable, you can set the Hadoop configurations that control split size to something smaller so that the HadoopRDD ends up with more partitions.

Increase partition count (repartition) without shuffle

2015-06-18 Thread Ulanov, Alexander
Hi, Is there a way to increase the amount of partition of RDD without causing shuffle? I've found JIRA issue https://issues.apache.org/jira/browse/SPARK-5997 however there is no implementation yet. Just in case, I am reading data from ~300 big binary files, which results in 300 partitions, the

Re: Random Forest driver memory

2015-06-18 Thread Joseph Bradley
Hi Isca, Could you please give more details? Data size, model parameters, stack traces / logs, etc. to help get a better picture? Thanks, Joseph On Wed, Jun 17, 2015 at 9:56 AM, Isca Harmatz wrote: > hello, > > does anyone has any help on the issue? > > > Isca > > On Tue, Jun 16, 2015 at 7:45

Re: [mllib] Refactoring some spark.mllib model classes in Python not inheriting JavaModelWrapper

2015-06-18 Thread Xiangrui Meng
Hi Yu, Reducing the code complexity on the Python side is certainly what we want to see:) We didn't call Java directly in Python models because Java methods don't work inside RDD closures, e.g., rdd.map(lambda x: model.predict(x[1])) But I agree that for model save/load the implementation should

Latency between the RDD in Streaming

2015-06-18 Thread anshu shukla
Is there any fixed way to find among RDD in stream processing systems , in the Distributed set-up . -- Thanks & Regards, Anshu Shukla

Re: Sidebar: issues targeted for 1.4.0

2015-06-18 Thread Nicholas Chammas
> Given fixed time, adding more TODOs generally means other stuff has to be taken out for the release. If not, then it happens de facto anyway, which is worse than managing it on purpose. +1 to this. I wouldn't mind helping go through open issues on JIRA targeted for the next release around RC ti

Re: Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Yin Huai
Is it the full stack trace? On Thu, Jun 18, 2015 at 6:39 AM, Sea <261810...@qq.com> wrote: > Hi, all: > > I want to run spark sql on yarn(yarn-client), but ... I already set > "spark.yarn.jar" and "spark.jars" in conf/spark-defaults.conf. > > ./bin/spark-sql -f game.sql --executor-memory 2g --nu

Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Sea
Hi, all: I want to run spark sql on yarn(yarn-client), but ... I already set "spark.yarn.jar" and "spark.jars" in conf/spark-defaults.conf. ./bin/spark-sql -f game.sql --executor-memory 2g --num-executors 100 > game.txt Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spar

Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Sea
Hi, all: I want to run spark sql on yarn(yarn-client), but ... I already set "spark.yarn.jar" and "spark.jars" in conf/spark-defaults.conf. ./bin/spark-sql -f game.sql --executor-memory 2g --num-executors 100 > game.txt Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark

Re: Approximate rank-based statistics (median, 95-th percentile, etc.) for Spark

2015-06-18 Thread Nick Pentreath
If it's going into the DataFrame API (which it probably should rather than in RDD itself) - then it could become a UDT (similar to HyperLogLogUDT) which would mean it doesn't have to implement Serializable, as it appears that serialization is taken care of in the UDT def (e.g. https://github.com/ap

Re: Sidebar: issues targeted for 1.4.0

2015-06-18 Thread Sean Owen
I also like using Target Version meaningfully. It might be a little much to require no Target Version = X before starting an RC. I do think it's reasonable to not start the RC with Blockers open. And here we started the RC with almost 100 TODOs for 1.4.0, most of which did not get done. Not the en