Thanks, Yin Huai
I work it out.
I use JDK1.7 to build Spark 1.4.0, but my yarn cluster run on JDK1.6.
But java.version in pom.xml in 1.6 and the exception makes me confused
-- --
??: "Yin Huai";;
: 2015??6??18??(??) 11:19
??: "Se
If you can scan input twice, you can of course do per partition count and
build custom RDD which can reparation without shuffle.
But nothing off the shelf as Sandy mentioned.
Regards
Mridul
On Thursday, June 18, 2015, Sandy Ryza wrote:
> Hi Alexander,
>
> There is currently no way to create an
Hi Alexander,
There is currently no way to create an RDD with more partitions than its
parent RDD without causing a shuffle.
However, if the files are splittable, you can set the Hadoop configurations
that control split size to something smaller so that the HadoopRDD ends up
with more partitions.
Hi,
Is there a way to increase the amount of partition of RDD without causing
shuffle? I've found JIRA issue https://issues.apache.org/jira/browse/SPARK-5997
however there is no implementation yet.
Just in case, I am reading data from ~300 big binary files, which results in
300 partitions, the
Hi Isca,
Could you please give more details? Data size, model parameters, stack
traces / logs, etc. to help get a better picture?
Thanks,
Joseph
On Wed, Jun 17, 2015 at 9:56 AM, Isca Harmatz wrote:
> hello,
>
> does anyone has any help on the issue?
>
>
> Isca
>
> On Tue, Jun 16, 2015 at 7:45
Hi Yu,
Reducing the code complexity on the Python side is certainly what we
want to see:) We didn't call Java directly in Python models because
Java methods don't work inside RDD closures, e.g.,
rdd.map(lambda x: model.predict(x[1]))
But I agree that for model save/load the implementation should
Is there any fixed way to find among RDD in stream processing systems ,
in the Distributed set-up .
--
Thanks & Regards,
Anshu Shukla
> Given fixed time, adding more TODOs generally means other stuff has to be
taken
out for the release. If not, then it happens de facto anyway, which is
worse than managing it on purpose.
+1 to this.
I wouldn't mind helping go through open issues on JIRA targeted for the
next release around RC ti
Is it the full stack trace?
On Thu, Jun 18, 2015 at 6:39 AM, Sea <261810...@qq.com> wrote:
> Hi, all:
>
> I want to run spark sql on yarn(yarn-client), but ... I already set
> "spark.yarn.jar" and "spark.jars" in conf/spark-defaults.conf.
>
> ./bin/spark-sql -f game.sql --executor-memory 2g --nu
Hi, all:
I want to run spark sql on yarn(yarn-client), but ... I already set
"spark.yarn.jar" and "spark.jars" in conf/spark-defaults.conf.
./bin/spark-sql -f game.sql --executor-memory 2g --num-executors 100 > game.txt
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spar
Hi, all:
I want to run spark sql on yarn(yarn-client), but ... I already set
"spark.yarn.jar" and "spark.jars" in conf/spark-defaults.conf.
./bin/spark-sql -f game.sql --executor-memory 2g --num-executors 100 > game.txt
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark
If it's going into the DataFrame API (which it probably should rather than
in RDD itself) - then it could become a UDT (similar to HyperLogLogUDT)
which would mean it doesn't have to implement Serializable, as it appears
that serialization is taken care of in the UDT def (e.g.
https://github.com/ap
I also like using Target Version meaningfully. It might be a little
much to require no Target Version = X before starting an RC. I do
think it's reasonable to not start the RC with Blockers open.
And here we started the RC with almost 100 TODOs for 1.4.0, most of
which did not get done. Not the en
13 matches
Mail list logo