We experienced similar problem when implementing LDA on Spark. Now we call
RDD.checkpoint every 10 iterations to cut the lineage DAG. Notice that
checkpointing hurts performance since it submits a job to write HDFS.
On Tue, Jan 28, 2014 at 5:15 PM, Qiuzhuang Lian qiuzhuang.l...@gmail.comwrote:
Hi Patrick,
The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1006/
I was going to import this rc5 release into our internal Maven repo to
try it out, but noticed that the version doesn't have rc5 in it.
This means that,
Hi Tathagata,
This code that you have sent me is it a scala code?
yourDStream.foreachRDD(rdd = {
// Get and print first n elements
val firstN = rdd.take(n)
println(First N elements = + firstN)
// Count the number of elements in each batch
println(RDD has + rdd.count() +
Yes, it was my intention to write scala code. But I may have failed to
write a correct one that compiles. Apologies.
Also, something to keep in mind. This is the dev mailing for Spark
developers. Questions related to using Spark should be sent to
u...@spark.incubator.apache.org
TD
2014/1/28
I'll add my own +1.
On Tue, Jan 28, 2014 at 12:45 PM, Patrick Wendell pwend...@gmail.com wrote:
Hey Stephen,
Yes this runs afoul of good practice in Maven where a given version
shouldn't be re-used. As far as I understand though, it is required by
the way the Apache release process works.
As far as I understand though, it is required by the way the Apache release
process works.
Okay, that makes sense. I was assuming that was the restriction.
I was thinking as a work around that maybe we could publish a second
set of staging artifacts that are versioned with -rcX for people
Hi Tathagata, doesn't worry I am looking for a manner in the source code of
JavaNetworkWordcount print me in console the sum of the total of words in a
file, not one word by line.
Thanks
Il giorno 28 gennaio 2014 22:36, Tathagata Das
tathagata.das1...@gmail.comha scritto:
Yes, it was my
Something like maybe. From this example -
https://github.com/tdas/incubator-spark/blob/recoverable-example-fix/examples/src/main/java/org/apache/spark/streaming/examples/JavaRecoverableWordCount.java
wordCounts.foreachRDD(new Function2JavaPairRDDString,
Integer, Time, Void() {
By the way, is there any plan to make a pluggable backend for
checkpointing? We might be interested in writing a, for example,
Cassandra backend.
On Sat, Jan 25, 2014 at 9:49 PM, Xia, Junluan junluan@intel.com wrote:
Hi all
The description about this Bug submitted by Matei is as
That would be great to add. Right now it would be easy to change it to use
another Hadoop FileSystem implementation at the very least (I think you can
just pass the URL for that), but for Cassandra you’d have to use a different
InputFormat or some direct Cassandra access API.
Matei
On Jan 28,
10 matches
Mail list logo