Re: Any suggestion about JIRA 1006 MLlib ALS gets stack overflow with too many iterations?

2014-01-28 Thread Cheng Lian
We experienced similar problem when implementing LDA on Spark. Now we call RDD.checkpoint every 10 iterations to cut the lineage DAG. Notice that checkpointing hurts performance since it submits a job to write HDFS. On Tue, Jan 28, 2014 at 5:15 PM, Qiuzhuang Lian qiuzhuang.l...@gmail.comwrote:

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-28 Thread Stephen Haberman
Hi Patrick, The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1006/ I was going to import this rc5 release into our internal Maven repo to try it out, but noticed that the version doesn't have rc5 in it. This means that,

Re: Print in JavaNetworkWordCount

2014-01-28 Thread Eduardo Costa Alfaia
Hi Tathagata, This code that you have sent me is it a scala code? yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch println(RDD has + rdd.count() +

Re: Print in JavaNetworkWordCount

2014-01-28 Thread Tathagata Das
Yes, it was my intention to write scala code. But I may have failed to write a correct one that compiles. Apologies. Also, something to keep in mind. This is the dev mailing for Spark developers. Questions related to using Spark should be sent to u...@spark.incubator.apache.org TD 2014/1/28

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-28 Thread Patrick Wendell
I'll add my own +1. On Tue, Jan 28, 2014 at 12:45 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Stephen, Yes this runs afoul of good practice in Maven where a given version shouldn't be re-used. As far as I understand though, it is required by the way the Apache release process works.

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-28 Thread Stephen Haberman
As far as I understand though, it is required by the way the Apache release process works. Okay, that makes sense. I was assuming that was the restriction. I was thinking as a work around that maybe we could publish a second set of staging artifacts that are versioned with -rcX for people

Re: Print in JavaNetworkWordCount

2014-01-28 Thread Eduardo Costa Alfaia
Hi Tathagata, doesn't worry I am looking for a manner in the source code of JavaNetworkWordcount print me in console the sum of the total of words in a file, not one word by line. Thanks Il giorno 28 gennaio 2014 22:36, Tathagata Das tathagata.das1...@gmail.comha scritto: Yes, it was my

Re: Print in JavaNetworkWordCount

2014-01-28 Thread Tathagata Das
Something like maybe. From this example - https://github.com/tdas/incubator-spark/blob/recoverable-example-fix/examples/src/main/java/org/apache/spark/streaming/examples/JavaRecoverableWordCount.java wordCounts.foreachRDD(new Function2JavaPairRDDString, Integer, Time, Void() {

Re: Any suggestion about JIRA 1006 MLlib ALS gets stack overflow with too many iterations?

2014-01-28 Thread Evan Chan
By the way, is there any plan to make a pluggable backend for checkpointing? We might be interested in writing a, for example, Cassandra backend. On Sat, Jan 25, 2014 at 9:49 PM, Xia, Junluan junluan@intel.com wrote: Hi all The description about this Bug submitted by Matei is as

Re: Any suggestion about JIRA 1006 MLlib ALS gets stack overflow with too many iterations?

2014-01-28 Thread Matei Zaharia
That would be great to add. Right now it would be easy to change it to use another Hadoop FileSystem implementation at the very least (I think you can just pass the URL for that), but for Cassandra you’d have to use a different InputFormat or some direct Cassandra access API. Matei On Jan 28,