Re: two calls of saveAsTextFile() have different results on the same RDD

2014-04-23 Thread randylu
i got it, thanks very much :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/two-calls-of-saveAsTextFile-have-different-results-on-the-same-RDD-tp4578p4655.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: two calls of saveAsTextFile() have different results on the same RDD

2014-04-23 Thread Cheng Lian
ROR")val r2 = text.map(_ split " ")val r3 = (r1 ++ r2).collect() >>> >>> Here the input file will be scanned twice unless you call .cache() on >>> text. So if your computation involves nondeterminism (e.g. random >>> number), you may get different

Re: two calls of saveAsTextFile() have different results on the same RDD

2014-04-23 Thread Cheng Lian
>> >> >> On Tue, Apr 22, 2014 at 11:30 AM, randylu wrote: >> >>> it's ok when i call doc_topic_dist.cache() firstly. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/two-calls-of-saveAsTextFile-have-different-results-on-the-same-RDD-tp4578p4580.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >> >> >

Re: two calls of saveAsTextFile() have different results on the same RDD

2014-04-23 Thread Mayur Rustagi
ache() firstly. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/two-calls-of-saveAsTextFile-have-different-results-on-the-same-RDD-tp4578p4580.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >

Re: two calls of saveAsTextFile() have different results on the same RDD

2014-04-23 Thread Cheng Lian
-- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/two-calls-of-saveAsTextFile-have-different-results-on-the-same-RDD-tp4578p4580.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: two calls of saveAsTextFile() have different results on the same RDD

2014-04-21 Thread randylu
it's ok when i call doc_topic_dist.cache() firstly. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/two-calls-of-saveAsTextFile-have-different-results-on-the-same-RDD-tp4578p4580.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

two calls of saveAsTextFile() have different results on the same RDD

2014-04-21 Thread randylu
(save_path) doc_topic_dist.coalesce(1, true).saveAsTextFile(save_path + "2") ... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/two-calls-of-saveAsTextFile-have-different-results-on-the-same-RDD-tp4578.html Sent from the Apache Spark User List mailing list archive at Nabble.com.