I am getting strange behavior with the RDDs.

All I want is to persist the RDD contents in a single file. 

The saveAsTextFile() saves them in multiple textfiles for each partition. So
I tried with rdd.coalesce(1,true).saveAsTextFile(). This fails with the
exception :

org.apache.spark.SparkException: Job aborted: Task 75.0:0 failed 1 times
(most recent failure: Exception failure: java.lang.IllegalStateException:
unread block data) 

Then I tried collecting the RDD contents in an array, and writing the array
to the file manually. Again, that fails. It is giving me empty arrays, even
when data is there.

/**The below saves the data in multiple text files. So data is there for
sure **/
rdd.saveAsTextFile(resultDirectory)
/**The below simply prints size 0 for all the RDDs in a stream. Why ?! **/
val arr = rdd.collect
println("SIZE of RDD " + rdd.id + " " + arr.size)

Kindly help! I am clueless on how to proceed.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Collect-returns-empty-arrays-tp3242.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to