Hi, I have a 2 node Spark cluster and I am trying to read data from a Cassandra cluster and save the data as CSV file. Here is my code:
JavaRDD<String> mapPair = cachedRdd.map(new Function<CassandraRow, String>() { /** * */ private static final long serialVersionUID = 1L; @Override public String call(CassandraRow v1) throws Exception { StringBuilder sb = new StringBuilder(); sb.append(v1.getString(0)); sb.append(","); sb.append(v1.getBytes(1)); sb.append(","); sb.append(v1.getString(2)); sb.append(","); sb.append(v1.getString(3)); sb.append(","); sb.append(v1.getString(4)); sb.append(","); sb.append(v1.getString(5)); return sb.toString(); } }); JavaRDD<String> cachedRdd1 = mapPair.cache(); JavaRDD<String> coalescedRdd = cachedRdd1.coalesce(1); coalescedRdd.saveAsTextFile("file:///home/echidew/cassandra/test-100.txt"); context.stop(); The problem is that part-00000 file is created with all the records in the _temporary/task-UUID folder. As I have read and understood this file should be stored at my output path and the temporary directory is deleted. Anything I need to change in my code or environment? What could be the reason for that? Any help appreciated. P.S : Posting only the relevant code. Sorry for the formatting. Thanks, Chirag