Output files of saveAsText are getting stuck in temporary directory

Chirag Dewan Fri, 04 Sep 2015 02:11:14 -0700

Hi,

I have a 2 node Spark cluster and I am trying to read data from a Cassandra 
cluster and save the data as CSV file. Here is my code:


JavaRDD<String> mapPair = cachedRdd.map(new Function<CassandraRow, String>() {

                                                /**
                                                *
                                                 */
                                                private static final long 
serialVersionUID = 1L;

                                                @Override
                                                public String call(CassandraRow 
v1) throws Exception {

                                                                StringBuilder 
sb = new StringBuilder();
                                                                
sb.append(v1.getString(0));
                                                                sb.append(",");
                                                                
sb.append(v1.getBytes(1));
                                                                sb.append(",");
                                                                
sb.append(v1.getString(2));
                                                                sb.append(",");
                                                                
sb.append(v1.getString(3));
                                                                sb.append(",");
                                                                
sb.append(v1.getString(4));
                                                                sb.append(",");
                                                                
sb.append(v1.getString(5));
                                                                return 
sb.toString();
                                                }
                                });

JavaRDD<String> cachedRdd1 = mapPair.cache();

                JavaRDD<String> coalescedRdd = cachedRdd1.coalesce(1);
                
coalescedRdd.saveAsTextFile("file:///home/echidew/cassandra/test-100.txt");

                context.stop();

The problem is that part-00000 file is created with all the records in the 
_temporary/task-UUID folder. As I have read and understood this file should be 
stored at my output path and the temporary directory is deleted. Anything I 
need to change in my code or environment? What could be the reason for that?

Any help appreciated.

P.S : Posting only the relevant code. Sorry for the formatting.

Thanks,

Chirag

Output files of saveAsText are getting stuck in temporary directory

Reply via email to