Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-11 Thread Sathish Kumaran Vairavelu
I think you can collect the results in driver through toLocalIterator method of RDD and save the result to the driver program; rather than writing it to the file on the local disk and collecting it separately. If your data is small enough and if you have enough cores/memory try processing

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-11 Thread Steve Loughran
On 10 Aug 2017, at 09:51, Hemanth Gudela > wrote: Yeah, installing HDFS in our environment is unfornutately going to take lot of time (approvals/planning etc). I will have to live with local FS for now. The other option I had

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
Yeah, installing HDFS in our environment is unfornutately going to take lot of time (approvals/planning etc). I will have to live with local FS for now. The other option I had already tried is collect() and send everything to driver node. But my data volume is too huge for driver node to handle

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Femi Anthony
Also, why are you trying to write results locally if you're not using a distributed file system ? Spark is geared towards writing to a distributed file system. I would suggest trying to collect() so the data is sent to the master and then do a write if the result set isn't too big, or

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
Yes, I have tried with file:/// and the fullpath, as well as just the full path without file:/// prefix. Spark session has been closed, no luck though ☹ Regards, Hemanth From: Femi Anthony Date: Thursday, 10 August 2017 at 11.06 To: Hemanth Gudela

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Femi Anthony
Is your filePath prefaced with file:/// and the full path or is it relative ? You might also try calling close() on the Spark context or session the end of the program execution to try and ensure that cleanup is completed Sent from my iPhone > On Aug 10, 2017, at 3:58 AM, Hemanth Gudela

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
Thanks for reply Femi! I’m writing the file like this --> myDataFrame.write.mode("overwrite").csv("myFilePath") There absolutely are no errors/warnings after the write. _SUCCESS file is created on master node, but the problem of _temporary is noticed only on worked nodes. I know

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Femi Anthony
Normally the* _temporary* directory gets deleted as part of the cleanup when the write is complete and a SUCCESS file is created. I suspect that the writes are not properly completed. How are you specifying the write ? Any error messages in the logs ? On Thu, Aug 10, 2017 at 3:17 AM, Hemanth

spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
Hi, I’m running spark on cluster mode containing 4 nodes, and trying to write CSV files to node’s local path (not HDFS). I’m spark.write.csv to write CSV files. On master node: spark.write.csv creates a folder with csv file name and writes many files with part-r-000n suffix. This is okay for