Normally the* _temporary* directory gets deleted as part of the cleanup when the write is complete and a SUCCESS file is created. I suspect that the writes are not properly completed. How are you specifying the write ? Any error messages in the logs ?
On Thu, Aug 10, 2017 at 3:17 AM, Hemanth Gudela <hemanth.gud...@qvantel.com> wrote: > Hi, > > > > I’m running spark on cluster mode containing 4 nodes, and trying to write > CSV files to node’s local path (*not HDFS*). > > I’m spark.write.csv to write CSV files. > > > > *On master node*: > > spark.write.csv creates a folder with csv file name and writes many files > with part-r-000n suffix. This is okay for me, I can merge them later. > > *But on worker nodes*: > > spark.write.csv creates a folder with csv file name and > writes many folders and files under _temporary/0/. This is not okay for me. > > Could someone please suggest me what could have been going wrong in my > settings/how to be able to write csv files to the specified folder, and not > to subfolders (_temporary/0/task_xxx) in worker machines. > > > > Thank you, > > Hemanth > > > -- http://www.femibyte.com/twiki5/bin/view/Tech/ http://www.nextmatrix.com "Great spirits have always encountered violent opposition from mediocre minds." - Albert Einstein.