Normally the* _temporary* directory gets deleted as part of the cleanup
when the write is complete and a SUCCESS file is created. I suspect that
the writes are not properly completed. How are you specifying the write ?
Any error messages in the logs ?

On Thu, Aug 10, 2017 at 3:17 AM, Hemanth Gudela <hemanth.gud...@qvantel.com>
wrote:

> Hi,
>
>
>
> I’m running spark on cluster mode containing 4 nodes, and trying to write
> CSV files to node’s local path (*not HDFS*).
>
> I’m spark.write.csv to write CSV files.
>
>
>
> *On master node*:
>
> spark.write.csv creates a folder with csv file name and writes many files
> with part-r-000n suffix. This is okay for me, I can merge them later.
>
> *But on worker nodes*:
>
>                 spark.write.csv creates a folder with csv file name and
> writes many folders and files under _temporary/0/. This is not okay for me.
>
> Could someone please suggest me what could have been going wrong in my
> settings/how to be able to write csv files to the specified folder, and not
> to subfolders (_temporary/0/task_xxx) in worker machines.
>
>
>
> Thank you,
>
> Hemanth
>
>
>



-- 
http://www.femibyte.com/twiki5/bin/view/Tech/
http://www.nextmatrix.com
"Great spirits have always encountered violent opposition from mediocre
minds." - Albert Einstein.

Reply via email to