task_xxx folder on worker nodes

Hemanth Gudela Thu, 10 Aug 2017 01:15:04 -0700

Yes, I have tried with file:/// and the fullpath, as well as just the full path 
without file:/// prefix.
Spark session has been closed, no luck though ☹

Regards,
Hemanth

From: Femi Anthony <femib...@gmail.com>
Date: Thursday, 10 August 2017 at 11.06
To: Hemanth Gudela <hemanth.gud...@qvantel.com>
Cc: "user@spark.apache.org" <user@spark.apache.org>
Subject: Re: spark.write.csv is not able write files to specified path, but is 
writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

Is your filePath prefaced with file:/// and the full path or is it relative ?

You might also try calling close() on the Spark context or session the end of 
the program execution to try and ensure that cleanup is completed

Sent from my iPhone

On Aug 10, 2017, at 3:58 AM, Hemanth Gudela 
<hemanth.gud...@qvantel.com<mailto:hemanth.gud...@qvantel.com>> wrote:
Thanks for reply Femi!

I’m writing the file like this --> 
myDataFrame.write.mode("overwrite").csv("myFilePath")
There absolutely are no errors/warnings after the write.

_SUCCESS file is created on master node, but the problem of _temporary is 
noticed only on worked nodes.

I know spark.write.csv works best with HDFS, but with the current setup I have 
in my environment, I have to deal with spark write to node’s local file system 
and not to HDFS.

Regards,
Hemanth

From: Femi Anthony <femib...@gmail.com<mailto:femib...@gmail.com>>
Date: Thursday, 10 August 2017 at 10.38
To: Hemanth Gudela 
<hemanth.gud...@qvantel.com<mailto:hemanth.gud...@qvantel.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: spark.write.csv is not able write files to specified path, but is 
writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

Normally the _temporary directory gets deleted as part of the cleanup when the 
write is complete and a SUCCESS file is created. I suspect that the writes are 
not properly completed. How are you specifying the write ? Any error messages 
in the logs ?

On Thu, Aug 10, 2017 at 3:17 AM, Hemanth Gudela 
<hemanth.gud...@qvantel.com<mailto:hemanth.gud...@qvantel.com>> wrote:
Hi,

I’m running spark on cluster mode containing 4 nodes, and trying to write CSV 
files to node’s local path (not HDFS).
I’m spark.write.csv to write CSV files.

On master node:
spark.write.csv creates a folder with csv file name and writes many files with 
part-r-000n suffix. This is okay for me, I can merge them later.
But on worker nodes:
                spark.write.csv creates a folder with csv file name and writes 
many folders and files under _temporary/0/. This is not okay for me.
Could someone please suggest me what could have been going wrong in my 
settings/how to be able to write csv files to the specified folder, and not to 
subfolders (_temporary/0/task_xxx) in worker machines.

Thank you,
Hemanth

--
http://www.femibyte.com/twiki5/bin/view/Tech/
http://www.nextmatrix.com
"Great spirits have always encountered violent opposition from mediocre minds." 
- Albert Einstein.

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

Reply via email to