Re: Writing to Parquet Job turns to wait mode after even completion of job

Cheng Lian Fri, 21 Oct 2016 15:12:06 -0700

What version of Spark are you using and how many output files does thejob writes out?

By default, Spark versions before 1.6 (not including) writes Parquetsummary files when committing the job. This process reads footers fromall Parquet files in the destination directory and merges them together.This can be particularly bad if you are appending a small amount of datato a large existing Parquet dataset.

If that's the case, you may disable Parquet summary files by settingHadoop configuration " parquet.enable.summary-metadata" to false.


We've disabled it by default since 1.6.0

Cheng


On 10/21/16 1:47 PM, Chetan Khatri wrote:

Hello Spark Users,
I am writing around 10 GB of Processed Data to Parquet where having 1TB of HDD and 102 GB of RAM, 16 vCore machine on Google Cloud.
Every time, i write to parquet. it shows on Spark UI that stagessucceeded but on spark shell it hold context on wait mode for almost10 mins. then it clears broadcast, accumulator shared variables.
Can we sped up this thing ?

Thanks.

--
Yours Aye,
Chetan Khatri.
M.+91 76666 80574
Data Science Researcher
INDIA

Statement of Confidentiality
————————————————————————————
The contents of this e-mail message and any attachments areconfidential and are intended solely for addressee. The informationmay also be legally privileged. This transmission is sent in trust,for the sole purpose of delivery to the intended recipient. If youhave received this transmission in error, any use, reproduction ordissemination of this transmission is strictly prohibited. If you arenot the intended recipient, please immediately notify the sender byreply e-mail or phone and delete this message and its attachments, ifany.

Re: Writing to Parquet Job turns to wait mode after even completion of job

Reply via email to