Re: parquet repartitions and parquet.enable.summary-metadata does not work

Cheng Lian Mon, 11 Jan 2016 16:32:10 -0800

Hey Gavin,

Could you please provide a snippet of your code to show how did youdisabled "parquet.enable.summary-metadata" and wrote the files?Especially, you mentioned you saw "3000 jobs" failed. Were you writingeach Parquet file with an individual job? (Usually people usewrite.partitionBy(...).parquet(...) to write multiple Parquet files.)


Cheng

On 1/10/16 10:12 PM, Gavin Yue wrote:

Hey,
I am trying to convert a bunch of json files into parquet, which wouldoutput over 7000 parquet files. But tthere are too many files, so Iwant to repartition based on id to 3000.
But I got the error of GC problem like this one:https://mail-archives.apache.org/mod_mbox/spark-user/201512.mbox/%3CCAB4bC7_LR2rpHceQw3vyJ=l6xq9+9sjl3wgiispzyfh2xmt...@mail.gmail.com%3E#archives
So I set parquet.enable.summary-metadata to false. But when Iwrite.parquet, I could still see the 3000 jobs run after the writingparquet and they failed due to GC.
Basically repartition never succeeded for me. Is there any othersettings which could be optimized?
Thanks,
Gavin



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: parquet repartitions and parquet.enable.summary-metadata does not work

Reply via email to