You can try this for G1GC:
.../spark-submit --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
-XX:+UseCompressedOops -XX:-UseGCOverheadLimit" ...

However, I would suggest ensuring your job is properly tuned. If you're
experiencing 60% GC in a task it's likely garbage collection is not the
problem, i.e., you may be doing something that is causing GC to thrash
around. Yes G1GC can give better GC performance, but this may not be the
root problem of your job. There is no magical-fix-all configuration. Based
on the info you've outlined, I would suggest looking at the results in your
UI after the combineByKey and first ensure the data is still well balanced
across all partitions.

HTH,
Duc



On Sat, Nov 14, 2015 at 11:03 PM, Renu Yadav <yren...@gmail.com> wrote:

> I have tried with G1 GC .Please if anyone can provide their setting for GC.
> At code level I am :
> 1.reading orc table usind dataframe
> 2.map df to rdd of my case class
> 3. changed that rdd to paired rdd
> 4.Applied combineByKey
> 5. saving the result to orc file
>
> Please suggest
>
> Regards,
> Renu Yadav
>
> On Fri, Nov 13, 2015 at 8:01 PM, Renu Yadav <yren...@gmail.com> wrote:
>
>> am using spark 1.4 and my application is taking much time in GC around
>> 60-70% of time for each task
>>
>> I am using parallel GC.
>> please help somebody as soon as possible.
>>
>> Thanks,
>> Renu
>>
>
>

Reply via email to