You're right, I think storageFraction is somewhat better to control this,
although some things 'counted' in spark.memory.fraction will also be
long-lived and in the OldGen.
You can also increase the OldGen size if you're pretty sure that's the
issue - 'old' objects in the YoungGen.

I'm not sure how much these will affect performance with modern JVMs; this
advice is 5-9 years old.

On Tue, Jul 20, 2021 at 5:39 PM Kuznetsov, Oleksandr
<olkuznet...@deloitte.com.invalid> wrote:

> Hello,
>
>
>
> I was reading the Garbage Collection Tuning guide here: Tuning - Spark
> 3.1.2 Documentation (apache.org)
> <https://spark.apache.org/docs/3.1.2/tuning.html#garbage-collection-tuning>,
> specifically section on “Advanced GC Tuning”. It is stated that if OldGen
> region is getting full, it is recommended to lower *spark.memory.fraction*.
> I am wondering if this would lower the overall amount of memory available
> for both storage and execution, slowing down execution. Isn’t it better to
> lower *spark.memory.storageFraction* instead?  In this case there is less
> memory available for caching objects, while execution is not being
> affected. Please see below the copy of the passage I am referring to:
>
>
>
> ·       “In the GC stats that are printed, if the OldGen is close to
> being full, reduce the amount of memory used for caching by lowering
> spark.memory.fraction; it is better to cache fewer objects than to slow
> down task execution. Alternatively, consider decreasing the size of the
> Young generation. This means lowering -Xmn if you’ve set it as above. If
> not, try changing the value of the JVM’s NewRatio parameter. Many JVMs
> default this to 2, meaning that the Old generation occupies 2/3 of the
> heap. It should be large enough such that this fraction exceeds
> spark.memory.fraction.”
>
> I would greatly appreciate if you could clarify it for me.
>
>
>

Reply via email to