You're right, I think storageFraction is somewhat better to control this, although some things 'counted' in spark.memory.fraction will also be long-lived and in the OldGen. You can also increase the OldGen size if you're pretty sure that's the issue - 'old' objects in the YoungGen.
I'm not sure how much these will affect performance with modern JVMs; this advice is 5-9 years old. On Tue, Jul 20, 2021 at 5:39 PM Kuznetsov, Oleksandr <olkuznet...@deloitte.com.invalid> wrote: > Hello, > > > > I was reading the Garbage Collection Tuning guide here: Tuning - Spark > 3.1.2 Documentation (apache.org) > <https://spark.apache.org/docs/3.1.2/tuning.html#garbage-collection-tuning>, > specifically section on “Advanced GC Tuning”. It is stated that if OldGen > region is getting full, it is recommended to lower *spark.memory.fraction*. > I am wondering if this would lower the overall amount of memory available > for both storage and execution, slowing down execution. Isn’t it better to > lower *spark.memory.storageFraction* instead? In this case there is less > memory available for caching objects, while execution is not being > affected. Please see below the copy of the passage I am referring to: > > > > · “In the GC stats that are printed, if the OldGen is close to > being full, reduce the amount of memory used for caching by lowering > spark.memory.fraction; it is better to cache fewer objects than to slow > down task execution. Alternatively, consider decreasing the size of the > Young generation. This means lowering -Xmn if you’ve set it as above. If > not, try changing the value of the JVM’s NewRatio parameter. Many JVMs > default this to 2, meaning that the Old generation occupies 2/3 of the > heap. It should be large enough such that this fraction exceeds > spark.memory.fraction.” > > I would greatly appreciate if you could clarify it for me. > > >