Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/14732#discussion_r75722287
--- Diff: docs/tuning.md ---
@@ -217,14 +204,22 @@ temporary objects created during task execution. Some
steps which may be useful
* Check if there are too many garbage collections by collecting GC stats.
If a full GC is invoked multiple times for
before a task completes, it means that there isn't enough memory
available for executing tasks.
-* In the GC stats that are printed, if the OldGen is close to being full,
reduce the amount of
- memory used for caching by lowering `spark.memory.storageFraction`; it
is better to cache fewer
- objects than to slow down task execution!
-
* If there are too many minor collections but not many major GCs,
allocating more memory for Eden would help. You
can set the size of the Eden to be an over-estimate of how much memory
each task will need. If the size of Eden
is determined to be `E`, then you can set the size of the Young
generation using the option `-Xmn=4/3*E`. (The scaling
up by 4/3 is to account for space used by survivor regions as well.)
+
+* In the GC stats that are printed, if the OldGen is close to being full,
reduce the amount of
+ memory used for caching by lowering `spark.memory.fraction`; it is
better to cache fewer
+ objects than to slow down task execution. Alternatively, consider
decreasing the size of
+ the Young generation. This means lowering `-Xmn` if you've set it as
above. If not, try changing the
+ value of the JVM's `NewRatio` parameter. Many JVMs default this to 2,
meaning that the Old generation
+ occupies 2/3 of the heap. It should be large enough such that this
fraction exceeds `spark.memory.fraction`.
--- End diff --
Do we need to keep the following paragraph?
```
So, by default, the tenured generation occupies 2/3 or about 0.66 of the
heap. A value of
0.6 for `spark.memory.fraction` keeps storage and execution memory within
the old generation with
room to spare. If `spark.memory.fraction` is increased to, say, 0.8, then
`NewRatio` may have to
increase to 6 or more.
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]