Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14732#discussion_r75727927
  
    --- Diff: docs/tuning.md ---
    @@ -217,14 +204,22 @@ temporary objects created during task execution. Some 
steps which may be useful
     * Check if there are too many garbage collections by collecting GC stats. 
If a full GC is invoked multiple times for
       before a task completes, it means that there isn't enough memory 
available for executing tasks.
     
    -* In the GC stats that are printed, if the OldGen is close to being full, 
reduce the amount of
    -  memory used for caching by lowering `spark.memory.storageFraction`; it 
is better to cache fewer
    -  objects than to slow down task execution!
    -
     * If there are too many minor collections but not many major GCs, 
allocating more memory for Eden would help. You
       can set the size of the Eden to be an over-estimate of how much memory 
each task will need. If the size of Eden
       is determined to be `E`, then you can set the size of the Young 
generation using the option `-Xmn=4/3*E`. (The scaling
       up by 4/3 is to account for space used by survivor regions as well.)
    +  
    +* In the GC stats that are printed, if the OldGen is close to being full, 
reduce the amount of
    +  memory used for caching by lowering `spark.memory.fraction`; it is 
better to cache fewer
    +  objects than to slow down task execution. Alternatively, consider 
decreasing the size of
    +  the Young generation. This means lowering `-Xmn` if you've set it as 
above. If not, try changing the 
    +  value of the JVM's `NewRatio` parameter. Many JVMs default this to 2, 
meaning that the Old generation 
    +  occupies 2/3 of the heap. It should be large enough such that this 
fraction exceeds `spark.memory.fraction`.
    --- End diff --
    
    I tried to retain all those ideas but reworded it, because the section 
where I moved it also contains some of this discussion. I believe the current 
discussion still captures the main idea, that an old generation nearly full of 
cached data indicates `spark.memory.fraction` (not just the fraction for 
storage) could be reduced. This section talks about `Xmn` and that does 
something similar to `NewRatio` so tried to weave them into one coherent 
paragraph.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to