Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14732#discussion_r75722253
  
    --- Diff: docs/tuning.md ---
    @@ -217,14 +204,22 @@ temporary objects created during task execution. Some 
steps which may be useful
     * Check if there are too many garbage collections by collecting GC stats. 
If a full GC is invoked multiple times for
       before a task completes, it means that there isn't enough memory 
available for executing tasks.
     
    -* In the GC stats that are printed, if the OldGen is close to being full, 
reduce the amount of
    -  memory used for caching by lowering `spark.memory.storageFraction`; it 
is better to cache fewer
    -  objects than to slow down task execution!
    -
     * If there are too many minor collections but not many major GCs, 
allocating more memory for Eden would help. You
       can set the size of the Eden to be an over-estimate of how much memory 
each task will need. If the size of Eden
       is determined to be `E`, then you can set the size of the Young 
generation using the option `-Xmn=4/3*E`. (The scaling
       up by 4/3 is to account for space used by survivor regions as well.)
    +  
    +* In the GC stats that are printed, if the OldGen is close to being full, 
reduce the amount of
    +  memory used for caching by lowering `spark.memory.fraction`; it is 
better to cache fewer
    +  objects than to slow down task execution. Alternatively, consider 
decreasing the size of
    +  the Young generation. This means lowering `-Xmn` if you've set it as 
above. If not, try changing the 
    +  value of the JVM's `NewRatio` parameter. Many JVMs default this to 2, 
meaning that the Old generation 
    +  occupies 2/3 of the heap. It should be large enough such that this 
fraction exceeds `spark.memory.fraction`.
    +  
    +* Try the G1GC garbage collector with `-XX:+UseG1GC`. It can improve 
performance in some situations where
    +  garbage collection is a bottleneck. Note that with large executor heap 
sizes, it may be important to
    +  increase the [G1 region 
size](https://blogs.oracle.com/g1gc/entry/g1_gc_tuning_a_case) 
    +  with `-XX:G1HeapRegionSize`
    --- End diff --
    
    Do we need to keep the following paragraph?
    ```
    So, by default, the tenured generation occupies 2/3 or about 0.66 of the 
heap. A value of
    0.6 for `spark.memory.fraction` keeps storage and execution memory within 
the old generation with
    room to spare. If `spark.memory.fraction` is increased to, say, 0.8, then 
`NewRatio` may have to
    increase to 6 or more.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to