GitHub user sryza opened a pull request:

    https://github.com/apache/spark/pull/2504

    SPARK-3172 and SPARK-3577

    The posted patch addresses both SPARK-3172 and SPARK-3577.  It renames 
ShuffleWriteMetrics to WriteMetrics and uses it for tracking all three of 
shuffle write, spilling on the fetch side, and spilling on the write side 
(which only occurs during sort-based shuffle).
    
    I'll fix and add tests if people think restructuring the metrics in this 
way makes sense.
    
    I'm a little unsure about the name shuffleReadSpillMetrics, as spilling 
happens during aggregation, not read, but I had trouble coming up with 
something better.
    
    I'm also unsure on what the most useful columns would be to display in the 
UI - I remember some pushback on adding new columns.  Ultimately these metrics 
will be most helpful if they can inform users whether and how much they need to 
increase the number of partitions / increase spark.shuffle.memoryFraction.  
Reporting spill time informs users whether spilling is a significant impacting 
performance.  Reporting memory size can help with understanding how much needs 
to be done to avoid spilling.
    
    @pwendell any thoughts on this?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sryza/spark sandy-spark-3172

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2504.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2504
    
----
commit c854514d81b4830ce1f1109662a713c51e6c8023
Author: Sandy Ryza <[email protected]>
Date:   2014-09-23T05:58:18Z

    SPARK-3172 and SPARK-3577

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to