[GitHub] spark pull request: SPARK-3172 and SPARK-3577

andrewor14 Thu, 25 Sep 2014 12:28:19 -0700

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2504#issuecomment-56870531
  
    Hey @sryza I haven't looked in detail at the changes but I have a few 
high-level comments. We only ever  spill during a shuffle read or shuffle 
write, so I think it actually makes sense to just add a field in each of 
`ShuffleWriteMetrics` and `ShuffleReadMetrics` that reports the number of bytes 
spilled ("Bytes Spilled") and the time spent spilling ("Spill Time" or 
something). The existing code groups the bytes spilled from both shuffle read 
and shuffle write into `ShuffleWriteMetrics`, which I think is slightly 
incorrect.
    
    As for the columns, I think it's OK to add more for these particular 
metrics. Eventually we'll have a mechanism for the user to toggle which columns 
he/she is interested in (through a series of checkboxes or something), and the 
default set could be a subset of all the columns. If you're worried about 
having too many columns, we could group the corresponding columns in 
`ShuffleReadMetrics` and `ShuffleWriteMetrics` and separate them with a slash 
or something.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

Reply via email to