Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/2504#issuecomment-56870531
Hey @sryza I haven't looked in detail at the changes but I have a few
high-level comments. We only ever spill during a shuffle read or shuffle
write, so I think it actually makes sense to just add a field in each of
`ShuffleWriteMetrics` and `ShuffleReadMetrics` that reports the number of bytes
spilled ("Bytes Spilled") and the time spent spilling ("Spill Time" or
something). The existing code groups the bytes spilled from both shuffle read
and shuffle write into `ShuffleWriteMetrics`, which I think is slightly
incorrect.
As for the columns, I think it's OK to add more for these particular
metrics. Eventually we'll have a mechanism for the user to toggle which columns
he/she is interested in (through a series of checkboxes or something), and the
default set could be a subset of all the columns. If you're worried about
having too many columns, we could group the corresponding columns in
`ShuffleReadMetrics` and `ShuffleWriteMetrics` and separate them with a slash
or something.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]