Github user squito commented on the pull request:
https://github.com/apache/spark/pull/9051#issuecomment-148808343
Also jumping in late, but I agree with @andrewor14 , I think we should just
change duration to (1), that would be the most useful. My vote is for (last
task end) - (first task start). I see the argument for sum(task time) as well,
not strongly opposed to it, but in that case it would definitely need to be
renamed from duration, maybe "total cpu time"?
I do see the case for having something to help diagnose skew, but I'm not
sure "max task time" alone really helps much. I don't think there is one
metric which is going to capture that plus the overall duration thats been
discussed. If we only want one metric on the page, I'd vote for the new
"duration" over max task time. I don't think max task time is really that
useful in isolation. Its useful on the stage page b/c you've also got the
distribution. it seems like you really want something like (max task time -
90% task time)/ (90% task time). But we can probably spend all day arguing
about our favorite skew metric ... makes me wonder if this really belongs in
the standard UI or not.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]