[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-149215414 https://issues.apache.org/jira/browse/SPARK-11185 Note I do believe adding something like max task time would be very useful for debugging. I can do this same thing on MapReduce very easily and people coming from that world moving to Spark expect something very similar. I'm fine with any number of different metrics as stated in a previous post. For instance showing both median and max would be better and if they are hidable columns shouldn't make the page much busier. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148809195 (incidentally, I just realized this info is totally missing from the json, filed https://issues.apache.org/jira/browse/SPARK-11155) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148808343 Also jumping in late, but I agree with @andrewor14 , I think we should just change duration to (1), that would be the most useful. My vote is for (last task end) - (first task start). I see the argument for sum(task time) as well, not strongly opposed to it, but in that case it would definitely need to be renamed from duration, maybe "total cpu time"? I do see the case for having something to help diagnose skew, but I'm not sure "max task time" alone really helps much. I don't think there is one metric which is going to capture that plus the overall duration thats been discussed. If we only want one metric on the page, I'd vote for the new "duration" over max task time. I don't think max task time is really that useful in isolation. Its useful on the stage page b/c you've also got the distribution. it seems like you really want something like (max task time - 90% task time)/ (90% task time). But we can probably spend all day arguing about our favorite skew metric ... makes me wonder if this really belongs in the standard UI or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148843534 I see... is the JIRA title / description outdated then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148842285 It seems that there are two separate issues here: (1) the stage duration is misleading, and (2) there's no way to see from the "All Stages Page" which stages took longer than others. So can't we just fix the stage duration semantics for (1) and add that as a column in the "All Stages Page" to fix (2)? I still don't fully see why we need to introduce a completely new metric (e.g. max task time). Is it only to avoid changing the semantics of the existing stage duration? > It seems like as long as its labelled properly it shouldn't be more confusing. My concern is not so much that it will confuse the user as it will make the UI less user friendly. If we keep the existing stage `Duration` and add this `Max task time`, then we might end up with two disagreeing the user has to reconcile. We can add tooltips but these don't make the UI any less cluttered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148842974 @andrewor14 "Duration" already is a column in the "All Stages Page". The problem @tgravescs is pointing out is that it's not the right metric for his use case ((1) is a separate problem from this) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148844132 Yeah @tgravescs would you mind filing a new JIRA for the second issue you brought up (that you want a new metric in the all stages page), so we can keep that separate from the duration-is-misleading issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148715800 I'm also a bit hesitant to change Duration but if everyone agrees its a bug we can change it. The question is it supposed to time taken across all tasks or is it time from first task launch to time from last task end? Generally when I think of duration I think of the first take launch time to last take end time. We should also perhaps add tool tips to define it. @andrewor14 its not making the duration field less confusing its providing more information so you can easily find stages that exhibit certain things. I think both of these can be useful pieces of information for different reasons. It seems like as long as its labelled properly it shouldn't be more confusing. If you click on the stage page you have a max task time. You also already have a tasks: Succeeded/Total. So there are already Task related things on that page. Perhaps label it with: "Max Task Duration" or "Tasks: Max Duration". We could use Task Time but Duration keeps wording the same. Then add tooltip to describe if needed. I'm also fine with pulling out a few others like average and make them hideable. I think anything at this level makes it easier to debug things across stages but like @kayousterhout said we also don't want to clutter it up.Personally I don't like having to click on hundreds of Stage pages to get that information. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148576976 @andrewor14 did you see @tgravescs 's comment saying that (1) and (2) don't really help his use case? He wants an easy way to spot stages that took a lot of resources, and stage service time isn't a good measure of that -- since service time can be small if there was a lot of parallelism, or large if there wasn't much parallelism (although I do agree that the current duration is misleading, but fixing that won't solve this issue). I'd slightly prefer total task time (i.e., sum of times across all tasks) or an option where we show average and max task time, because just showing the max seems a little weird to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148552339 Chiming in a little late here. I actually like (1) in @pwendell's [comment](https://github.com/apache/spark/pull/9051#issuecomment-147493632), where we fix the semantics of Duration. IMO this is a bug; most people understand "duration" of a stage to be the time it took to actually run the stage, excluding the waiting time. My concern with (2) is that any term we choose adds more confusion; what does "service time" or "active time" mean? Why is there a "service time" AND a duration? I also don't see how introducing a max task time makes the stage duration any less confusing. I actually think it's even worse since now the user has to reconcile two metrics that don't agree with each other. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148471371 @d2r can we get a screenshot here to help visualize the change? It'd be great to do that for all UI changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-148084696 Ping... thoughts on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147439789 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43568/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147439696 [Test build #43568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43568/console) for PR 9051 at commit [`5588157`](https://github.com/apache/spark/commit/5588157c2ef149bb8017ebe52d3fc695c8e4200e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147439788 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147394979 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147394955 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147394925 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147396129 [Test build #43567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43567/consoleFull) for PR 9051 at commit [`5588157`](https://github.com/apache/spark/commit/5588157c2ef149bb8017ebe52d3fc695c8e4200e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147401247 [Test build #43567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43567/console) for PR 9051 at commit [`5588157`](https://github.com/apache/spark/commit/5588157c2ef149bb8017ebe52d3fc695c8e4200e). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147401311 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43567/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147402862 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147401309 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147403468 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147403438 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147404975 [Test build #43568 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43568/consoleFull) for PR 9051 at commit [`5588157`](https://github.com/apache/spark/commit/5588157c2ef149bb8017ebe52d3fc695c8e4200e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147475887 Jenkins, this is okay to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/9051#discussion_r41787102 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala --- @@ -150,6 +152,10 @@ private[ui] class StageTableBase( } val formattedDuration = duration.map(d => UIUtils.formatDuration(d)).getOrElse("Unknown") +val maxTaskDuration = stageData.taskData.values.toSeq +.map(d => d.taskInfo.duration).max --- End diff -- this need to check if the task is finished first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147489092 @vanzin @JoshRosen just wanted to check to get your input to if this would be useful to you or if you had other ideas? I basically want some other indicator that shows me the slow tasks/stages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147493632 I see the underlying problem posed in the JIRA - it's difficult to assess duration since it currently includes the time spent waiting on dependent stages. However, this patch doesn't seem like the obvious way to fix that. I think there are some alternatives that would make more sense: 1. Re-define duration so that it's only defined starting when the first task in a stage launches (some concerns here about changing semantics, though). 2. Add a new field that represents the time spent servicing the stage "service time" (?) 3. Add a new field that represents the time spent queuing before any tasks launched "queue time" (?) Those all seem better ways to address the issue in the JIRA. This way of showing the max task time, it seems indirect. And also not always helpful since max task time doesn't have a simple relationship with "duration" as desired here... for instance the max task could be pretty short but the duration is anyways really long for the stage. /cc @rxin @kayousterhout for any thoughts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147495618 I like the idea of adding 2 ("active time"?). Then users can infer 3 from this. I agree that the metric added here is misleading. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147498736 Or perhaps we need multiple columns added to cover both or if you have better ideas on to show I'm definitely open to suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147509387 So the stages page already has the # of tasks per stage and the Duration so personally if I saw a large duration with many tasks and a small max task duration I wouldn't look at it for certain set of issues. It might make me look at it for changing parallelism or # of executors. The "active time" would be more useful here too where "active time" is first task start time to last task end time. The summary of all task time could be useful to but it still doesn't tell me if some set of tasks took a long time and others took very little. You could easily have 90% of tasks finish in a few seconds and the rest take a long time. Or it could mean that all tasks took a long time. Which might be interesting to but answers a different question then I wanted answered at the time. A couple of the reasons I want to see long task times are if a particular host is bad or if perhaps its getting skewed data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147514208 Ok that makes sense to me. I'm still a little concerned about cluterring up the main page with something that's not necessarily broadly useful...what about hiding this under an "additional metrics" drop down, similar to what we do on the task page? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147498589 So one thing I would I want from this is if I have 100's of stages I want to be able to quickly (I don't want to click on each individual stage) look and see which ones "took the longest" to see if there were problems or optimizations that could be made for those stages. "took the longest" can obviously have different meanings and different things could be useful. I don't want the wait time but the active time might not be what I want either as if its only running a few tasks out of thousands at a time the "active time" might be huge when each task only took very little time. That is why we were talking about the max task time because it could be an indicator that a certain node or task was having issues. I think the "active time" is better then duration and can tell you certain things but I don't think it tells me what I'm looking for here. How is the max task time here misleading? Its basically the same thing you get if you click on the stage and can see the min/25th/med/75th/max, just in an easier to view across stages. Note I don't think we should redefine duration since its more a backwards compatibility thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147499886 I was thinking it's misleading in the sense that a stage could have a small max task duration, but a huge number of tasks, such that its total time is still long. Is max task duration more useful for you than the total time taken across all tasks? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user d2r commented on a diff in the pull request: https://github.com/apache/spark/pull/9051#discussion_r41787721 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala --- @@ -150,6 +152,10 @@ private[ui] class StageTableBase( } val formattedDuration = duration.map(d => UIUtils.formatDuration(d)).getOrElse("Unknown") +val maxTaskDuration = stageData.taskData.values.toSeq +.map(d => d.taskInfo.duration).max --- End diff -- OK, I'll report the maximum of finished and unfinished tasks using logic similar to that of the stage duration calculation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147523442 I definitely agree with not cluttering up the page. I'm fine with addition metrics or something. So do we want to add 2 columns, 1 for the "active time" and one for max task time? @pwendell your based on above do you think max task time would be ok or still have concerns? Note there are probably other ways of showing this too, like showing median and variance or some sort of ratio of max over median. So if people have other ideas like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
GitHub user d2r opened a pull request: https://github.com/apache/spark/pull/9051 [SPARK-10930] Adds max task duration to all stages page You can merge this pull request into a Git repository by running: $ git pull https://github.com/d2r/spark spark-10930-ui-max-task-dur Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9051.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9051 commit 5588157c2ef149bb8017ebe52d3fc695c8e4200e Author: Derek DagitDate: 2015-10-09T21:48:31Z Adds max task duration to all stages page --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-146994822 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org