[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-19 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-149215414
  
https://issues.apache.org/jira/browse/SPARK-11185

Note I do believe adding something like max task time would be very useful 
for debugging.  I can do this same thing on MapReduce very easily and people 
coming from that world moving to Spark expect something very similar.  I'm fine 
with any number of different metrics as stated in a previous post.  For 
instance showing both median and max would be better and if they are hidable 
columns shouldn't make the page much busier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-16 Thread squito
Github user squito commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148809195
  
(incidentally, I just realized this info is totally missing from the json, 
filed https://issues.apache.org/jira/browse/SPARK-11155)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-16 Thread squito
Github user squito commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148808343
  
Also jumping in late, but I agree with @andrewor14 , I think we should just 
change duration to (1), that would be the most useful.  My vote is for (last 
task end) - (first task start).  I see the argument for sum(task time) as well, 
not strongly opposed to it, but in that case it would definitely need to be 
renamed from duration, maybe "total cpu time"?

I do see the case for having something to help diagnose skew, but I'm not 
sure "max task time" alone really helps much.  I don't think there is one 
metric which is going to capture that plus the overall duration thats been 
discussed.  If we only want one metric on the page, I'd vote for the new 
"duration" over max task time.  I don't think max task time is really that 
useful in isolation.  Its useful on the stage page b/c you've also got the 
distribution.  it seems like you really want something like (max task time - 
90% task time)/ (90% task time).  But we can probably spend all day arguing 
about our favorite skew metric ... makes me wonder if this really belongs in 
the standard UI or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-16 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148843534
  
I see... is the JIRA title / description outdated then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-16 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148842285
  
It seems that there are two separate issues here: (1) the stage duration is 
misleading, and (2) there's no way to see from the "All Stages Page" which 
stages took longer than others. So can't we just fix the stage duration 
semantics for (1) and add that as a column in the "All Stages Page" to fix (2)? 
I still don't fully see why we need to introduce a completely new metric (e.g. 
max task time). Is it only to avoid changing the semantics of the existing 
stage duration?

> It seems like as long as its labelled properly it shouldn't be more 
confusing.

My concern is not so much that it will confuse the user as it will make the 
UI less user friendly. If we keep the existing stage `Duration` and add this 
`Max task time`, then we might end up with two disagreeing the user has to 
reconcile. We can add tooltips but these don't make the UI any less cluttered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-16 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148842974
  
@andrewor14 "Duration" already is a column in the "All Stages Page".  The 
problem @tgravescs is pointing out is that it's not the right metric for his 
use case ((1) is a separate problem from this)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-16 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148844132
  
Yeah @tgravescs would you mind filing a new JIRA for the second issue you 
brought up (that you want a new metric in the all stages page), so we can keep 
that separate from the duration-is-misleading issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-16 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148715800
  
I'm also a bit hesitant to change Duration but if everyone agrees its a bug 
we can change it.  The question is it supposed to time taken across all tasks 
or is it time from first task launch to time from last task end?  Generally 
when I think of duration I think of the first take launch time to last take end 
time.

We should also perhaps add tool tips to define it.

@andrewor14  its not making the duration field less confusing its providing 
more information so you can easily find stages that exhibit certain things.  I 
think both of these can be useful pieces of information for different reasons.
It seems like as long as its labelled properly it shouldn't be more 
confusing.  If you click on the stage page you have a max task time.  You also 
already have a tasks: Succeeded/Total. So there are already Task related things 
on that page.  Perhaps label it with:   "Max Task Duration"  or "Tasks: Max 
Duration".  We could use Task Time but Duration keeps wording the same.   Then 
add tooltip to describe if needed.  I'm also fine with pulling out a few others 
like average and make them hideable.  I think anything at this level makes it 
easier to debug things across stages but like @kayousterhout said we also don't 
want to clutter it up.Personally I don't like having to click on hundreds 
of Stage pages to get that information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-15 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148576976
  
@andrewor14 did you see @tgravescs 's comment saying that (1) and (2) don't 
really help his use case?  He wants an easy way to spot stages that took a lot 
of resources, and stage service time isn't a good measure of that -- since 
service time can be small if there was a lot of parallelism, or large if there 
wasn't much parallelism (although I do agree that the current duration is 
misleading, but fixing that won't solve this issue).

I'd slightly prefer total task time (i.e., sum of times across all tasks) 
or an option where we show average and max task time, because just showing the 
max seems a little weird to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-15 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148552339
  
Chiming in a little late here. I actually like (1) in @pwendell's 
[comment](https://github.com/apache/spark/pull/9051#issuecomment-147493632), 
where we fix the semantics of Duration. IMO this is a bug; most people 
understand "duration" of a stage to be the time it took to actually run the 
stage, excluding the waiting time. My concern with (2) is that any term we 
choose adds more confusion; what does "service time" or "active time" mean? Why 
is there a "service time" AND a duration?

I also don't see how introducing a max task time makes the stage duration 
any less confusing. I actually think it's even worse since now the user has to 
reconcile two metrics that don't agree with each other.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-15 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148471371
  
@d2r  can we get a screenshot here to help visualize the change? It'd be 
great to do that for all UI changes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-14 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-148084696
  
Ping... thoughts on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147439789
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43568/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147439696
  
  [Test build #43568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43568/console)
 for   PR 9051 at commit 
[`5588157`](https://github.com/apache/spark/commit/5588157c2ef149bb8017ebe52d3fc695c8e4200e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147439788
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147394979
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147394955
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147394925
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147396129
  
  [Test build #43567 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43567/consoleFull)
 for   PR 9051 at commit 
[`5588157`](https://github.com/apache/spark/commit/5588157c2ef149bb8017ebe52d3fc695c8e4200e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147401247
  
  [Test build #43567 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43567/console)
 for   PR 9051 at commit 
[`5588157`](https://github.com/apache/spark/commit/5588157c2ef149bb8017ebe52d3fc695c8e4200e).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147401311
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43567/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147402862
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147401309
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147403468
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147403438
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147404975
  
  [Test build #43568 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43568/consoleFull)
 for   PR 9051 at commit 
[`5588157`](https://github.com/apache/spark/commit/5588157c2ef149bb8017ebe52d3fc695c8e4200e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147475887
  
Jenkins, this is okay to test 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/9051#discussion_r41787102
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala ---
@@ -150,6 +152,10 @@ private[ui] class StageTableBase(
 }
 val formattedDuration = duration.map(d => 
UIUtils.formatDuration(d)).getOrElse("Unknown")
 
+val maxTaskDuration = stageData.taskData.values.toSeq
+.map(d => d.taskInfo.duration).max
--- End diff --

this need to check if the task is finished first.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147489092
  
@vanzin @JoshRosen  just wanted to check to get your input to if this would 
be useful to you or if you had other ideas?  I basically want some other 
indicator that shows me the slow tasks/stages.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147493632
  
I see the underlying problem posed in the JIRA - it's difficult to assess 
duration since it currently includes the time spent waiting on dependent 
stages. However, this patch doesn't seem like the obvious way to fix that. I 
think there are some alternatives that would make more sense:

1. Re-define duration so that it's only defined starting when the first 
task in a stage launches (some concerns here about changing semantics, though).
2. Add a new field that represents the time spent servicing the stage 
"service time" (?)
3. Add a new field that represents the time spent queuing before any tasks 
launched "queue time" (?)

Those all seem better ways to address the issue in the JIRA. This way of 
showing the max task time, it seems indirect. And also not always helpful since 
max task time doesn't have a simple relationship with "duration" as desired 
here... for instance the max task could be pretty short but the duration is 
anyways really long for the stage.

/cc @rxin @kayousterhout for any thoughts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147495618
  
I like the idea of adding 2 ("active time"?). Then users can infer 3 from 
this. I agree that the metric added here is misleading.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147498736
  
Or perhaps we need multiple columns added to cover both or if you have 
better ideas on to show I'm definitely open to suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147509387
  
So the stages page already has the # of tasks per stage and the Duration so 
personally if I saw a large duration with many tasks and a small max task 
duration I wouldn't look at it for certain set of issues.  It might make me 
look at it for changing parallelism or # of executors.  The "active time" would 
be more useful here too where "active time" is first task start time to last 
task end time.

The summary of all task time could be useful to but it still doesn't tell 
me if some set of tasks took a long time and others took very little.   You 
could easily have 90% of tasks finish in a few seconds and the rest take a long 
time.   Or it could mean that all tasks took a long time.  Which might be 
interesting to but answers a different question then I wanted answered at the 
time.

A couple of the reasons I want to see long task times are if a particular 
host is bad or if perhaps its getting skewed data.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147514208
  
Ok that makes sense to me. I'm still a little concerned about cluterring up 
the main page with something that's not necessarily broadly useful...what about 
hiding this under an "additional metrics" drop down, similar to what we do on 
the task page?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147498589
  
So one thing I would I want from this is if I have 100's of stages I want 
to be able to quickly (I don't want to click on each individual stage) look and 
see which ones "took the longest" to see if there were problems or 
optimizations that could be made for those stages.   "took the longest" can 
obviously have different meanings and different things could be useful. 

I don't want the wait time but the active time might not be what I want 
either as if its only running a few tasks out of thousands at a time the 
"active time" might be huge when each task only took very little time.   That 
is why we were talking about the max task time because it could be an indicator 
that a certain node or task was having issues.  I think the "active time" is 
better then duration and can tell you certain things but I don't think it tells 
me what I'm looking for here.

How is the max task time here misleading?  Its basically the same thing you 
get if you click on the stage and can see the min/25th/med/75th/max, just in an 
easier to view across stages.

Note I don't think we should redefine duration since its more a backwards 
compatibility thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147499886
  
I was thinking it's misleading in the sense that a stage could have a small 
max task duration, but a huge number of tasks, such that its total time is 
still long.   Is max task duration more useful for you than the total time 
taken across all tasks?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread d2r
Github user d2r commented on a diff in the pull request:

https://github.com/apache/spark/pull/9051#discussion_r41787721
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala ---
@@ -150,6 +152,10 @@ private[ui] class StageTableBase(
 }
 val formattedDuration = duration.map(d => 
UIUtils.formatDuration(d)).getOrElse("Unknown")
 
+val maxTaskDuration = stageData.taskData.values.toSeq
+.map(d => d.taskInfo.duration).max
--- End diff --

OK, I'll report the maximum of finished and unfinished tasks using logic 
similar to that of the stage duration calculation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-147523442
  
I definitely agree with not cluttering up the page.  I'm fine with addition 
metrics or something.  

So do we want to add 2 columns, 1 for the "active time" and one for max 
task time?  @pwendell your based on above do you think max task time would be 
ok or still have concerns?

Note there are probably other ways of showing this too, like showing median 
and variance or some sort of ratio of max over median.  So if people have other 
ideas like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-09 Thread d2r
GitHub user d2r opened a pull request:

https://github.com/apache/spark/pull/9051

[SPARK-10930] Adds max task duration to all stages page



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/d2r/spark spark-10930-ui-max-task-dur

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9051.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9051


commit 5588157c2ef149bb8017ebe52d3fc695c8e4200e
Author: Derek Dagit 
Date:   2015-10-09T21:48:31Z

Adds max task duration to all stages page




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...

2015-10-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9051#issuecomment-146994822
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org