[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2017-03-29 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947678#comment-15947678
 ] 

Apache Spark commented on SPARK-3577:
-

User 'sitalkedia' has created a pull request for this issue:
https://github.com/apache/spark/pull/17471

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2017-03-29 Thread Sital Kedia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947669#comment-15947669
 ] 

Sital Kedia commented on SPARK-3577:


I am making a change to report correct spill data size on disk. 

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564472#comment-15564472
 ] 

Reynold Xin commented on SPARK-3577:


[~dreamworks007] can you take a look at the problem here? 
https://github.com/apache/spark/pull/15347


> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563038#comment-15563038
 ] 

Gaoxiang Liu commented on SPARK-3577:
-

I find that the spill size metrics is already added  in 
https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0,
 and I have confirm in the UI.

Also, we notices that it's wield that the spill size is somehow not reported in 
the reducer , but reported in the mapper.

Back to the previous question, for the spill time, if it's still relevant to 
add, then I plan to work on it if there is no objections. 

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562945#comment-15562945
 ] 

Reynold Xin commented on SPARK-3577:


Actually instead of tracking spill time, it's more important to first report 
spill data size.


> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562938#comment-15562938
 ] 

Reynold Xin commented on SPARK-3577:


This is still relevant.

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562925#comment-15562925
 ] 

Gaoxiang Liu commented on SPARK-3577:
-

Hi [~kayousterhout], 

Just want to make sure that this JIRA is still relevant, right ?  Is there any 
changes to the requirement ?

I am currently working on this one, so just want to make sure.

Thanks !

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2016-08-11 Thread Kay Ousterhout (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417929#comment-15417929
 ] 

Kay Ousterhout commented on SPARK-3577:
---

I believe spill time will currently be displayed as part of the task runtime, 
but not as part of scheduler delay.

The scheduler delay is calculated by looking at the difference between two 
values:

(1) The time that the task was running on the executor
(2) The time from when the scheduler sent information about the task to the 
executor (so the executor could run the task) until the scheduler received a 
message that the task completed.

Scheduler delay is (2) - (1).  Usually when it's high, it's because of queueing 
delays in the scheduler that are either delaying the task getting sent to the 
executor (e.g., because the scheduler has a long queue of other tasks that need 
to be launched, or because tasks are large so take a while to send over the 
network) or that are delaying the task completion message getting back to the 
scheduler (which can happen when the rate of task launch is high -- greater 
than 1K or so task launches / second).

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2016-08-11 Thread Tzach Zohar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417916#comment-15417916
 ] 

Tzach Zohar commented on SPARK-3577:


Does this mean that currently, spill time will be displayed as part of the 
*Scheduler Delay*? 
Scheduler Delay is calculated pretty much as "everything that isn't 
specifically measured" (see 
[StagePage.getSchedulerDelay|https://github.com/apache/spark/blob/v2.0.0/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala#L770]),
 so I'm wondering if indeed it might include  spill time if it's not included 
anywhere else. 

If so - this might explain long Scheduler Delay values which would be hard to 
make sense of otherwise (which I think is what I'm seeing...).

Thanks

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2015-07-15 Thread Tom Hubregtsen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628646#comment-14628646
 ] 

Tom Hubregtsen commented on SPARK-3577:
---

I could also use this metric

 Add task metric to report spill time
 

 Key: SPARK-3577
 URL: https://issues.apache.org/jira/browse/SPARK-3577
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Affects Versions: 1.1.0
Reporter: Kay Ousterhout
Assignee: Sandy Ryza
Priority: Minor

 The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
 {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
 We should probably add task metrics to report this spill time, since for 
 shuffles, this would have previously been reported as part of shuffle write 
 time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2015-06-08 Thread Ming Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578240#comment-14578240
 ] 

Ming Chen commented on SPARK-3577:
--

Why have not the metric been added? I think this is rather important, it may 
affect the results of the research work on this :  
https://kayousterhout.github.io/trace-analysis/

 Add task metric to report spill time
 

 Key: SPARK-3577
 URL: https://issues.apache.org/jira/browse/SPARK-3577
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Affects Versions: 1.1.0
Reporter: Kay Ousterhout
Assignee: Sandy Ryza
Priority: Minor

 The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
 {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
 We should probably add task metrics to report this spill time, since for 
 shuffles, this would have previously been reported as part of shuffle write 
 time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2014-09-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144412#comment-14144412
 ] 

Apache Spark commented on SPARK-3577:
-

User 'sryza' has created a pull request for this issue:
https://github.com/apache/spark/pull/2504

 Add task metric to report spill time
 

 Key: SPARK-3577
 URL: https://issues.apache.org/jira/browse/SPARK-3577
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Kay Ousterhout
Priority: Minor

 The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
 {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
 We should probably add task metrics to report this spill time, since for 
 shuffles, this would have previously been reported as part of shuffle write 
 time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2014-09-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142524#comment-14142524
 ] 

Sandy Ryza commented on SPARK-3577:
---

No problem.  Yeah, I agree that a spill time metric would be useful.

 Add task metric to report spill time
 

 Key: SPARK-3577
 URL: https://issues.apache.org/jira/browse/SPARK-3577
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Kay Ousterhout
Priority: Minor

 The ExternalSorter passes its own ShuffleWriteMetrics into ExternalSorter.  
 The write time recorded in those metrics is never used.  We should probably 
 add task metrics to report this spill time, since for shuffles, this would 
 have previously been reported as part of shuffle write time (with the 
 original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org