[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013120#comment-15013120
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2555 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2555/])
move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 
23a130abd7f26ca95d7e94988c7bc45c6d419d0f)
* hadoop-yarn-project/CHANGES.txt


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012130#comment-15012130
 ] 

Wangda Tan commented on YARN-4287:
--

Committed to branch-2.7 and updated CHANGES.txt of branch-2/trunk. Thanks 
[~nroberts] and review from [~jlowe].

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012318#comment-15012318
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2622 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2622/])
move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 
23a130abd7f26ca95d7e94988c7bc45c6d419d0f)
* hadoop-yarn-project/CHANGES.txt


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012412#comment-15012412
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #693 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/693/])
move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 
23a130abd7f26ca95d7e94988c7bc45c6d419d0f)
* hadoop-yarn-project/CHANGES.txt


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012166#comment-15012166
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8823 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8823/])
move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 
23a130abd7f26ca95d7e94988c7bc45c6d419d0f)
* hadoop-yarn-project/CHANGES.txt


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012215#comment-15012215
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #681 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/681/])
move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 
23a130abd7f26ca95d7e94988c7bc45c6d419d0f)
* hadoop-yarn-project/CHANGES.txt


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012380#comment-15012380
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1420 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1420/])
move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 
23a130abd7f26ca95d7e94988c7bc45c6d419d0f)
* hadoop-yarn-project/CHANGES.txt


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012820#comment-15012820
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #617 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/617/])
move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 
23a130abd7f26ca95d7e94988c7bc45c6d419d0f)
* hadoop-yarn-project/CHANGES.txt


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007129#comment-15007129
 ] 

Wangda Tan commented on YARN-4287:
--

Thanks for update [~nroberts], I tried this patch on 2.7, all CS tests passed 
with this patch. I will commit this to branch-2.7 today if no opposite opinions.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003002#comment-15003002
 ] 

Hudson commented on YARN-4287:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #675 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/675/])
YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts 
(wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003029#comment-15003029
 ] 

Hudson commented on YARN-4287:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #662 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/662/])
YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts 
(wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002979#comment-15002979
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1399 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1399/])
YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts 
(wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003123#comment-15003123
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2603 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2603/])
YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts 
(wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003187#comment-15003187
 ] 

Nathan Roberts commented on YARN-4287:
--

I will put up a 2.7 version tomorrow morning.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003146#comment-15003146
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #601 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/601/])
YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts 
(wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003173#comment-15003173
 ] 

Jason Lowe commented on YARN-4287:
--

bq. do you think if it should be committed to 2.7 also?

+1 for committing this to 2.7.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002681#comment-15002681
 ] 

Wangda Tan commented on YARN-4287:
--

And [~nroberts], do you think if it should be committed to 2.7 also?

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002722#comment-15002722
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8798 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8798/])
YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts 
(wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003175#comment-15003175
 ] 

Wangda Tan commented on YARN-4287:
--

[~jlowe], Thanks for comment,
Added 2.7.3 to target version.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003309#comment-15003309
 ] 

Hudson commented on YARN-4287:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2539 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2539/])
YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts 
(wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000770#comment-15000770
 ] 

Wangda Tan commented on YARN-4287:
--

Thanks [~nroberts]
Patch looks good +1, will commit in a few days if no opposite opinions.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999391#comment-14999391
 ] 

Hadoop QA commented on YARN-4287:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 198, now 202). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 39s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 56s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
37s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 145m 25s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-10 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771599/YARN-4287-minimal-v4.patch
 |
| JIRA Issue | YARN-4287 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  

[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997650#comment-14997650
 ] 

Hadoop QA commented on YARN-4287:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 198, now 202). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 10s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 41s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
29s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 151m 16s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.0 Server=1.7.0 
Image:test-patch-base-hadoop-date2015-11-09 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771428/YARN-4287-minimal-v3.patch
 |
| JIRA Issue | YARN-4287 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux 3182d018451a 

[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997573#comment-14997573
 ] 

Wangda Tan commented on YARN-4287:
--

Thanks for update, [~nroberts].

Patch generally looks good, few comments:
- Could you add a comment at
{code}
  return (Math.min(rmContext.getScheduler().getNumClusterNodes(), 
  (requiredContainers * localityWaitFactor)) < missedOpportunities);
{code}
People read the code can get better understanding that why missedOpportunity 
need to be capped by numClusterNodes
- I would suggest to add tests for missedOpportunity capped by numClusterNodes 
and resetSchedulingOpportunity for rack request.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-29 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980503#comment-14980503
 ] 

Nathan Roberts commented on YARN-4287:
--

+1 on percentages. My only concern is that node-locality-delay is already in 
there and is not a percentage. I can deprecate the existing node-locality-delay 
and add the percentage based configs. 

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-29 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980886#comment-14980886
 ] 

MENG DING commented on YARN-4287:
-

Looking at this issue, I have to admit that I had been frustrated with the 
existing {{getLocalityWaitFactor}}, and had the same question as [~nroberts]:
bq. This made no sense to me - Accept OFF-SWITCH without delay, yet don't 
accept RACK-LOCAL??

IMHO, although it makes sense to introduce a configurable rack-locality delay, 
it doesn't help when the cluster is really busy as described in YARN-4189 and 
YARN-3309. As an interim solution, I am in favor of the 
YARN-4287-minimal.patch, but I think the default configuration of 
DEFAULT_RACK_LOCALITY_FULL_RESET should be set to true to be backward 
compatible. 

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980816#comment-14980816
 ] 

Wangda Tan commented on YARN-4287:
--

I think maybe it's better not deprecate original option, we can support both in 
the same option. Just like html set element size, you can set either px or 
percentage of parent's width/height.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981181#comment-14981181
 ] 

Wangda Tan commented on YARN-4287:
--

I'm fine with either direction, but for the 4287-minimal.patch, I suggest to 
cap the rack-local-delay to cluster size to avoid off-switch requests wait for 
too long when the request needs lots of containers.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-29 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981232#comment-14981232
 ] 

MENG DING commented on YARN-4287:
-

Agreed.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979560#comment-14979560
 ] 

Wangda Tan commented on YARN-4287:
--

Hi [~nroberts],
bq. One argument for sticking with the scaling approach is the fact that we 
basically do it today in a simpler fashion. If you specify node-locality-delay 
of 5000 on a 3000 node cluster, it gets automatically scaled down to 3000 
without informing the user. So I'd say scale it but don't try to explain it in 
user documentation.
I still think scaling down is not a straightforward way to support the problem 
you mentioned (user isn't clear about size of cluster). Instead, I think we can 
use percentage. User can say, I want node locality delay to 300 OR 10% of 
cluster size. And same to rack locality delay. Scheduler will compute what's 
the actual delay at runtime. With this, I think we can safely cap delay by 
cluster size. Does this make sense to you?

Thanks,

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-27 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976386#comment-14976386
 ] 

Nathan Roberts commented on YARN-4287:
--

Thanks [~leftnoteasy] for the quick responses. 

{quote}
I think instead of scaling, I suggest to simply cap rack/offswitch delay by the 
cluster size, so:

rack-delay = min(offswitch, node-locality-delay, cluserSize)
offswitch-delay = min(offswitch, clusterSize)
The scaling behavior could be hard to explain to end users.
{quote}

I agree that it's not as easy to describe. BUT, the problem I have is that I 
don't know how to deal with the common case of someone wanting 
node-locality-delay to be based on the size of the cluster. What we do is set 
node-locality-delay to something guaranteed to be larger than the cluster, 
knowing  the scheduler will automatically lower it to the size of the cluster. 
This works great for a single delay on any size cluster. However, it's 
impossible to describe two different delays using this same approach. For 
example, I might always want node-locality-delay to be 10% less than 
rack-locality-delay.  Maybe we should specify rack-locality-delay as a 
percentage above node-locality-delay ( 10%)? Still a little hard to describe 
though. 

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-v2.patch, YARN-4287-v3.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975410#comment-14975410
 ] 

Wangda Tan commented on YARN-4287:
--

Thanks for sharing your thoughts, [~nroberts]!

bq. iiuc, we didn't get good locality before the patch either. i.e. canAssign() 
would return true for NODE-LOCAL and OFF-SWITCH without delay. 
Yes, you're correct, I think we can safely use min(computed-offswitch, 
configured-offswitch) as final offswitch/rack delay.

bq. 1) I need to change the way rackLocalityDelay is specified because it 
doesn't handle the case where the configuration value is larger than the 
cluster size. I was thinking of just scaling it. Let's say 
node-locality-delay=5000, rack-locality-delay=5100, cluster_size is 3000. In 
the existing code, node-locality-delay would automatically get lowered to 3000. 
Instead, it will lower rack-locality-delay to 3000, and node-locality-delay 
will be proportionally adjusted (5000 * 3000 / 5100) = 2941. 
I think instead of scaling, I suggest to simply cap rack/offswitch delay by the 
cluster size, so:
- rack-delay = min(offswitch, node-locality-delay, cluserSize)
- offswitch-delay = min(offswitch, clusterSize)
The scaling behavior could be hard to explain to end users.

bq. 2) Add a configurable boolean that controls whether a rack-local assignment 
resets missed_opportunities to 0 (old behavior), OR node-locality-delay (new 
behavior). Default of new behavior.
This is fine to me since this is a configurable item and you have done tests 
for this change already.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-v2.patch, YARN-4287-v3.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-26 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974512#comment-14974512
 ] 

Nathan Roberts commented on YARN-4287:
--

Thanks [~leftnoteasy] for the comments. 

{quote}
2. node-delay = min(rack-delay, node-delay).
If a cluster has 40 nodes, user requests 3 containers on node1:

Assume the configured-rack-delay=50, 
rack-delay = min(3 (#requested-container) * 1 (#requested-resource-name)  / 40, 
50) = 0.
So:
node-delay = min(rack-delay, 40) = 0

In above example, no matter how rack-delay specified/computed, if we can keep 
the node-delay to 40, we have better chance to get node-local containers 
allocated.
{quote}
It is true that we won't get good locality in this example. iiuc, we didn't get 
good locality before the patch either. i.e. canAssign() would return true for 
NODE-LOCAL and OFF-SWITCH without delay. With the patch, canAssign() will 
return true for NODE-LOCAL, RACK-LOCAL, and OFF-SWITCH without delay. I believe 
the original intent of using localityWaitFactor was to avoid delaying small 
resource asks (could be a small job, or could be the tail of a large job). 
Unfortunately the algorithm still delayed RACK-LOCAL assignments. This made no 
sense to me - Accept OFF-SWITCH without delay, yet don't accept RACK-LOCAL?? I 
agree that we could change things here to get better locality for small 
requests, but to me this could have significant impact on small job latency so 
it would make me nervous to do so as part of this jira. 

{quote}
3. Don't restore missed-opportunity if rack-local container allocated.
The benefit of this change is obvious - we can get faster rack-local container 
allocation. But I feel this can also affect node-local container allocation (If 
the application asks only a small subset of nodes in a rack), may lead to some 
performance regression for locality I/O sensitive applications.
{quote}
You're correct that it can affect node local container allocation. I will make 
this behavior configurable. The reason I didn't in the first place was that I 
felt the circumstances where we lose out are rare (not currently getting 
NODE-LOCAL assignments because otherwise missedOpportunities resets, AND not 
getting OFF-SWITCH assignments because missedOpportunities doesn't reset for 
OFF-SWITCH so it will quickly allocated everything to OFF-SWITCH as soon as it 
hits that threshold). On the other hand, the effects of not doing it are 
dramatic. We have been having cases where 5% of NMs are down for maintenance 
and some jobs take about an order of magnitude longer to run than normal. 

So, here are the changes I propose:
1) I need to change the way rackLocalityDelay is specified because it doesn't 
handle the case where the configuration value is larger than the cluster size. 
I was thinking of just scaling it. Let's say node-locality-delay=5000, 
rack-locality-delay=5100, cluster_size is 3000. In the existing code, 
node-locality-delay would automatically get lowered to 3000. Instead, it will 
lower rack-locality-delay to 3000, and node-locality-delay will be 
proportionally adjusted (5000 * 3000 / 5100) = 2941. 
2) Add a configurable boolean that controls whether a rack-local assignment 
resets missed_opportunities to 0 (old behavior), OR node-locality-delay (new 
behavior). Default of new behavior. 

Let me know what you think of that approach.


> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-v2.patch, YARN-4287-v3.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-23 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971176#comment-14971176
 ] 

Nathan Roberts commented on YARN-4287:
--

Thanks for the comments. You're right that the logic can be simplified in that 
area. Let me do that and post a followup patch.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-v2.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972233#comment-14972233
 ] 

Wangda Tan commented on YARN-4287:
--

[~nroberts], 
Thanks for updating, some thinkings regarding to your comments:
bq. This is a behavior change, but I can't think of any good cases where 
someone would prefer the old behavior to the new. Let me know if you can think 
of some.
Agree with you, most of your changes are good, I prefer to enable it to get 
better performance. But I can still think some edge cases, and I'd prefer to 
keep old one to avoid some magic things happen :). Let me explain more:

There're several behavior changes in your patch,
1. rack-delay = min (computed-offswitch-delay, configured-rack-delay)
When large configured-rack-delay specified, it uses old behavior. So this is 
safe to me. And I think what you mentioned before:
bq. I didn't separate them in this version of the patch because I still want to 
be able to specify rack-locality-delay BUT have the computed delay take effect 
when an application is not asking for locality OR is really small.
Makes sense to me, I just feel current way to compute offswitch delay need to 
be improved, I will add an example below.

2. node-delay = min(rack-delay, node-delay).
If a cluster has 40 nodes, user requests 3 containers on node1:
{code}
Assume the configured-rack-delay=50, 
rack-delay = min(3 (#requested-container) * 1 (#requested-resource-name)  / 40, 
50) = 0.
So:
node-delay = min(rack-delay, 40) = 0
{code}
In above example, no matter how rack-delay specified/computed, if we can keep 
the node-delay to 40, we have better chance to get node-local containers 
allocated.

3. Don't restore missed-opportunity if rack-local container allocated.
The benefit of this change is obvious - we can get faster rack-local container 
allocation. But I feel this can also affect node-local container allocation (If 
the application asks only a small subset of nodes in a rack), may lead to some 
performance regression for locality I/O sensitive applications.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-v2.patch, YARN-4287-v3.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968665#comment-14968665
 ] 

Wangda Tan commented on YARN-4287:
--

Thanks [~nroberts], +1 to have an interim solution, the proposal looks good 
also, will review patch soon.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970026#comment-14970026
 ] 

Wangda Tan commented on YARN-4287:
--

Some suggestions
1) RACK_LOCALITY_EXTRA_DELAY -> RACK_LOCALITY_DELAY, same as configuration 
property name (rack-locality-delay)

2) Do you think if is it a good idea to separate old rack-locality-delay 
computation (using getLocalityWaitFactor) and new rack-locality-delay config? 
Now rack-locality-delay = min(old-computed-delay, new-specified-delay), since 
the getLocalityWaitFactor has some flaws, I think we can make this configurable 
so user can choose to use specified or computed.

Pseudo code may look like:
{code}
if type is OFF_SWITCH:
  if rack-locality-delay specified:
delay = rack-locality-delay
  else:
delay = computed-locality-delay
else if type is RACK_LOCAL:
delay = min(node-locality-delay, computed-or-specified-rack-locality-delay)
{code}

3)
bq. When we're getting rackLocal assignments, subsequent rackLocal assignments 
should not be delayed
+1 to the fix, since this is a behavior change, do you think if we need to make 
this configurable? This change could lead to #node-local container allocation 
decreasing in some cases.


Thanks,
Wangda

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-v2.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970159#comment-14970159
 ] 

Hadoop QA commented on YARN-4287:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  9s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 26s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 50s | The applied patch generated  4 
new checkstyle issues (total was 257, now 259). |
| {color:red}-1{color} | whitespace |   0m 12s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  58m 19s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m  3s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768138/YARN-4287-v2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9533/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9533/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9533/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9533/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9533/console |


This message was automatically generated.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-v2.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968300#comment-14968300
 ] 

Hadoop QA commented on YARN-4287:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 24s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |  10m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  15m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 48s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 16s | The applied patch generated  
13 new checkstyle issues (total was 257, now 268). |
| {color:red}-1{color} | whitespace |   0m 10s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   3m 13s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 53s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 17s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  60m 42s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 115m 33s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt |
| Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestRM |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767900/YARN-4287.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d1cdce7 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9516/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9516/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9516/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9516/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9516/console |


This message was automatically generated.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)