[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013120#comment-15013120 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2555 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2555/]) move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 23a130abd7f26ca95d7e94988c7bc45c6d419d0f) * hadoop-yarn-project/CHANGES.txt > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012130#comment-15012130 ] Wangda Tan commented on YARN-4287: -- Committed to branch-2.7 and updated CHANGES.txt of branch-2/trunk. Thanks [~nroberts] and review from [~jlowe]. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012318#comment-15012318 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2622 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2622/]) move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 23a130abd7f26ca95d7e94988c7bc45c6d419d0f) * hadoop-yarn-project/CHANGES.txt > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012412#comment-15012412 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #693 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/693/]) move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 23a130abd7f26ca95d7e94988c7bc45c6d419d0f) * hadoop-yarn-project/CHANGES.txt > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012166#comment-15012166 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-trunk-Commit #8823 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8823/]) move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 23a130abd7f26ca95d7e94988c7bc45c6d419d0f) * hadoop-yarn-project/CHANGES.txt > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012215#comment-15012215 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #681 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/681/]) move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 23a130abd7f26ca95d7e94988c7bc45c6d419d0f) * hadoop-yarn-project/CHANGES.txt > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012380#comment-15012380 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1420 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1420/]) move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 23a130abd7f26ca95d7e94988c7bc45c6d419d0f) * hadoop-yarn-project/CHANGES.txt > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012820#comment-15012820 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #617 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/617/]) move fix version of YARN-4287 from 2.8.0 to 2.7.3 (wangda: rev 23a130abd7f26ca95d7e94988c7bc45c6d419d0f) * hadoop-yarn-project/CHANGES.txt > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007129#comment-15007129 ] Wangda Tan commented on YARN-4287: -- Thanks for update [~nroberts], I tried this patch on 2.7, all CS tests passed with this patch. I will commit this to branch-2.7 today if no opposite opinions. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003002#comment-15003002 ] Hudson commented on YARN-4287: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #675 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/675/]) YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts (wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003029#comment-15003029 ] Hudson commented on YARN-4287: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #662 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/662/]) YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts (wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002979#comment-15002979 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1399 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1399/]) YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts (wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/CHANGES.txt > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003123#comment-15003123 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2603 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2603/]) YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts (wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003187#comment-15003187 ] Nathan Roberts commented on YARN-4287: -- I will put up a 2.7 version tomorrow morning. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003146#comment-15003146 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #601 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/601/]) YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts (wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003173#comment-15003173 ] Jason Lowe commented on YARN-4287: -- bq. do you think if it should be committed to 2.7 also? +1 for committing this to 2.7. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002681#comment-15002681 ] Wangda Tan commented on YARN-4287: -- And [~nroberts], do you think if it should be committed to 2.7 also? > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002722#comment-15002722 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-trunk-Commit #8798 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8798/]) YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts (wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003175#comment-15003175 ] Wangda Tan commented on YARN-4287: -- [~jlowe], Thanks for comment, Added 2.7.3 to target version. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003309#comment-15003309 ] Hudson commented on YARN-4287: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2539 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2539/]) YARN-4287. Capacity Scheduler: Rack Locality improvement (Nathan Roberts (wangda: rev 796638d9bc86235b9f3e5d1a3a9a25bbf5c04d1c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestQueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AbstractContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/CHANGES.txt > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000770#comment-15000770 ] Wangda Tan commented on YARN-4287: -- Thanks [~nroberts] Patch looks good +1, will commit in a few days if no opposite opinions. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999391#comment-14999391 ] Hadoop QA commented on YARN-4287: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 4 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 198, now 202). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 39s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 56s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 145m 25s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-10 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771599/YARN-4287-minimal-v4.patch | | JIRA Issue | YARN-4287 | | Optional Tests | asflicense javac javadoc mvninstall unit
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997650#comment-14997650 ] Hadoop QA commented on YARN-4287: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 4 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 198, now 202). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 10s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 41s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 151m 16s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.0 Server=1.7.0 Image:test-patch-base-hadoop-date2015-11-09 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771428/YARN-4287-minimal-v3.patch | | JIRA Issue | YARN-4287 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Linux 3182d018451a
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997573#comment-14997573 ] Wangda Tan commented on YARN-4287: -- Thanks for update, [~nroberts]. Patch generally looks good, few comments: - Could you add a comment at {code} return (Math.min(rmContext.getScheduler().getNumClusterNodes(), (requiredContainers * localityWaitFactor)) < missedOpportunities); {code} People read the code can get better understanding that why missedOpportunity need to be capped by numClusterNodes - I would suggest to add tests for missedOpportunity capped by numClusterNodes and resetSchedulingOpportunity for rack request. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980503#comment-14980503 ] Nathan Roberts commented on YARN-4287: -- +1 on percentages. My only concern is that node-locality-delay is already in there and is not a percentage. I can deprecate the existing node-locality-delay and add the percentage based configs. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980886#comment-14980886 ] MENG DING commented on YARN-4287: - Looking at this issue, I have to admit that I had been frustrated with the existing {{getLocalityWaitFactor}}, and had the same question as [~nroberts]: bq. This made no sense to me - Accept OFF-SWITCH without delay, yet don't accept RACK-LOCAL?? IMHO, although it makes sense to introduce a configurable rack-locality delay, it doesn't help when the cluster is really busy as described in YARN-4189 and YARN-3309. As an interim solution, I am in favor of the YARN-4287-minimal.patch, but I think the default configuration of DEFAULT_RACK_LOCALITY_FULL_RESET should be set to true to be backward compatible. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980816#comment-14980816 ] Wangda Tan commented on YARN-4287: -- I think maybe it's better not deprecate original option, we can support both in the same option. Just like html set element size, you can set either px or percentage of parent's width/height. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981181#comment-14981181 ] Wangda Tan commented on YARN-4287: -- I'm fine with either direction, but for the 4287-minimal.patch, I suggest to cap the rack-local-delay to cluster size to avoid off-switch requests wait for too long when the request needs lots of containers. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981232#comment-14981232 ] MENG DING commented on YARN-4287: - Agreed. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979560#comment-14979560 ] Wangda Tan commented on YARN-4287: -- Hi [~nroberts], bq. One argument for sticking with the scaling approach is the fact that we basically do it today in a simpler fashion. If you specify node-locality-delay of 5000 on a 3000 node cluster, it gets automatically scaled down to 3000 without informing the user. So I'd say scale it but don't try to explain it in user documentation. I still think scaling down is not a straightforward way to support the problem you mentioned (user isn't clear about size of cluster). Instead, I think we can use percentage. User can say, I want node locality delay to 300 OR 10% of cluster size. And same to rack locality delay. Scheduler will compute what's the actual delay at runtime. With this, I think we can safely cap delay by cluster size. Does this make sense to you? Thanks, > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976386#comment-14976386 ] Nathan Roberts commented on YARN-4287: -- Thanks [~leftnoteasy] for the quick responses. {quote} I think instead of scaling, I suggest to simply cap rack/offswitch delay by the cluster size, so: rack-delay = min(offswitch, node-locality-delay, cluserSize) offswitch-delay = min(offswitch, clusterSize) The scaling behavior could be hard to explain to end users. {quote} I agree that it's not as easy to describe. BUT, the problem I have is that I don't know how to deal with the common case of someone wanting node-locality-delay to be based on the size of the cluster. What we do is set node-locality-delay to something guaranteed to be larger than the cluster, knowing the scheduler will automatically lower it to the size of the cluster. This works great for a single delay on any size cluster. However, it's impossible to describe two different delays using this same approach. For example, I might always want node-locality-delay to be 10% less than rack-locality-delay. Maybe we should specify rack-locality-delay as a percentage above node-locality-delay ( 10%)? Still a little hard to describe though. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-v2.patch, YARN-4287-v3.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975410#comment-14975410 ] Wangda Tan commented on YARN-4287: -- Thanks for sharing your thoughts, [~nroberts]! bq. iiuc, we didn't get good locality before the patch either. i.e. canAssign() would return true for NODE-LOCAL and OFF-SWITCH without delay. Yes, you're correct, I think we can safely use min(computed-offswitch, configured-offswitch) as final offswitch/rack delay. bq. 1) I need to change the way rackLocalityDelay is specified because it doesn't handle the case where the configuration value is larger than the cluster size. I was thinking of just scaling it. Let's say node-locality-delay=5000, rack-locality-delay=5100, cluster_size is 3000. In the existing code, node-locality-delay would automatically get lowered to 3000. Instead, it will lower rack-locality-delay to 3000, and node-locality-delay will be proportionally adjusted (5000 * 3000 / 5100) = 2941. I think instead of scaling, I suggest to simply cap rack/offswitch delay by the cluster size, so: - rack-delay = min(offswitch, node-locality-delay, cluserSize) - offswitch-delay = min(offswitch, clusterSize) The scaling behavior could be hard to explain to end users. bq. 2) Add a configurable boolean that controls whether a rack-local assignment resets missed_opportunities to 0 (old behavior), OR node-locality-delay (new behavior). Default of new behavior. This is fine to me since this is a configurable item and you have done tests for this change already. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-v2.patch, YARN-4287-v3.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974512#comment-14974512 ] Nathan Roberts commented on YARN-4287: -- Thanks [~leftnoteasy] for the comments. {quote} 2. node-delay = min(rack-delay, node-delay). If a cluster has 40 nodes, user requests 3 containers on node1: Assume the configured-rack-delay=50, rack-delay = min(3 (#requested-container) * 1 (#requested-resource-name) / 40, 50) = 0. So: node-delay = min(rack-delay, 40) = 0 In above example, no matter how rack-delay specified/computed, if we can keep the node-delay to 40, we have better chance to get node-local containers allocated. {quote} It is true that we won't get good locality in this example. iiuc, we didn't get good locality before the patch either. i.e. canAssign() would return true for NODE-LOCAL and OFF-SWITCH without delay. With the patch, canAssign() will return true for NODE-LOCAL, RACK-LOCAL, and OFF-SWITCH without delay. I believe the original intent of using localityWaitFactor was to avoid delaying small resource asks (could be a small job, or could be the tail of a large job). Unfortunately the algorithm still delayed RACK-LOCAL assignments. This made no sense to me - Accept OFF-SWITCH without delay, yet don't accept RACK-LOCAL?? I agree that we could change things here to get better locality for small requests, but to me this could have significant impact on small job latency so it would make me nervous to do so as part of this jira. {quote} 3. Don't restore missed-opportunity if rack-local container allocated. The benefit of this change is obvious - we can get faster rack-local container allocation. But I feel this can also affect node-local container allocation (If the application asks only a small subset of nodes in a rack), may lead to some performance regression for locality I/O sensitive applications. {quote} You're correct that it can affect node local container allocation. I will make this behavior configurable. The reason I didn't in the first place was that I felt the circumstances where we lose out are rare (not currently getting NODE-LOCAL assignments because otherwise missedOpportunities resets, AND not getting OFF-SWITCH assignments because missedOpportunities doesn't reset for OFF-SWITCH so it will quickly allocated everything to OFF-SWITCH as soon as it hits that threshold). On the other hand, the effects of not doing it are dramatic. We have been having cases where 5% of NMs are down for maintenance and some jobs take about an order of magnitude longer to run than normal. So, here are the changes I propose: 1) I need to change the way rackLocalityDelay is specified because it doesn't handle the case where the configuration value is larger than the cluster size. I was thinking of just scaling it. Let's say node-locality-delay=5000, rack-locality-delay=5100, cluster_size is 3000. In the existing code, node-locality-delay would automatically get lowered to 3000. Instead, it will lower rack-locality-delay to 3000, and node-locality-delay will be proportionally adjusted (5000 * 3000 / 5100) = 2941. 2) Add a configurable boolean that controls whether a rack-local assignment resets missed_opportunities to 0 (old behavior), OR node-locality-delay (new behavior). Default of new behavior. Let me know what you think of that approach. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-v2.patch, YARN-4287-v3.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971176#comment-14971176 ] Nathan Roberts commented on YARN-4287: -- Thanks for the comments. You're right that the logic can be simplified in that area. Let me do that and post a followup patch. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-v2.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972233#comment-14972233 ] Wangda Tan commented on YARN-4287: -- [~nroberts], Thanks for updating, some thinkings regarding to your comments: bq. This is a behavior change, but I can't think of any good cases where someone would prefer the old behavior to the new. Let me know if you can think of some. Agree with you, most of your changes are good, I prefer to enable it to get better performance. But I can still think some edge cases, and I'd prefer to keep old one to avoid some magic things happen :). Let me explain more: There're several behavior changes in your patch, 1. rack-delay = min (computed-offswitch-delay, configured-rack-delay) When large configured-rack-delay specified, it uses old behavior. So this is safe to me. And I think what you mentioned before: bq. I didn't separate them in this version of the patch because I still want to be able to specify rack-locality-delay BUT have the computed delay take effect when an application is not asking for locality OR is really small. Makes sense to me, I just feel current way to compute offswitch delay need to be improved, I will add an example below. 2. node-delay = min(rack-delay, node-delay). If a cluster has 40 nodes, user requests 3 containers on node1: {code} Assume the configured-rack-delay=50, rack-delay = min(3 (#requested-container) * 1 (#requested-resource-name) / 40, 50) = 0. So: node-delay = min(rack-delay, 40) = 0 {code} In above example, no matter how rack-delay specified/computed, if we can keep the node-delay to 40, we have better chance to get node-local containers allocated. 3. Don't restore missed-opportunity if rack-local container allocated. The benefit of this change is obvious - we can get faster rack-local container allocation. But I feel this can also affect node-local container allocation (If the application asks only a small subset of nodes in a rack), may lead to some performance regression for locality I/O sensitive applications. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-v2.patch, YARN-4287-v3.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968665#comment-14968665 ] Wangda Tan commented on YARN-4287: -- Thanks [~nroberts], +1 to have an interim solution, the proposal looks good also, will review patch soon. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970026#comment-14970026 ] Wangda Tan commented on YARN-4287: -- Some suggestions 1) RACK_LOCALITY_EXTRA_DELAY -> RACK_LOCALITY_DELAY, same as configuration property name (rack-locality-delay) 2) Do you think if is it a good idea to separate old rack-locality-delay computation (using getLocalityWaitFactor) and new rack-locality-delay config? Now rack-locality-delay = min(old-computed-delay, new-specified-delay), since the getLocalityWaitFactor has some flaws, I think we can make this configurable so user can choose to use specified or computed. Pseudo code may look like: {code} if type is OFF_SWITCH: if rack-locality-delay specified: delay = rack-locality-delay else: delay = computed-locality-delay else if type is RACK_LOCAL: delay = min(node-locality-delay, computed-or-specified-rack-locality-delay) {code} 3) bq. When we're getting rackLocal assignments, subsequent rackLocal assignments should not be delayed +1 to the fix, since this is a behavior change, do you think if we need to make this configurable? This change could lead to #node-local container allocation decreasing in some cases. Thanks, Wangda > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-v2.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970159#comment-14970159 ] Hadoop QA commented on YARN-4287: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 9s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 26s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 50s | The applied patch generated 4 new checkstyle issues (total was 257, now 259). | | {color:red}-1{color} | whitespace | 0m 12s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 58m 19s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 99m 3s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768138/YARN-4287-v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 124a412 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9533/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9533/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9533/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9533/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9533/console | This message was automatically generated. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-v2.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968300#comment-14968300 ] Hadoop QA commented on YARN-4287: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 24s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 10m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 15m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 48s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 16s | The applied patch generated 13 new checkstyle issues (total was 257, now 268). | | {color:red}-1{color} | whitespace | 0m 10s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 3m 13s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 53s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 60m 42s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 115m 33s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestRM | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12767900/YARN-4287.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d1cdce7 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9516/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9516/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9516/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9516/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9516/console | This message was automatically generated. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)