[jira] [Commented] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
[ https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523861#comment-14523861 ] Jian He commented on YARN-1572: --- Hi [~gujilangzi], the patch you uploaded is a branch-2 patch. Could you please work on a trunk patch ? Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal -- Key: YARN-1572 URL: https://issues.apache.org/jira/browse/YARN-1572 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: Wenwu Peng Assignee: Wenwu Peng Attachments: YARN-1572-branch-2.3.0.001.patch, YARN-1572-log.tar.gz, conf.tar.gz, log.tar.gz we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). {code} 2014-07-31 04:18:19,653 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_1406794589275_0001_01_21 of capacity memory:1024, vCores:1 on host datanode10:57281, which has 6 containers, memory:6144, vCores:6 used and memory:2048, vCores:2 available after allocation 2014-07-31 04:18:19,654 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) at java.lang.Thread.run(Thread.java:662) 2014-07-31 04:18:19,655 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
[ https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391976#comment-14391976 ] Hadoop QA commented on YARN-1572: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708878/YARN-1572-branch-2.3.0.001.patch against trunk revision f383fd9. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7197//console This message is automatically generated. Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal -- Key: YARN-1572 URL: https://issues.apache.org/jira/browse/YARN-1572 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: Wenwu Peng Assignee: Wenwu Peng Attachments: YARN-1572-branch-2.3.0.001.patch, YARN-1572-log.tar.gz, conf.tar.gz, log.tar.gz we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). {code} 2014-07-31 04:18:19,653 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_1406794589275_0001_01_21 of capacity memory:1024, vCores:1 on host datanode10:57281, which has 6 containers, memory:6144, vCores:6 used and memory:2048, vCores:2 available after allocation 2014-07-31 04:18:19,654 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) at java.lang.Thread.run(Thread.java:662) 2014-07-31 04:18:19,655 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
[ https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391710#comment-14391710 ] Hadoop QA commented on YARN-1572: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708801/0001-Fix-for-YARN-1572.patch against trunk revision 3c7adaa. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7195//console This message is automatically generated. Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal -- Key: YARN-1572 URL: https://issues.apache.org/jira/browse/YARN-1572 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: Wenwu Peng Assignee: Wenwu Peng Attachments: 0001-Fix-for-YARN-1572.patch, YARN-1572-log.tar.gz, conf.tar.gz, log.tar.gz we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). {code} 2014-07-31 04:18:19,653 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_1406794589275_0001_01_21 of capacity memory:1024, vCores:1 on host datanode10:57281, which has 6 containers, memory:6144, vCores:6 used and memory:2048, vCores:2 available after allocation 2014-07-31 04:18:19,654 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) at java.lang.Thread.run(Thread.java:662) 2014-07-31 04:18:19,655 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
[ https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391406#comment-14391406 ] Wangda Tan commented on YARN-1572: -- This is a bad bug, adding null check seems enough to me, this could caused by user uses ApplicationMasterProtocol instead of AMRMClient, adding node-label request but doesn't add rack-local request. [~gujilangzi], are you still working on this? Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal -- Key: YARN-1572 URL: https://issues.apache.org/jira/browse/YARN-1572 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Wenwu Peng Assignee: Wenwu Peng Attachments: YARN-1572-log.tar.gz, conf.tar.gz, log.tar.gz we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). {code} 2014-07-31 04:18:19,653 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_1406794589275_0001_01_21 of capacity memory:1024, vCores:1 on host datanode10:57281, which has 6 containers, memory:6144, vCores:6 used and memory:2048, vCores:2 available after allocation 2014-07-31 04:18:19,654 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) at java.lang.Thread.run(Thread.java:662) 2014-07-31 04:18:19,655 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
[ https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391363#comment-14391363 ] Kareem El Gebaly commented on YARN-1572: I agree with solution. It affects version 2.3.0 as well, any patch/ solution found yet? Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal -- Key: YARN-1572 URL: https://issues.apache.org/jira/browse/YARN-1572 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Wenwu Peng Assignee: Wenwu Peng Attachments: YARN-1572-log.tar.gz, conf.tar.gz, log.tar.gz we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). {code} 2014-07-31 04:18:19,653 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_1406794589275_0001_01_21 of capacity memory:1024, vCores:1 on host datanode10:57281, which has 6 containers, memory:6144, vCores:6 used and memory:2048, vCores:2 available after allocation 2014-07-31 04:18:19,654 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) at java.lang.Thread.run(Thread.java:662) 2014-07-31 04:18:19,655 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
[ https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080540#comment-14080540 ] Wenwu Peng commented on YARN-1572: -- Not sure the rackLocalRequest is cause of NPE, Better to check rackLocalRequest whether is null before rackLocalRequest.setNumContainers {code} ResourceRequest rackLocalRequest = requests.get(priority).get(node.getRackName()); rackLocalRequest.setNumContainers(rackLocalRequest.getNumContainers() - 1); {code} Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal -- Key: YARN-1572 URL: https://issues.apache.org/jira/browse/YARN-1572 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Wenwu Peng Assignee: Junping Du Attachments: conf.tar.gz, log.tar.gz we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). Steps: 1. setup hadoop 2.2.0 environment 2. Run for i in {1..10}; do /hadoop/hadoop-smoke/bin/hadoop jar /hadoop/hadoop-smoke/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*.jar org.apache.hadoop.fs.TestDFSIO -write -nrFiles 30 -fileSize 64MB; sleep 10;done 2014-01-08 03:56:14,082 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:291) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:252) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:614) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:524) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:419) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:658) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:687) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:662) will attach log and configure files later Note: My topology file: 10.111.89.230 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.231 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.232 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.239 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.233 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.234 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.240 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.236 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com 10.111.89.241 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com 10.111.89.238 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com 10.111.89.242 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com -- This message was sent by Atlassian JIRA (v6.2#6252)