[jira] [Commented] (YARN-7747) YARN UI is broken in the minicluster
[ https://issues.apache.org/jira/browse/YARN-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729517#comment-17729517 ] Jon Bringhurst commented on YARN-7747: -- No problem [~jira.shegalov]! Thank you for providing so much background info here! > YARN UI is broken in the minicluster > - > > Key: YARN-7747 > URL: https://issues.apache.org/jira/browse/YARN-7747 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Gera Shegalov >Priority: Major > Attachments: YARN-7747.001.patch, YARN-7747.002.patch > > > YARN web apps use non-injected instances of GuiceFilter, i.e. instances > created by Jetty as opposed by Guice itself. This triggers the [call > path|https://github.com/google/guice/blob/master/extensions/servlet/src/com/google/inject/servlet/GuiceFilter.java#L251] > where the static field {{pipeline}} is used instead of the instance field > {{injectedPipeline}}. However, besides GuiceFilter instances created by > Jetty, each Guice module generates them as well. On the injection call path > this static variable is updated by each instance. Thus if there are multiple > modules as it happens to be the case in the minicluster the one loaded last > ends up defining the filter pipeline for all Jetty instances. In the > minicluster case this is the nodemanager UI > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7747) YARN UI is broken in the minicluster
[ https://issues.apache.org/jira/browse/YARN-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729465#comment-17729465 ] Jon Bringhurst edited comment on YARN-7747 at 6/5/23 8:36 PM: -- I think I just ran into this issue on 3.3.5. After launching a MiniCluster RM and NM, the URL to the RM shows the NM UI. I'm going to try to patch locally with the patch here (it mostly still applies) to see if it resolves things. was (Author: jonbringhurst): I think I just ran into this issue on 3.3.5. After launching a MiniCluster RM and NM, the URL to the RM shows the NM UI. > YARN UI is broken in the minicluster > - > > Key: YARN-7747 > URL: https://issues.apache.org/jira/browse/YARN-7747 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Major > Attachments: YARN-7747.001.patch, YARN-7747.002.patch > > > YARN web apps use non-injected instances of GuiceFilter, i.e. instances > created by Jetty as opposed by Guice itself. This triggers the [call > path|https://github.com/google/guice/blob/master/extensions/servlet/src/com/google/inject/servlet/GuiceFilter.java#L251] > where the static field {{pipeline}} is used instead of the instance field > {{injectedPipeline}}. However, besides GuiceFilter instances created by > Jetty, each Guice module generates them as well. On the injection call path > this static variable is updated by each instance. Thus if there are multiple > modules as it happens to be the case in the minicluster the one loaded last > ends up defining the filter pipeline for all Jetty instances. In the > minicluster case this is the nodemanager UI > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7747) YARN UI is broken in the minicluster
[ https://issues.apache.org/jira/browse/YARN-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729465#comment-17729465 ] Jon Bringhurst commented on YARN-7747: -- I think I just ran into this issue on 3.3.5. After launching a MiniCluster RM and NM, the URL to the RM shows the NM UI. > YARN UI is broken in the minicluster > - > > Key: YARN-7747 > URL: https://issues.apache.org/jira/browse/YARN-7747 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Major > Attachments: YARN-7747.001.patch, YARN-7747.002.patch > > > YARN web apps use non-injected instances of GuiceFilter, i.e. instances > created by Jetty as opposed by Guice itself. This triggers the [call > path|https://github.com/google/guice/blob/master/extensions/servlet/src/com/google/inject/servlet/GuiceFilter.java#L251] > where the static field {{pipeline}} is used instead of the instance field > {{injectedPipeline}}. However, besides GuiceFilter instances created by > Jetty, each Guice module generates them as well. On the injection call path > this static variable is updated by each instance. Thus if there are multiple > modules as it happens to be the case in the minicluster the one loaded last > ends up defining the filter pipeline for all Jetty instances. In the > minicluster case this is the nodemanager UI > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10044) ResourceManager NPE - Error in handling event type NODE_UPDATE to the Event Dispatcher
[ https://issues.apache.org/jira/browse/YARN-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Bringhurst updated YARN-10044: -- Issue Type: Bug (was: Improvement) > ResourceManager NPE - Error in handling event type NODE_UPDATE to the Event > Dispatcher > -- > > Key: YARN-10044 > URL: https://issues.apache.org/jira/browse/YARN-10044 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Jon Bringhurst >Priority: Major > > {noformat} > 2019-12-18 00:46:42,577 [INFO] [IPC Server handler 48 on 8030] > resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - > USER=vapp5003 IP=10.186.103.102 OPERATION=AM Released Container > TARGET=SchedulerApp RESULT=SUCCESS APPID= > application_1575937033226_0426 > CONTAINERID=container_e18_1575937033226_0426_01_000389 > RESOURCE= > 2019-12-18 00:46:42,577 [INFO] [IPC Server handler 48 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - > container_e18_1575937033226_0426_01_000392 Container Transitioned from > ACQUIRED to RELEASED > 2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] > resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - > USER=vapp5003 IP=10.186.103.102 OPERATION=AM Released Container > TARGET=SchedulerApp RESULT=SUCCESS APPID= > application_1575937033226_0426 > CONTAINERID=container_e18_1575937033226_0426_01_000392 > RESOURCE= > 2019-12-18 00:46:42,578 [INFO] [SchedulerEventDispatcher:Event Processor] > allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(AbstractContainerAllocator.java:126) > - assignedContainer application attempt=appattempt_1575937033226 > _0426_01 container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@73179037 > clusterResource= type=OFF_SWITCH > requestedPartition=concourse > 2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - > container_e18_1575937033226_0426_01_000393 Container Transitioned from > ACQUIRED to RELEASED > 2019-12-18 00:46:42,578 [INFO] [SchedulerEventDispatcher:Event Processor] > capacity.ParentQueue.assignContainers(ParentQueue.java:616) - > assignedContainer queue=root usedCapacity=0.68548673 > absoluteUsedCapacity=0.68548673 used= ores:11062> cluster= > 2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] > resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - > USER=vapp5003 IP=10.186.103.102 OPERATION=AM Released Container > TARGET=SchedulerApp RESULT=SUCCESS APPID= > application_1575937033226_0426 > CONTAINERID=container_e18_1575937033226_0426_01_000393 > RESOURCE= > 2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - > container_e18_1575937033226_0426_01_000394 Container Transitioned from > ACQUIRED to RELEASED > 2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] > resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - > USER=vapp5003 IP=10.186.103.102 OPERATION=AM Released Container > TARGET=SchedulerApp RESULT=SUCCESS APPID= > application_1575937033226_0426 > CONTAINERID=container_e18_1575937033226_0426_01_000394 > RESOURCE= > 2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] > scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:250) > - checking for deactivate of application :application_1575937033226_0426 > 2019-12-18 00:46:42,580 [FATAL] [SchedulerEventDispatcher:Event Processor] > event.EventDispatcher$EventProcessor.run(EventDispatcher.java:75) - Error in > handling event type NODE_UPDATE to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:533) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2563) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2429) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1359) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1348) > at >
[jira] [Created] (YARN-10044) ResourceManager NPE - Error in handling event type NODE_UPDATE to the Event Dispatcher
Jon Bringhurst created YARN-10044: - Summary: ResourceManager NPE - Error in handling event type NODE_UPDATE to the Event Dispatcher Key: YARN-10044 URL: https://issues.apache.org/jira/browse/YARN-10044 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.9.2 Reporter: Jon Bringhurst {noformat} 2019-12-18 00:46:42,577 [INFO] [IPC Server handler 48 on 8030] resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - USER=vapp5003 IP=10.186.103.102 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID= application_1575937033226_0426 CONTAINERID=container_e18_1575937033226_0426_01_000389 RESOURCE= 2019-12-18 00:46:42,577 [INFO] [IPC Server handler 48 on 8030] rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - container_e18_1575937033226_0426_01_000392 Container Transitioned from ACQUIRED to RELEASED 2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - USER=vapp5003 IP=10.186.103.102 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID= application_1575937033226_0426 CONTAINERID=container_e18_1575937033226_0426_01_000392 RESOURCE= 2019-12-18 00:46:42,578 [INFO] [SchedulerEventDispatcher:Event Processor] allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(AbstractContainerAllocator.java:126) - assignedContainer application attempt=appattempt_1575937033226 _0426_01 container=null queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@73179037 clusterResource= type=OFF_SWITCH requestedPartition=concourse 2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - container_e18_1575937033226_0426_01_000393 Container Transitioned from ACQUIRED to RELEASED 2019-12-18 00:46:42,578 [INFO] [SchedulerEventDispatcher:Event Processor] capacity.ParentQueue.assignContainers(ParentQueue.java:616) - assignedContainer queue=root usedCapacity=0.68548673 absoluteUsedCapacity=0.68548673 used= cluster= 2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - USER=vapp5003 IP=10.186.103.102 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID= application_1575937033226_0426 CONTAINERID=container_e18_1575937033226_0426_01_000393 RESOURCE= 2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - container_e18_1575937033226_0426_01_000394 Container Transitioned from ACQUIRED to RELEASED 2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - USER=vapp5003 IP=10.186.103.102 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID= application_1575937033226_0426 CONTAINERID=container_e18_1575937033226_0426_01_000394 RESOURCE= 2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:250) - checking for deactivate of application :application_1575937033226_0426 2019-12-18 00:46:42,580 [FATAL] [SchedulerEventDispatcher:Event Processor] event.EventDispatcher$EventProcessor.run(EventDispatcher.java:75) - Error in handling event type NODE_UPDATE to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:448) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:533) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2563) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2429) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1359) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1348) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1437) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1208) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1070) at
[jira] [Commented] (YARN-2223) NPE on ResourceManager recover
[ https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523853#comment-14523853 ] Jon Bringhurst commented on YARN-2223: -- Hey [~jianhe], that sounds good to me -- I haven't seen this problem in a long time. We're running 2.6.0 now. NPE on ResourceManager recover -- Key: YARN-2223 URL: https://issues.apache.org/jira/browse/YARN-2223 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Environment: JDK 8u5 Reporter: Jon Bringhurst I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461). Both clusters have the same config (other than hostnames). Both are running on JDK8u5 (I'm not sure if this is a factor here). One cluster started up without any errors. The other started up with the following error on the RM: {noformat} 18:33:45,463 WARN RMAppImpl:331 - The specific max attempts: 0 for application: 1 is invalid, because it is out of the range [1, 50]. Use the global max attempts instead. 18:33:45,465 INFO RMAppImpl:651 - Recovering app: application_1398450350082_0001 with 8 attempts and final state = KILLED 18:33:45,468 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_01 with final state: KILLED 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_02 with final state: FAILED 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_03 with final state: FAILED 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_04 with final state: FAILED 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_05 with final state: FAILED 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_06 with final state: FAILED 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_07 with final state: FAILED 18:33:45,481 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_08 with final state: FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 State change from NEW to KILLED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 State change from NEW to FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 State change from NEW to FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 State change from NEW to FAILED 18:33:45,485 INFO RMAppImpl:639 - application_1398450350082_0001 State change from NEW to KILLED 18:33:45,485 WARN RMAppImpl:331 - The specific max attempts: 0 for application: 2 is invalid, because it is out of the range [1, 50]. Use the global max attempts instead. 18:33:45,485 INFO RMAppImpl:651 - Recovering app: application_1398450350082_0002 with 8 attempts and final state = KILLED 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_01 with final state: KILLED 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_02 with final state: FAILED 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_03 with final state: FAILED 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_04 with final state: FAILED 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_05 with final state: FAILED 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_06 with final state: FAILED 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_07 with final state: FAILED 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_08 with final state: FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 State change from NEW to KILLED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 State change from NEW to FAILED 18:33:45,490 INFO
[jira] [Created] (YARN-3344) procfs stat file is not in the expected format warning
Jon Bringhurst created YARN-3344: Summary: procfs stat file is not in the expected format warning Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing (sorry about the janky json formatted log message): {noformat} { source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, }, } {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep -i 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Bringhurst updated YARN-3344: - Description: Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} was: Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing (sorry about the janky json formatted log message): {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} procfs stat file is not in the expected format warning -- Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Bringhurst updated YARN-3344: - Description: Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing (sorry about the janky json formatted log message): {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} was: Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing (sorry about the janky json formatted log message): {noformat} { source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, }, } {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep -i 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} procfs stat file is not in the expected format warning -- Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing (sorry about the janky json formatted log message): {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Bringhurst updated YARN-3344: - Description: Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. was: Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} procfs stat file is not in the expected format warning -- Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2223) NPE on ResourceManager recover
Jon Bringhurst created YARN-2223: Summary: NPE on ResourceManager recover Key: YARN-2223 URL: https://issues.apache.org/jira/browse/YARN-2223 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Reporter: Jon Bringhurst I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461). Both clusters have the same config (other than hostnames). Both are running on JDK8u5 (I'm not sure if this is a factor here). One cluster started up without any errors. The other started up with the following error on the RM: {noformat} 18:33:45,463 WARN RMAppImpl:331 - The specific max attempts: 0 for application: 1 is invalid, because it is out of the range [1, 50]. Use the global max attempts instead. 18:33:45,465 INFO RMAppImpl:651 - Recovering app: application_1398450350082_0001 with 8 attempts and final state = KILLED 18:33:45,468 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_01 with final state: KILLED 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_02 with final state: FAILED 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_03 with final state: FAILED 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_04 with final state: FAILED 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_05 with final state: FAILED 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_06 with final state: FAILED 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_07 with final state: FAILED 18:33:45,481 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_08 with final state: FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 State change from NEW to KILLED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 State change from NEW to FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 State change from NEW to FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 State change from NEW to FAILED 18:33:45,485 INFO RMAppImpl:639 - application_1398450350082_0001 State change from NEW to KILLED 18:33:45,485 WARN RMAppImpl:331 - The specific max attempts: 0 for application: 2 is invalid, because it is out of the range [1, 50]. Use the global max attempts instead. 18:33:45,485 INFO RMAppImpl:651 - Recovering app: application_1398450350082_0002 with 8 attempts and final state = KILLED 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_01 with final state: KILLED 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_02 with final state: FAILED 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_03 with final state: FAILED 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_04 with final state: FAILED 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_05 with final state: FAILED 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_06 with final state: FAILED 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_07 with final state: FAILED 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_08 with final state: FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 State change from NEW to KILLED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 State change from NEW to FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_03 State change from NEW to FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_04 State change from NEW to FAILED 18:33:45,491 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_05 State change from NEW to FAILED 18:33:45,491 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_06 State
[jira] [Updated] (YARN-2223) NPE on ResourceManager recover
[ https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Bringhurst updated YARN-2223: - Description: I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461). Both clusters have the same config (other than hostnames). Both are running on JDK8u5 (I'm not sure if this is a factor here). One cluster started up without any errors. The other started up with the following error on the RM: {noformat} 18:33:45,463 WARN RMAppImpl:331 - The specific max attempts: 0 for application: 1 is invalid, because it is out of the range [1, 50]. Use the global max attempts instead. 18:33:45,465 INFO RMAppImpl:651 - Recovering app: application_1398450350082_0001 with 8 attempts and final state = KILLED 18:33:45,468 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_01 with final state: KILLED 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_02 with final state: FAILED 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_03 with final state: FAILED 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_04 with final state: FAILED 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_05 with final state: FAILED 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_06 with final state: FAILED 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_07 with final state: FAILED 18:33:45,481 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_08 with final state: FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 State change from NEW to KILLED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 State change from NEW to FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 State change from NEW to FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 State change from NEW to FAILED 18:33:45,485 INFO RMAppImpl:639 - application_1398450350082_0001 State change from NEW to KILLED 18:33:45,485 WARN RMAppImpl:331 - The specific max attempts: 0 for application: 2 is invalid, because it is out of the range [1, 50]. Use the global max attempts instead. 18:33:45,485 INFO RMAppImpl:651 - Recovering app: application_1398450350082_0002 with 8 attempts and final state = KILLED 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_01 with final state: KILLED 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_02 with final state: FAILED 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_03 with final state: FAILED 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_04 with final state: FAILED 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_05 with final state: FAILED 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_06 with final state: FAILED 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_07 with final state: FAILED 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_08 with final state: FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 State change from NEW to KILLED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 State change from NEW to FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_03 State change from NEW to FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_04 State change from NEW to FAILED 18:33:45,491 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_05 State change from NEW to FAILED 18:33:45,491 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_06 State change from NEW to FAILED 18:33:45,491 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_07 State change from NEW to FAILED
[jira] [Commented] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
[ https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042835#comment-14042835 ] Jon Bringhurst commented on YARN-2093: -- When upgrading a second time from 2.2.0 to 2.4.1-rc1, this didn't happen. So, this is not reproducible as far as I can tell. Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP --- Key: YARN-2093 URL: https://issues.apache.org/jira/browse/YARN-2093 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Reporter: Jon Bringhurst After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup: {noformat} 21:19:34,308 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED 21:19:34,309 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED 21:19:34,317 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_09 to scheduler from user: samza-perf-playground 21:19:34,318 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_10 to scheduler from user: samza-perf-playground 21:19:34,318 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED 21:19:34,318 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_05 is done. finalState=FAILED 21:19:34,319 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED 21:19:34,319 INFO AppSchedulingInfo:108 - Application application_1400092144371_0003 requests cleared 21:19:34,319 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_11 to scheduler from user: samza-perf-playground 21:19:34,320 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_06 is done. finalState=FAILED 21:19:34,320 INFO AppSchedulingInfo:108 - Application application_1400092144371_0003 requests cleared 21:19:34,320 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.IllegalStateException: Given app to remove org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d does not exist in queue [root.samza-perf-playground, demand=memory:0, vCores:0, running=memory:0, vCores:0, share=memory:368640, vCores:0, w=memory weight=1.0, cpu weight=1.0] at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 21:19:34,330 INFO ResourceManager:604 - Exiting, bbye.. 21:19:34,335 INFO log:67 - Stopped SelectChannelConnector@:8088 21:19:34,437 INFO Server:2398 - Stopping server on 8033 21:19:34,438 INFO Server:694 - Stopping IPC Server listener on 8033 {noformat} Last commit message for this build is (branch-2.4 on github.com/apache/hadoop-common): {noformat} commit 09e24d5519187c0db67aacc1992be5d43829aa1e Author: Arpit Agarwal a...@apache.org Date: Tue May 20 20:18:46 2014 + HADOOP-10562. Fix CHANGES.txt entry again git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 13f79535-47bb-0310-9956-ffa450edef68 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041255#comment-14041255 ] Jon Bringhurst commented on YARN-2194: -- It might also be useful to have a SystemdNspawnContainerExectuor for yarn.nodemanager.container-executor.class. I don't know how many people would be interesting it using it however. Add Cgroup support for RedHat 7 --- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
Jon Bringhurst created YARN-2093: Summary: Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP Key: YARN-2093 URL: https://issues.apache.org/jira/browse/YARN-2093 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Reporter: Jon Bringhurst After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup: {noformat} 21:19:34,308 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED 21:19:34,309 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED 21:19:34,317 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_09 to scheduler from user: samza-perf-playground 21:19:34,318 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_10 to scheduler from user: samza-perf-playground 21:19:34,318 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED 21:19:34,318 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_05 is done. finalState=FAILED 21:19:34,319 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED 21:19:34,319 INFO AppSchedulingInfo:108 - Application application_1400092144371_0003 requests cleared 21:19:34,319 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_11 to scheduler from user: samza-perf-playground 21:19:34,320 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_06 is done. finalState=FAILED 21:19:34,320 INFO AppSchedulingInfo:108 - Application application_1400092144371_0003 requests cleared 21:19:34,320 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.IllegalStateException: Given app to remove org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d does not exist in queue [root.samza-perf-playground, demand=memory:0, vCores:0, running=memory:0, vCores:0, share=memory:368640, vCores:0, w=memory weight=1.0, cpu weight=1.0] at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 21:19:34,330 INFO ResourceManager:604 - Exiting, bbye.. 21:19:34,335 INFO log:67 - Stopped selectchannelconnec...@eat1-app587.stg.linkedin.com:8088 21:19:34,437 INFO Server:2398 - Stopping server on 8033 21:19:34,438 INFO Server:694 - Stopping IPC Server listener on 8033 {noformat} Last commit message is (branch-2.4 on github.com/apache/hadoop-common): {noformat} commit 09e24d5519187c0db67aacc1992be5d43829aa1e Author: Arpit Agarwal a...@apache.org Date: Tue May 20 20:18:46 2014 + HADOOP-10562. Fix CHANGES.txt entry again git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 13f79535-47bb-0310-9956-ffa450edef68 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
[ https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Bringhurst updated YARN-2093: - Description: After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup: {noformat} 21:19:34,308 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED 21:19:34,309 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED 21:19:34,317 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_09 to scheduler from user: samza-perf-playground 21:19:34,318 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_10 to scheduler from user: samza-perf-playground 21:19:34,318 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED 21:19:34,318 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_05 is done. finalState=FAILED 21:19:34,319 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED 21:19:34,319 INFO AppSchedulingInfo:108 - Application application_1400092144371_0003 requests cleared 21:19:34,319 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_11 to scheduler from user: samza-perf-playground 21:19:34,320 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_06 is done. finalState=FAILED 21:19:34,320 INFO AppSchedulingInfo:108 - Application application_1400092144371_0003 requests cleared 21:19:34,320 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.IllegalStateException: Given app to remove org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d does not exist in queue [root.samza-perf-playground, demand=memory:0, vCores:0, running=memory:0, vCores:0, share=memory:368640, vCores:0, w=memory weight=1.0, cpu weight=1.0] at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 21:19:34,330 INFO ResourceManager:604 - Exiting, bbye.. 21:19:34,335 INFO log:67 - Stopped SelectChannelConnector@:8088 21:19:34,437 INFO Server:2398 - Stopping server on 8033 21:19:34,438 INFO Server:694 - Stopping IPC Server listener on 8033 {noformat} Last commit message is (branch-2.4 on github.com/apache/hadoop-common): {noformat} commit 09e24d5519187c0db67aacc1992be5d43829aa1e Author: Arpit Agarwal a...@apache.org Date: Tue May 20 20:18:46 2014 + HADOOP-10562. Fix CHANGES.txt entry again git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 13f79535-47bb-0310-9956-ffa450edef68 {noformat} was: After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup: {noformat} 21:19:34,308 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED 21:19:34,309 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED 21:19:34,317 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_09 to scheduler from user: samza-perf-playground 21:19:34,318 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_10 to scheduler from user: samza-perf-playground 21:19:34,318 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED 21:19:34,318 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_05 is done. finalState=FAILED 21:19:34,319 INFO RMAppAttemptImpl:659 -
[jira] [Updated] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
[ https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Bringhurst updated YARN-2093: - Description: After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup: {noformat} 21:19:34,308 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED 21:19:34,309 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED 21:19:34,317 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_09 to scheduler from user: samza-perf-playground 21:19:34,318 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_10 to scheduler from user: samza-perf-playground 21:19:34,318 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED 21:19:34,318 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_05 is done. finalState=FAILED 21:19:34,319 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED 21:19:34,319 INFO AppSchedulingInfo:108 - Application application_1400092144371_0003 requests cleared 21:19:34,319 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_11 to scheduler from user: samza-perf-playground 21:19:34,320 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_06 is done. finalState=FAILED 21:19:34,320 INFO AppSchedulingInfo:108 - Application application_1400092144371_0003 requests cleared 21:19:34,320 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.IllegalStateException: Given app to remove org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d does not exist in queue [root.samza-perf-playground, demand=memory:0, vCores:0, running=memory:0, vCores:0, share=memory:368640, vCores:0, w=memory weight=1.0, cpu weight=1.0] at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 21:19:34,330 INFO ResourceManager:604 - Exiting, bbye.. 21:19:34,335 INFO log:67 - Stopped SelectChannelConnector@:8088 21:19:34,437 INFO Server:2398 - Stopping server on 8033 21:19:34,438 INFO Server:694 - Stopping IPC Server listener on 8033 {noformat} Last commit message for this build is (branch-2.4 on github.com/apache/hadoop-common): {noformat} commit 09e24d5519187c0db67aacc1992be5d43829aa1e Author: Arpit Agarwal a...@apache.org Date: Tue May 20 20:18:46 2014 + HADOOP-10562. Fix CHANGES.txt entry again git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 13f79535-47bb-0310-9956-ffa450edef68 {noformat} was: After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup: {noformat} 21:19:34,308 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED 21:19:34,309 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED 21:19:34,317 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_09 to scheduler from user: samza-perf-playground 21:19:34,318 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_10 to scheduler from user: samza-perf-playground 21:19:34,318 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED 21:19:34,318 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_05 is done. finalState=FAILED 21:19:34,319 INFO RMAppAttemptImpl:659 -
[jira] [Commented] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
[ https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005407#comment-14005407 ] Jon Bringhurst commented on YARN-2093: -- RM-HA is enabled. This only happened on the first start after upgrading from 2.2.0. Starting the RM again after the first start works without error. I haven't tried to do an upgrade again, so I'm not sure if it's reproducible. [ https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005397#comment-14005397 ] Sandy Ryza commented on YARN-2093: -- Thanks for reporting this Jon. Did this occur in an RM-HA setup? Is it reproducible? -- This message was sent by Atlassian JIRA (v6.2#6252) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP --- Key: YARN-2093 URL: https://issues.apache.org/jira/browse/YARN-2093 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Reporter: Jon Bringhurst After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup: {noformat} 21:19:34,308 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED 21:19:34,309 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED 21:19:34,310 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED 21:19:34,317 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_09 to scheduler from user: samza-perf-playground 21:19:34,318 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_10 to scheduler from user: samza-perf-playground 21:19:34,318 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED 21:19:34,318 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_05 is done. finalState=FAILED 21:19:34,319 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED 21:19:34,319 INFO AppSchedulingInfo:108 - Application application_1400092144371_0003 requests cleared 21:19:34,319 INFO FairScheduler:673 - Added Application Attempt appattempt_1400092144371_0004_11 to scheduler from user: samza-perf-playground 21:19:34,320 INFO FairScheduler:733 - Application appattempt_1400092144371_0003_06 is done. finalState=FAILED 21:19:34,320 INFO AppSchedulingInfo:108 - Application application_1400092144371_0003 requests cleared 21:19:34,320 INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.IllegalStateException: Given app to remove org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d does not exist in queue [root.samza-perf-playground, demand=memory:0, vCores:0, running=memory:0, vCores:0, share=memory:368640, vCores:0, w=memory weight=1.0, cpu weight=1.0] at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 21:19:34,330 INFO ResourceManager:604 - Exiting, bbye.. 21:19:34,335 INFO log:67 - Stopped SelectChannelConnector@:8088 21:19:34,437 INFO Server:2398 - Stopping server on 8033 21:19:34,438 INFO Server:694 - Stopping IPC Server listener on 8033 {noformat} Last commit message for this build is (branch-2.4 on github.com/apache/hadoop-common): {noformat} commit 09e24d5519187c0db67aacc1992be5d43829aa1e Author: Arpit Agarwal a...@apache.org Date: Tue May 20 20:18:46 2014 + HADOOP-10562. Fix CHANGES.txt entry again git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 13f79535-47bb-0310-9956-ffa450edef68 {noformat}
[jira] [Updated] (YARN-1986) After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
[ https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Bringhurst updated YARN-1986: - Description: After upgrade from 2.2.0 to 2.4.0, NPE on first job start. -After RM was restarted, the job runs without a problem.- {noformat} 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye.. {noformat} was: After upgrade from 2.2.0 to 2.4.0, NPE on first job start. After RM was restarted, the job runs without a problem. {noformat} 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye.. {noformat} After upgrade from 2.2.0 to 2.4.0, NPE on first job start. -- Key: YARN-1986 URL: https://issues.apache.org/jira/browse/YARN-1986 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jon Bringhurst Assignee: Hong Zhiguo Attachments: YARN-1986-2.patch, YARN-1986-3.patch, YARN-1986-testcase.patch, YARN-1986.patch After upgrade from 2.2.0 to 2.4.0, NPE on first job start. -After RM was restarted, the job runs without a problem.- {noformat} 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1986) After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
Jon Bringhurst created YARN-1986: Summary: After upgrade from 2.2.0 to 2.4.0, NPE on first job start. Key: YARN-1986 URL: https://issues.apache.org/jira/browse/YARN-1986 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jon Bringhurst After upgrade from 2.2.0 to 2.4.0, NPE on first job start. After RM was restarted, the job runs without a problem. {noformat} 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)