[jira] [Commented] (YARN-7747) YARN UI is broken in the minicluster

2023-06-05 Thread Jon Bringhurst (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729517#comment-17729517
 ] 

Jon Bringhurst commented on YARN-7747:
--

No problem [~jira.shegalov]! Thank you for providing so much background info 
here!

> YARN UI is broken in the minicluster 
> -
>
> Key: YARN-7747
> URL: https://issues.apache.org/jira/browse/YARN-7747
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: YARN-7747.001.patch, YARN-7747.002.patch
>
>
> YARN web apps use non-injected instances of GuiceFilter, i.e. instances 
> created by Jetty as opposed by Guice itself. This triggers the [call 
> path|https://github.com/google/guice/blob/master/extensions/servlet/src/com/google/inject/servlet/GuiceFilter.java#L251]
>  where the static field {{pipeline}} is used instead of the instance field 
> {{injectedPipeline}}. However, besides GuiceFilter instances created by 
> Jetty, each Guice module generates them as well. On the injection call path 
> this static variable is updated by each instance. Thus if there are multiple 
> modules as it happens to be the case in the minicluster the one loaded last 
> ends up defining the filter pipeline for all Jetty instances. In the 
> minicluster case this is the nodemanager UI
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7747) YARN UI is broken in the minicluster

2023-06-05 Thread Jon Bringhurst (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729465#comment-17729465
 ] 

Jon Bringhurst edited comment on YARN-7747 at 6/5/23 8:36 PM:
--

I think I just ran into this issue on 3.3.5. After launching a MiniCluster RM 
and NM, the URL to the RM shows the NM UI.

I'm going to try to patch locally with the patch here (it mostly still applies) 
to see if it resolves things.


was (Author: jonbringhurst):
I think I just ran into this issue on 3.3.5. After launching a MiniCluster RM 
and NM, the URL to the RM shows the NM UI.

> YARN UI is broken in the minicluster 
> -
>
> Key: YARN-7747
> URL: https://issues.apache.org/jira/browse/YARN-7747
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Major
> Attachments: YARN-7747.001.patch, YARN-7747.002.patch
>
>
> YARN web apps use non-injected instances of GuiceFilter, i.e. instances 
> created by Jetty as opposed by Guice itself. This triggers the [call 
> path|https://github.com/google/guice/blob/master/extensions/servlet/src/com/google/inject/servlet/GuiceFilter.java#L251]
>  where the static field {{pipeline}} is used instead of the instance field 
> {{injectedPipeline}}. However, besides GuiceFilter instances created by 
> Jetty, each Guice module generates them as well. On the injection call path 
> this static variable is updated by each instance. Thus if there are multiple 
> modules as it happens to be the case in the minicluster the one loaded last 
> ends up defining the filter pipeline for all Jetty instances. In the 
> minicluster case this is the nodemanager UI
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7747) YARN UI is broken in the minicluster

2023-06-05 Thread Jon Bringhurst (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729465#comment-17729465
 ] 

Jon Bringhurst commented on YARN-7747:
--

I think I just ran into this issue on 3.3.5. After launching a MiniCluster RM 
and NM, the URL to the RM shows the NM UI.

> YARN UI is broken in the minicluster 
> -
>
> Key: YARN-7747
> URL: https://issues.apache.org/jira/browse/YARN-7747
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Major
> Attachments: YARN-7747.001.patch, YARN-7747.002.patch
>
>
> YARN web apps use non-injected instances of GuiceFilter, i.e. instances 
> created by Jetty as opposed by Guice itself. This triggers the [call 
> path|https://github.com/google/guice/blob/master/extensions/servlet/src/com/google/inject/servlet/GuiceFilter.java#L251]
>  where the static field {{pipeline}} is used instead of the instance field 
> {{injectedPipeline}}. However, besides GuiceFilter instances created by 
> Jetty, each Guice module generates them as well. On the injection call path 
> this static variable is updated by each instance. Thus if there are multiple 
> modules as it happens to be the case in the minicluster the one loaded last 
> ends up defining the filter pipeline for all Jetty instances. In the 
> minicluster case this is the nodemanager UI
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10044) ResourceManager NPE - Error in handling event type NODE_UPDATE to the Event Dispatcher

2019-12-18 Thread Jon Bringhurst (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-10044:
--
Issue Type: Bug  (was: Improvement)

> ResourceManager NPE - Error in handling event type NODE_UPDATE to the Event 
> Dispatcher
> --
>
> Key: YARN-10044
> URL: https://issues.apache.org/jira/browse/YARN-10044
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Jon Bringhurst
>Priority: Major
>
> {noformat}
> 2019-12-18 00:46:42,577 [INFO] [IPC Server handler 48 on 8030] 
> resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - 
> USER=vapp5003 IP=10.186.103.102   OPERATION=AM Released Container 
> TARGET=SchedulerApp RESULT=SUCCESS  APPID=
> application_1575937033226_0426
> CONTAINERID=container_e18_1575937033226_0426_01_000389  
> RESOURCE=
> 2019-12-18 00:46:42,577 [INFO] [IPC Server handler 48 on 8030] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - 
> container_e18_1575937033226_0426_01_000392 Container Transitioned from 
> ACQUIRED to RELEASED
> 2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] 
> resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - 
> USER=vapp5003 IP=10.186.103.102   OPERATION=AM Released Container 
> TARGET=SchedulerApp RESULT=SUCCESS  APPID=
> application_1575937033226_0426
> CONTAINERID=container_e18_1575937033226_0426_01_000392  
> RESOURCE=
> 2019-12-18 00:46:42,578 [INFO] [SchedulerEventDispatcher:Event Processor] 
> allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(AbstractContainerAllocator.java:126)
>  - assignedContainer application attempt=appattempt_1575937033226
> _0426_01 container=null 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@73179037
>  clusterResource= type=OFF_SWITCH 
> requestedPartition=concourse
> 2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - 
> container_e18_1575937033226_0426_01_000393 Container Transitioned from 
> ACQUIRED to RELEASED
> 2019-12-18 00:46:42,578 [INFO] [SchedulerEventDispatcher:Event Processor] 
> capacity.ParentQueue.assignContainers(ParentQueue.java:616) - 
> assignedContainer queue=root usedCapacity=0.68548673 
> absoluteUsedCapacity=0.68548673 used= ores:11062> cluster=
> 2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] 
> resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - 
> USER=vapp5003 IP=10.186.103.102   OPERATION=AM Released Container 
> TARGET=SchedulerApp RESULT=SUCCESS  APPID=
> application_1575937033226_0426
> CONTAINERID=container_e18_1575937033226_0426_01_000393  
> RESOURCE=
> 2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - 
> container_e18_1575937033226_0426_01_000394 Container Transitioned from 
> ACQUIRED to RELEASED
> 2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] 
> resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - 
> USER=vapp5003 IP=10.186.103.102   OPERATION=AM Released Container 
> TARGET=SchedulerApp RESULT=SUCCESS  APPID=
> application_1575937033226_0426
> CONTAINERID=container_e18_1575937033226_0426_01_000394  
> RESOURCE=
> 2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] 
> scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:250)
>  - checking for deactivate of application :application_1575937033226_0426
> 2019-12-18 00:46:42,580 [FATAL] [SchedulerEventDispatcher:Event Processor] 
> event.EventDispatcher$EventProcessor.run(EventDispatcher.java:75) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:533)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2563)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2429)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1359)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1348)
> at 
> 

[jira] [Created] (YARN-10044) ResourceManager NPE - Error in handling event type NODE_UPDATE to the Event Dispatcher

2019-12-18 Thread Jon Bringhurst (Jira)
Jon Bringhurst created YARN-10044:
-

 Summary: ResourceManager NPE - Error in handling event type 
NODE_UPDATE to the Event Dispatcher
 Key: YARN-10044
 URL: https://issues.apache.org/jira/browse/YARN-10044
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.9.2
Reporter: Jon Bringhurst


{noformat}
2019-12-18 00:46:42,577 [INFO] [IPC Server handler 48 on 8030] 
resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - 
USER=vapp5003 IP=10.186.103.102   OPERATION=AM Released Container 
TARGET=SchedulerApp RESULT=SUCCESS  APPID=
application_1575937033226_0426
CONTAINERID=container_e18_1575937033226_0426_01_000389  RESOURCE=
2019-12-18 00:46:42,577 [INFO] [IPC Server handler 48 on 8030] 
rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - 
container_e18_1575937033226_0426_01_000392 Container Transitioned from ACQUIRED 
to RELEASED
2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] 
resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - 
USER=vapp5003 IP=10.186.103.102   OPERATION=AM Released Container 
TARGET=SchedulerApp RESULT=SUCCESS  APPID=
application_1575937033226_0426
CONTAINERID=container_e18_1575937033226_0426_01_000392  RESOURCE=
2019-12-18 00:46:42,578 [INFO] [SchedulerEventDispatcher:Event Processor] 
allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(AbstractContainerAllocator.java:126)
 - assignedContainer application attempt=appattempt_1575937033226
_0426_01 container=null 
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@73179037
 clusterResource= type=OFF_SWITCH 
requestedPartition=concourse
2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] 
rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - 
container_e18_1575937033226_0426_01_000393 Container Transitioned from ACQUIRED 
to RELEASED
2019-12-18 00:46:42,578 [INFO] [SchedulerEventDispatcher:Event Processor] 
capacity.ParentQueue.assignContainers(ParentQueue.java:616) - assignedContainer 
queue=root usedCapacity=0.68548673 absoluteUsedCapacity=0.68548673 
used= cluster=
2019-12-18 00:46:42,578 [INFO] [IPC Server handler 48 on 8030] 
resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - 
USER=vapp5003 IP=10.186.103.102   OPERATION=AM Released Container 
TARGET=SchedulerApp RESULT=SUCCESS  APPID=
application_1575937033226_0426
CONTAINERID=container_e18_1575937033226_0426_01_000393  RESOURCE=
2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] 
rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:486) - 
container_e18_1575937033226_0426_01_000394 Container Transitioned from ACQUIRED 
to RELEASED
2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] 
resourcemanager.RMAuditLogger.logSuccess(RMAuditLogger.java:200) - 
USER=vapp5003 IP=10.186.103.102   OPERATION=AM Released Container 
TARGET=SchedulerApp RESULT=SUCCESS  APPID=
application_1575937033226_0426
CONTAINERID=container_e18_1575937033226_0426_01_000394  RESOURCE=
2019-12-18 00:46:42,579 [INFO] [IPC Server handler 48 on 8030] 
scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:250) 
- checking for deactivate of application :application_1575937033226_0426
2019-12-18 00:46:42,580 [FATAL] [SchedulerEventDispatcher:Event Processor] 
event.EventDispatcher$EventProcessor.run(EventDispatcher.java:75) - Error in 
handling event type NODE_UPDATE to the Event Dispatcher
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:533)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2563)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2429)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1359)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1348)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1437)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1208)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1070)
at 

[jira] [Commented] (YARN-2223) NPE on ResourceManager recover

2015-05-01 Thread Jon Bringhurst (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523853#comment-14523853
 ] 

Jon Bringhurst commented on YARN-2223:
--

Hey [~jianhe], that sounds good to me -- I haven't seen this problem in a long 
time. We're running 2.6.0 now.

 NPE on ResourceManager recover
 --

 Key: YARN-2223
 URL: https://issues.apache.org/jira/browse/YARN-2223
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
 Environment: JDK 8u5
Reporter: Jon Bringhurst

 I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
 https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).
 Both clusters have the same config (other than hostnames). Both are running 
 on JDK8u5 (I'm not sure if this is a factor here).
 One cluster started up without any errors. The other started up with the 
 following error on the RM:
 {noformat}
 18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
 application: 1 is invalid, because it is out of the range [1, 50]. Use the 
 global max attempts instead.
 18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
 application_1398450350082_0001 with 8 attempts and final state = KILLED
 18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0001_01 with final state: KILLED
 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0001_02 with final state: FAILED
 18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0001_03 with final state: FAILED
 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0001_04 with final state: FAILED
 18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0001_05 with final state: FAILED
 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0001_06 with final state: FAILED
 18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0001_07 with final state: FAILED
 18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0001_08 with final state: FAILED
 18:33:45,482  INFO RMAppAttemptImpl:659 - 
 appattempt_1398450350082_0001_01 State change from NEW to KILLED
 18:33:45,482  INFO RMAppAttemptImpl:659 - 
 appattempt_1398450350082_0001_02 State change from NEW to FAILED
 18:33:45,482  INFO RMAppAttemptImpl:659 - 
 appattempt_1398450350082_0001_03 State change from NEW to FAILED
 18:33:45,482  INFO RMAppAttemptImpl:659 - 
 appattempt_1398450350082_0001_04 State change from NEW to FAILED
 18:33:45,483  INFO RMAppAttemptImpl:659 - 
 appattempt_1398450350082_0001_05 State change from NEW to FAILED
 18:33:45,483  INFO RMAppAttemptImpl:659 - 
 appattempt_1398450350082_0001_06 State change from NEW to FAILED
 18:33:45,483  INFO RMAppAttemptImpl:659 - 
 appattempt_1398450350082_0001_07 State change from NEW to FAILED
 18:33:45,483  INFO RMAppAttemptImpl:659 - 
 appattempt_1398450350082_0001_08 State change from NEW to FAILED
 18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State 
 change from NEW to KILLED
 18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
 application: 2 is invalid, because it is out of the range [1, 50]. Use the 
 global max attempts instead.
 18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
 application_1398450350082_0002 with 8 attempts and final state = KILLED
 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0002_01 with final state: KILLED
 18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0002_02 with final state: FAILED
 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0002_03 with final state: FAILED
 18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0002_04 with final state: FAILED
 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0002_05 with final state: FAILED
 18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0002_06 with final state: FAILED
 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0002_07 with final state: FAILED
 18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
 appattempt_1398450350082_0002_08 with final state: FAILED
 18:33:45,490  INFO RMAppAttemptImpl:659 - 
 appattempt_1398450350082_0002_01 State change from NEW to KILLED
 18:33:45,490  INFO RMAppAttemptImpl:659 - 
 appattempt_1398450350082_0002_02 State change from NEW to FAILED
 18:33:45,490  INFO 

[jira] [Created] (YARN-3344) procfs stat file is not in the expected format warning

2015-03-12 Thread Jon Bringhurst (JIRA)
Jon Bringhurst created YARN-3344:


 Summary: procfs stat file is not in the expected format warning
 Key: YARN-3344
 URL: https://issues.apache.org/jira/browse/YARN-3344
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jon Bringhurst


Although this doesn't appear to be causing any functional issues, it is 
spamming our log files quite a bit. :)

It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
/proc/pid/stat files.

Here's the error I'm seeing (sorry about the janky json formatted log message):

{noformat}
{
source_host: asdf,
method: constructProcessInfo,
level: WARN,
message: Unexpected: procfs stat file is not in the expected format for 
process with pid 6953
file: ProcfsBasedProcessTree.java,
line_number: 514,
class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree,
  },
}
{noformat}

And here's the basic info on process with pid 6953:

{noformat}
[asdf ~]$ cat /proc/6953/stat
6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 
0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 
18446744073709551615 0 0 17 13 0 0 0 0 0
[asdf ~]$ ps aux|grep -i 6953
root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
/export/apps/salt/minion-scripts/module-sync.py
jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
[asdf ~]$ 
{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning

2015-03-12 Thread Jon Bringhurst (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-3344:
-
Description: 
Although this doesn't appear to be causing any functional issues, it is 
spamming our log files quite a bit. :)

It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
/proc/pid/stat files.

Here's the error I'm seeing:

{noformat}
source_host: asdf,
method: constructProcessInfo,
level: WARN,
message: Unexpected: procfs stat file is not in the expected format for 
process with pid 6953
file: ProcfsBasedProcessTree.java,
line_number: 514,
class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree,
{noformat}

And here's the basic info on process with pid 6953:

{noformat}
[asdf ~]$ cat /proc/6953/stat
6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 
0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 
18446744073709551615 0 0 17 13 0 0 0 0 0
[asdf ~]$ ps aux|grep 6953
root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
/export/apps/salt/minion-scripts/module-sync.py
jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
[asdf ~]$ 
{noformat}



  was:
Although this doesn't appear to be causing any functional issues, it is 
spamming our log files quite a bit. :)

It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
/proc/pid/stat files.

Here's the error I'm seeing (sorry about the janky json formatted log message):

{noformat}
source_host: asdf,
method: constructProcessInfo,
level: WARN,
message: Unexpected: procfs stat file is not in the expected format for 
process with pid 6953
file: ProcfsBasedProcessTree.java,
line_number: 514,
class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree,
{noformat}

And here's the basic info on process with pid 6953:

{noformat}
[asdf ~]$ cat /proc/6953/stat
6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 
0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 
18446744073709551615 0 0 17 13 0 0 0 0 0
[asdf ~]$ ps aux|grep 6953
root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
/export/apps/salt/minion-scripts/module-sync.py
jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
[asdf ~]$ 
{noformat}




 procfs stat file is not in the expected format warning
 --

 Key: YARN-3344
 URL: https://issues.apache.org/jira/browse/YARN-3344
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jon Bringhurst

 Although this doesn't appear to be causing any functional issues, it is 
 spamming our log files quite a bit. :)
 It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
 /proc/pid/stat files.
 Here's the error I'm seeing:
 {noformat}
 source_host: asdf,
 method: constructProcessInfo,
 level: WARN,
 message: Unexpected: procfs stat file is not in the expected format 
 for process with pid 6953
 file: ProcfsBasedProcessTree.java,
 line_number: 514,
 class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree,
 {noformat}
 And here's the basic info on process with pid 6953:
 {noformat}
 [asdf ~]$ cat /proc/6953/stat
 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 
 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 
 2 18446744073709551615 0 0 17 13 0 0 0 0 0
 [asdf ~]$ ps aux|grep 6953
 root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
 /export/apps/salt/minion-scripts/module-sync.py
 jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
 [asdf ~]$ 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning

2015-03-12 Thread Jon Bringhurst (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-3344:
-
Description: 
Although this doesn't appear to be causing any functional issues, it is 
spamming our log files quite a bit. :)

It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
/proc/pid/stat files.

Here's the error I'm seeing (sorry about the janky json formatted log message):

{noformat}
source_host: asdf,
method: constructProcessInfo,
level: WARN,
message: Unexpected: procfs stat file is not in the expected format for 
process with pid 6953
file: ProcfsBasedProcessTree.java,
line_number: 514,
class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree,
{noformat}

And here's the basic info on process with pid 6953:

{noformat}
[asdf ~]$ cat /proc/6953/stat
6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 
0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 
18446744073709551615 0 0 17 13 0 0 0 0 0
[asdf ~]$ ps aux|grep 6953
root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
/export/apps/salt/minion-scripts/module-sync.py
jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
[asdf ~]$ 
{noformat}



  was:
Although this doesn't appear to be causing any functional issues, it is 
spamming our log files quite a bit. :)

It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
/proc/pid/stat files.

Here's the error I'm seeing (sorry about the janky json formatted log message):

{noformat}
{
source_host: asdf,
method: constructProcessInfo,
level: WARN,
message: Unexpected: procfs stat file is not in the expected format for 
process with pid 6953
file: ProcfsBasedProcessTree.java,
line_number: 514,
class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree,
  },
}
{noformat}

And here's the basic info on process with pid 6953:

{noformat}
[asdf ~]$ cat /proc/6953/stat
6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 
0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 
18446744073709551615 0 0 17 13 0 0 0 0 0
[asdf ~]$ ps aux|grep -i 6953
root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
/export/apps/salt/minion-scripts/module-sync.py
jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
[asdf ~]$ 
{noformat}




 procfs stat file is not in the expected format warning
 --

 Key: YARN-3344
 URL: https://issues.apache.org/jira/browse/YARN-3344
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jon Bringhurst

 Although this doesn't appear to be causing any functional issues, it is 
 spamming our log files quite a bit. :)
 It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
 /proc/pid/stat files.
 Here's the error I'm seeing (sorry about the janky json formatted log 
 message):
 {noformat}
 source_host: asdf,
 method: constructProcessInfo,
 level: WARN,
 message: Unexpected: procfs stat file is not in the expected format 
 for process with pid 6953
 file: ProcfsBasedProcessTree.java,
 line_number: 514,
 class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree,
 {noformat}
 And here's the basic info on process with pid 6953:
 {noformat}
 [asdf ~]$ cat /proc/6953/stat
 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 
 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 
 2 18446744073709551615 0 0 17 13 0 0 0 0 0
 [asdf ~]$ ps aux|grep 6953
 root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
 /export/apps/salt/minion-scripts/module-sync.py
 jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
 [asdf ~]$ 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning

2015-03-12 Thread Jon Bringhurst (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-3344:
-
Description: 
Although this doesn't appear to be causing any functional issues, it is 
spamming our log files quite a bit. :)

It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
/proc/pid/stat files.

Here's the error I'm seeing:

{noformat}
source_host: asdf,
method: constructProcessInfo,
level: WARN,
message: Unexpected: procfs stat file is not in the expected format for 
process with pid 6953
file: ProcfsBasedProcessTree.java,
line_number: 514,
class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree,
{noformat}

And here's the basic info on process with pid 6953:

{noformat}
[asdf ~]$ cat /proc/6953/stat
6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 
0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 
18446744073709551615 0 0 17 13 0 0 0 0 0
[asdf ~]$ ps aux|grep 6953
root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
/export/apps/salt/minion-scripts/module-sync.py
jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
[asdf ~]$ 
{noformat}

This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5.

  was:
Although this doesn't appear to be causing any functional issues, it is 
spamming our log files quite a bit. :)

It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
/proc/pid/stat files.

Here's the error I'm seeing:

{noformat}
source_host: asdf,
method: constructProcessInfo,
level: WARN,
message: Unexpected: procfs stat file is not in the expected format for 
process with pid 6953
file: ProcfsBasedProcessTree.java,
line_number: 514,
class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree,
{noformat}

And here's the basic info on process with pid 6953:

{noformat}
[asdf ~]$ cat /proc/6953/stat
6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 
0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 
18446744073709551615 0 0 17 13 0 0 0 0 0
[asdf ~]$ ps aux|grep 6953
root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
/export/apps/salt/minion-scripts/module-sync.py
jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
[asdf ~]$ 
{noformat}




 procfs stat file is not in the expected format warning
 --

 Key: YARN-3344
 URL: https://issues.apache.org/jira/browse/YARN-3344
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jon Bringhurst

 Although this doesn't appear to be causing any functional issues, it is 
 spamming our log files quite a bit. :)
 It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
 /proc/pid/stat files.
 Here's the error I'm seeing:
 {noformat}
 source_host: asdf,
 method: constructProcessInfo,
 level: WARN,
 message: Unexpected: procfs stat file is not in the expected format 
 for process with pid 6953
 file: ProcfsBasedProcessTree.java,
 line_number: 514,
 class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree,
 {noformat}
 And here's the basic info on process with pid 6953:
 {noformat}
 [asdf ~]$ cat /proc/6953/stat
 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 
 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 
 2 18446744073709551615 0 0 17 13 0 0 0 0 0
 [asdf ~]$ ps aux|grep 6953
 root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
 /export/apps/salt/minion-scripts/module-sync.py
 jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
 [asdf ~]$ 
 {noformat}
 This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2223) NPE on ResourceManager recover

2014-06-27 Thread Jon Bringhurst (JIRA)
Jon Bringhurst created YARN-2223:


 Summary: NPE on ResourceManager recover
 Key: YARN-2223
 URL: https://issues.apache.org/jira/browse/YARN-2223
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Jon Bringhurst


I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).

Both clusters have the same config (other than hostnames). Both are running on 
JDK8u5 (I'm not sure if this is a factor here).

One cluster started up without any errors. The other started up with the 
following error on the RM:

{noformat}
18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 1 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0001 with 8 attempts and final state = KILLED
18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_01 with final state: KILLED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_02 with final state: FAILED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_03 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_04 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_05 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_06 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_07 with final state: FAILED
18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_08 with final state: FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 
State change from NEW to KILLED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 
State change from NEW to FAILED
18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State change 
from NEW to KILLED
18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 2 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0002 with 8 attempts and final state = KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_01 with final state: KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_02 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_03 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_04 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_05 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_06 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_07 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_08 with final state: FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 
State change from NEW to KILLED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_03 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_04 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_05 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_06 
State 

[jira] [Updated] (YARN-2223) NPE on ResourceManager recover

2014-06-27 Thread Jon Bringhurst (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-2223:
-

Description: 
I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).

Both clusters have the same config (other than hostnames). Both are running on 
JDK8u5 (I'm not sure if this is a factor here).

One cluster started up without any errors. The other started up with the 
following error on the RM:

{noformat}
18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 1 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0001 with 8 attempts and final state = KILLED
18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_01 with final state: KILLED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_02 with final state: FAILED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_03 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_04 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_05 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_06 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_07 with final state: FAILED
18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_08 with final state: FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 
State change from NEW to KILLED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 
State change from NEW to FAILED
18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State change 
from NEW to KILLED
18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 2 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0002 with 8 attempts and final state = KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_01 with final state: KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_02 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_03 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_04 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_05 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_06 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_07 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_08 with final state: FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 
State change from NEW to KILLED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_03 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_04 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_05 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_06 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_07 
State change from NEW to FAILED

[jira] [Commented] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2014-06-24 Thread Jon Bringhurst (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042835#comment-14042835
 ] 

Jon Bringhurst commented on YARN-2093:
--

When upgrading a second time from 2.2.0 to 2.4.1-rc1, this didn't happen. So, 
this is not reproducible as far as I can tell.

 Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
 ---

 Key: YARN-2093
 URL: https://issues.apache.org/jira/browse/YARN-2093
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Jon Bringhurst

 After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:
 {noformat}
 21:19:34,308  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED
 21:19:34,309  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED
 21:19:34,310  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED
 21:19:34,310  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED
 21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
 appattempt_1400092144371_0004_09 to scheduler from user: 
 samza-perf-playground
 21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
 appattempt_1400092144371_0004_10 to scheduler from user: 
 samza-perf-playground
 21:19:34,318  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED
 21:19:34,318  INFO FairScheduler:733 - Application 
 appattempt_1400092144371_0003_05 is done. finalState=FAILED
 21:19:34,319  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED
 21:19:34,319  INFO AppSchedulingInfo:108 - Application 
 application_1400092144371_0003 requests cleared
 21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
 appattempt_1400092144371_0004_11 to scheduler from user: 
 samza-perf-playground
 21:19:34,320  INFO FairScheduler:733 - Application 
 appattempt_1400092144371_0003_06 is done. finalState=FAILED
 21:19:34,320  INFO AppSchedulingInfo:108 - Application 
 application_1400092144371_0003 requests cleared
 21:19:34,320  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED
 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
 APP_ATTEMPT_REMOVED to the scheduler
 java.lang.IllegalStateException: Given app to remove 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
  does not exist in queue [root.samza-perf-playground, demand=memory:0, 
 vCores:0, running=memory:0, vCores:0, share=memory:368640, vCores:0, 
 w=memory weight=1.0, cpu weight=1.0]
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
   at java.lang.Thread.run(Thread.java:744)
 21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
 21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
 21:19:34,437  INFO Server:2398 - Stopping server on 8033
 21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
 {noformat}
 Last commit message for this build is (branch-2.4 on 
 github.com/apache/hadoop-common):
 {noformat}
 commit 09e24d5519187c0db67aacc1992be5d43829aa1e
 Author: Arpit Agarwal a...@apache.org
 Date:   Tue May 20 20:18:46 2014 +
 HADOOP-10562. Fix CHANGES.txt entry again
 
 git-svn-id: 
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
 13f79535-47bb-0310-9956-ffa450edef68
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2014-06-23 Thread Jon Bringhurst (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041255#comment-14041255
 ] 

Jon Bringhurst commented on YARN-2194:
--

It might also be useful to have a SystemdNspawnContainerExectuor for 
yarn.nodemanager.container-executor.class. I don't know how many people would 
be interesting it using it however.

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan

 In previous versions of RedHat, we can build custom cgroup hierarchies with 
 use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2014-05-21 Thread Jon Bringhurst (JIRA)
Jon Bringhurst created YARN-2093:


 Summary: Fair Scheduler IllegalStateException after upgrade from 
2.2.0 to 2.4.1-SNAP
 Key: YARN-2093
 URL: https://issues.apache.org/jira/browse/YARN-2093
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Jon Bringhurst


After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:

{noformat}
21:19:34,308  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 
State change from SUBMITTED to SCHEDULED
21:19:34,309  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 
State change from SUBMITTED to SCHEDULED
21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_09 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_10 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 
State change from SUBMITTED to SCHEDULED
21:19:34,318  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_05 is done. finalState=FAILED
21:19:34,319  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 
State change from SUBMITTED to SCHEDULED
21:19:34,319  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_11 to scheduler from user: 
samza-perf-playground
21:19:34,320  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_06 is done. finalState=FAILED
21:19:34,320  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,320  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 
State change from SUBMITTED to SCHEDULED
21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
APP_ATTEMPT_REMOVED to the scheduler
java.lang.IllegalStateException: Given app to remove 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
 does not exist in queue [root.samza-perf-playground, demand=memory:0, 
vCores:0, running=memory:0, vCores:0, share=memory:368640, vCores:0, 
w=memory weight=1.0, cpu weight=1.0]
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
at java.lang.Thread.run(Thread.java:744)
21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
21:19:34,335  INFO log:67 - Stopped 
selectchannelconnec...@eat1-app587.stg.linkedin.com:8088
21:19:34,437  INFO Server:2398 - Stopping server on 8033
21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
{noformat}

Last commit message is (branch-2.4 on github.com/apache/hadoop-common):

{noformat}
commit 09e24d5519187c0db67aacc1992be5d43829aa1e
Author: Arpit Agarwal a...@apache.org
Date:   Tue May 20 20:18:46 2014 +

HADOOP-10562. Fix CHANGES.txt entry again

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2014-05-21 Thread Jon Bringhurst (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-2093:
-

Description: 
After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:

{noformat}
21:19:34,308  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 
State change from SUBMITTED to SCHEDULED
21:19:34,309  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 
State change from SUBMITTED to SCHEDULED
21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_09 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_10 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 
State change from SUBMITTED to SCHEDULED
21:19:34,318  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_05 is done. finalState=FAILED
21:19:34,319  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 
State change from SUBMITTED to SCHEDULED
21:19:34,319  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_11 to scheduler from user: 
samza-perf-playground
21:19:34,320  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_06 is done. finalState=FAILED
21:19:34,320  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,320  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 
State change from SUBMITTED to SCHEDULED
21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
APP_ATTEMPT_REMOVED to the scheduler
java.lang.IllegalStateException: Given app to remove 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
 does not exist in queue [root.samza-perf-playground, demand=memory:0, 
vCores:0, running=memory:0, vCores:0, share=memory:368640, vCores:0, 
w=memory weight=1.0, cpu weight=1.0]
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
at java.lang.Thread.run(Thread.java:744)
21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
21:19:34,437  INFO Server:2398 - Stopping server on 8033
21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
{noformat}

Last commit message is (branch-2.4 on github.com/apache/hadoop-common):

{noformat}
commit 09e24d5519187c0db67aacc1992be5d43829aa1e
Author: Arpit Agarwal a...@apache.org
Date:   Tue May 20 20:18:46 2014 +

HADOOP-10562. Fix CHANGES.txt entry again

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}

  was:
After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:

{noformat}
21:19:34,308  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 
State change from SUBMITTED to SCHEDULED
21:19:34,309  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 
State change from SUBMITTED to SCHEDULED
21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_09 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_10 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 
State change from SUBMITTED to SCHEDULED
21:19:34,318  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_05 is done. finalState=FAILED
21:19:34,319  INFO RMAppAttemptImpl:659 - 

[jira] [Updated] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2014-05-21 Thread Jon Bringhurst (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-2093:
-

Description: 
After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:

{noformat}
21:19:34,308  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 
State change from SUBMITTED to SCHEDULED
21:19:34,309  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 
State change from SUBMITTED to SCHEDULED
21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_09 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_10 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 
State change from SUBMITTED to SCHEDULED
21:19:34,318  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_05 is done. finalState=FAILED
21:19:34,319  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_10 
State change from SUBMITTED to SCHEDULED
21:19:34,319  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_11 to scheduler from user: 
samza-perf-playground
21:19:34,320  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_06 is done. finalState=FAILED
21:19:34,320  INFO AppSchedulingInfo:108 - Application 
application_1400092144371_0003 requests cleared
21:19:34,320  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_11 
State change from SUBMITTED to SCHEDULED
21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
APP_ATTEMPT_REMOVED to the scheduler
java.lang.IllegalStateException: Given app to remove 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
 does not exist in queue [root.samza-perf-playground, demand=memory:0, 
vCores:0, running=memory:0, vCores:0, share=memory:368640, vCores:0, 
w=memory weight=1.0, cpu weight=1.0]
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
at java.lang.Thread.run(Thread.java:744)
21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
21:19:34,437  INFO Server:2398 - Stopping server on 8033
21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
{noformat}

Last commit message for this build is (branch-2.4 on 
github.com/apache/hadoop-common):

{noformat}
commit 09e24d5519187c0db67aacc1992be5d43829aa1e
Author: Arpit Agarwal a...@apache.org
Date:   Tue May 20 20:18:46 2014 +

HADOOP-10562. Fix CHANGES.txt entry again

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}

  was:
After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:

{noformat}
21:19:34,308  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_09 
State change from SUBMITTED to SCHEDULED
21:19:34,309  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_08 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_10 
State change from SUBMITTED to SCHEDULED
21:19:34,310  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0003_11 
State change from SUBMITTED to SCHEDULED
21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_09 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
appattempt_1400092144371_0004_10 to scheduler from user: 
samza-perf-playground
21:19:34,318  INFO RMAppAttemptImpl:659 - appattempt_1400092144371_0004_09 
State change from SUBMITTED to SCHEDULED
21:19:34,318  INFO FairScheduler:733 - Application 
appattempt_1400092144371_0003_05 is done. finalState=FAILED
21:19:34,319  INFO RMAppAttemptImpl:659 - 

[jira] [Commented] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2014-05-21 Thread Jon Bringhurst (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005407#comment-14005407
 ] 

Jon Bringhurst commented on YARN-2093:
--

RM-HA is enabled. This only happened on the first start after upgrading from 
2.2.0. Starting the RM again after the first start works without error. I 
haven't tried to do an upgrade again, so I'm not sure if it's reproducible.





[ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005397#comment-14005397
 ]  


Sandy Ryza commented on YARN-2093:

--



Thanks for reporting this Jon.



Did this occur in an RM-HA setup?



Is it reproducible?
























































--

This message was sent by Atlassian JIRA

(v6.2#6252)



 Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
 ---

 Key: YARN-2093
 URL: https://issues.apache.org/jira/browse/YARN-2093
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Jon Bringhurst

 After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:
 {noformat}
 21:19:34,308  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED
 21:19:34,309  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED
 21:19:34,310  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED
 21:19:34,310  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED
 21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
 appattempt_1400092144371_0004_09 to scheduler from user: 
 samza-perf-playground
 21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
 appattempt_1400092144371_0004_10 to scheduler from user: 
 samza-perf-playground
 21:19:34,318  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED
 21:19:34,318  INFO FairScheduler:733 - Application 
 appattempt_1400092144371_0003_05 is done. finalState=FAILED
 21:19:34,319  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED
 21:19:34,319  INFO AppSchedulingInfo:108 - Application 
 application_1400092144371_0003 requests cleared
 21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
 appattempt_1400092144371_0004_11 to scheduler from user: 
 samza-perf-playground
 21:19:34,320  INFO FairScheduler:733 - Application 
 appattempt_1400092144371_0003_06 is done. finalState=FAILED
 21:19:34,320  INFO AppSchedulingInfo:108 - Application 
 application_1400092144371_0003 requests cleared
 21:19:34,320  INFO RMAppAttemptImpl:659 - 
 appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED
 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
 APP_ATTEMPT_REMOVED to the scheduler
 java.lang.IllegalStateException: Given app to remove 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
  does not exist in queue [root.samza-perf-playground, demand=memory:0, 
 vCores:0, running=memory:0, vCores:0, share=memory:368640, vCores:0, 
 w=memory weight=1.0, cpu weight=1.0]
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
   at java.lang.Thread.run(Thread.java:744)
 21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
 21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
 21:19:34,437  INFO Server:2398 - Stopping server on 8033
 21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
 {noformat}
 Last commit message for this build is (branch-2.4 on 
 github.com/apache/hadoop-common):
 {noformat}
 commit 09e24d5519187c0db67aacc1992be5d43829aa1e
 Author: Arpit Agarwal a...@apache.org
 Date:   Tue May 20 20:18:46 2014 +
 HADOOP-10562. Fix CHANGES.txt entry again
 
 git-svn-id: 
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
 13f79535-47bb-0310-9956-ffa450edef68
 {noformat}


[jira] [Updated] (YARN-1986) After upgrade from 2.2.0 to 2.4.0, NPE on first job start.

2014-05-11 Thread Jon Bringhurst (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-1986:
-

Description: 
After upgrade from 2.2.0 to 2.4.0, NPE on first job start.

-After RM was restarted, the job runs without a problem.-

{noformat}
19:11:13,441 FATAL ResourceManager:600 - Error in handling event type 
NODE_UPDATE to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
at java.lang.Thread.run(Thread.java:744)
19:11:13,443  INFO ResourceManager:604 - Exiting, bbye..
{noformat}

  was:
After upgrade from 2.2.0 to 2.4.0, NPE on first job start.

After RM was restarted, the job runs without a problem.

{noformat}
19:11:13,441 FATAL ResourceManager:600 - Error in handling event type 
NODE_UPDATE to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
at java.lang.Thread.run(Thread.java:744)
19:11:13,443  INFO ResourceManager:604 - Exiting, bbye..
{noformat}


 After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
 --

 Key: YARN-1986
 URL: https://issues.apache.org/jira/browse/YARN-1986
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jon Bringhurst
Assignee: Hong Zhiguo
 Attachments: YARN-1986-2.patch, YARN-1986-3.patch, 
 YARN-1986-testcase.patch, YARN-1986.patch


 After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
 -After RM was restarted, the job runs without a problem.-
 {noformat}
 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type 
 NODE_UPDATE to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
   at java.lang.Thread.run(Thread.java:744)
 19:11:13,443  INFO ResourceManager:604 - Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1986) After upgrade from 2.2.0 to 2.4.0, NPE on first job start.

2014-04-25 Thread Jon Bringhurst (JIRA)
Jon Bringhurst created YARN-1986:


 Summary: After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
 Key: YARN-1986
 URL: https://issues.apache.org/jira/browse/YARN-1986
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jon Bringhurst


After upgrade from 2.2.0 to 2.4.0, NPE on first job start.

After RM was restarted, the job runs without a problem.

{noformat}
19:11:13,441 FATAL ResourceManager:600 - Error in handling event type 
NODE_UPDATE to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
at java.lang.Thread.run(Thread.java:744)
19:11:13,443  INFO ResourceManager:604 - Exiting, bbye..
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)