Hi, Yarn Developers,

I am writing to report a strange case I meet here.
I was using libyarn(
https://github.com/apache/incubator-hawq/tree/master/depends/libyarn), a
kind of C++ client library to communicate with Hadoop Yarn. It acts as a
"Unmanaged Application Master", doing things such as register application,
request to allocate/release containers from Yarn, activate containers,
etc(sample app is in /libyarn/sample/c_client_main.c).

What is strange to me is, when configure timeline server and
yarn.resourcemanager.system-metrics-publisher.enabled is enable in
yarn-site.xml. Something strange happens.

If the property "yarn.resourcemanager.system-metrics-publisher.enabled" is
not enable and no timeline server is configured, things goes well.
Once Yarn RM gets a heartbeat from my application, the progress becomes
50%, 50% is hard coded in allocate request.

[gpadmin@master ~]$ yarn application -list

15/12/09 08:12:54 INFO impl.TimelineClientImpl: Timeline service address:
http://master:8188/ws/v1/timeline/

Total number of applications (application-types: [] and states: [SUBMITTED,
ACCEPTED, RUNNING]):1

                Application-Id     Application-Name     Application-Type
    User      Queue              State        Final-State        Progress
                    Tracking-URL

application_1449630430655_0007              libyarn                 YARN
gpadmin    default            RUNNING          UNDEFINED             50%
                              url

However if timeline server is configured, and
 "yarn.resourcemanager.system-metrics-publisher.enabled" is enable,
Even my application can allocate/activate/release containers, the progress
of my application is ALWAYS 0%!

I've check the log of Yarn resource manager, I find
a NullPointerException in log.
I've google this exception with such callstack. there is a question, which
is similar to mine.
http://zh.hortonworks.com/community/forums/topic/error-in-handling-event-type-registered-for-applicationattempt/

I can see below information in log in log file:
2015-12-10 02:52:19,025 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Storing attempt: AppId: application_1449744734026_0001 AttemptId:
appattempt_1449744734026_0001_000001 MasterContainer: null
...
2015-12-10 02:52:19,946 ERROR
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
handling event type REGISTERED for applicationAttempt
application_1449744734026_0001
java.lang.NullPointerException
        at
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:145)

Below is related Yarn codes:
  private void storeAttempt() {
    // store attempt data in a non-blocking manner to prevent dispatcher
    // thread starvation and wait for state to be saved
    LOG.info("Storing attempt: AppId: " +
              getAppAttemptId().getApplicationId()
              + " AttemptId: " +
              getAppAttemptId()
              + " MasterContainer: " + masterContainer);
    rmContext.getStateStore().storeNewApplicationAttempt(this);
  }

  public void appAttemptRegistered(RMAppAttempt appAttempt,
      long registeredTime) {
    if (publishSystemMetrics) {
      dispatcher.getEventHandler().handle(
          new AppAttemptRegisteredEvent(
              appAttempt.getAppAttemptId(),
              appAttempt.getHost(),
              appAttempt.getRpcPort(),
              appAttempt.getTrackingUrl(),
              appAttempt.getOriginalTrackingUrl(),
              appAttempt.getMasterContainer().getId(),
              registeredTime));
    }
  }
I guess since libyarn is a "unmanaged" application master, there is no
container allocated for it, so MasterContainer is null. But when Yarn
register this application into SystemMetricsPublisher, it requires a
container and its id. That's why this null exception happens.
In a word, if a unmanaged AM tries to register in Yarn, when timeline
server is configured and
 "yarn.resourcemanager.system-metrics-publisher.enabled" is enable, a java
NullPointerException occurs in Yarn.
I am not sure if it's my app's fault or Yarn's bug. Would you please help
me take a look at it?

Thanks,
Wen

2015-12-10 02:52:18,842 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
new applicationId: 1
2015-12-10 02:52:18,873 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
with id 1 submitted by user gpadmin
2015-12-10 02:52:18,874 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing
application with id application_1449744734026_0001
2015-12-10 02:52:18,874 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=gpadmin
 IP=10.32.34.218 OPERATION=Submit Application Request
 TARGET=ClientRMService  RESULT=SUCCESS
 APPID=application_1449744734026_0001
2015-12-10 02:52:18,883 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1449744734026_0001 State change from NEW to NEW_SAVING
2015-12-10 02:52:18,884 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Storing info for app: application_1449744734026_0001
2015-12-10 02:52:18,908 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1449744734026_0001 State change from NEW_SAVING to SUBMITTED
2015-12-10 02:52:18,910 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Application added - appId: application_1449744734026_0001 user: gpadmin
leaf-queue of parent: root #applications: 1
2015-12-10 02:52:18,911 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Accepted application application_1449744734026_0001 from user: gpadmin, in
queue: default
2015-12-10 02:52:18,969 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1449744734026_0001 State change from SUBMITTED to ACCEPTED
2015-12-10 02:52:19,004 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering app attempt : appattempt_1449744734026_0001_000001
2015-12-10 02:52:19,006 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1449744734026_0001_000001 State change from NEW to SUBMITTED
2015-12-10 02:52:19,023 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
maximum-am-resource-percent is insufficient to start a single application
in queue, it is likely set too low. skipping enforcement to allow at least
one application to start
2015-12-10 02:52:19,024 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
maximum-am-resource-percent is insufficient to start a single application
in queue for user, it is likely set too low. skipping enforcement to allow
at least one application to start
2015-12-10 02:52:19,024 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Application application_1449744734026_0001 from user: gpadmin activated in
queue: default
2015-12-10 02:52:19,024 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Application added - appId: application_1449744734026_0001 user:
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@6a7edb15,
leaf-queue: default #user-pending-applications: 0
#user-active-applications: 1 #queue-pending-applications: 0
#queue-active-applications: 1
2015-12-10 02:52:19,024 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Added Application Attempt appattempt_1449744734026_0001_000001 to scheduler
from user gpadmin in queue default
2015-12-10 02:52:19,025 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Storing attempt: AppId: application_1449744734026_0001 AttemptId:
appattempt_1449744734026_0001_000001 MasterContainer: null
2015-12-10 02:52:19,039 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1449744734026_0001_000001 State change from SUBMITTED to
LAUNCHED_UNMANAGED_SAVING
2015-12-10 02:52:19,039 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
Create AMRMToken for ApplicationAttempt:
appattempt_1449744734026_0001_000001
2015-12-10 02:52:19,042 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
Creating password for appattempt_1449744734026_0001_000001
2015-12-10 02:52:19,046 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1449744734026_0001_000001 State change from
LAUNCHED_UNMANAGED_SAVING to LAUNCHED
2015-12-10 02:52:19,936 INFO SecurityLogger.org.apache.hadoop.ipc.Server:
Auth successful for appattempt_1449744734026_0001_000001 (auth:SIMPLE)
2015-12-10 02:52:19,944 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM
registration appattempt_1449744734026_0001_000001
2015-12-10 02:52:19,945 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=gpadmin
 IP=10.32.34.218 OPERATION=Register App Master
TARGET=ApplicationMasterService RESULT=SUCCESS
 APPID=application_1449744734026_0001
 APPATTEMPTID=appattempt_1449744734026_0001_000001
2015-12-10 02:52:19,946 ERROR
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
handling event type REGISTERED for applicationAttempt
application_1449744734026_0001
java.lang.NullPointerException
        at
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:145)
        at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1405)
        at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1380)
        at
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
        at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
        at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:850)
        at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:831)
        at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
        at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
        at java.lang.Thread.run(Thread.java:745)
2015-12-10 02:52:19,947 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1449744734026_0001 State change from ACCEPTED to RUNNING
2015-12-10 02:52:19,963 ERROR
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
STATUS_UPDATE at LAUNCHED
        at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
        at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
        at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:850)
        at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:831)
        at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
        at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
        at java.lang.Thread.run(Thread.java:745)
2015-12-10 02:52:38,864 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node
not found resyncing test2:9983
2015-12-10 02:52:39,884 INFO org.apache.hadoop.yarn.util.RackResolver:
Resolved test2 to /default-rack
2015-12-10 02:52:39,886 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
NodeManager from node test2(cmPort: 9983 httpPort: 8042) registered with
capability: <memory:12288, vCores:6>, assigned nodeId test2:9983
2015-12-10 02:52:39,886 WARN
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Cannot get
RMApp by appId=application_1449655982302_0004, just added it to
finishedApplications list for cleanup
2015-12-10 02:52:39,886 WARN
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Cannot get
RMApp by appId=application_1449743958580_0001, just added it to
finishedApplications list for cleanup
2015-12-10 02:52:39,889 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: test2:9983
Node Transitioned from NEW to RUNNING

Reply via email to