-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44712/
-----------------------------------------------------------

(Updated March 23, 2016, 7:04 p.m.)


Review request for Ambari and Andrew Onischuk.


Changes
-------

Increase timeout even more to see if it fixes an issue


Bugs: AMBARI-15389
    https://issues.apache.org/jira/browse/AMBARI-15389


Repository: ambari


Description
-------

Build # - Ambari 2.2.1.1 - #63

Observed this issue in a couple of EU runs recently where YARN service check 
reports failure
a. In one test, the EU ran from HDP 2.3.4.0 to 2.4.0.0 and YARN service check 
reported failure during EU itself; a retry of the operation led to service 
check being successful

b. In another test post EU when YARN service check was run, it reported 
failure; afterwards when I ran it again - success

Looks like there is some corner condition which causes this issue to be hit

{code}
stderr:   /var/lib/ambari-agent/data/errors-822.txt

Traceback (most recent call last):
File 
"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py",
 line 142, in <module>
ServiceCheck().execute()
File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 219, in execute
method(env)
File 
"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py",
 line 104, in service_check
user=params.smokeuser,
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/bin/kinit -kt 
/etc/security/keytabs/smokeuser.headless.keytab ambari...@example.com; yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls 
-num_containers 1 -jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar'
 returned 2. ######## Hortonworks #############
This is MOTD message, added for testing in qe infra
16/03/03 02:33:51 INFO impl.TimelineClientImpl: Timeline service address: 
http://host:8188/ws/v1/timeline/
16/03/03 02:33:51 INFO distributedshell.Client: Initializing Client
16/03/03 02:33:51 INFO distributedshell.Client: Running Client
16/03/03 02:33:51 INFO client.RMProxy: Connecting to ResourceManager at 
host-9-5.test/127.0.0.254:8050
16/03/03 02:33:53 INFO distributedshell.Client: Got Cluster metric info from 
ASM, numNodeManagers=3
16/03/03 02:33:53 INFO distributedshell.Client: Got Cluster node info from ASM
16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, 
nodeId=host:25454, nodeAddresshost:8042, nodeRackName/default-rack, 
nodeNumContainers1
16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, 
nodeId=host-9-5.test:25454, nodeAddresshost-9-5.test:8042, 
nodeRackName/default-rack, nodeNumContainers0
16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, 
nodeId=host-9-1.test:25454, nodeAddresshost-9-1.test:8042, 
nodeRackName/default-rack, nodeNumContainers0
16/03/03 02:33:53 INFO distributedshell.Client: Queue info, queueName=default, 
queueCurrentCapacity=0.083333336, queueMaxCapacity=1.0, 
queueApplicationCount=0, queueChildQueueCount=0
16/03/03 02:33:53 INFO distributedshell.Client: User ACL Info for Queue, 
queueName=root, userAcl=SUBMIT_APPLICATIONS
16/03/03 02:33:53 INFO distributedshell.Client: User ACL Info for Queue, 
queueName=default, userAcl=SUBMIT_APPLICATIONS
16/03/03 02:33:53 INFO distributedshell.Client: Max mem capabililty of 
resources in this cluster 10240
16/03/03 02:33:53 INFO distributedshell.Client: Max virtual cores capabililty 
of resources in this cluster 1
16/03/03 02:33:53 INFO distributedshell.Client: Copy App Master jar from local 
filesystem and add to local environment
16/03/03 02:33:53 INFO distributedshell.Client: Set the environment for the 
application master
16/03/03 02:33:53 INFO distributedshell.Client: Setting up app master command
16/03/03 02:33:53 INFO distributedshell.Client: Completed setting up app master 
command {{JAVA_HOME}}/bin/java -Xmx10m 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
--container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 
1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr
16/03/03 02:33:53 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 290 
for ambari-qa on 127.0.0.235:8020
16/03/03 02:33:53 INFO distributedshell.Client: Got dt for 
hdfs://host-9-1.test:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 
127.0.0.235:8020, Ident: (HDFS_DELEGATION_TOKEN token 290 for ambari-qa)
16/03/03 02:33:53 INFO distributedshell.Client: Submitting application to ASM
16/03/03 02:33:54 INFO impl.YarnClientImpl: Submitted application 
application_1456970141888_0011
16/03/03 02:33:55 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:33:56 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:33:57 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:33:58 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:33:59 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:00 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:01 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:02 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:03 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1456972434150, yarnAppState=ACCEPTED, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:04 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:05 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:06 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:07 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:08 INFO distributedshell.Client: Got application report from ASM 
for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, 
appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=FINISHED, 
distributedFinalState=FAILED, 
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, 
appUser=ambari-qa
16/03/03 02:34:08 INFO distributedshell.Client: Application did finished 
unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
loop
16/03/03 02:34:08 ERROR distributedshell.Client: Application failed to complete 
successfully
stdout:   /var/lib/ambari-agent/data/output-822.txt

2016-03-03 02:33:47,974 - Using hadoop conf dir: 
/usr/hdp/current/hadoop-client/conf
2016-03-03 02:33:48,013 - Using hadoop conf dir: 
/usr/hdp/current/hadoop-client/conf
2016-03-03 02:33:48,018 - checked_call['/usr/bin/kinit -kt 
/etc/security/keytabs/smokeuser.headless.keytab ambari...@example.com; yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls 
-num_containers 1 -jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar']
 {'path': '/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin', 'user': 'ambari-qa'}
{code}


Diffs (updated)
-----

  
ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py
 244d5d7 

Diff: https://reviews.apache.org/r/44712/diff/


Testing
-------

mvn clean test


Thanks,

Dmitro Lisnichenko

Reply via email to