-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57604/
-----------------------------------------------------------
Review request for Ambari, Jonathan Hurley, Nate Cole, and Vinod Kumar
Vavilapalli.
Bugs: AMBARI-20447
https://issues.apache.org/jira/browse/AMBARI-20447
Repository: ambari
Description
-------
The problem with YARN service check failure is that during Rolling upgrade from
HDP-2.4 to HDP-2.6 (with YARN HA turned on):
# After "core master restart" step, yarn client uses new (HDP-2.6) config and
fails with Class
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found .
Forcing yarn client to use old (HDP-2.4) config until client binary is updated
helps here
# After "core slave restart" step, using old YARN client config with old YARN
client binary does not help. NM/RM classpath points to HDP-2.6. App job gets
scheduled, but then fails with log:
{code}17/03/06 16:39:27 INFO service.AbstractService: Service
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state STARTED;
cause: java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
at
org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at
org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.serviceStart(AMRMClientAsyncImpl.java:96)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:559)
at
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:299)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2208)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2232)
... 9 more
Caused by: java.lang.ClassNotFoundException: Class
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2114)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2206)
... 10 more
17/03/06 16:39:27 INFO service.AbstractService: Service
org.apache.hadoop.yarn.client.api.async.AMRMClientAsync failed in state
STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
at
org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at
org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at
{code}
# After yarn client is updated to a new binary, service check works fine.
----
Bottom line, this is a known problem with DistributedShell - it was never fixed
to not rely on cluster's configuration. What this means is that client
configuration changes like this can break DistributedShell apps over upgrades.
Unfortunately nothing we do now can fix this broken upgrade for
DistributedShell - as to ideally fix it, we have to go back in time and provide
changes.
We have to do two things
# Disable DistributedShell based service-check when we go from 2.4 > 2.6. The
RequestHedgingRMFailoverProxyProvider is added in 2.5, so 2.5 > 2.6 is fine.
# Also fix yarn-site.xml starting 2.6 with the following change to avoid this
in the future. The change is from using $HADOOP_CONF_DIR which is inherited
from the NodeManager to /etc/hadoop/conf/ which is always tied to the client
version.
{code}
<property>
<name>yarn.application.classpath</name>
<value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
</property>
{code}
Diffs
-----
ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.6.xml
c27b634efd
ambari-server/src/main/resources/stacks/HDP/2.4/upgrades/upgrade-2.6.xml
dc92c2b46f
ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml
ab6b2398b6
ambari-server/src/main/resources/stacks/HDP/2.6/services/YARN/configuration/yarn-site.xml
4b97148278
Diff: https://reviews.apache.org/r/57604/diff/1/
Testing
-------
checked that upgrade 2.4->2.6 passes well.
First my thought was that there is not need to skip YARN service check after
slave restart (since Yarn 2.6 configuration is expected to be correct). But
that is not the case, so I excluded YARN service check on this step.
mvn clean test
Thanks,
Dmitro Lisnichenko