----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57604/ -----------------------------------------------------------
Review request for Ambari, Jonathan Hurley, Nate Cole, and Vinod Kumar Vavilapalli. Bugs: AMBARI-20447 https://issues.apache.org/jira/browse/AMBARI-20447 Repository: ambari Description ------- The problem with YARN service check failure is that during Rolling upgrade from HDP-2.4 to HDP-2.6 (with YARN HA turned on): # After "core master restart" step, yarn client uses new (HDP-2.6) config and fails with Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found . Forcing yarn client to use old (HDP-2.4) config until client binary is updated helps here # After "core slave restart" step, using old YARN client config with old YARN client binary does not help. NM/RM classpath points to HDP-2.6. App job gets scheduled, but then fails with log: {code}17/03/06 16:39:27 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240) at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160) at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93) at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.serviceStart(AMRMClientAsyncImpl.java:96) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:559) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:299) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2208) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2232) ... 9 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2114) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2206) ... 10 more 17/03/06 16:39:27 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.async.AMRMClientAsync failed in state STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240) at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160) at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93) at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72) at {code} # After yarn client is updated to a new binary, service check works fine. ---- Bottom line, this is a known problem with DistributedShell - it was never fixed to not rely on cluster's configuration. What this means is that client configuration changes like this can break DistributedShell apps over upgrades. Unfortunately nothing we do now can fix this broken upgrade for DistributedShell - as to ideally fix it, we have to go back in time and provide changes. We have to do two things # Disable DistributedShell based service-check when we go from 2.4 > 2.6. The RequestHedgingRMFailoverProxyProvider is added in 2.5, so 2.5 > 2.6 is fine. # Also fix yarn-site.xml starting 2.6 with the following change to avoid this in the future. The change is from using $HADOOP_CONF_DIR which is inherited from the NodeManager to /etc/hadoop/conf/ which is always tied to the client version. {code} <property> <name>yarn.application.classpath</name> <value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value> </property> {code} Diffs ----- ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.6.xml c27b634efd ambari-server/src/main/resources/stacks/HDP/2.4/upgrades/upgrade-2.6.xml dc92c2b46f ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml ab6b2398b6 ambari-server/src/main/resources/stacks/HDP/2.6/services/YARN/configuration/yarn-site.xml 4b97148278 Diff: https://reviews.apache.org/r/57604/diff/1/ Testing ------- checked that upgrade 2.4->2.6 passes well. First my thought was that there is not need to skip YARN service check after slave restart (since Yarn 2.6 configuration is expected to be correct). But that is not the case, so I excluded YARN service check on this step. mvn clean test Thanks, Dmitro Lisnichenko