Re: Review Request 64579: Node Managers fail to start after Spark2 is patched due to CNF YarnShuffleService

2017-12-13 Thread Dmytro Grinenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64579/#review193721
---


Ship it!




Ship It!

- Dmytro Grinenko


On Dec. 13, 2017, 5:12 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64579/
> ---
> 
> (Updated Dec. 13, 2017, 5:12 p.m.)
> 
> 
> Review request for Ambari, Dmytro Grinenko, Dmitro Lisnichenko, and Nate Cole.
> 
> 
> Bugs: AMBARI-22644
> https://issues.apache.org/jira/browse/AMBARI-22644
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> *STR*
> # Deploy HDP-2.6.4.0 cluster with Ambari-2.6.1.0-114
> # Apply HBase patch Upgrade on the cluster (this step is optional)
> # Then apply Spark2 patch Upgrade on the cluster
> # Restart Node Managers
> 
> *Result*
> NM restart fails with below error:
> ```
> 2017-12-10 07:17:02,559 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(606)) - NodeManager metrics system shutdown 
> complete.
> 2017-12-10 07:17:02,559 FATAL nodemanager.NodeManager 
> (NodeManager.java:initAndStartNodeManager(549)) - Error starting NodeManager
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.network.yarn.YarnShuffleService
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:197)
> at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:165)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:131)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 8 more
> 2017-12-10 07:17:02,562 INFO  nodemanager.NodeManager 
> (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
> ```
> 
> The spark properties are correctly being written out as per AMBARI-22525.
> 
> Initially, we had defined Spark properties for ATS like this:
> ```xml
> yarn.nodemanager.aux-services.spark_shuffle.classpath
> {{stack_root}}/${hdp.version}/spark/aux/*
> ```
> 
> When YARN upgrades without Spark, we run into AMBARI-22525. Seems like the 
> shuffle classes are installed as part of RPM dependencies, but not the 
> SparkATSPlugin.
> 
> So:
> - If we use YARN's version for the Spark classes, then ATS can't find 
> SparkATSPlugin since that is not part of YARN.
> - If we use Spark's version for the classes, then Spark can never upgrade 
> without YARN since NodeManager can't find the new Spark classes. 
> 
> However, it seems like shuffle and ATS use different properties. We changed 
> all 3 properties in AMBARI-22525:
> 
> ```
> yarn.nodemanager.aux-services.spark2_shuffle.classpath
> yarn.nodemanager.aux-services.spark_shuffle.classpath
> yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath
> ```
> 
> It seems like what need to do is change the spark shuffle stuff back to 
> hdp.version, but leave ATS using the new version since we're guaranteed to 
> have Spark installed on the ATS machine.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py
>  6b5559cf91 
>   
> ambari-server/src/main/resources/stacks/HDP/2.5/services/YARN/configuration/yarn-site.xml
>  29833fbe03 
>   ambari-server/src/main/resources/stacks/HDP/2.6/upgrades/config-upgrade.xml 
> ea0e2cd46b 
> 
> 
> Diff: https://reviews.apache.org/r/64579/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing on a patched cluster with YARN/Spark
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>



Re: Review Request 64579: Node Managers fail to start after Spark2 is patched due to CNF YarnShuffleService

2017-12-13 Thread Nate Cole

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64579/#review193703
---


Ship it!




Ship It!

- Nate Cole


On Dec. 13, 2017, 12:12 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64579/
> ---
> 
> (Updated Dec. 13, 2017, 12:12 p.m.)
> 
> 
> Review request for Ambari, Dmytro Grinenko, Dmitro Lisnichenko, and Nate Cole.
> 
> 
> Bugs: AMBARI-22644
> https://issues.apache.org/jira/browse/AMBARI-22644
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> *STR*
> # Deploy HDP-2.6.4.0 cluster with Ambari-2.6.1.0-114
> # Apply HBase patch Upgrade on the cluster (this step is optional)
> # Then apply Spark2 patch Upgrade on the cluster
> # Restart Node Managers
> 
> *Result*
> NM restart fails with below error:
> ```
> 2017-12-10 07:17:02,559 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(606)) - NodeManager metrics system shutdown 
> complete.
> 2017-12-10 07:17:02,559 FATAL nodemanager.NodeManager 
> (NodeManager.java:initAndStartNodeManager(549)) - Error starting NodeManager
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.network.yarn.YarnShuffleService
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:197)
> at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:165)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:131)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 8 more
> 2017-12-10 07:17:02,562 INFO  nodemanager.NodeManager 
> (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
> ```
> 
> The spark properties are correctly being written out as per AMBARI-22525.
> 
> Initially, we had defined Spark properties for ATS like this:
> ```xml
> yarn.nodemanager.aux-services.spark_shuffle.classpath
> {{stack_root}}/${hdp.version}/spark/aux/*
> ```
> 
> When YARN upgrades without Spark, we run into AMBARI-22525. Seems like the 
> shuffle classes are installed as part of RPM dependencies, but not the 
> SparkATSPlugin.
> 
> So:
> - If we use YARN's version for the Spark classes, then ATS can't find 
> SparkATSPlugin since that is not part of YARN.
> - If we use Spark's version for the classes, then Spark can never upgrade 
> without YARN since NodeManager can't find the new Spark classes. 
> 
> However, it seems like shuffle and ATS use different properties. We changed 
> all 3 properties in AMBARI-22525:
> 
> ```
> yarn.nodemanager.aux-services.spark2_shuffle.classpath
> yarn.nodemanager.aux-services.spark_shuffle.classpath
> yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath
> ```
> 
> It seems like what need to do is change the spark shuffle stuff back to 
> hdp.version, but leave ATS using the new version since we're guaranteed to 
> have Spark installed on the ATS machine.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py
>  6b5559cf91 
>   
> ambari-server/src/main/resources/stacks/HDP/2.5/services/YARN/configuration/yarn-site.xml
>  29833fbe03 
>   ambari-server/src/main/resources/stacks/HDP/2.6/upgrades/config-upgrade.xml 
> ea0e2cd46b 
> 
> 
> Diff: https://reviews.apache.org/r/64579/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing on a patched cluster with YARN/Spark
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>



Review Request 64579: Node Managers fail to start after Spark2 is patched due to CNF YarnShuffleService

2017-12-13 Thread Jonathan Hurley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64579/
---

Review request for Ambari, Dmytro Grinenko, Dmitro Lisnichenko, and Nate Cole.


Bugs: AMBARI-22644
https://issues.apache.org/jira/browse/AMBARI-22644


Repository: ambari


Description
---

*STR*
# Deploy HDP-2.6.4.0 cluster with Ambari-2.6.1.0-114
# Apply HBase patch Upgrade on the cluster (this step is optional)
# Then apply Spark2 patch Upgrade on the cluster
# Restart Node Managers

*Result*
NM restart fails with below error:
```
2017-12-10 07:17:02,559 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:shutdown(606)) - NodeManager metrics system shutdown 
complete.
2017-12-10 07:17:02,559 FATAL nodemanager.NodeManager 
(NodeManager.java:initAndStartNodeManager(549)) - Error starting NodeManager
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.network.yarn.YarnShuffleService
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 
org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:197)
at 
org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:165)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:131)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 8 more
2017-12-10 07:17:02,562 INFO  nodemanager.NodeManager 
(LogAdapter.java:info(45)) - SHUTDOWN_MSG:
```

The spark properties are correctly being written out as per AMBARI-22525.

Initially, we had defined Spark properties for ATS like this:
```xml
yarn.nodemanager.aux-services.spark_shuffle.classpath
{{stack_root}}/${hdp.version}/spark/aux/*
```

When YARN upgrades without Spark, we run into AMBARI-22525. Seems like the 
shuffle classes are installed as part of RPM dependencies, but not the 
SparkATSPlugin.

So:
- If we use YARN's version for the Spark classes, then ATS can't find 
SparkATSPlugin since that is not part of YARN.
- If we use Spark's version for the classes, then Spark can never upgrade 
without YARN since NodeManager can't find the new Spark classes. 

However, it seems like shuffle and ATS use different properties. We changed all 
3 properties in AMBARI-22525:

```
yarn.nodemanager.aux-services.spark2_shuffle.classpath
yarn.nodemanager.aux-services.spark_shuffle.classpath
yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath
```

It seems like what need to do is change the spark shuffle stuff back to 
hdp.version, but leave ATS using the new version since we're guaranteed to have 
Spark installed on the ATS machine.


Diffs
-

  
ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/hbase.py
 ac71ce4b36 


Diff: https://reviews.apache.org/r/64579/diff/1/


Testing
---

Manual testing on a patched cluster with YARN/Spark


Thanks,

Jonathan Hurley