RE: The auxService:spark_shuffle does not exist
Hi Andrew, Thanks for the advice. I didn't see the log in the NodeManager, so apparently, something was wrong with the yarn-site.xml configuration. After digging in more, I realize it was an user error. I'm sharing this with other people so others may know what mistake I have made. When I review the configurations, I notice that there was another property setting "yarn.nodemanager.aux-services" in mapred-site.xml. It turns out that mapred-site.xml will override the property "yarn.nodemanager.aux-services" in yarn-site.xml, because of this, spark_shuffle service was never enabled. :( err.. After deleting the redundant invalid properties in mapred-site.xml, it starts working. I see the following logs from the NodeManager. 2015-07-21 21:24:44,046 INFO org.apache.spark.network.yarn.YarnShuffleService: Initializing YARN shuffle service for Spark 2015-07-21 21:24:44,046 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Adding auxiliary service spark_shuffle, "spark_shuffle" 2015-07-21 21:24:44,264 INFO org.apache.spark.network.yarn.YarnShuffleService: Started YARN shuffle service for Spark on port 7337. Authentication is not enabled. Appreciate all and the pointers where to look at. Thanks, problem solved. Date: Tue, 21 Jul 2015 09:31:50 -0700 Subject: Re: The auxService:spark_shuffle does not exist From: and...@databricks.com To: alee...@hotmail.com CC: zjf...@gmail.com; rp...@njit.edu; user@spark.apache.org Hi Andrew, Based on your driver logs, it seems the issue is that the shuffle service is actually not running on the NodeManagers, but your application is trying to provide a "spark_shuffle" secret anyway. One way to verify whether the shuffle service is actually started is to look at the NodeManager logs for the following lines: Initializing YARN shuffle service for Spark Started YARN shuffle service for Spark on port X These should be logged under the INFO level. Also, could you verify whether all the executors have this problem, or just a subset? If even one of the NM doesn't have the shuffle service, you'll see the stack trace that you ran into. It would be good to confirm whether the yarn-site.xml change is actually reflected on all NMs if the log statements above are missing. Let me know if you can get it working. I've run the shuffle service myself on the master branch (which will become Spark 1.5.0) recently following the instructions and have not encountered any problems. -Andrew
Re: The auxService:spark_shuffle does not exist
Hi Andrew, Based on your driver logs, it seems the issue is that the shuffle service is actually not running on the NodeManagers, but your application is trying to provide a "spark_shuffle" secret anyway. One way to verify whether the shuffle service is actually started is to look at the NodeManager logs for the following lines: *Initializing YARN shuffle service for Spark* *Started YARN shuffle service for Spark on port X* These should be logged under the INFO level. Also, could you verify whether *all* the executors have this problem, or just a subset? If even one of the NM doesn't have the shuffle service, you'll see the stack trace that you ran into. It would be good to confirm whether the yarn-site.xml change is actually reflected on all NMs if the log statements above are missing. Let me know if you can get it working. I've run the shuffle service myself on the master branch (which will become Spark 1.5.0) recently following the instructions and have not encountered any problems. -Andrew
RE: The auxService:spark_shuffle does not exist
Hi Andrew Or, Yes, NodeManager was restarted, I also checked the logs to see if the JARs appear in the CLASSPATH. I have also downloaded the binary distribution and use the JAR "spark-1.4.1-bin-hadoop2.4/lib/spark-1.4.1-yarn-shuffle.jar" without success. Has anyone successfully enabled the spark_shuffle via the documentation https://spark.apache.org/docs/1.4.1/job-scheduling.html ?? I'm testing it on Hadoop 2.4.1. Any feedback or suggestion are appreciated, thanks. Date: Fri, 17 Jul 2015 15:35:29 -0700 Subject: Re: The auxService:spark_shuffle does not exist From: and...@databricks.com To: alee...@hotmail.com CC: zjf...@gmail.com; rp...@njit.edu; user@spark.apache.org Hi all, Did you forget to restart the node managers after editing yarn-site.xml by any chance? -Andrew 2015-07-17 8:32 GMT-07:00 Andrew Lee : I have encountered the same problem after following the document. Here's my spark-defaults.confspark.shuffle.service.enabled true spark.dynamicAllocation.enabled true spark.dynamicAllocation.executorIdleTimeout 60 spark.dynamicAllocation.cachedExecutorIdleTimeout 120 spark.dynamicAllocation.initialExecutors 2 spark.dynamicAllocation.maxExecutors 8 spark.dynamicAllocation.minExecutors 1 spark.dynamicAllocation.schedulerBacklogTimeout 10 and yarn-site.xml configured. yarn.nodemanager.aux-services spark_shuffle,mapreduce_shuffle ... yarn.nodemanager.aux-services.spark_shuffle.class org.apache.spark.network.yarn.YarnShuffleService and deployed the 2 JARs to NodeManager's classpath /opt/hadoop/share/hadoop/mapreduce/. (I also checked the NodeManager log and the JARs appear in the classpath). I notice that the JAR location is not the same as the document in 1.4. I found them under network/yarn/target and network/shuffle/target/ after building it with "-Phadoop-2.4 -Psparkr -Pyarn -Phive -Phive-thriftserver" in maven. spark-network-yarn_2.10-1.4.1.jar spark-network-shuffle_2.10-1.4.1.jar and still getting the following exception. Exception in thread "ContainerLauncher #0" java.lang.Error: org.apache.spark.SparkException: Exception while starting container container_1437141440985_0003_01_02 on host alee-ci-2058-slave-2.test.foo.com at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.spark.SparkException: Exception while starting container container_1437141440985_0003_01_02 on host alee-ci-2058-slave-2.test.foo.com at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:116) at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:67) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ... 2 more Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) Not sure what else am I missing here or doing wrong? Appreciate any insights or feedback, thanks. Date: Wed, 8 Jul 2015 09:25:39 +0800 Subject: Re: The auxService:spark_shuffle does not exist From: zjf...@gmail.com To: rp...@njit.edu CC: user@spark.apache.org Did you enable the dynamic resource allocation ? You can refer to this page for how to configure spark shuffle service for yarn. https://spark.apache.org/docs/1.4.0/job-scheduling.html On Tue, Jul 7, 2015 at 10:55 PM, roy wrote: we tried "--master yarn-client" with no different result. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards Jeff Zhang
Re: The auxService:spark_shuffle does not exist
Hi all, Did you forget to restart the node managers after editing yarn-site.xml by any chance? -Andrew 2015-07-17 8:32 GMT-07:00 Andrew Lee : > I have encountered the same problem after following the document. > > Here's my spark-defaults.conf > > spark.shuffle.service.enabled true > spark.dynamicAllocation.enabled true > spark.dynamicAllocation.executorIdleTimeout 60 > spark.dynamicAllocation.cachedExecutorIdleTimeout 120 > spark.dynamicAllocation.initialExecutors 2 > spark.dynamicAllocation.maxExecutors 8 > spark.dynamicAllocation.minExecutors 1 > spark.dynamicAllocation.schedulerBacklogTimeout 10 > > > > and yarn-site.xml configured. > > > yarn.nodemanager.aux-services > spark_shuffle,mapreduce_shuffle > > ... > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > > and deployed the 2 JARs to NodeManager's classpath > /opt/hadoop/share/hadoop/mapreduce/. (I also checked the NodeManager log > and the JARs appear in the classpath). I notice that the JAR location is > not the same as the document in 1.4. I found them under network/yarn/target > and network/shuffle/target/ after building it with "-Phadoop-2.4 -Psparkr > -Pyarn -Phive -Phive-thriftserver" in maven. > > > spark-network-yarn_2.10-1.4.1.jar > > spark-network-shuffle_2.10-1.4.1.jar > > > and still getting the following exception. > > Exception in thread "ContainerLauncher #0" java.lang.Error: > org.apache.spark.SparkException: Exception while starting container > container_1437141440985_0003_01_02 on host > alee-ci-2058-slave-2.test.altiscale.com > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.spark.SparkException: Exception while starting > container container_1437141440985_0003_01_02 on host > alee-ci-2058-slave-2.test.altiscale.com > at > org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:116) > at > org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:67) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ... 2 more > Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The > auxService:spark_shuffle does not exist > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) > at > org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) > > > Not sure what else am I missing here or doing wrong? > > Appreciate any insights or feedback, thanks. > > > -- > Date: Wed, 8 Jul 2015 09:25:39 +0800 > Subject: Re: The auxService:spark_shuffle does not exist > From: zjf...@gmail.com > To: rp...@njit.edu > CC: user@spark.apache.org > > > Did you enable the dynamic resource allocation ? You can refer to this > page for how to configure spark shuffle service for yarn. > > https://spark.apache.org/docs/1.4.0/job-scheduling.html > > > On Tue, Jul 7, 2015 at 10:55 PM, roy wrote: > > we tried "--master yarn-client" with no different result. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > > -- > Best Regards > > Jeff Zhang >
RE: The auxService:spark_shuffle does not exist
I have encountered the same problem after following the document. Here's my spark-defaults.confspark.shuffle.service.enabled true spark.dynamicAllocation.enabled true spark.dynamicAllocation.executorIdleTimeout 60 spark.dynamicAllocation.cachedExecutorIdleTimeout 120 spark.dynamicAllocation.initialExecutors 2 spark.dynamicAllocation.maxExecutors 8 spark.dynamicAllocation.minExecutors 1 spark.dynamicAllocation.schedulerBacklogTimeout 10 and yarn-site.xml configured. yarn.nodemanager.aux-services spark_shuffle,mapreduce_shuffle ... yarn.nodemanager.aux-services.spark_shuffle.class org.apache.spark.network.yarn.YarnShuffleService and deployed the 2 JARs to NodeManager's classpath /opt/hadoop/share/hadoop/mapreduce/. (I also checked the NodeManager log and the JARs appear in the classpath). I notice that the JAR location is not the same as the document in 1.4. I found them under network/yarn/target and network/shuffle/target/ after building it with "-Phadoop-2.4 -Psparkr -Pyarn -Phive -Phive-thriftserver" in maven. spark-network-yarn_2.10-1.4.1.jarspark-network-shuffle_2.10-1.4.1.jar and still getting the following exception. Exception in thread "ContainerLauncher #0" java.lang.Error: org.apache.spark.SparkException: Exception while starting container container_1437141440985_0003_01_02 on host alee-ci-2058-slave-2.test.altiscale.com at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.spark.SparkException: Exception while starting container container_1437141440985_0003_01_02 on host alee-ci-2058-slave-2.test.altiscale.com at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:116) at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:67) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ... 2 more Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) Not sure what else am I missing here or doing wrong? Appreciate any insights or feedback, thanks. Date: Wed, 8 Jul 2015 09:25:39 +0800 Subject: Re: The auxService:spark_shuffle does not exist From: zjf...@gmail.com To: rp...@njit.edu CC: user@spark.apache.org Did you enable the dynamic resource allocation ? You can refer to this page for how to configure spark shuffle service for yarn. https://spark.apache.org/docs/1.4.0/job-scheduling.html On Tue, Jul 7, 2015 at 10:55 PM, roy wrote: we tried "--master yarn-client" with no different result. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards Jeff Zhang
Re: The auxService:spark_shuffle does not exist
Did you enable the dynamic resource allocation ? You can refer to this page for how to configure spark shuffle service for yarn. https://spark.apache.org/docs/1.4.0/job-scheduling.html On Tue, Jul 7, 2015 at 10:55 PM, roy wrote: > we tried "--master yarn-client" with no different result. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang
Re: The auxService:spark_shuffle does not exist
we tried "--master yarn-client" with no different result. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org