RE: The auxService:spark_shuffle does not exist

2015-07-21 Thread Andrew Lee
Hi Andrew,
Thanks for the advice. I didn't see the log in the NodeManager, so apparently, 
something was wrong with the yarn-site.xml configuration.
After digging in more, I realize it was an user error. I'm sharing this with 
other people so others may know what mistake I have made.
When I review the configurations, I notice that there was another property 
setting "yarn.nodemanager.aux-services" in mapred-site.xml. It turns out that 
mapred-site.xml will override the property "yarn.nodemanager.aux-services" in 
yarn-site.xml, because of this, spark_shuffle service was never enabled.  :(  
err.. 
















After deleting the redundant invalid properties in mapred-site.xml, it starts 
working. I see the following logs from the NodeManager.









2015-07-21 21:24:44,046 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Initializing YARN shuffle service for Spark
2015-07-21 21:24:44,046 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Adding 
auxiliary service spark_shuffle, "spark_shuffle"
2015-07-21 21:24:44,264 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Started YARN shuffle service for Spark on port 7337. Authentication is not 
enabled.

Appreciate all and the pointers where to look at. Thanks, problem solved.



Date: Tue, 21 Jul 2015 09:31:50 -0700
Subject: Re: The auxService:spark_shuffle does not exist
From: and...@databricks.com
To: alee...@hotmail.com
CC: zjf...@gmail.com; rp...@njit.edu; user@spark.apache.org

Hi Andrew,
Based on your driver logs, it seems the issue is that the shuffle service is 
actually not running on the NodeManagers, but your application is trying to 
provide a "spark_shuffle" secret anyway. One way to verify whether the shuffle 
service is actually started is to look at the NodeManager logs for the 
following lines:
Initializing YARN shuffle service for Spark
Started YARN shuffle service for Spark on port X

These should be logged under the INFO level. Also, could you verify whether all 
the executors have this problem, or just a subset? If even one of the NM 
doesn't have the shuffle service, you'll see the stack trace that you ran into. 
It would be good to confirm whether the yarn-site.xml change is actually 
reflected on all NMs if the log statements above are missing.

Let me know if you can get it working. I've run the shuffle service myself on 
the master branch (which will become Spark 1.5.0) recently following the 
instructions and have not encountered any problems.
-Andrew   

Re: The auxService:spark_shuffle does not exist

2015-07-21 Thread Andrew Or
Hi Andrew,

Based on your driver logs, it seems the issue is that the shuffle service
is actually not running on the NodeManagers, but your application is trying
to provide a "spark_shuffle" secret anyway. One way to verify whether the
shuffle service is actually started is to look at the NodeManager logs for
the following lines:

*Initializing YARN shuffle service for Spark*
*Started YARN shuffle service for Spark on port X*

These should be logged under the INFO level. Also, could you verify whether
*all* the executors have this problem, or just a subset? If even one of the
NM doesn't have the shuffle service, you'll see the stack trace that you
ran into. It would be good to confirm whether the yarn-site.xml change is
actually reflected on all NMs if the log statements above are missing.

Let me know if you can get it working. I've run the shuffle service myself
on the master branch (which will become Spark 1.5.0) recently following the
instructions and have not encountered any problems.

-Andrew


RE: The auxService:spark_shuffle does not exist

2015-07-21 Thread Andrew Lee
Hi Andrew Or,
Yes, NodeManager was restarted, I also checked the logs to see if the JARs 
appear in the CLASSPATH.
I have also downloaded the binary distribution and use the JAR 
"spark-1.4.1-bin-hadoop2.4/lib/spark-1.4.1-yarn-shuffle.jar" without success.
Has anyone successfully enabled the spark_shuffle via the documentation 
https://spark.apache.org/docs/1.4.1/job-scheduling.html ??
I'm testing it on Hadoop 2.4.1.
Any feedback or suggestion are appreciated, thanks.

Date: Fri, 17 Jul 2015 15:35:29 -0700
Subject: Re: The auxService:spark_shuffle does not exist
From: and...@databricks.com
To: alee...@hotmail.com
CC: zjf...@gmail.com; rp...@njit.edu; user@spark.apache.org

Hi all,
Did you forget to restart the node managers after editing yarn-site.xml by any 
chance?
-Andrew
2015-07-17 8:32 GMT-07:00 Andrew Lee :



I have encountered the same problem after following the document.
Here's my spark-defaults.confspark.shuffle.service.enabled true
spark.dynamicAllocation.enabled  true
spark.dynamicAllocation.executorIdleTimeout 60
spark.dynamicAllocation.cachedExecutorIdleTimeout 120
spark.dynamicAllocation.initialExecutors 2
spark.dynamicAllocation.maxExecutors 8
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.schedulerBacklogTimeout 10

and yarn-site.xml configured.

yarn.nodemanager.aux-services
spark_shuffle,mapreduce_shuffle

...

yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService

and deployed the 2 JARs to NodeManager's classpath 
/opt/hadoop/share/hadoop/mapreduce/. (I also checked the NodeManager log and 
the JARs appear in the classpath). I notice that the JAR location is not the 
same as the document in 1.4. I found them under network/yarn/target and 
network/shuffle/target/ after building it with "-Phadoop-2.4 -Psparkr -Pyarn 
-Phive -Phive-thriftserver" in maven.

















spark-network-yarn_2.10-1.4.1.jar
spark-network-shuffle_2.10-1.4.1.jar


and still getting the following exception.
Exception in thread "ContainerLauncher #0" java.lang.Error: 
org.apache.spark.SparkException: Exception while starting container 
container_1437141440985_0003_01_02 on host alee-ci-2058-slave-2.test.foo.com
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.SparkException: Exception while starting container 
container_1437141440985_0003_01_02 on host alee-ci-2058-slave-2.test.foo.com
at 
org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:116)
at 
org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:67)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
... 2 more
Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
auxService:spark_shuffle does not exist
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
at 
org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
Not sure what else am I missing here or doing wrong?
Appreciate any insights or feedback, thanks.

Date: Wed, 8 Jul 2015 09:25:39 +0800
Subject: Re: The auxService:spark_shuffle does not exist
From: zjf...@gmail.com
To: rp...@njit.edu
CC: user@spark.apache.org

Did you enable the dynamic resource allocation ? You can refer to this page for 
how to configure spark shuffle service for yarn.
https://spark.apache.org/docs/1.4.0/job-scheduling.html 
On Tue, Jul 7, 2015 at 10:55 PM, roy  wrote:
we tried "--master yarn-client" with no different result.







--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.



-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org





-- 
Best Regards

Jeff Zhang
  

  

Re: The auxService:spark_shuffle does not exist

2015-07-17 Thread Andrew Or
Hi all,

Did you forget to restart the node managers after editing yarn-site.xml by
any chance?

-Andrew

2015-07-17 8:32 GMT-07:00 Andrew Lee :

> I have encountered the same problem after following the document.
>
> Here's my spark-defaults.conf
>
> spark.shuffle.service.enabled true
> spark.dynamicAllocation.enabled  true
> spark.dynamicAllocation.executorIdleTimeout 60
> spark.dynamicAllocation.cachedExecutorIdleTimeout 120
> spark.dynamicAllocation.initialExecutors 2
> spark.dynamicAllocation.maxExecutors 8
> spark.dynamicAllocation.minExecutors 1
> spark.dynamicAllocation.schedulerBacklogTimeout 10
>
>
>
> and yarn-site.xml configured.
>
> 
> yarn.nodemanager.aux-services
> spark_shuffle,mapreduce_shuffle
> 
> ...
> 
> yarn.nodemanager.aux-services.spark_shuffle.class
> org.apache.spark.network.yarn.YarnShuffleService
> 
>
>
> and deployed the 2 JARs to NodeManager's classpath
> /opt/hadoop/share/hadoop/mapreduce/. (I also checked the NodeManager log
> and the JARs appear in the classpath). I notice that the JAR location is
> not the same as the document in 1.4. I found them under network/yarn/target
> and network/shuffle/target/ after building it with "-Phadoop-2.4 -Psparkr
> -Pyarn -Phive -Phive-thriftserver" in maven.
>
>
> spark-network-yarn_2.10-1.4.1.jar
>
> spark-network-shuffle_2.10-1.4.1.jar
>
>
> and still getting the following exception.
>
> Exception in thread "ContainerLauncher #0" java.lang.Error: 
> org.apache.spark.SparkException: Exception while starting container 
> container_1437141440985_0003_01_02 on host 
> alee-ci-2058-slave-2.test.altiscale.com
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.spark.SparkException: Exception while starting 
> container container_1437141440985_0003_01_02 on host 
> alee-ci-2058-slave-2.test.altiscale.com
>   at 
> org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:116)
>   at 
> org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:67)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   ... 2 more
> Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
> auxService:spark_shuffle does not exist
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>
>
> Not sure what else am I missing here or doing wrong?
>
> Appreciate any insights or feedback, thanks.
>
>
> --
> Date: Wed, 8 Jul 2015 09:25:39 +0800
> Subject: Re: The auxService:spark_shuffle does not exist
> From: zjf...@gmail.com
> To: rp...@njit.edu
> CC: user@spark.apache.org
>
>
> Did you enable the dynamic resource allocation ? You can refer to this
> page for how to configure spark shuffle service for yarn.
>
> https://spark.apache.org/docs/1.4.0/job-scheduling.html
>
>
> On Tue, Jul 7, 2015 at 10:55 PM, roy  wrote:
>
> we tried "--master yarn-client" with no different result.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


RE: The auxService:spark_shuffle does not exist

2015-07-17 Thread Andrew Lee
I have encountered the same problem after following the document.
Here's my spark-defaults.confspark.shuffle.service.enabled true
spark.dynamicAllocation.enabled  true
spark.dynamicAllocation.executorIdleTimeout 60
spark.dynamicAllocation.cachedExecutorIdleTimeout 120
spark.dynamicAllocation.initialExecutors 2
spark.dynamicAllocation.maxExecutors 8
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.schedulerBacklogTimeout 10

and yarn-site.xml configured.

yarn.nodemanager.aux-services
spark_shuffle,mapreduce_shuffle

...

yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService

and deployed the 2 JARs to NodeManager's classpath 
/opt/hadoop/share/hadoop/mapreduce/. (I also checked the NodeManager log and 
the JARs appear in the classpath). I notice that the JAR location is not the 
same as the document in 1.4. I found them under network/yarn/target and 
network/shuffle/target/ after building it with "-Phadoop-2.4 -Psparkr -Pyarn 
-Phive -Phive-thriftserver" in maven.
















spark-network-yarn_2.10-1.4.1.jarspark-network-shuffle_2.10-1.4.1.jar

and still getting the following exception.
Exception in thread "ContainerLauncher #0" java.lang.Error: 
org.apache.spark.SparkException: Exception while starting container 
container_1437141440985_0003_01_02 on host 
alee-ci-2058-slave-2.test.altiscale.com
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.SparkException: Exception while starting container 
container_1437141440985_0003_01_02 on host 
alee-ci-2058-slave-2.test.altiscale.com
at 
org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:116)
at 
org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:67)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
... 2 more
Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
auxService:spark_shuffle does not exist
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
at 
org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
Not sure what else am I missing here or doing wrong?
Appreciate any insights or feedback, thanks.

Date: Wed, 8 Jul 2015 09:25:39 +0800
Subject: Re: The auxService:spark_shuffle does not exist
From: zjf...@gmail.com
To: rp...@njit.edu
CC: user@spark.apache.org

Did you enable the dynamic resource allocation ? You can refer to this page for 
how to configure spark shuffle service for yarn.
https://spark.apache.org/docs/1.4.0/job-scheduling.html 
On Tue, Jul 7, 2015 at 10:55 PM, roy  wrote:
we tried "--master yarn-client" with no different result.







--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.



-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org





-- 
Best Regards

Jeff Zhang
  

Re: The auxService:spark_shuffle does not exist

2015-07-07 Thread Jeff Zhang
Did you enable the dynamic resource allocation ? You can refer to this page
for how to configure spark shuffle service for yarn.

https://spark.apache.org/docs/1.4.0/job-scheduling.html


On Tue, Jul 7, 2015 at 10:55 PM, roy  wrote:

> we tried "--master yarn-client" with no different result.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Best Regards

Jeff Zhang


Re: The auxService:spark_shuffle does not exist

2015-07-07 Thread roy
we tried "--master yarn-client" with no different result.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org