Trying to run Spark on Yarn

2014-05-23 Thread zsterone
I'm running into an authentication issue when running against YARN.  I am
using my own method to create the JAR assembly file and most likely I am
missing something.  This method used to work, but I recently ran into this
problem.  Here is the error from the YARN server:

14/05/23 19:03:02 INFO yarn.WorkerLauncher: ApplicationAttemptId:
appattempt_1400198337128_0012_01
14/05/23 19:03:02 INFO yarn.WorkerLauncher: Registering the
ApplicationMaster
14/05/23 19:03:02 ERROR security.UserGroupInformation:
PriviledgedActionException as:zsterone (auth:SIMPLE)
cause:org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN]
14/05/23 19:03:02 WARN ipc.Client: Exception encountered while connecting to
the server : org.apache.hadoop.security.AccessControlException: Client
cannot authenticate via:[TOKEN]


What are the obvious culprit dependencies for this issue.  I am running
Spark 0.9.1 with Hadoop 2.2

thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Trying-to-run-Spark-on-Yarn-tp6342.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: YARN issues with resourcemanager.scheduler.address

2014-05-02 Thread zsterone
ok, we figured it out.  It is a bit weird, but for some reason, the
YARN_CONF_DIR and HADOOP_CONF_DIR did not propagate out.  We do see it in
the build classpath, but the remote machines don't seem to get it.  So we
added:
export SPARK_YARN_USER_ENV=CLASSPATH=/hadoop/var/hadoop/conf/

and it seems to have worked.  We also made it work by adding this:
export SPARK_YARN_DIST_FILES=$(ls $HADOOP_CONF_DIR* | sed 's#^#file://#g'
|tr '\n' ',' )

which distributed the conf dir to all machines.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/YARN-issues-with-resourcemanager-scheduler-address-tp5201p5258.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


YARN issues with resourcemanager.scheduler.address

2014-05-01 Thread zsterone
Hi,

I'm trying to connect to a YARN cluster by running these commands:
export HADOOP_CONF_DIR=/hadoop/var/hadoop/conf/
export YARN_CONF_DIR=$HADOOP_CONF_DIR
export SPARK_YARN_MODE=true
export
SPARK_JAR=./assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar
export
SPARK_YARN_APP_JAR=examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar
export MASTER=yarn-client

./bin/spark-shell

This is what I have in my yarn-site.xml, I have not set
yar.resourcemanager.scheduler.address per
defaults(https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml):
configuration
property
nameyarn.resourcemanager.hostname/name
valuemy-machine/value
/property
property
nameyarn.resourcemanager.address/name
value${yarn.resourcemanager.hostname}:51176/value
/property
property
nameyarn.nodemanager.webapp.address/name
value${yarn.nodemanager.hostname}:1183/value
/property
property
nameyarn.nodemanager.aux-services/name
valuemapreduce_shuffle/value
/property
property
nameyarn.log-aggregation-enable/name
valuetrue/value
/property
property
nameyarn.application.classpath/name
value/apollo/env/ForecastPipelineHadoopCluster/lib/*/value
/property
property
nameyarn.scheduler.minimum-allocation-mb/name
value500/value
/property
property
nameyarn.nodemanager.vmem-pmem-ratio/name
value5.1/value
descriptionwe use a lot of jars which consumes a ton of
vmem/description
/property
property
nameyarn.nodemanager.resource.memory-mb/name
value24500/value
/property
property
nameyarn.resourcemanager.am.max-attempts/name
value10/value
/property
property
nameyarn.resourcemanager.nodes.exclude-path/name
   
value/apollo/env/ForecastPipelineHadoopCluster/var/hadoop/conf/exclude/resourcemanager.exclude/value
/property
property
nameyarn.scheduler.maximum-allocation-mb/name
value11000/value
descriptionThis is the maximum amount of ram that any job can ask
for. Any more and the job will be denied.
11000 is currently the largest amount of ram any job uses. If a
new job needs more ram this the team adding the job
needs to ask the Forecasting Platform team for permission to
change this number.
/description
/property
property
nameyarn.nodemanager.user-home-dir/name
   
value/apollo/env/ForecastPipelineHadoopCluster/var/hadoop/tmp//value
descriptionI'm not particularly fond of this but matlab writes to
the user's home directory. Without this variable matlab will always
segfault. /description
/property
/configuration


When I go to my-machine:8088/conf
I get the expected output:
propertynameyarn.resourcemanager.scheduler.address/namevaluemy-machine:8030/valuesourceprogramatically/source/property

however, when I try running spark-shell, my application is stuck at this
phase:

14/05/02 00:41:35 INFO yarn.Client: Submitting application to ASM
14/05/02 00:41:35 INFO impl.YarnClientImpl: Submitted application
application_1397083384516_6571 to ResourceManager at my-machine/my-ip:51176
14/05/02 00:41:35 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM: 
 appMasterRpcPort: 0
 appStartTime: 1398991295872
 yarnAppState: ACCEPTED

14/05/02 00:41:36 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM: 
 appMasterRpcPort: 0
 appStartTime: 1398991295872
 yarnAppState: ACCEPTED


and it keeps going.  When I look at the log on the resource manager UI, I
get this:
2014-05-02 02:57:31,862 INFO  [sparkYarnAM-akka.actor.default-dispatcher-2]
slf4j.Slf4jLogger (Slf4jLogger.scala:applyOrElse(80)) - Slf4jLogger started
2014-05-02 02:57:31,917 INFO  [sparkYarnAM-akka.actor.default-dispatcher-5]
Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Starting remoting
2014-05-02 02:57:32,104 INFO  [sparkYarnAM-akka.actor.default-dispatcher-2]
Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting started; listening
on addresses :[akka.tcp://sparkYarnAM@another-machine:37400]
2014-05-02 02:57:32,105 INFO  [sparkYarnAM-akka.actor.default-dispatcher-2]
Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting now listens on
addresses: [akka.tcp://sparkYarnAM@another-machine:37400]
2014-05-02 02:57:33,217 INFO  [main] client.RMProxy
(RMProxy.java:createRMProxy(56)) - *Connecting to ResourceManager at
0.0.0.0/0.0.0.0:8030*
2014-05-02 02:57:33,293 INFO  [main] yarn.WorkerLauncher
(Logging.scala:logInfo(50)) - ApplicationAttemptId:
appattempt_1397083384516_6859_01
2014-05-02 02:57:33,294 INFO  [main] yarn.WorkerLauncher
(Logging.scala:logInfo(50)) - Registering the ApplicationMaster
2014-05-02 02:57:34,330 INFO  [main] ipc.Client
(Client.java:handleConnectionFailure(783)) - Retrying connect to server: