Hi all, I'm investigating spark for a new project and I'm trying to use spark-jobserver because... I need to reuse and share RDDs and from what I read in the forum that's the "standard" :D
Turns out that spark-jobserver doesn't seem to work on yarn, or at least it does not on 1.1.1 My config is spark 1.1.1 (moving to 1.2.0 soon), hadoop 2.6 (which seems compatible with 2.4 from spark point of view... at least I was able to run spark-submit and shell tasks both in yarn-client and yarn-cluster modes) going back to my original point, I did some changes in spark-jobserver and how I can submit a job but I get: .... [2014-12-30 18:20:19,769] INFO e.spark.deploy.yarn.Client [] [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] - Max mem capabililty of a single resource in this cluster 15000 [2014-12-30 18:20:19,770] INFO e.spark.deploy.yarn.Client [] [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] - Preparing Local resources [2014-12-30 18:20:20,041] INFO e.spark.deploy.yarn.Client [] [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] - Prepared Local resources Map(__spark__.jar -> resource { scheme: "file" port: -1 file: "/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar" } size: 343226 timestamp: 1416429031000 type: FILE visibility: PRIVATE) [...] [2014-12-30 18:20:20,139] INFO e.spark.deploy.yarn.Client [] [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] - Yarn AM launch context: [2014-12-30 18:20:20,140] INFO e.spark.deploy.yarn.Client [] [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] - class: org.apache.spark.deploy.yarn.ExecutorLauncher [2014-12-30 18:20:20,140] INFO e.spark.deploy.yarn.Client [] [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] - env: Map(CLASSPATH -> $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$PWD/__app__.jar:$PWD/*, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 343226, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1419963137232_0001/, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE, SPARK_USER -> ec2-user, SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1416429031000, SPARK_YARN_CACHE_FILES -> file:/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar#__spark__.jar) [...] [2014-12-30 18:03:04,474] INFO YarnClientSchedulerBackend [] [akka://JobServer/user/context-supervisor/ebac0153-spark.jobserver.WordCountExample] - Application report from ASM: appMasterRpcPort: -1 appStartTime: 1419962580444 yarnAppState: FAILED [2014-12-30 18:03:04,475] ERROR .jobserver.JobManagerActor [] [akka://JobServer/user/context-supervisor/ebac0153-spark.jobserver.WordCountExample] - Failed to create context ebac0153-spark.jobserver.WordCountExample, shutting down actor org.apache.spark.SparkException: Yarn application already ended,might be killed or not able to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:117) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:93) In the hadoop console I can get the detailed issue Diagnostics: File file:/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar does not exist java.io.FileNotFoundException: File file:/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar does not exist now... it seems like spark is actually use a file I used for launching the task in other nodes Can anyone point me in the right direction of where that might be being set?