Dear all, I would like to run a simple spark job on EMR with yarn.
My job is the follows: public voidEMRRun() { SparkConf sparkConf =newSparkConf().setAppName("RunEMR").setMaster("yarn-cluster"); sparkConf.set("spark.executor.memory","13000m"); JavaSparkContext ctx =newJavaSparkContext(sparkConf); System.out.println(ctx.appName()); List<Integer> list =newLinkedList<Integer>(); for(inti =0;i<10000;i++){ list.add(i); } JavaRDD<Integer> listRDD = ctx.parallelize(list); List<Integer> results = listRDD.collect(); for(Integer i : results){ System.out.println(i); } ctx.stop(); } public static voidmain(String[] args) { SparkTest sp =newSparkTest(); sp.EMRRun(); } On EMR I run the spark with spark-submit with the following:./spark-submit --class com.collokia.ml.stackoverflow.usertags.browserhistory.sparkTestJava.SparkTest --master yarn-cluster --executor-memory 512m --num-executors 10 /home/hadoop/MLyBigData.jar
After that finished I tried to see yarn log, but I got this: yarn logs -applicationId application_1418123020170_003214/12/09 20:29:26 INFO client.RMProxy: Connecting to ResourceManager at /172.31.3.155:9022
Logs not available at /tmp/logs/hadoop/logs/application_1418123020170_0032 Log aggregation has not completed or is not enabled. But I modified the yarn-site.xml as: <property><name>yarn.log-aggregation-enable</name><value>true</value></property> <property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value></property> <property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>30</value></property> I use AMI version of 3.2.3, spark 1.1.0 on hadoop 2.4 Any suggestions how can I see the logs of the yarn? Thanks, Istvan
<<attachment: nistvan.vcf>>
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org