Dear all,

I would like to run a simple spark job on EMR with yarn.

My job is the follows:

public voidEMRRun() {
    SparkConf sparkConf 
=newSparkConf().setAppName("RunEMR").setMaster("yarn-cluster");
    sparkConf.set("spark.executor.memory","13000m");
    JavaSparkContext ctx =newJavaSparkContext(sparkConf);
    System.out.println(ctx.appName());

    List<Integer> list =newLinkedList<Integer>();
    for(inti =0;i<10000;i++){
        list.add(i);
    }

    JavaRDD<Integer> listRDD = ctx.parallelize(list);
    List<Integer> results = listRDD.collect();

    for(Integer i : results){
        System.out.println(i);
    }

    ctx.stop();

}

public static voidmain(String[] args) {
    SparkTest sp =newSparkTest();
    sp.EMRRun();
}


On EMR I run the spark with spark-submit with the following:

./spark-submit --class com.collokia.ml.stackoverflow.usertags.browserhistory.sparkTestJava.SparkTest --master yarn-cluster --executor-memory 512m --num-executors 10 /home/hadoop/MLyBigData.jar

After that finished I tried to see yarn log, but I got this:
 yarn logs -applicationId application_1418123020170_0032
14/12/09 20:29:26 INFO client.RMProxy: Connecting to ResourceManager at /172.31.3.155:9022
Logs not available at /tmp/logs/hadoop/logs/application_1418123020170_0032
Log aggregation has not completed or is not enabled.

But I modified the yarn-site.xml as:
<property><name>yarn.log-aggregation-enable</name><value>true</value></property>
<property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value></property>
<property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>30</value></property>

I use AMI version of 3.2.3, spark 1.1.0 on hadoop 2.4

Any suggestions how can I see the logs of the yarn?
Thanks,
Istvan

<<attachment: nistvan.vcf>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to