I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN + HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles. Now am trying to run the example Spark job . (In Yarn-cluster mode).
>From my *local machine. *I have setup HADOOP_CONF_DIR environment variable correctly. ➜ spark git:(master) ✗ /bin/bash -c "./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 2 --driver-memory 2g --executor-memory 2g --executor-cores 1 examples/target/scala-2.10/spark-examples_*.jar 10" 14/06/19 14:59:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/06/19 14:59:39 INFO client.RMProxy: Connecting to ResourceManager at ec2-54-242-244-250.compute-1.amazonaws.com/54.242.244.250:8050 14/06/19 14:59:41 INFO yarn.Client: Got Cluster metric info from ApplicationsManager (ASM), number of NodeManagers: 1 14/06/19 14:59:41 INFO yarn.Client: Queue info ... queueName: default, queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0, queueApplicationCount = 0, queueChildQueueCount = 0 14/06/19 14:59:41 INFO yarn.Client: Max mem capabililty of a single resource in this cluster 12288 14/06/19 14:59:41 INFO yarn.Client: Preparing Local resources 14/06/19 14:59:42 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 14/06/19 14:59:43 INFO yarn.Client: Uploading file:/home/rgupta/awesome/spark/examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar to hdfs:// ec2-54-242-244-250.compute-1.amazonaws.com:8020/user/rgupta/.sparkStaging/application_1403176373037_0009/spark-examples_2.10-1.0.0-SNAPSHOT.jar 14/06/19 15:00:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/ 10.180.150.66:50010] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) 14/06/19 15:00:45 INFO hdfs.DFSClient: Abandoning BP-1714253233-10.180.215.105-1403176367942:blk_1073741833_1009 14/06/19 15:00:46 INFO hdfs.DFSClient: Excluding datanode 10.180.150.66:50010 14/06/19 15:00:46 WARN hdfs.DFSClient: DataStreamer Exception Its able to talk to Resource Manager Then it puts the example.jar file to HDFS and it fails. Its trying to write to datanode. I verified that 50010 port is accessible through local machine. Any idea whats the issue here ? One thing thats suspicious is */10.180.150.66:50010 <http://10.180.150.66:50010> - it looks like its trying to connect using private IP. If so, how can I resolve this to use public IP.* Thanks Praveen