Getting started : Spark on YARN issue

Praveen Seluka Thu, 19 Jun 2014 06:04:46 -0700

I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN  +
HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles.
Now am trying to run the example Spark job . (In Yarn-cluster mode).


>From my *local machine. *I have setup HADOOP_CONF_DIR environment variable
correctly.

➜  spark git:(master) ✗ /bin/bash -c "./bin/spark-submit --class
org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 2
--driver-memory 2g --executor-memory 2g --executor-cores 1
examples/target/scala-2.10/spark-examples_*.jar 10"
14/06/19 14:59:39 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/06/19 14:59:39 INFO client.RMProxy: Connecting to ResourceManager at
ec2-54-242-244-250.compute-1.amazonaws.com/54.242.244.250:8050
14/06/19 14:59:41 INFO yarn.Client: Got Cluster metric info from
ApplicationsManager (ASM), number of NodeManagers: 1
14/06/19 14:59:41 INFO yarn.Client: Queue info ... queueName: default,
queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
      queueApplicationCount = 0, queueChildQueueCount = 0
14/06/19 14:59:41 INFO yarn.Client: Max mem capabililty of a single
resource in this cluster 12288
14/06/19 14:59:41 INFO yarn.Client: Preparing Local resources
14/06/19 14:59:42 WARN hdfs.BlockReaderLocal: The short-circuit local reads
feature cannot be used because libhadoop cannot be loaded.
14/06/19 14:59:43 INFO yarn.Client: Uploading
file:/home/rgupta/awesome/spark/examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar
to hdfs://
ec2-54-242-244-250.compute-1.amazonaws.com:8020/user/rgupta/.sparkStaging/application_1403176373037_0009/spark-examples_2.10-1.0.0-SNAPSHOT.jar
14/06/19 15:00:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while
waiting for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending remote=/
10.180.150.66:50010]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
        at
org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
14/06/19 15:00:45 INFO hdfs.DFSClient: Abandoning
BP-1714253233-10.180.215.105-1403176367942:blk_1073741833_1009
14/06/19 15:00:46 INFO hdfs.DFSClient: Excluding datanode
10.180.150.66:50010
14/06/19 15:00:46 WARN hdfs.DFSClient: DataStreamer Exception

Its able to talk to Resource Manager
Then it puts the example.jar file to HDFS and it fails. Its trying to write
to datanode. I verified that 50010 port is accessible through local
machine. Any idea whats the issue here ?
One thing thats suspicious is */10.180.150.66:50010
<http://10.180.150.66:50010> - it looks like its trying to connect using
private IP. If so, how can I resolve this to use public IP.*

Thanks
Praveen

Getting started : Spark on YARN issue

Reply via email to