I have Apache Mesos 0.22.1 cluster (3 masters & 5 slaves), running Cloudera HDFS (2.5.0-cdh5.3.1) in HA configuration and Spark 1.5.1 framework.
When I try to spark-submit compiled HdfsTest.scala example app (from Spark 1.5.1 sources) - it fails with "java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs" error in executor logs. This error is only observed when I pass HDFS HA Path as an argument ("hdfs://hdfs/"), when I pass "hdfs://namenode1.hdfs.mesos:50071/tesfile" - everything works fine. What I've found after enabling TRACE logging is that Spark driver actually reads hdfs://hdfs URL correctly, but Spark executor - doesn't. My Scala app code: import org.apache.spark._ object HdfsTest { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName("HdfsTest") val sc = new SparkContext(sparkConf) val file = sc.textFile(args(0)) val mapped = file.map(s => s.length).cache() for (iter <- 1 to 10) { val start = System.currentTimeMillis() for (x <- mapped) { x + 2 } val end = System.currentTimeMillis() println("Iteration " + iter + " took " + (end-start) + " ms") } sc.stop() } } I compile this code and submit jar file to Spark in cluster mode: /opt/spark/bin/spark-submit --deploy-mode cluster --class com.cisco.hdfs.HdfsTest http://1.2.3.4/HdfsTest-0.0.1.jar hdfs://hdfs/testfile My spark-defaults.conf file: spark.master spark://1.2.3.4:7077 spark.eventLog.enabled true spark.driver.memory 1g My spark-env.sh file: export HADOOP_HOME=/opt/spark export HADOOP_CONF_DIR=/opt/spark/conf I have spark deployed on each slave in /opt/spark directory. I can accesses HDFS using "hdfs dfs -ls hdfs://hdfs/" command in console, without the need to specify active namenode address and port. core-site.xml: ---------------------------------------------------------------------- <configuration> <property> <name>fs.default.name</name> <value>hdfs://hdfs</value> </property> </configuration> hdfs-site.xml: ---------------------------------------------------------------------- <configuration> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.nameservice.id</name> <value>hdfs</value> </property> <property> <name>dfs.nameservices</name> <value>hdfs</value> </property> <property> <name>dfs.ha.namenodes.hdfs</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.hdfs.nn1</name> <value>namenode1.hdfs.mesos:50071</value> </property> <property> <name>dfs.namenode.http-address.hdfs.nn1</name> <value>namenode1.hdfs.mesos:50070</value> </property> <property> <name>dfs.namenode.rpc-address.hdfs.nn2</name> <value>namenode2.hdfs.mesos:50071</value> </property> <property> <name>dfs.namenode.http-address.hdfs.nn2</name> <value>namenode2.hdfs.mesos:50070</value> </property> <property> <name>dfs.client.failover.proxy.provider.hdfs</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider </value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://journalnode1.hdfs.mesos:8485;journalnode2.hdfs.mesos:8485;journalnode3.hdfs.mesos:8485/hdfs</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>master.mesos:2181</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/var/lib/hdfs/data/jn</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///var/lib/hdfs/data/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///var/lib/hdfs/data/data</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.datanode.du.reserved</name> <value>10485760</value> </property> <property> <name>dfs.datanode.balance.bandwidthPerSec</name> <value>41943040</value> </property> <property> <name>dfs.namenode.safemode.threshold-pct</name> <value>0.90</value> </property> <property> <name>dfs.namenode.heartbeat.recheck-interval</name> <value>60000</value> </property> <property> <name>dfs.datanode.handler.count</name> <value>10</value> </property> <property> <name>dfs.namenode.handler.count</name> <value>20</value> </property> <property> <name>dfs.image.compress</name> <value>true</value> </property> <property> <name>dfs.image.compression.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> <property> <name>dfs.namenode.invalidate.work.pct.per.iteration</name> <value>0.35f</value> </property> <property> <name>dfs.namenode.replication.work.multiplier.per.iteration</name> <value>4</value> </property> <property> <name>dfs.namenode.datanode.registration.ip-hostname-check</name> <value>false</value> </property> <property> <name>dfs.client.read.shortcircuit</name> <value>true</value> </property> <property> <name>dfs.client.read.shortcircuit.streams.cache.size</name> <value>1000</value> </property> <property> <name>dfs.client.read.shortcircuit.streams.cache.size.expiry.ms</name> <value>1000</value> </property> <property> <name>dfs.domain.socket.path</name> <value>/var/run/hadoop-hdfs/dn._PORT</value> </property> </configuration> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Accessing-HDFS-HA-from-spark-job-UnknownHostException-error-tp25092.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org