Re: UnknownHostException with Mesos and custom Jar
That's strange, for some reason your hadoop configurations are not picked up by spark. Thanks Best Regards On Wed, Sep 30, 2015 at 9:11 PM, Stephen Hankinson wrote: > When I use hdfs://affinio/tmp/Input it gives the same error about > UnknownHostException affinio. > > However, from the command line I can run hdfs dfs -ls /tmp/Input or hdfs > dfs -ls hdfs://affinio/tmp/Input and they work correctly. > > See more details here: > http://stackoverflow.com/questions/32833860/unknownhostexception-with-mesos-spark-and-custom-jar > > Stephen Hankinson, P. Eng. > CTO > Affinio Inc. > 301 - 211 Horseshoe Lake Dr. > Halifax, Nova Scotia, Canada > B3S 0B9 > > http://www.affinio.com > > On Wed, Sep 30, 2015 at 4:21 AM, Akhil Das > wrote: > >> Can you try replacing your code with the hdfs uri? like: >> >> sc.textFile("hdfs://...").collect().foreach(println) >> >> Thanks >> Best Regards >> >> On Tue, Sep 29, 2015 at 1:45 AM, Stephen Hankinson >> wrote: >> >>> Hi, >>> >>> Wondering if anyone can help me with the issue I am having. >>> >>> I am receiving an UnknownHostException when running a custom jar with >>> Spark on Mesos. The issue does not happen when running spark-shell. >>> >>> My spark-env.sh contains the following: >>> >>> export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so >>> >>> export HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop/ >>> >>> My spark-defaults.conf contains the following: >>> >>> spark.master mesos://zk://172.31.0.81:2181, >>> 172.31.16.81:2181,172.31.32.81:2181/mesos >>> >>> spark.mesos.executor.home /spark-1.5.0-bin-hadoop2.6/ >>> >>> Starting spark-shell as follows and running the following line works >>> correctly: >>> >>> /spark-1.5.0-bin-hadoop2.6/bin/spark-shell >>> >>> sc.textFile("/tmp/Input").collect.foreach(println) >>> >>> 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(88528) >>> called with curMem=0, maxMem=556038881 >>> >>> 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_0 stored as >>> values in memory (estimated size 86.5 KB, free 530.2 MB) >>> >>> 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(20236) >>> called with curMem=88528, maxMem=556038881 >>> >>> 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_0_piece0 >>> stored as bytes in memory (estimated size 19.8 KB, free 530.2 MB) >>> >>> 15/09/28 20:04:49 INFO storage.BlockManagerInfo: Added >>> broadcast_0_piece0 in memory on 172.31.21.104:49048 (size: 19.8 KB, >>> free: 530.3 MB) >>> >>> 15/09/28 20:04:49 INFO spark.SparkContext: Created broadcast 0 from >>> textFile at :22 >>> >>> 15/09/28 20:04:49 INFO mapred.FileInputFormat: Total input paths to >>> process : 1 >>> >>> 15/09/28 20:04:49 INFO spark.SparkContext: Starting job: collect at >>> :22 >>> >>> 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Got job 0 (collect at >>> :22) with 3 output partitions >>> >>> 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Final stage: ResultStage >>> 0(collect at :22) >>> >>> 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Parents of final stage: >>> List() >>> >>> 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Missing parents: List() >>> >>> 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Submitting ResultStage 0 >>> (MapPartitionsRDD[1] at textFile at :22), which has no missing >>> parents >>> >>> 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(3120) called >>> with curMem=108764, maxMem=556038881 >>> >>> 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_1 stored as >>> values in memory (estimated size 3.0 KB, free 530.2 MB) >>> >>> 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(1784) called >>> with curMem=111884, maxMem=556038881 >>> >>> 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_1_piece0 >>> stored as bytes in memory (estimated size 1784.0 B, free 530.2 MB) >>> >>> 15/09/28 20:04:49 INFO storage.BlockManagerInfo: Added >>> broadcast_1_piece0 in memory on 172.31.21.104:49048 (size: 1784.0 B, >>> free: 530.3 MB) >>> >>> 15/09/28 20:04:49 INFO spark.SparkContext: Created broadcast 1 from >>> broadcast at DAGScheduler.scala:861 >>> >>> 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Submitting 3 missing >>> tasks from ResultStage 0 (MapPartitionsRDD[1] at textFile at :22) >>> >>> 15/09/28 20:04:49 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 >>> with 3 tasks >>> >>> 15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 0.0 in >>> stage 0.0 (TID 0, ip-172-31-37-82.us-west-2.compute.internal, NODE_LOCAL, >>> 2142 bytes) >>> >>> 15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 1.0 in >>> stage 0.0 (TID 1, ip-172-31-21-104.us-west-2.compute.internal, NODE_LOCAL, >>> 2142 bytes) >>> >>> 15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 2.0 in >>> stage 0.0 (TID 2, ip-172-31-4-4.us-west-2.compute.internal, NODE_LOCAL, >>> 2142 bytes) >>> >>> 15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering >>> block manager ip-172-31-4-4.us-west-2.compute.inte
Re: UnknownHostException with Mesos and custom Jar
Can you try replacing your code with the hdfs uri? like: sc.textFile("hdfs://...").collect().foreach(println) Thanks Best Regards On Tue, Sep 29, 2015 at 1:45 AM, Stephen Hankinson wrote: > Hi, > > Wondering if anyone can help me with the issue I am having. > > I am receiving an UnknownHostException when running a custom jar with > Spark on Mesos. The issue does not happen when running spark-shell. > > My spark-env.sh contains the following: > > export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so > > export HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop/ > > My spark-defaults.conf contains the following: > > spark.master mesos://zk://172.31.0.81:2181, > 172.31.16.81:2181,172.31.32.81:2181/mesos > > spark.mesos.executor.home /spark-1.5.0-bin-hadoop2.6/ > > Starting spark-shell as follows and running the following line works > correctly: > > /spark-1.5.0-bin-hadoop2.6/bin/spark-shell > > sc.textFile("/tmp/Input").collect.foreach(println) > > 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(88528) called > with curMem=0, maxMem=556038881 > > 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_0 stored as > values in memory (estimated size 86.5 KB, free 530.2 MB) > > 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(20236) called > with curMem=88528, maxMem=556038881 > > 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_0_piece0 > stored as bytes in memory (estimated size 19.8 KB, free 530.2 MB) > > 15/09/28 20:04:49 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 > in memory on 172.31.21.104:49048 (size: 19.8 KB, free: 530.3 MB) > > 15/09/28 20:04:49 INFO spark.SparkContext: Created broadcast 0 from > textFile at :22 > > 15/09/28 20:04:49 INFO mapred.FileInputFormat: Total input paths to > process : 1 > > 15/09/28 20:04:49 INFO spark.SparkContext: Starting job: collect at > :22 > > 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Got job 0 (collect at > :22) with 3 output partitions > > 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Final stage: ResultStage > 0(collect at :22) > > 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Parents of final stage: > List() > > 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Missing parents: List() > > 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Submitting ResultStage 0 > (MapPartitionsRDD[1] at textFile at :22), which has no missing > parents > > 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(3120) called > with curMem=108764, maxMem=556038881 > > 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_1 stored as > values in memory (estimated size 3.0 KB, free 530.2 MB) > > 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(1784) called > with curMem=111884, maxMem=556038881 > > 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_1_piece0 > stored as bytes in memory (estimated size 1784.0 B, free 530.2 MB) > > 15/09/28 20:04:49 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 > in memory on 172.31.21.104:49048 (size: 1784.0 B, free: 530.3 MB) > > 15/09/28 20:04:49 INFO spark.SparkContext: Created broadcast 1 from > broadcast at DAGScheduler.scala:861 > > 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Submitting 3 missing tasks > from ResultStage 0 (MapPartitionsRDD[1] at textFile at :22) > > 15/09/28 20:04:49 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 > with 3 tasks > > 15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 0.0 in > stage 0.0 (TID 0, ip-172-31-37-82.us-west-2.compute.internal, NODE_LOCAL, > 2142 bytes) > > 15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 1.0 in > stage 0.0 (TID 1, ip-172-31-21-104.us-west-2.compute.internal, NODE_LOCAL, > 2142 bytes) > > 15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 2.0 in > stage 0.0 (TID 2, ip-172-31-4-4.us-west-2.compute.internal, NODE_LOCAL, > 2142 bytes) > > 15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering > block manager ip-172-31-4-4.us-west-2.compute.internal:50648 with 530.3 MB > RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S2, > ip-172-31-4-4.us-west-2.compute.internal, 50648) > > 15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering > block manager ip-172-31-37-82.us-west-2.compute.internal:52624 with 530.3 > MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S1, > ip-172-31-37-82.us-west-2.compute.internal, 52624) > > 15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering > block manager ip-172-31-21-104.us-west-2.compute.internal:56628 with 530.3 > MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S0, > ip-172-31-21-104.us-west-2.compute.internal, 56628) > > 15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 > in memory on ip-172-31-37-82.us-west-2.compute.internal:52624 (size: 1784.0 > B, free: 530.3 MB) > > 15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 > in memory on ip-172-31-21-104.us-west-2.com
UnknownHostException with Mesos and custom Jar
Hi, Wondering if anyone can help me with the issue I am having. I am receiving an UnknownHostException when running a custom jar with Spark on Mesos. The issue does not happen when running spark-shell. My spark-env.sh contains the following: export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so export HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop/ My spark-defaults.conf contains the following: spark.master mesos://zk://172.31.0.81:2181, 172.31.16.81:2181,172.31.32.81:2181/mesos spark.mesos.executor.home /spark-1.5.0-bin-hadoop2.6/ Starting spark-shell as follows and running the following line works correctly: /spark-1.5.0-bin-hadoop2.6/bin/spark-shell sc.textFile("/tmp/Input").collect.foreach(println) 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(88528) called with curMem=0, maxMem=556038881 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 86.5 KB, free 530.2 MB) 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(20236) called with curMem=88528, maxMem=556038881 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 19.8 KB, free 530.2 MB) 15/09/28 20:04:49 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.21.104:49048 (size: 19.8 KB, free: 530.3 MB) 15/09/28 20:04:49 INFO spark.SparkContext: Created broadcast 0 from textFile at :22 15/09/28 20:04:49 INFO mapred.FileInputFormat: Total input paths to process : 1 15/09/28 20:04:49 INFO spark.SparkContext: Starting job: collect at :22 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Got job 0 (collect at :22) with 3 output partitions 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Final stage: ResultStage 0(collect at :22) 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Parents of final stage: List() 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Missing parents: List() 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at textFile at :22), which has no missing parents 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(3120) called with curMem=108764, maxMem=556038881 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 530.2 MB) 15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(1784) called with curMem=111884, maxMem=556038881 15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1784.0 B, free 530.2 MB) 15/09/28 20:04:49 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.31.21.104:49048 (size: 1784.0 B, free: 530.3 MB) 15/09/28 20:04:49 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861 15/09/28 20:04:49 INFO scheduler.DAGScheduler: Submitting 3 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at textFile at :22) 15/09/28 20:04:49 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 3 tasks 15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-172-31-37-82.us-west-2.compute.internal, NODE_LOCAL, 2142 bytes) 15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-172-31-21-104.us-west-2.compute.internal, NODE_LOCAL, 2142 bytes) 15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, ip-172-31-4-4.us-west-2.compute.internal, NODE_LOCAL, 2142 bytes) 15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-172-31-4-4.us-west-2.compute.internal:50648 with 530.3 MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S2, ip-172-31-4-4.us-west-2.compute.internal, 50648) 15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-172-31-37-82.us-west-2.compute.internal:52624 with 530.3 MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S1, ip-172-31-37-82.us-west-2.compute.internal, 52624) 15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-172-31-21-104.us-west-2.compute.internal:56628 with 530.3 MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S0, ip-172-31-21-104.us-west-2.compute.internal, 56628) 15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-37-82.us-west-2.compute.internal:52624 (size: 1784.0 B, free: 530.3 MB) 15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-21-104.us-west-2.compute.internal:56628 (size: 1784.0 B, free: 530.3 MB) 15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-4-4.us-west-2.compute.internal:50648 (size: 1784.0 B, free: 530.3 MB) 15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-37-82.us-west-2.compute.internal:52624 (size: 19.8 KB, free: 530.3 MB) 15/09/28