Has this been resolved? Forgive me if I missed the follow-up but I've been having the exact same problem.
- Horia On Fri, Nov 22, 2013 at 5:38 AM, Maxime Lemaire <[email protected]>wrote: > Hi all, > When im building Spark with Hadoop 2.2.0 support, workers cant connect to > Spark master anymore. > Network is up and hostnames are correct. Tcpdump can clearly see workers > trying to connect (tcpdump outputs at the end). > > Same set up with Spark build without SPARK_HADOOP_VERSION (or with > SPARK_HADOOP_VERSION=2.0.5-alpha) > is working fine ! > > Some details : > > pmtx-master01 : master > pmtx-master02 : slave > > (behavior is the same if i launch both master and slave from the same box) > > Building HADOOP 2.2.0 support : > > fresh install on pmtx-master01 : > # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly > ....build successfull > # > > fresh install on pmtx-master02 : > # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly > ...build successfull > # > > On pmtx-master01 : > # ./bin/start-master.sh > starting org.apache.spark.deploy.master.Master, logging to > /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out > # netstat -an | grep 7077 > tcp6 0 0 10.90.XX.XX:7077 :::* LISTEN > # > > On pmtx-master02 : > # nc -v pmtx-master01 7077 > pmtx-master01 [10.90.XX.XX] 7077 (?) open > # ./spark-class org.apache.spark.deploy.worker.Worker > spark://pmtx-master01:7077 > 13/11/22 10:57:50 INFO Slf4jEventHandler: Slf4jEventHandler started > 13/11/22 10:57:50 INFO Worker: Starting Spark worker pmtx-master02:42271 > with 8 cores, 22.6 GB RAM > 13/11/22 10:57:50 INFO Worker: Spark home: /cluster/bin/spark > 13/11/22 10:57:50 INFO WorkerWebUI: Started Worker web UI at > http://pmtx-master02:8081 > 13/11/22 10:57:50 INFO Worker: Connecting to master > spark://pmtx-master01:7077 > 13/11/22 10:57:50 ERROR Worker: Connection to master failed! Shutting down. > # > > With spark-shell on pmtx-master02 : > # MASTER=spark://pmtx-master01:7077 ./spark-shell > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 0.8.0 > /_/ > > Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.6.0_31) > Initializing interpreter... > Creating SparkContext... > 13/11/22 11:19:29 INFO Slf4jEventHandler: Slf4jEventHandler started > 13/11/22 11:19:29 INFO SparkEnv: Registering BlockManagerMaster > 13/11/22 11:19:29 INFO MemoryStore: MemoryStore started with capacity > 323.9 MB. > 13/11/22 11:19:29 INFO DiskStore: Created local directory at > /tmp/spark-local-20131122111929-3e3c > 13/11/22 11:19:29 INFO ConnectionManager: Bound socket to port 42249 with > id = ConnectionManagerId(pmtx-master02,42249) > 13/11/22 11:19:29 INFO BlockManagerMaster: Trying to register BlockManager > 13/11/22 11:19:29 INFO BlockManagerMaster: Registered BlockManager > 13/11/22 11:19:29 INFO HttpBroadcast: Broadcast server started at > http://10.90.66.67:52531 > 13/11/22 11:19:29 INFO SparkEnv: Registering MapOutputTracker > 13/11/22 11:19:29 INFO HttpFileServer: HTTP File server directory is > /tmp/spark-40525f81-f883-45d5-92ad-bbff44ecf435 > 13/11/22 11:19:29 INFO SparkUI: Started Spark Web UI at > http://pmtx-master02:4040 > 13/11/22 11:19:29 INFO Client$ClientActor: Connecting to master > spark://pmtx-master01:7077 > 13/11/22 11:19:30 ERROR Client$ClientActor: Connection to master failed; > stopping client > 13/11/22 11:19:30 ERROR SparkDeploySchedulerBackend: Disconnected from > Spark cluster! > 13/11/22 11:19:30 ERROR ClusterScheduler: Exiting due to error from > cluster scheduler: Disconnected from Spark cluster > > ---- snip ---- > > WORKING : Building HADOOP 2.0.5-alpha support > > On pmtx-master01, now im building hadoop 2.0.5-alpha : > # sbt/sbt clean > ... > # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly > ... > # ./bin/start-master.sh > starting org.apache.spark.deploy.master.Master, logging to > /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out > > Same build on pmtx-master02 : > # sbt/sbt clean > ... build successfull ... > # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly > ... build successfull ... > # ./spark-class org.apache.spark.deploy.worker.Worker > spark://pmtx-master01:7077 > 13/11/22 11:25:34 INFO Slf4jEventHandler: Slf4jEventHandler started > 13/11/22 11:25:34 INFO Worker: Starting Spark worker pmtx-master02:33768 > with 8 cores, 22.6 GB RAM > 13/11/22 11:25:34 INFO Worker: Spark home: /cluster/bin/spark > 13/11/22 11:25:34 INFO WorkerWebUI: Started Worker web UI at > http://pmtx-master02:8081 > 13/11/22 11:25:34 INFO Worker: Connecting to master > spark://pmtx-master01:7077 > 13/11/22 11:25:34 INFO Worker: Successfully registered with master > # > > With spark-shell on pmtx-master02 : > # MASTER=spark://pmtx-master01:7077 ./spark-shell > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 0.8.0 > /_/ > > Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.6.0_31) > Initializing interpreter... > Creating SparkContext... > 13/11/22 11:23:12 INFO Slf4jEventHandler: Slf4jEventHandler started > 13/11/22 11:23:12 INFO SparkEnv: Registering BlockManagerMaster > 13/11/22 11:23:12 INFO MemoryStore: MemoryStore started with capacity > 323.9 MB. > 13/11/22 11:23:12 INFO DiskStore: Created local directory at > /tmp/spark-local-20131122112312-3d8b > 13/11/22 11:23:12 INFO ConnectionManager: Bound socket to port 58826 with > id = ConnectionManagerId(pmtx-master02,58826) > 13/11/22 11:23:12 INFO BlockManagerMaster: Trying to register BlockManager > 13/11/22 11:23:12 INFO BlockManagerMaster: Registered BlockManager > 13/11/22 11:23:12 INFO HttpBroadcast: Broadcast server started at > http://10.90.66.67:39067 > 13/11/22 11:23:12 INFO SparkEnv: Registering MapOutputTracker > 13/11/22 11:23:12 INFO HttpFileServer: HTTP File server directory is > /tmp/spark-ded7bcc1-bacf-4158-b20f-5b2fa6936e8b > 13/11/22 11:23:12 INFO SparkUI: Started Spark Web UI at > http://pmtx-master02:4040 > 13/11/22 11:23:12 INFO Client$ClientActor: Connecting to master > spark://pmtx-master01:7077 > Spark context available as sc. > 13/11/22 11:23:12 INFO SparkDeploySchedulerBackend: Connected to Spark > cluster with app ID app-20131122112312-0000 > Type in expressions to have them evaluated. > Type :help for more information. > scala> > # > > please be aware that I really dont know the Spark communication protocol > so forgive me if i am misunderstanding something. i will make assumptions > on whats happening. > As you can see in tcpdump output, when connection failed, the slave is > sending empty data packets (tcp header only without P flag and length 0) > when it should start the communication by saying "hello iam sparkWorker > pmtx-master02" (4th packet, line 19) > > Tcpdump output : > Connection Failed (hadoop 2.2.0) : http://pastebin.com/6N8tEgUf > Connection sucessfull (hadoop 2.0.5-alpha) : http://pastebin.com/CegYAjMj > > Also Im not familiar with log4j so if you have some tips to get more log > informations i will try them (im using default properties in > log4j.properties) > > Hadoop 2.2.0 is great, Spark 0.8 is awesome, so please, help me make them > work together ! :-) > > Thanks > > maxx >
