RE: Worker failed to connect when build with SPARK_HADOOP_VERSION=2.2.0

2013-12-02 Thread Liu, Raymond
What version of code you are using?

2.2.0 support not yet merged into trunk. Check out 
https://github.com/apache/incubator-spark/pull/199

Best Regards,
Raymond Liu

From: horia@gmail.com [mailto:horia@gmail.com] On Behalf Of Horia
Sent: Monday, December 02, 2013 3:00 PM
To: user@spark.incubator.apache.org
Subject: Re: Worker failed to connect when build with SPARK_HADOOP_VERSION=2.2.0

Has this been resolved?

Forgive me if I missed the follow-up but I've been having the exact same 
problem.

- Horia


On Fri, Nov 22, 2013 at 5:38 AM, Maxime Lemaire digital@gmail.com wrote:
Hi all,
When im building Spark with Hadoop 2.2.0 support, workers cant connect to Spark 
master anymore.
Network is up and hostnames are correct. Tcpdump can clearly see workers trying 
to connect (tcpdump outputs at the end).

Same set up with Spark build without SPARK_HADOOP_VERSION (or with 
SPARK_HADOOP_VERSION=2.0.5-alpha) is working fine !

Some details :

pmtx-master01 : master
pmtx-master02 : slave

(behavior is the same if i launch both master and slave from the same box)

Building HADOOP 2.2.0 support :

fresh install on pmtx-master01 : 
# SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
build successfull
#

fresh install on pmtx-master02 :
# SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
...build successfull
#

On pmtx-master01 :
# ./bin/start-master.sh 
starting org.apache.spark.deploy.master.Master, logging to 
/cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out
# netstat -an | grep 7077
tcp6       0      0 10.90.XX.XX:7077        :::*                    LISTEN 
#

On pmtx-master02 :
# nc -v pmtx-master01 7077
pmtx-master01 [10.90.XX.XX] 7077 (?) open
# ./spark-class org.apache.spark.deploy.worker.Worker spark://pmtx-master01:7077
13/11/22 10:57:50 INFO Slf4jEventHandler: Slf4jEventHandler started
13/11/22 10:57:50 INFO Worker: Starting Spark worker pmtx-master02:42271 with 8 
cores, 22.6 GB RAM
13/11/22 10:57:50 INFO Worker: Spark home: /cluster/bin/spark
13/11/22 10:57:50 INFO WorkerWebUI: Started Worker web UI at 
http://pmtx-master02:8081
13/11/22 10:57:50 INFO Worker: Connecting to master spark://pmtx-master01:7077
13/11/22 10:57:50 ERROR Worker: Connection to master failed! Shutting down.
#

With spark-shell on pmtx-master02 :
# MASTER=spark://pmtx-master01:7077 ./spark-shell 
Welcome to
                __  
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
 /___/ .__/\_,_/_/ /_/\_\   version 0.8.0
  /_/                  

Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_31)
Initializing interpreter...
Creating SparkContext...
13/11/22 11:19:29 INFO Slf4jEventHandler: Slf4jEventHandler started
13/11/22 11:19:29 INFO SparkEnv: Registering BlockManagerMaster
13/11/22 11:19:29 INFO MemoryStore: MemoryStore started with capacity 323.9 MB.
13/11/22 11:19:29 INFO DiskStore: Created local directory at 
/tmp/spark-local-20131122111929-3e3c
13/11/22 11:19:29 INFO ConnectionManager: Bound socket to port 42249 with id = 
ConnectionManagerId(pmtx-master02,42249)
13/11/22 11:19:29 INFO BlockManagerMaster: Trying to register BlockManager
13/11/22 11:19:29 INFO BlockManagerMaster: Registered BlockManager
13/11/22 11:19:29 INFO HttpBroadcast: Broadcast server started at 
http://10.90.66.67:52531
13/11/22 11:19:29 INFO SparkEnv: Registering MapOutputTracker
13/11/22 11:19:29 INFO HttpFileServer: HTTP File server directory is 
/tmp/spark-40525f81-f883-45d5-92ad-bbff44ecf435
13/11/22 11:19:29 INFO SparkUI: Started Spark Web UI at 
http://pmtx-master02:4040
13/11/22 11:19:29 INFO Client$ClientActor: Connecting to master 
spark://pmtx-master01:7077
13/11/22 11:19:30 ERROR Client$ClientActor: Connection to master failed; 
stopping client
13/11/22 11:19:30 ERROR SparkDeploySchedulerBackend: Disconnected from Spark 
cluster!
13/11/22 11:19:30 ERROR ClusterScheduler: Exiting due to error from cluster 
scheduler: Disconnected from Spark cluster

 snip 

WORKING : Building HADOOP 2.0.5-alpha support

On pmtx-master01, now im building hadoop 2.0.5-alpha :
# sbt/sbt clean
...
# SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
...
# ./bin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to 
/cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out

Same build on pmtx-master02 :
# sbt/sbt clean
... build successfull ...
# SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
... build successfull ...
# ./spark-class org.apache.spark.deploy.worker.Worker spark://pmtx-master01:7077
13/11/22 11:25:34 INFO Slf4jEventHandler: Slf4jEventHandler started
13/11/22 11:25:34 INFO Worker: Starting Spark worker pmtx-master02:33768 with 8 
cores, 22.6 GB RAM
13/11/22 11:25:34 INFO Worker: Spark home: /cluster/bin/spark
13/11/22 11:25:34 INFO WorkerWebUI: Started Worker web UI at 
http://pmtx-master02:8081
13/11/22 11:25:34 INFO Worker: Connecting to master spark://pmtx

Re: Worker failed to connect when build with SPARK_HADOOP_VERSION=2.2.0

2013-12-02 Thread Maxime Lemaire
Horia,
if you dont need yarn support you can get it work by setting SPARK_YARN to
false :
*SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=false sbt/sbt assembly*

Raymond,
Ok, thank you, so thats why, im using the lastest release 0.8.0 (september
25, 2013)




2013/12/2 Liu, Raymond raymond@intel.com

 What version of code you are using?

 2.2.0 support not yet merged into trunk. Check out
 https://github.com/apache/incubator-spark/pull/199

 Best Regards,
 Raymond Liu

 From: horia@gmail.com [mailto:horia@gmail.com] On Behalf Of Horia
 Sent: Monday, December 02, 2013 3:00 PM
 To: user@spark.incubator.apache.org
 Subject: Re: Worker failed to connect when build with
 SPARK_HADOOP_VERSION=2.2.0

 Has this been resolved?

 Forgive me if I missed the follow-up but I've been having the exact same
 problem.

 - Horia


 On Fri, Nov 22, 2013 at 5:38 AM, Maxime Lemaire digital@gmail.com
 wrote:
 Hi all,
 When im building Spark with Hadoop 2.2.0 support, workers cant connect to
 Spark master anymore.
 Network is up and hostnames are correct. Tcpdump can clearly see workers
 trying to connect (tcpdump outputs at the end).

 Same set up with Spark build without SPARK_HADOOP_VERSION (or
 with SPARK_HADOOP_VERSION=2.0.5-alpha) is working fine !

 Some details :

 pmtx-master01 : master
 pmtx-master02 : slave

 (behavior is the same if i launch both master and slave from the same box)

 Building HADOOP 2.2.0 support :

 fresh install on pmtx-master01 :
 # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
 build successfull
 #

 fresh install on pmtx-master02 :
 # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
 ...build successfull
 #

 On pmtx-master01 :
 # ./bin/start-master.sh
 starting org.apache.spark.deploy.master.Master, logging to
 /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out
 # netstat -an | grep 7077
 tcp6   0  0 10.90.XX.XX:7077:::*LISTEN
 #

 On pmtx-master02 :
 # nc -v pmtx-master01 7077
 pmtx-master01 [10.90.XX.XX] 7077 (?) open
 # ./spark-class org.apache.spark.deploy.worker.Worker
 spark://pmtx-master01:7077
 13/11/22 10:57:50 INFO Slf4jEventHandler: Slf4jEventHandler started
 13/11/22 10:57:50 INFO Worker: Starting Spark worker pmtx-master02:42271
 with 8 cores, 22.6 GB RAM
 13/11/22 10:57:50 INFO Worker: Spark home: /cluster/bin/spark
 13/11/22 10:57:50 INFO WorkerWebUI: Started Worker web UI at
 http://pmtx-master02:8081
 13/11/22 10:57:50 INFO Worker: Connecting to master
 spark://pmtx-master01:7077
 13/11/22 10:57:50 ERROR Worker: Connection to master failed! Shutting down.
 #

 With spark-shell on pmtx-master02 :
 # MASTER=spark://pmtx-master01:7077 ./spark-shell
 Welcome to
     __
  / __/__  ___ _/ /__
 _\ \/ _ \/ _ `/ __/  '_/
  /___/ .__/\_,_/_/ /_/\_\   version 0.8.0
   /_/

 Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.6.0_31)
 Initializing interpreter...
 Creating SparkContext...
 13/11/22 11:19:29 INFO Slf4jEventHandler: Slf4jEventHandler started
 13/11/22 11:19:29 INFO SparkEnv: Registering BlockManagerMaster
 13/11/22 11:19:29 INFO MemoryStore: MemoryStore started with capacity
 323.9 MB.
 13/11/22 11:19:29 INFO DiskStore: Created local directory at
 /tmp/spark-local-20131122111929-3e3c
 13/11/22 11:19:29 INFO ConnectionManager: Bound socket to port 42249 with
 id = ConnectionManagerId(pmtx-master02,42249)
 13/11/22 11:19:29 INFO BlockManagerMaster: Trying to register BlockManager
 13/11/22 11:19:29 INFO BlockManagerMaster: Registered BlockManager
 13/11/22 11:19:29 INFO HttpBroadcast: Broadcast server started at
 http://10.90.66.67:52531
 13/11/22 11:19:29 INFO SparkEnv: Registering MapOutputTracker
 13/11/22 11:19:29 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-40525f81-f883-45d5-92ad-bbff44ecf435
 13/11/22 11:19:29 INFO SparkUI: Started Spark Web UI at
 http://pmtx-master02:4040
 13/11/22 11:19:29 INFO Client$ClientActor: Connecting to master
 spark://pmtx-master01:7077
 13/11/22 11:19:30 ERROR Client$ClientActor: Connection to master failed;
 stopping client
 13/11/22 11:19:30 ERROR SparkDeploySchedulerBackend: Disconnected from
 Spark cluster!
 13/11/22 11:19:30 ERROR ClusterScheduler: Exiting due to error from
 cluster scheduler: Disconnected from Spark cluster

  snip 

 WORKING : Building HADOOP 2.0.5-alpha support

 On pmtx-master01, now im building hadoop 2.0.5-alpha :
 # sbt/sbt clean
 ...
 # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
 ...
 # ./bin/start-master.sh
 starting org.apache.spark.deploy.master.Master, logging to
 /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out

 Same build on pmtx-master02 :
 # sbt/sbt clean
 ... build successfull ...
 # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
 ... build successfull ...
 # ./spark-class org.apache.spark.deploy.worker.Worker
 spark://pmtx-master01:7077
 13/11/22 11:25:34

Re: Worker failed to connect when build with SPARK_HADOOP_VERSION=2.2.0

2013-12-01 Thread Horia
Has this been resolved?

Forgive me if I missed the follow-up but I've been having the exact same
problem.

- Horia



On Fri, Nov 22, 2013 at 5:38 AM, Maxime Lemaire digital@gmail.comwrote:

 Hi all,
 When im building Spark with Hadoop 2.2.0 support, workers cant connect to
 Spark master anymore.
 Network is up and hostnames are correct. Tcpdump can clearly see workers
 trying to connect (tcpdump outputs at the end).

 Same set up with Spark build without SPARK_HADOOP_VERSION (or with 
 SPARK_HADOOP_VERSION=2.0.5-alpha)
 is working fine !

 Some details :

 pmtx-master01 : master
 pmtx-master02 : slave

 (behavior is the same if i launch both master and slave from the same box)

 Building HADOOP 2.2.0 support :

 fresh install on pmtx-master01 :
 # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
 build successfull
 #

 fresh install on pmtx-master02 :
 # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
 ...build successfull
 #

 On pmtx-master01 :
 # ./bin/start-master.sh
 starting org.apache.spark.deploy.master.Master, logging to
 /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out
 # netstat -an | grep 7077
 tcp6   0  0 10.90.XX.XX:7077:::*LISTEN
 #

 On pmtx-master02 :
 # nc -v pmtx-master01 7077
 pmtx-master01 [10.90.XX.XX] 7077 (?) open
 # ./spark-class org.apache.spark.deploy.worker.Worker
 spark://pmtx-master01:7077
 13/11/22 10:57:50 INFO Slf4jEventHandler: Slf4jEventHandler started
 13/11/22 10:57:50 INFO Worker: Starting Spark worker pmtx-master02:42271
 with 8 cores, 22.6 GB RAM
 13/11/22 10:57:50 INFO Worker: Spark home: /cluster/bin/spark
 13/11/22 10:57:50 INFO WorkerWebUI: Started Worker web UI at
 http://pmtx-master02:8081
 13/11/22 10:57:50 INFO Worker: Connecting to master
 spark://pmtx-master01:7077
 13/11/22 10:57:50 ERROR Worker: Connection to master failed! Shutting down.
 #

 With spark-shell on pmtx-master02 :
 # MASTER=spark://pmtx-master01:7077 ./spark-shell
 Welcome to
     __
  / __/__  ___ _/ /__
  _\ \/ _ \/ _ `/ __/  '_/
  /___/ .__/\_,_/_/ /_/\_\   version 0.8.0
   /_/

 Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.6.0_31)
 Initializing interpreter...
 Creating SparkContext...
 13/11/22 11:19:29 INFO Slf4jEventHandler: Slf4jEventHandler started
 13/11/22 11:19:29 INFO SparkEnv: Registering BlockManagerMaster
 13/11/22 11:19:29 INFO MemoryStore: MemoryStore started with capacity
 323.9 MB.
 13/11/22 11:19:29 INFO DiskStore: Created local directory at
 /tmp/spark-local-20131122111929-3e3c
 13/11/22 11:19:29 INFO ConnectionManager: Bound socket to port 42249 with
 id = ConnectionManagerId(pmtx-master02,42249)
 13/11/22 11:19:29 INFO BlockManagerMaster: Trying to register BlockManager
 13/11/22 11:19:29 INFO BlockManagerMaster: Registered BlockManager
 13/11/22 11:19:29 INFO HttpBroadcast: Broadcast server started at
 http://10.90.66.67:52531
 13/11/22 11:19:29 INFO SparkEnv: Registering MapOutputTracker
 13/11/22 11:19:29 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-40525f81-f883-45d5-92ad-bbff44ecf435
 13/11/22 11:19:29 INFO SparkUI: Started Spark Web UI at
 http://pmtx-master02:4040
 13/11/22 11:19:29 INFO Client$ClientActor: Connecting to master
 spark://pmtx-master01:7077
 13/11/22 11:19:30 ERROR Client$ClientActor: Connection to master failed;
 stopping client
 13/11/22 11:19:30 ERROR SparkDeploySchedulerBackend: Disconnected from
 Spark cluster!
 13/11/22 11:19:30 ERROR ClusterScheduler: Exiting due to error from
 cluster scheduler: Disconnected from Spark cluster

  snip 

 WORKING : Building HADOOP 2.0.5-alpha support

 On pmtx-master01, now im building hadoop 2.0.5-alpha :
 # sbt/sbt clean
 ...
 # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
 ...
 # ./bin/start-master.sh
 starting org.apache.spark.deploy.master.Master, logging to
 /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out

 Same build on pmtx-master02 :
 # sbt/sbt clean
 ... build successfull ...
 # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
 ... build successfull ...
 # ./spark-class org.apache.spark.deploy.worker.Worker
 spark://pmtx-master01:7077
 13/11/22 11:25:34 INFO Slf4jEventHandler: Slf4jEventHandler started
 13/11/22 11:25:34 INFO Worker: Starting Spark worker pmtx-master02:33768
 with 8 cores, 22.6 GB RAM
 13/11/22 11:25:34 INFO Worker: Spark home: /cluster/bin/spark
 13/11/22 11:25:34 INFO WorkerWebUI: Started Worker web UI at
 http://pmtx-master02:8081
 13/11/22 11:25:34 INFO Worker: Connecting to master
 spark://pmtx-master01:7077
 13/11/22 11:25:34 INFO Worker: Successfully registered with master
 #

 With spark-shell on pmtx-master02 :
 # MASTER=spark://pmtx-master01:7077 ./spark-shell
 Welcome to
  __
  / __/__  ___ _/ /__
 _\ \/ _ \/ _ `/ __/  '_/
  /___/ .__/\_,_/_/ /_/\_\   version 0.8.0
   /_/

 Using Scala version