Hi all, We are facing the similar problem as in http://comments.gmane.org/gmane.comp.lang.scala.spark.user/2505 . The mesos environment looks good with a master and four slaves, according to the service http://master-ip:5050 .
In the discussion we found the environment variable SPARK_HOME and SCALA_HOME might be the reason. So we export the environment variable in sbin/mesos-daemon.sh in the way like: export SPARK_HOME=/data/hadoop/spark/spark-0.8.0-incubating export SCALA_HOME=/data/hadoop/scala/scala-2.9.3 The modified sbin/mesos-daemon.sh is then replicated to all the mesos slaves. We hope in this way, the mesos-slave process will aware the environment variables (sbin/mesos-daemon.sh is used in the start of mesos-slave services according to sbin/mesos-start-slaves.sh). But the launch of worker on mesos still failed and the message reported in stout is like the following. Could anyone help to suggest the proper way to set up the environment variable? Any other logs should we investigated further? 3/11/01 15:53:31 WARN util.Utils: Your hostname, zyz-1 resolves to a loopback address: 127.0.0.1; using 10.4.1.140 instead (on interface eth0) 13/11/01 15:53:31 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address 13/11/01 15:53:33 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started 13/11/01 15:53:33 INFO spark.SparkEnv: Registering BlockManagerMaster 13/11/01 15:53:33 INFO storage.MemoryStore: MemoryStore started with capacity 562.0 MB. 13/11/01 15:53:34 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20131101155334-87b1 13/11/01 15:53:34 INFO network.ConnectionManager: Bound socket to port 54842 with id = ConnectionManagerId(proxy.optaim.com,54842) 13/11/01 15:53:34 INFO storage.BlockManagerMaster: Trying to register BlockManager 13/11/01 15:53:34 INFO storage.BlockManagerMaster: Registered BlockManager 13/11/01 15:53:34 INFO server.Server: jetty-7.x.y-SNAPSHOT 13/11/01 15:53:34 INFO server.AbstractConnector: Started [email protected]:47914 13/11/01 15:53:34 INFO broadcast.HttpBroadcast: Broadcast server started at http://10.4.1.140:47914 13/11/01 15:53:34 INFO spark.SparkEnv: Registering MapOutputTracker 13/11/01 15:53:34 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-0c5d5a09-d719-4d3b-b68a-3694cd80d11e 13/11/01 15:53:34 INFO server.Server: jetty-7.x.y-SNAPSHOT 13/11/01 15:53:34 INFO server.AbstractConnector: Started [email protected]:57023 13/11/01 15:53:34 INFO server.Server: jetty-7.x.y-SNAPSHOT 13/11/01 15:53:34 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null} 13/11/01 15:53:34 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null} 13/11/01 15:53:34 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null} 13/11/01 15:53:34 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null} 13/11/01 15:53:34 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null} 13/11/01 15:53:34 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null} 13/11/01 15:53:34 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null} 13/11/01 15:53:34 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null} 13/11/01 15:53:34 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null} 13/11/01 15:53:34 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null} 13/11/01 15:53:34 INFO server.AbstractConnector: Started [email protected]:4040 13/11/01 15:53:34 INFO ui.SparkUI: Started Spark Web UI at http://proxy.optaim.com:4040 13/11/01 15:53:34 INFO spark.SparkContext: Added JAR /home/jianminwu/dev/spark/spark-0.8.0-incubating/trial/counter/target/simple -project-1.0.jar at http://10.4.1.140:57023/jars/simple-project-1.0.jar with timestamp 1383292414868 13/11/01 15:53:35 INFO mesos.MesosSchedulerBackend: Registered as framework ID 201310312305-2348876810-5050-3555-0001 13/11/01 15:53:36 INFO storage.MemoryStore: ensureFreeSpace(61352) called with curMem=0, maxMem=589332480 13/11/01 15:53:36 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 59.9 KB, free 562.0 MB) 13/11/01 15:53:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/11/01 15:53:36 WARN snappy.LoadSnappy: Snappy native library not loaded 13/11/01 15:53:36 INFO mapred.FileInputFormat: Total input paths to process : 3 13/11/01 15:53:37 INFO spark.SparkContext: Starting job: collect at JavaHdfsWordCount.java:54 13/11/01 15:53:37 INFO scheduler.DAGScheduler: Registering RDD 4 (reduceByKey at JavaHdfsWordCount.java:41) 13/11/01 15:53:37 INFO scheduler.DAGScheduler: Got job 0 (collect at JavaHdfsWordCount.java:54) with 4 output partitions (allowLocal=false) 13/11/01 15:53:37 INFO scheduler.DAGScheduler: Final stage: Stage 0 (collect at JavaHdfsWordCount.java:54) 13/11/01 15:53:37 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1) 13/11/01 15:53:37 INFO scheduler.DAGScheduler: Missing parents: List(Stage 1) 13/11/01 15:53:37 INFO scheduler.DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[4] at reduceByKey at JavaHdfsWordCount.java:41), which has no missing parents 13/11/01 15:53:37 INFO scheduler.DAGScheduler: Submitting 4 missing tasks from Stage 1 (MapPartitionsRDD[4] at reduceByKey at JavaHdfsWordCount.java:41) 13/11/01 15:53:37 INFO cluster.ClusterScheduler: Adding task set 1.0 with 4 tasks 13/11/01 15:53:37 INFO cluster.ClusterTaskSetManager: Starting task 1.0:0 as TID 0 on executor 201310312305-2348876810-5050-3555-7: datanode-05 (PROCESS_LOCAL) 13/11/01 15:53:37 INFO cluster.ClusterTaskSetManager: Serialized task 1.0:0 as 2212 bytes in 10 ms 13/11/01 15:53:37 INFO cluster.ClusterTaskSetManager: Starting task 1.0:1 as TID 1 on executor 201310312305-2348876810-5050-3555-5: datanode-02 (PROCESS_LOCAL) Thanks, Jianmin
