Can you see your webUI of Spark. Is it running? (would run on masterurl:8080) if so what is the master URL shown thr.. MASTER=spark://<URL>:<PORT> ./bin/spark-shell Should work.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, Mar 6, 2014 at 2:22 PM, Christian <chri...@gmail.com> wrote: > Hello, has anyone found this problem before? I am sorry to insist but I > can not guess what is happening. Should I ask to the dev mailing list? Many > thanks in advance. > El 05/03/2014 23:57, "Christian" <chri...@gmail.com> escribió: > > I have deployed a Spark cluster in standalone mode with 3 machines: >> >> node1/192.168.1.2 -> master >> node2/192.168.1.3 -> worker 20 cores 12g >> node3/192.168.1.4 -> worker 20 cores 12g >> >> The web interface shows the workers correctly. >> >> When I launch the scala job (which only requires 256m of memory) these >> are the logs: >> >> 14/03/05 23:24:06 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 >> with 55 tasks >> 14/03/05 23:24:21 WARN scheduler.TaskSchedulerImpl: Initial job has not >> accepted any resources; check your cluster UI to ensure that workers are >> registered and have sufficient memory >> 14/03/05 23:24:23 INFO client.AppClient$ClientActor: Connecting to master >> spark://node1:7077... >> 14/03/05 23:24:36 WARN scheduler.TaskSchedulerImpl: Initial job has not >> accepted any resources; check your cluster UI to ensure that workers are >> registered and have sufficient memory >> 14/03/05 23:24:43 INFO client.AppClient$ClientActor: Connecting to master >> spark://node1:7077... >> 14/03/05 23:24:51 WARN scheduler.TaskSchedulerImpl: Initial job has not >> accepted any resources; check your cluster UI to ensure that workers are >> registered and have sufficient memory >> 14/03/05 23:25:03 ERROR client.AppClient$ClientActor: All masters are >> unresponsive! Giving up. >> 14/03/05 23:25:03 ERROR cluster.SparkDeploySchedulerBackend: Spark >> cluster looks dead, giving up. >> 14/03/05 23:25:03 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 0.0 >> from pool >> 14/03/05 23:25:03 INFO scheduler.DAGScheduler: Failed to run >> saveAsNewAPIHadoopFile at CondelCalc.scala:146 >> Exception in thread "main" org.apache.spark.SparkException: Job aborted: >> Spark cluster looks down >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) >> ... >> >> The generated logs by the master and the 2 workers are attached, but I >> found something weird in the master logs: >> >> 14/03/05 23:37:43 INFO master.Master: Registering worker *node1:57297*with >> 20 cores, 12.0 GB RAM >> 14/03/05 23:37:43 INFO master.Master: Registering worker *node1:34188*with >> 20 cores, 12.0 GB RAM >> >> It reports that the two workers are node1:57297 and node1:34188 instead >> of node3 and node2 respectively. >> >> $ cat /etc/hosts >> ... >> 192.168.1.2 node1 >> 192.168.1.3 node2 >> 192.168.1.4 node3 >> ... >> >> $ nslookup node2 >> Server: 192.168.1.1 >> Address: 192.168.1.1#53 >> >> Name: node2.cluster.local >> Address: 192.168.1.3 >> >> $ nslookup node3 >> Server: 192.168.1.1 >> Address: 192.168.1.1#53 >> >> Name: node3.cluster.local >> Address: 192.168.1.4 >> >> $ ssh node1 "ps aux | grep spark" >> cperez 17023 1.4 0.1 4691944 154532 pts/3 Sl 23:37 0:15 >> /data/users/cperez/opt/jdk/bin/java -cp >> :/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/conf:/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop2.2.0.jar:/data/users/cperez/opt/hadoop-2.2.0/etc/hadoop >> -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m >> org.apache.spark.deploy.master.Master --ip node1 --port 7077 --webui-port >> 8080 >> >> $ ssh node2 "ps aux | grep spark" >> cperez 17511 2.7 0.1 4625248 156304 ? Sl 23:37 0:07 >> /data/users/cperez/opt/jdk/bin/java -cp >> :/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/conf:/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop2.2.0.jar:/data/users/cperez/opt/hadoop-2.2.0/etc/hadoop >> -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m >> org.apache.spark.deploy.worker.Worker spark://node1:7077 >> >> $ ssh node2 "netstat -lptun | grep 17511" >> tcp 0 0 :::8081 :::* >> LISTEN 17511/java >> tcp 0 0 ::ffff:192.168.1.3:34188 :::* >> LISTEN 17511/java >> >> $ ssh node3 "ps aux | grep spark" >> cperez 7543 1.9 0.1 4625248 158600 ? Sl 23:37 0:09 >> /data/users/cperez/opt/jdk/bin/java -cp >> :/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/conf:/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop2.2.0.jar:/data/users/cperez/opt/hadoop-2.2.0/etc/hadoop >> -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m >> org.apache.spark.deploy.worker.Worker spark://node1:7077 >> >> $ ssh node3 "netstat -lptun | grep 7543" >> tcp 0 0 :::8081 :::* >> LISTEN 7543/java >> tcp 0 0 ::ffff:192.168.1.4:57297 :::* >> LISTEN 7543/java >> >> I am completely blocked at this, any help would be very helpful to me. >> Many thanks in advance. >> Christian >> >