Re: Deploying a python code on a spark EC2 cluster
This is the error from stderr: Spark Executor Command: java -cp :/root/ephemeral-hdfs/conf:/root/ephemeral-hdfs/conf:/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar -Djava.library.path=/root/ephemeral-hdfs/lib/native/ -Dspark.local.dir=/mnt/spark -Dspark.local.dir=/mnt/spark -Dspark.local.dir=/mnt/spark -Dspark.local.dir=/mnt/spark -Xms2048M -Xmx2048M org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://spark@192.168.122.1:44577/user/CoarseGrainedScheduler 1 ip-10-84-7-178.eu-west-1.compute.internal 1 akka.tcp://sparkwor...@ip-10-84-7-178.eu-west-1.compute.internal:57839/user/Worker app-20140425133749- 14/04/25 13:39:37 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/04/25 13:39:38 INFO Remoting: Starting remoting 14/04/25 13:39:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkexecu...@ip-10-84-7-178.eu-west-1.compute.internal:36800] 14/04/25 13:39:38 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkexecu...@ip-10-84-7-178.eu-west-1.compute.internal:36800] 14/04/25 13:39:38 INFO worker.WorkerWatcher: Connecting to worker akka.tcp://sparkwor...@ip-10-84-7-178.eu-west-1.compute.internal:57839/user/Worker 14/04/25 13:39:38 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@192.168.122.1:44577/user/CoarseGrainedScheduler 14/04/25 13:39:39 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkwor...@ip-10-84-7-178.eu-west-1.compute.internal:57839/user/Worker 14/04/25 13:41:19 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkexecu...@ip-10-84-7-178.eu-west-1.compute.internal:36800] - [akka.tcp://spark@192.168.122.1:44577] disassociated! Shutting down. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4828.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Deploying a python code on a spark EC2 cluster
: app-20140425160713-0002/7 is now RUNNING 14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated: app-20140425160713-0002/7 is now FAILED (class java.io.IOException: Cannot run program /mnt/work/spark/bin/compute-classpath.sh (in directory .): error=2, No such file or directory) 14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor app-20140425160713-0002/7 removed: class java.io.IOException: Cannot run program /mnt/work/spark/bin/compute-classpath.sh (in directory .): error=2, No such file or directory 14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added: app-20140425160713-0002/8 on worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839 (ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores 14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140425160713-0002/8 on hostPort ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM 14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated: app-20140425160713-0002/8 is now RUNNING 14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated: app-20140425160713-0002/8 is now FAILED (class java.io.IOException: Cannot run program /mnt/work/spark/bin/compute-classpath.sh (in directory .): error=2, No such file or directory) 14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor app-20140425160713-0002/8 removed: class java.io.IOException: Cannot run program /mnt/work/spark/bin/compute-classpath.sh (in directory .): error=2, No such file or directory 14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added: app-20140425160713-0002/9 on worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839 (ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores 14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140425160713-0002/9 on hostPort ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM 14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated: app-20140425160713-0002/9 is now RUNNING 14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated: app-20140425160713-0002/9 is now FAILED (class java.io.IOException: Cannot run program /mnt/work/spark/bin/compute-classpath.sh (in directory .): error=2, No such file or directory) 14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor app-20140425160713-0002/9 removed: class java.io.IOException: Cannot run program /mnt/work/spark/bin/compute-classpath.sh (in directory .): error=2, No such file or directory 14/04/25 17:07:13 ERROR AppClient$ClientActor: Master removed our application: FAILED; stopping client 14/04/25 17:07:13 WARN SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection... 14/04/25 17:07:28 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/04/25 17:07:43 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/04/25 17:07:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/04/25 17:08:13 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/04/25 17:08:28 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4833.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Deploying a python code on a spark EC2 cluster
Moreover it seems all the workers are registered and have sufficient memory (2.7GB where as I have asked for 512 MB). The UI also shows the jobs are running on the slaves. But on the termial it is still the same error Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Please see the screenshot. Thanks http://apache-spark-user-list.1001560.n3.nabble.com/file/n4761/33.png -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4761.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Deploying a python code on a spark EC2 cluster
Did you launch this using our EC2 scripts (http://spark.apache.org/docs/latest/ec2-scripts.html) or did you manually set up the daemons? My guess is that their hostnames are not being resolved properly on all nodes, so executor processes can’t connect back to your driver app. This error message indicates that: 14/04/24 09:00:49 WARN util.Utils: Your hostname, spark-node resolves to a loopback address: 127.0.0.1; using 10.74.149.251 instead (on interface eth0) 14/04/24 09:00:49 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address If you launch with your EC2 scripts, or don’t manually change the hostnames, this should not happen. Matei On Apr 24, 2014, at 11:36 AM, John King usedforprinting...@gmail.com wrote: Same problem. On Thu, Apr 24, 2014 at 10:54 AM, Shubhabrata mail2shu...@gmail.com wrote: Moreover it seems all the workers are registered and have sufficient memory (2.7GB where as I have asked for 512 MB). The UI also shows the jobs are running on the slaves. But on the termial it is still the same error Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Please see the screenshot. Thanks http://apache-spark-user-list.1001560.n3.nabble.com/file/n4761/33.png -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4761.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Deploying a python code on a spark EC2 cluster
This happens to me when using the EC2 scripts for v1.0.0rc2 recent release. The Master connects and then disconnects immediately, eventually saying Master disconnected from cluster. On Thu, Apr 24, 2014 at 4:01 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Did you launch this using our EC2 scripts ( http://spark.apache.org/docs/latest/ec2-scripts.html) or did you manually set up the daemons? My guess is that their hostnames are not being resolved properly on all nodes, so executor processes can’t connect back to your driver app. This error message indicates that: 14/04/24 09:00:49 WARN util.Utils: Your hostname, spark-node resolves to a loopback address: 127.0.0.1; using 10.74.149.251 instead (on interface eth0) 14/04/24 09:00:49 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address If you launch with your EC2 scripts, or don’t manually change the hostnames, this should not happen. Matei On Apr 24, 2014, at 11:36 AM, John King usedforprinting...@gmail.com wrote: Same problem. On Thu, Apr 24, 2014 at 10:54 AM, Shubhabrata mail2shu...@gmail.comwrote: Moreover it seems all the workers are registered and have sufficient memory (2.7GB where as I have asked for 512 MB). The UI also shows the jobs are running on the slaves. But on the termial it is still the same error Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Please see the screenshot. Thanks http://apache-spark-user-list.1001560.n3.nabble.com/file/n4761/33.png -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4761.html Sent from the Apache Spark User List mailing list archive at Nabble.com.