Re: Submitting jobs to Spark EC2 cluster remotely

Patrick Wendell Mon, 23 Feb 2015 01:00:31 -0800

Can you list other configs that you are setting? It looks like the
executor can't communicate back to the driver. I'm actually not sure
it's a good idea to set spark.driver.host here, you want to let spark
set that automatically.


- Patrick

On Mon, Feb 23, 2015 at 12:48 AM, Oleg Shirokikh <o...@solver.com> wrote:
> Dear Patrick,
>
> Thanks a lot for your quick response. Indeed, following your advice I've 
> uploaded the jar onto S3 and FileNotFoundException is gone now and job is 
> submitted in "cluster" deploy mode.
>
> However, now both (client and cluster) fail with the following errors in 
> executors (they keep exiting/killing executors as I see in UI):
>
> 15/02/23 08:42:46 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:oleg 
> cause:java.util.concurrent.TimeoutException: Futures timed out after [30 
> seconds]
>
>
> Full log is:
>
> 15/02/23 01:59:11 INFO executor.CoarseGrainedExecutorBackend: Registered 
> signal handlers for [TERM, HUP, INT]
> 15/02/23 01:59:12 INFO spark.SecurityManager: Changing view acls to: root,oleg
> 15/02/23 01:59:12 INFO spark.SecurityManager: Changing modify acls to: 
> root,oleg
> 15/02/23 01:59:12 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(root, oleg); 
> users with modify permissions: Set(root, oleg)
> 15/02/23 01:59:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 15/02/23 01:59:12 INFO Remoting: Starting remoting
> 15/02/23 01:59:13 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://driverpropsfetc...@ip-172-31-33-194.us-west-2.compute.internal:39379]
> 15/02/23 01:59:13 INFO util.Utils: Successfully started service 
> 'driverPropsFetcher' on port 39379.
> 15/02/23 01:59:43 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:oleg 
> cause:java.util.concurrent.TimeoutException: Futures timed out after [30 
> seconds]
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: 
> Unknown exception in doAs
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
>         at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> Caused by: java.security.PrivilegedActionException: 
> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         ... 4 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
> seconds]
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>         at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:107)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:127)
>         at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
>         at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59)
>         ... 7 more
>
>
>
>
> -----Original Message-----
> From: Patrick Wendell [mailto:pwend...@gmail.com]
> Sent: Monday, February 23, 2015 12:17 AM
> To: Oleg Shirokikh
> Subject: Re: Submitting jobs to Spark EC2 cluster remotely
>
> The reason is that the file needs to be in a globally visible
> filesystem where the master node can download. So it needs to be on
> s3, for instance, rather than on your local filesystem.
>
> - Patrick
>
> On Sun, Feb 22, 2015 at 11:55 PM, olegshirokikh <o...@solver.com> wrote:
>> I've set up the EC2 cluster with Spark. Everything works, all master/slaves
>> are up and running.
>>
>> I'm trying to submit a sample job (SparkPi). When I ssh to cluster and
>> submit it from there - everything works fine. However when driver is created
>> on a remote host (my laptop), it doesn't work. I've tried both modes for
>> `--deploy-mode`:
>>
>> **`--deploy-mode=client`:**
>>
>> From my laptop:
>>
>>     ./bin/spark-submit --master
>> spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --class
>> SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
>>
>> Results in the following indefinite warnings/errors:
>>
>>>  WARN TaskSchedulerImpl: Initial job has not accepted any resources;
>>> check your cluster UI to ensure that workers are registered and have
>>> sufficient memory 15/02/22 18:30:45
>>
>>> ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
>>> 15/02/22 18:30:45
>>
>>> ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
>>
>> ...and failed drivers - in Spark Web UI "Completed Drivers" with
>> "State=ERROR" appear.
>>
>> I've tried to pass limits for cores and memory to submit script but it
>> didn't help...
>>
>> **`--deploy-mode=cluster`:**
>>
>> From my laptop:
>>
>>     ./bin/spark-submit --master
>> spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --deploy-mode
>> cluster --class SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
>>
>> The result is:
>>
>>> .... Driver successfully submitted as driver-20150223023734-0007 ...
>>> waiting before polling master for driver state ... polling master for
>>> driver state State of driver-20150223023734-0007 is ERROR Exception
>>> from cluster was: java.io.FileNotFoundException: File
>>> file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
>>> does not exist. java.io.FileNotFoundException: File
>>> file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
>>> does not exist.       at
>>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
>>>       at
>>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
>>>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)        at
>>> org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)
>>>       at
>>> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:75)
>>
>>  So, I'd appreciate any pointers on what is going wrong and some guidance
>> how to deploy jobs from remote client. Thanks.
>>
>>
>>
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-jobs-to-Spark-EC2-cluster-remotely-tp21762.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Submitting jobs to Spark EC2 cluster remotely

Reply via email to