Can you list other configs that you are setting? It looks like the executor can't communicate back to the driver. I'm actually not sure it's a good idea to set spark.driver.host here, you want to let spark set that automatically.
- Patrick On Mon, Feb 23, 2015 at 12:48 AM, Oleg Shirokikh <o...@solver.com> wrote: > Dear Patrick, > > Thanks a lot for your quick response. Indeed, following your advice I've > uploaded the jar onto S3 and FileNotFoundException is gone now and job is > submitted in "cluster" deploy mode. > > However, now both (client and cluster) fail with the following errors in > executors (they keep exiting/killing executors as I see in UI): > > 15/02/23 08:42:46 ERROR security.UserGroupInformation: > PriviledgedActionException as:oleg > cause:java.util.concurrent.TimeoutException: Futures timed out after [30 > seconds] > > > Full log is: > > 15/02/23 01:59:11 INFO executor.CoarseGrainedExecutorBackend: Registered > signal handlers for [TERM, HUP, INT] > 15/02/23 01:59:12 INFO spark.SecurityManager: Changing view acls to: root,oleg > 15/02/23 01:59:12 INFO spark.SecurityManager: Changing modify acls to: > root,oleg > 15/02/23 01:59:12 INFO spark.SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root, oleg); > users with modify permissions: Set(root, oleg) > 15/02/23 01:59:12 INFO slf4j.Slf4jLogger: Slf4jLogger started > 15/02/23 01:59:12 INFO Remoting: Starting remoting > 15/02/23 01:59:13 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://driverpropsfetc...@ip-172-31-33-194.us-west-2.compute.internal:39379] > 15/02/23 01:59:13 INFO util.Utils: Successfully started service > 'driverPropsFetcher' on port 39379. > 15/02/23 01:59:43 ERROR security.UserGroupInformation: > PriviledgedActionException as:oleg > cause:java.util.concurrent.TimeoutException: Futures timed out after [30 > seconds] > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: > Unknown exception in doAs > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > Caused by: java.security.PrivilegedActionException: > java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > ... 4 more > Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 > seconds] > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:127) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59) > ... 7 more > > > > > -----Original Message----- > From: Patrick Wendell [mailto:pwend...@gmail.com] > Sent: Monday, February 23, 2015 12:17 AM > To: Oleg Shirokikh > Subject: Re: Submitting jobs to Spark EC2 cluster remotely > > The reason is that the file needs to be in a globally visible > filesystem where the master node can download. So it needs to be on > s3, for instance, rather than on your local filesystem. > > - Patrick > > On Sun, Feb 22, 2015 at 11:55 PM, olegshirokikh <o...@solver.com> wrote: >> I've set up the EC2 cluster with Spark. Everything works, all master/slaves >> are up and running. >> >> I'm trying to submit a sample job (SparkPi). When I ssh to cluster and >> submit it from there - everything works fine. However when driver is created >> on a remote host (my laptop), it doesn't work. I've tried both modes for >> `--deploy-mode`: >> >> **`--deploy-mode=client`:** >> >> From my laptop: >> >> ./bin/spark-submit --master >> spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --class >> SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar >> >> Results in the following indefinite warnings/errors: >> >>> WARN TaskSchedulerImpl: Initial job has not accepted any resources; >>> check your cluster UI to ensure that workers are registered and have >>> sufficient memory 15/02/22 18:30:45 >> >>> ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0 >>> 15/02/22 18:30:45 >> >>> ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 >> >> ...and failed drivers - in Spark Web UI "Completed Drivers" with >> "State=ERROR" appear. >> >> I've tried to pass limits for cores and memory to submit script but it >> didn't help... >> >> **`--deploy-mode=cluster`:** >> >> From my laptop: >> >> ./bin/spark-submit --master >> spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --deploy-mode >> cluster --class SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar >> >> The result is: >> >>> .... Driver successfully submitted as driver-20150223023734-0007 ... >>> waiting before polling master for driver state ... polling master for >>> driver state State of driver-20150223023734-0007 is ERROR Exception >>> from cluster was: java.io.FileNotFoundException: File >>> file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar >>> does not exist. java.io.FileNotFoundException: File >>> file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar >>> does not exist. at >>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) >>> at >>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) >>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329) at >>> org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150) >>> at >>> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:75) >> >> So, I'd appreciate any pointers on what is going wrong and some guidance >> how to deploy jobs from remote client. Thanks. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-jobs-to-Spark-EC2-cluster-remotely-tp21762.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org