Hi Spark users, I've been attempting to get flambo <https://github.com/yieldbot/flambo/blob/develop/README.md>, a Clojure library for Spark, working with my codebase. After getting things to build with this very simple interface:
(ns sharknado.core (:require [flambo.conf :as conf] [flambo.api :as spark])) (defn configure [master-url app-name] (-> (conf/spark-conf) (conf/master master-url) (conf/app-name app-name))) (defn get-context [master-url app-name] (spark/spark-context (configure master-url app-name))) I run in the lein repl: (use 'sharknado.core) (def cx (get-context "spark://MASTER-URL.compute-1.amazonaws.com:7077" "flambo-test")) This connects to the master and successfully creates an app; however, the app's workers all die after several seconds. It looks like user Saiph Kappa had similar problems about a month ago. Someone suggested that the cluster and submitted spark application might be using different versions of Spark; that's definitely not the case here. I've tried with both 1.1.0 and 1.1.1 on both ends. With Spark 1.1.0, after all workers die, the application exits. With spark 1.1.1, after each worker dies, another is automatically created; at the moment the app detail screen in the UI is showing 150 exited and 5 running workers. Anyone have any ideas? Example trace from a worker below. Thanks, Jeff 14/12/10 01:22:09 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 14/12/10 01:22:10 INFO spark.SecurityManager: Changing view acls to: root,Jeff 14/12/10 01:22:10 INFO spark.SecurityManager: Changing modify acls to: root,Jeff 14/12/10 01:22:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, Jeff); users with modify permissions: Set(root, Jeff) 14/12/10 01:22:10 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/12/10 01:22:10 INFO Remoting: Starting remoting 14/12/10 01:22:10 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@ip-address.ec2.internal:49050] 14/12/10 01:22:10 INFO Remoting: Remoting now listens on addresses: [akka.tcp://driverPropsFetcher@ip-address.ec2.internal:49050] 14/12/10 01:22:10 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 49050. 14/12/10 01:22:40 ERROR security.UserGroupInformation: PriviledgedActionException as:Jeff cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) ... 4 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) ... 7 more