Initial job has not accepted any resources
Hi, i am running spark jobs with standalone resource manager and i am gathering several performance metrics from my cluster nodes. I am also gathering disk io metrics from my nodes and because many of my jobs are using the same dataset i am trying to prevent the operating system from caching the dataset in memory in every node so as to gather the correct metrics for every job. Therefore before i submit my jobs to spark i clear my caches with the commands: sync ; echo 3 /proc/sys/vm/drop_caches The problem is that when i do so i see this error at the beginning of the job: WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Ultimately the job runs successfully in most cases, but i feel like this error has a significant effect in the overall execution time of the job which i try to avoid. I am also pretty confident that there is nothing wrong in my configurations, because when i run jobs without clearing my nodes' caches the above error doesn't come up. I would really appreciate i anyone could help me with this error. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Initial-job-has-not-accepted-any-resources-tp22955.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
PriviledgedActionException- Executor error
Hi, i am running several jobs in standalone mode and i notice this error in the log files in some of my nodes at the start of my jobs: INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] INFO spark.SecurityManager: Changing view acls to: root INFO spark.SecurityManager: Changing modify acls to: root INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: NFO slf4j.Slf4jLogger: Slf4jLogger started INFO Remoting: Starting remoting ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:java.util.concurrent.TimeoutException: Futures timed out after [1 milliseconds] Exception in thread main java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [1 milliseconds] at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) ... 4 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [1 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at akka.remote.Remoting.start(Remoting.scala:180) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632) at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) at akka.actor.ActorSystem$.apply(ActorSystem.scala:118) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:122) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59) ... 7 more INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. These errors result in executor losses at the beginning and i have been trying to find a way to solve this with no success, so if anyone has a clue please let me know. Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PriviledgedActionException-Executor-error-tp22745.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
KMeans takeSample jobs and RDD cached
Hi, i am running k-means algorithm with initialization mode set to random and various dataset sizes and values for clusters and i have a question regarding the takeSample job of the algorithm. More specific i notice that in every application there are two sampling jobs. The first one is consuming the most time compared to all others while the second one is much quicker and that sparked my interest to investigate what is actually happening. In order to explain it, i checked the source code of the takeSample operation and i saw that there is a count action involved and then the computation of a PartiotionwiseSampledRDD with a PoissonSampler. So my question is,if that count action corresponds to the first takeSample job and if the second takeSample job is the one doing the actual sampling. I also have a question for the RDDs that are created for the k-means. In the middle of the execution under the storage tab of the web ui i can see 3 RDDs with their partitions cached in memory across all nodes which is very helpful for monitoring reasons. The problem is that after the completion i can only see one of them and the portion of the cache memory it used and i would like to ask why the web ui doesn't display all the RDDs involded in the computation. Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-takeSample-jobs-and-RDD-cached-tp22656.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Executor memory in web UI
Hi, i am a bit confused with the executor-memory option. I am running applications with Standalone cluster manager with 8 workers with 4gb memory and 2 cores each and when i submit my application with spark-submit i use --executor-memory 1g. In the web ui in the completed applications table i see that my application was correctly submitted with 1g memory per node as expected but when i check the executors tab of the application i see that every executor launched with 530mb which is about half the memory of the configuration. I would really appreciate an explanation if anyone knew. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Executor-memory-in-web-UI-tp22538.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Mllib kmeans #iteration
Hello, i am running the Kmeans algorithm in cluster mode from Mllib and i was wondering if i could run the algorithm with fixed number of iterations in some way. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mllib-kmeans-iteration-tp22353.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org