Initial job has not accepted any resources

2015-05-20 Thread podioss
Hi,
i am running spark jobs with standalone resource manager and i am gathering
several performance metrics from my cluster nodes. I am also gathering disk
io metrics from my nodes and because many of my jobs are using the same
dataset i am trying to prevent the operating system from caching the dataset
in memory in every node so as to gather the correct metrics for every job.
Therefore before i submit my jobs to spark i clear my caches with the
commands:
sync ; echo 3 /proc/sys/vm/drop_caches

The problem is that when i do so i see this error at the beginning of the
job:

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have sufficient
memory

Ultimately the job runs successfully in most cases, but i feel like this
error has a significant effect in the overall execution time of the job
which i try to avoid.
I am also pretty confident that there is nothing wrong in my configurations,
because when i run jobs without clearing my nodes' caches the above error
doesn't come up.
I would really appreciate i anyone could help me with this error.

Thanks.   



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Initial-job-has-not-accepted-any-resources-tp22955.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



PriviledgedActionException- Executor error

2015-05-03 Thread podioss
Hi,
i am running several jobs in standalone mode and i notice this error in the
log files in some of my nodes at the start of my jobs: 

INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for
[TERM, HUP, INT]
INFO spark.SecurityManager: Changing view acls to: root
INFO spark.SecurityManager: Changing modify acls to: root
INFO spark.SecurityManager: SecurityManager: authentication disabled; ui
acls disabled; users with view permissions: NFO slf4j.Slf4jLogger:
Slf4jLogger started
INFO Remoting: Starting remoting
ERROR security.UserGroupInformation: PriviledgedActionException as:root
cause:java.util.concurrent.TimeoutException: Futures timed out after [1
milliseconds]
Exception in thread main java.lang.reflect.UndeclaredThrowableException:
Unknown exception in doAs
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException:
java.util.concurrent.TimeoutException: Futures timed out after [1
milliseconds]
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[1 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at akka.remote.Remoting.start(Remoting.scala:180)
at
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1676)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1667)
at 
org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:122)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59)
... 7 more
INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote
daemon.

These errors result in executor losses at the beginning and i have been
trying to find a way to solve this with no success, so if anyone has a clue
please let me know.

Thank you   



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/PriviledgedActionException-Executor-error-tp22745.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



KMeans takeSample jobs and RDD cached

2015-04-25 Thread podioss
Hi,
i am running k-means algorithm with initialization mode set to random and
various dataset sizes and values for clusters and i have a question
regarding the takeSample job of the algorithm.
More specific i notice that in every application there are  two sampling
jobs. The first one is consuming the most time compared to all others while
the second one is much quicker and that sparked my interest to investigate
what is actually happening. 
In order to explain it, i  checked the source code of the takeSample
operation and i saw that there is a count action involved and then the
computation of a PartiotionwiseSampledRDD with a PoissonSampler.
So my question is,if that count action corresponds to the first takeSample
job and if the second takeSample job is the one doing the actual sampling.

I also have a question for the RDDs that are created for the k-means. In the
middle of the execution under the storage tab of the web ui i can see 3 RDDs
with their partitions cached in memory across all nodes which is very
helpful for monitoring reasons. The problem is that after the completion i
can only see one of them and the portion of the cache memory it used and i
would like to ask why the web ui doesn't display all the RDDs involded in
the computation.

Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-takeSample-jobs-and-RDD-cached-tp22656.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Executor memory in web UI

2015-04-17 Thread podioss
Hi,
i am a bit confused with the executor-memory option. I am running
applications with Standalone cluster manager with 8 workers with 4gb memory
and 2 cores each and when i submit my application with spark-submit i use
--executor-memory 1g.
In the web ui in the completed applications table i see that my application
was correctly submitted with 1g memory per node as expected but when i check
the executors tab of the application i see that every executor launched with
530mb which is about half the memory of the configuration.
I would really appreciate an explanation if anyone knew.

Thanks  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Executor-memory-in-web-UI-tp22538.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Mllib kmeans #iteration

2015-04-02 Thread podioss
Hello,
i am running the Kmeans algorithm in cluster mode from Mllib and i was
wondering if i could run the algorithm with fixed number of iterations in
some way.

Thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Mllib-kmeans-iteration-tp22353.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org