Hi, Thank you. Here i am putting detail log of that executor error. Please consider it.
Here is detail log : > [Stage 103:==>(3 + 1) / 4][Stage 105:==>(3 + 1) / 4][Stage 107:==>(3 + 1) > / 4][WARN] [HeartbeatReceiver] Removing executor driver with no recent > heartbeats: 143771 ms exceeds timeout 120000 ms > [WARN] [tcp] RMI TCP Accept-0: accept loop for ServerSocket[addr= > 0.0.0.0/0.0.0.0,localport=54740] throws > [ERROR] [TaskSchedulerImpl] Lost executor driver on localhost: Executor > heartbeat timed out after 143771 ms > [WARN] [tcp] RMI TCP Accept-0: accept loop for ServerSocket[addr= > 0.0.0.0/0.0.0.0,localport=54740] throws > [WARN] [TaskSetManager] Lost task 3.0 in stage 103.0 (TID 237, localhost): > ExecutorLostFailure (executor driver lost) > [ERROR] [TaskSetManager] Task 3 in stage 103.0 failed 1 times; aborting job > [WARN] [TaskSetManager] Lost task 3.0 in stage 105.0 (TID 245, localhost): > ExecutorLostFailure (executor driver lost) > [ERROR] [TaskSetManager] Task 3 in stage 105.0 failed 1 times; aborting job > [WARN] [TaskSetManager] Lost task 3.0 in stage 107.0 (TID 253, localhost): > ExecutorLostFailure (executor driver lost) > [ERROR] [TaskSetManager] Task 3 in stage 107.0 failed 1 times; aborting job > [WARN] [TaskSetManager] Lost task 3.0 in stage 110.0 (TID 261, localhost): > ExecutorLostFailure (executor driver lost) > [ERROR] [TaskSetManager] Task 3 in stage 110.0 failed 1 times; aborting job > [Stage 103:=>(3 + -3) / 4][Stage 105:==>(3 + 1) / 4][Stage 107:==>(3 + 1) > / 4][WARN] [SparkContext] Killing executors is only supported in > coarse-grained mode > Exception in thread "main" scala.collection.parallel.CompositeThrowable: > Multiple exceptions thrown during a parallel computation: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 > in stage 110.0 failed 1 times, most recent failure: Lost task 3.0 in stage > 110.0 (TID 261, localhost): ExecutorLostFailure (executor driver lost) > Driver stacktrace: > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > scala.Option.foreach(Option.scala:236) > > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) > . > . > . > org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 > in stage 103.0 failed 1 times, most recent failure: Lost task 3.0 in stage > 103.0 (TID 237, localhost): ExecutorLostFailure (executor driver lost) > Driver stacktrace: > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > scala.Option.foreach(Option.scala:236) > > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) > . > . > . > at > scala.collection.parallel.package$$anon$1.alongWith(package.scala:88) > at > scala.collection.parallel.Task$class.mergeThrowables(Tasks.scala:86) > at > scala.collection.parallel.ParIterableLike$Map.mergeThrowables(ParIterableLike.scala:1054) > at scala.collection.parallel.Task$class.tryMerge(Tasks.scala:72) > at > scala.collection.parallel.ParIterableLike$Map.tryMerge(ParIterableLike.scala:1054) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:190) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:514) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:162) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514) > at > scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341) > at > scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673) > at > scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:444) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:514) > at > scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:492) > at > scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:64) > at > scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:961) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) > at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56) > at > scala.collection.parallel.ParIterableLike$ResultMapping.tryLeaf(ParIterableLike.scala:956) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514) > at > scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [Stage 116:> (0 + 0) / 4][Stage 117:> (0 + 0) / 4][Stage 118:> (0 + 0) > / 4][ERROR] [BlockManager] Failed to report rdd_123_0 to master; giving up. > [ERROR] [Executor] Exception in task 3.0 in stage 103.0 (TID 237) > [ERROR] [Executor] Exception in task 3.0 in stage 110.0 (TID 261) > [ERROR] [Executor] Exception in task 3.0 in stage 107.0 (TID 253) > [ERROR] [Executor] Exception in task 3.0 in stage 105.0 (TID 245) > [ERROR] [AkkaRpcEnv] Ignore error: Task > org.apache.spark.executor.Executor$TaskRunner@187b1861 rejected from > java.util.concurrent.ThreadPoolExecutor@15da1dcd[Shutting down, pool size > = 3, active threads = 3, queued tasks = 0, completed tasks = 262] > [ERROR] [AkkaRpcEnv] Ignore error: Task > org.apache.spark.executor.Executor$TaskRunner@50f59ea3 rejected from > java.util.concurrent.ThreadPoolExecutor@15da1dcd[Terminated, pool size = > 0, active threads = 0, queued tasks = 0, completed tasks = 265] > [ERROR] [AkkaRpcEnv] Ignore error: Task > org.apache.spark.executor.Executor$TaskRunner@379764eb rejected from > java.util.concurrent.ThreadPoolExecutor@15da1dcd[Terminated, pool size = > 0, active threads = 0, queued tasks = 0, completed tasks = 265] > [ERROR] [AkkaRpcEnv] Ignore error: Task > org.apache.spark.executor.Executor$TaskRunner@2014e4fb rejected from > java.util.concurrent.ThreadPoolExecutor@15da1dcd[Terminated, pool size = > 0, active threads = 0, queued tasks = 0, completed tasks = 265] Please suggest me how to solve or what can be approaches to resolve this error.. Thank you. Regards, Bansari Shah On Fri, Jan 20, 2017 at 2:54 AM, Felipe Oliveira <[email protected]> wrote: > I would investigate GC using JVM of your executors, very common to see > that error during large GC pauses. > > On Thu, Jan 19, 2017 at 9:44 AM, Donald Szeto <[email protected]> wrote: > >> Do you have more detail logs from Spark executors? >> >> Regards, >> Donald >> >> On Wed, Dec 14, 2016 at 3:26 AM Bansari Shah <[email protected]> >> wrote: >> >>> Hi all, >>> >>> I am trying to use spark environment in predict function of ML engine >>> for text analysis. It extends P2LAlgorithm algorithm. System works on >>> standalone cluster. >>> >>> Predict function for new query is as below : >>> >>> override def predict(model: NaiveBayesModel, query: Query): >>> PredictedResult = { >>> val sc_new = SparkContext.getOrCreate() >>> val sqlContext = SQLContext.getOrCreate(sc_new) >>> val phraseDataframe = sqlContext.createDataFrame(Seq(query)).toDF("text" >>> ) >>> val dpObj = new DataPreparator >>> val tf = dpObj.processPhrase(phraseDataframe) >>> >>> tf.show() >>> >>> val labeledpoints = tf.map(row => row.getAs[Vector]("rowFeatures")) >>> val predictedResult = model.predict(labeledpoints) >>> *return *predictedResult >>> >>> >>> it trains properly in pio train and while deploying as well it predicts >>> results properly for single query. >>> >>> But in case of pio eval, when i try to check accuracy of model it runs >>> upto tf.show() properly but when forming labelled point statement comes, it >>> stuck and after waiting for long it shows error that it lost spark executor >>> and no heartbeat received. Here it is error log : >>> >>> WARN org.apache.spark.HeartbeatReceiver >>> [sparkDriver-akka.actor.default-dispatcher-14] >>> - Removing executor driver with no recent heartbeats: 686328 ms exceeds >>> timeout 120000 ms >>> >>> >>> >>> ERROR org.apache.spark.scheduler.TaskSchedulerImpl >>> [sparkDriver-akka.actor.default-dispatcher-14] - Lost executor driver >>> on localhost: Executor heartbeat timed out after 686328 ms >>> >>> >>> >>> WARN org.apache.spark.scheduler.TaskSetManager >>> [sparkDriver-akka.actor.default-dispatcher-14] - Lost task 3.0 in stage >>> 103.0 (TID 237, localhost): ExecutorLostFailure (executor driver lost) >>> >>> >>> >>> ERROR org.apache.spark.scheduler.TaskSetManager >>> [sparkDriver-akka.actor.default-dispatcher-14] - Task 3 in stage 103.0 >>> failed 1 times; aborting job >>> ...... >>> org.apache.spark.SparkException: Job cancelled because SparkContext was >>> shut down >>> >>> >>> Please suggest me how to solve this issue. >>> >>> Thank you. >>> Regards, >>> Bansari Shah >>> >>> >>> >>> > > > -- > Thank you, > Felipe > > http://geeks.aretotally.in > http://twitter.com/_felipera >
