I would investigate GC using JVM of your executors, very common to see that error during large GC pauses.
On Thu, Jan 19, 2017 at 9:44 AM, Donald Szeto <[email protected]> wrote: > Do you have more detail logs from Spark executors? > > Regards, > Donald > > On Wed, Dec 14, 2016 at 3:26 AM Bansari Shah <[email protected]> > wrote: > >> Hi all, >> >> I am trying to use spark environment in predict function of ML engine for >> text analysis. It extends P2LAlgorithm algorithm. System works on >> standalone cluster. >> >> Predict function for new query is as below : >> >> override def predict(model: NaiveBayesModel, query: Query): >> PredictedResult = { >> val sc_new = SparkContext.getOrCreate() >> val sqlContext = SQLContext.getOrCreate(sc_new) >> val phraseDataframe = sqlContext.createDataFrame(Seq(query)).toDF("text") >> val dpObj = new DataPreparator >> val tf = dpObj.processPhrase(phraseDataframe) >> >> tf.show() >> >> val labeledpoints = tf.map(row => row.getAs[Vector]("rowFeatures")) >> val predictedResult = model.predict(labeledpoints) >> *return *predictedResult >> >> >> it trains properly in pio train and while deploying as well it predicts >> results properly for single query. >> >> But in case of pio eval, when i try to check accuracy of model it runs >> upto tf.show() properly but when forming labelled point statement comes, it >> stuck and after waiting for long it shows error that it lost spark executor >> and no heartbeat received. Here it is error log : >> >> WARN org.apache.spark.HeartbeatReceiver >> [sparkDriver-akka.actor.default-dispatcher-14] >> - Removing executor driver with no recent heartbeats: 686328 ms exceeds >> timeout 120000 ms >> >> >> >> ERROR org.apache.spark.scheduler.TaskSchedulerImpl >> [sparkDriver-akka.actor.default-dispatcher-14] - Lost executor driver on >> localhost: Executor heartbeat timed out after 686328 ms >> >> >> >> WARN org.apache.spark.scheduler.TaskSetManager >> [sparkDriver-akka.actor.default-dispatcher-14] >> - Lost task 3.0 in stage 103.0 (TID 237, localhost): ExecutorLostFailure >> (executor driver lost) >> >> >> >> ERROR org.apache.spark.scheduler.TaskSetManager >> [sparkDriver-akka.actor.default-dispatcher-14] >> - Task 3 in stage 103.0 failed 1 times; aborting job >> ...... >> org.apache.spark.SparkException: Job cancelled because SparkContext was >> shut down >> >> >> Please suggest me how to solve this issue. >> >> Thank you. >> Regards, >> Bansari Shah >> >> >> >> -- Thank you, Felipe http://geeks.aretotally.in http://twitter.com/_felipera
