Hello,

I am experiencing the following problem with Spark.

My application runs properly for very small datasets (6 MB), but fails for
datasets beyond 12MB.

With those larger datasets, the main log shows the following errors for
all of my executors. The application (launched from sbt command) hangs
(until I
terminate it with ctrl-c), but the Web UI shows a FAILED state, with only 4
executors
(it started with 5), whose states are shown as KILLED. Those messages and 
failures happen almost right after launching my application.

  INFO cluster.SparkDeploySchedulerBackend: Executor 1 disconnected, so
removing it
  ERROR cluster.ClusterScheduler: Lost executor 1 on OFW4: remote Akka
client shutdown
  ...
  WARN cluster.ClusterTaskSetManager: Lost TID 0 (task 2.0:0)
  INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 0)

The same warnings and errors get logged a few times, then all processing
stops.
The executor stderr only shows the following error, which does not explain
why my executors keep disconnecting with larger datasets.

  INFO server.AbstractConnector: Started [email protected]:59749
  ERROR executor.CoarseGrainedExecutorBackend: Driver terminated or
disconnected! Shutting down.

After closer inspection, I pinpointed the problem to a partitionBy
transformation.

My current application looks like this:

  input dataset (HDFS) -> sample
    -> map
      -> union (other RDD coming from same file with similar lineage) 
        -> partitionBy
          -> first (for debugging)

Note that for debugging reasons I am also loading the entire contents of my
input 
file into application memory using Scala's Source.fromFile API. Could this
have 
anything to do with the above failures ?

If not, any idea of what could be causing executor disconnections?
How could I get more detailed debugging information to help me further
investigate 
this issue ?

Any help would be gladly appreciated,
emeric




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-errors-Executor-X-disconnected-so-removing-it-tp1162.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to