[jira] [Updated] (SPARK-25775) Race between end-of-task and completion iterator read lock release

2018-10-18 Thread Ablimit A. Keskin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ablimit A. Keskin updated SPARK-25775:
--
Description: 
The following issue comes from a production Spark job where executors die due 
to uncaught exceptions during block release. When the task run with a specific 
configuration for _executor-cores_ and _total-executor-cores_ (e.g. 4 & 8 or 1 
& 8),  it constantly fails at the same code segment.  Following are logs from 
our run:

 
{code:java}
18/10/18 23:06:18 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 27 (PythonRDD[94] at RDD at PythonRDD.scala:49) (first 15 tasks are 
for partitions Vector(0))
18/10/18 23:06:18 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks
18/10/18 23:06:18 INFO TaskSetManager: Starting task 0.0 in stage 27.0 (TID 
112, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
18/10/18 23:06:18 INFO BlockManagerInfo: Added broadcast_37_piece0 in memory on 
10.248.7.2:44871 (size: 9.1 KB, free: 13.0 GB)
18/10/18 23:06:24 WARN TaskSetManager: Lost task 0.0 in stage 27.0 (TID 112, 
10.248.7.2, executor 0): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
at 
org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

18/10/18 23:06:24 INFO TaskSetManager: Starting task 0.1 in stage 27.0 (TID 
113, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
18/10/18 23:06:31 INFO TaskSetManager: Lost task 0.1 in stage 27.0 (TID 113) on 
10.248.7.2, executor 0: java.lang.AssertionError (assertion failed) [duplicate 
1]
18/10/18 23:06:31 INFO TaskSetManager: Starting task 0.2 in stage 27.0 (TID 
114, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
18/10/18 23:06:31 ERROR TaskSchedulerImpl: Lost executor 0 on 10.248.7.2: 
Remote RPC client disassociated. Likely due to containers exceeding thresholds, 
or network issues. Check driver logs for WARN messages.
18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20181018230546-0040/0 is now EXITED (Command exited with code 50)
18/10/18 23:06:31 INFO StandaloneSchedulerBackend: Executor 
app-20181018230546-0040/0 removed: Command exited with code 50
18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor added: 
app-20181018230546-0040/2 on worker-20181018173742-10.248.110.2-40787 
(10.248.110.2:40787) with 4 cores
18/10/18 23:06:31 INFO StandaloneSchedulerBackend: Granted executor ID 
app-20181018230546-0040/2 on hostPort 10.248.110.2:40787 with 4 cores, 25.0 GB 
RAM
18/10/18 23:06:31 WARN TaskSetManager: Lost task 0.2 in stage 27.0 (TID 114, 
10.248.7.2, executor 0): ExecutorLostFailure (executor 0 exited caused by one 
of the running tasks) Reason: Remote RPC client disassociated. Likely due to 
containers exceeding thresholds, or network issues. Check driver logs for WARN 
messages.
18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20181018230546-0040/2 is now RUNNING
18/10/18 23:06:31 INFO DAGScheduler: Executor lost: 0 (epoch 11)
18/10/18 23:06:31 INFO BlockManagerMaster: Removal of executor 0 requested
18/10/18 23:06:31 INFO BlockManagerMasterEndpoint: Trying to remove executor 0 
from BlockManagerMaster.
18/10/18 23:06:31 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to 
remove non-existent executor 0
18/10/18 23:06:31 INFO TaskSetManager: Starting task 0.3 in stage 27.0 (TID 
115, 10.248.21.2, executor 1, partition 0, ANY, 5585 bytes)
18/10/18 23:06:31 WARN BlockManagerMasterEndpoint: No more replicas available 
for rdd_27_2 !
18/10/18 23:06:31 WARN BlockManagerMasterEndpoint: No more replicas available 
for rdd_44_7 !
18/10/18 23:06:31 WARN 

[jira] [Updated] (SPARK-25775) Race between end-of-task and completion iterator read lock release

2018-10-18 Thread Ablimit A. Keskin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ablimit A. Keskin updated SPARK-25775:
--
Affects Version/s: (was: 2.2.0)
   2.2.2

> Race between end-of-task and completion iterator read lock release
> --
>
> Key: SPARK-25775
> URL: https://issues.apache.org/jira/browse/SPARK-25775
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.2.2
>Reporter: Ablimit A. Keskin
>Priority: Major
>
> The following issue comes from a production Spark job where executors die due 
> to uncaught exceptions during block release. When the task run with a 
> specific configuration for -_executor-cores_ and- _total-executor-cores_ 
> (e.g. 4 & 8 or 1 & 8),  it constantly fails at the same code segment.  
> Following are logs from our run:
>  
> {code:java}
> 18/10/18 23:06:18 INFO DAGScheduler: Submitting 1 missing tasks from 
> ResultStage 27 (PythonRDD[94] at RDD at PythonRDD.scala:49) (first 15 tasks 
> are for partitions Vector(0))
> 18/10/18 23:06:18 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks
> 18/10/18 23:06:18 INFO TaskSetManager: Starting task 0.0 in stage 27.0 (TID 
> 112, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
> 18/10/18 23:06:18 INFO BlockManagerInfo: Added broadcast_37_piece0 in memory 
> on 10.248.7.2:44871 (size: 9.1 KB, free: 13.0 GB)
> 18/10/18 23:06:24 WARN TaskSetManager: Lost task 0.0 in stage 27.0 (TID 112, 
> 10.248.7.2, executor 0): java.lang.AssertionError: assertion failed
> at scala.Predef$.assert(Predef.scala:156)
> at 
> org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
> at 
> org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
> at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
> at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
> at scala.Option.foreach(Option.scala:257)
> at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
> at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at 
> org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
> at 
> org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 18/10/18 23:06:24 INFO TaskSetManager: Starting task 0.1 in stage 27.0 (TID 
> 113, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
> 18/10/18 23:06:31 INFO TaskSetManager: Lost task 0.1 in stage 27.0 (TID 113) 
> on 10.248.7.2, executor 0: java.lang.AssertionError (assertion failed) 
> [duplicate 1]
> 18/10/18 23:06:31 INFO TaskSetManager: Starting task 0.2 in stage 27.0 (TID 
> 114, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
> 18/10/18 23:06:31 ERROR TaskSchedulerImpl: Lost executor 0 on 10.248.7.2: 
> Remote RPC client disassociated. Likely due to containers exceeding 
> thresholds, or network issues. Check driver logs for WARN messages.
> 18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20181018230546-0040/0 is now EXITED (Command exited with code 50)
> 18/10/18 23:06:31 INFO StandaloneSchedulerBackend: Executor 
> app-20181018230546-0040/0 removed: Command exited with code 50
> 18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor added: 
> app-20181018230546-0040/2 on worker-20181018173742-10.248.110.2-40787 
> (10.248.110.2:40787) with 4 cores
> 18/10/18 23:06:31 INFO StandaloneSchedulerBackend: Granted executor ID 
> app-20181018230546-0040/2 on hostPort 10.248.110.2:40787 with 4 cores, 25.0 
> GB RAM
> 18/10/18 23:06:31 WARN TaskSetManager: Lost task 0.2 in stage 27.0 (TID 114, 
> 10.248.7.2, executor 0): ExecutorLostFailure (executor 0 exited caused by one 
> of the running tasks) Reason: Remote RPC client disassociated. Likely due to 
> containers exceeding thresholds, or network issues. Check driver logs for 
> WARN messages.
> 18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20181018230546-0040/2 is now RUNNING
> 18/10/18 23:06:31 INFO DAGScheduler: Executor lost: 

[jira] [Updated] (SPARK-25775) Race between end-of-task and completion iterator read lock release

2018-10-18 Thread Ablimit A. Keskin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ablimit A. Keskin updated SPARK-25775:
--
Description: 
The following issue comes from a production Spark job where executors die due 
to uncaught exceptions during block release. When the task run with a specific 
configuration for -_executor-cores_ and- _total-executor-cores_ (e.g. 4 & 8 or 
1 & 8),  it constantly fails at the same code segment.  Following are logs from 
our run:

 
{code:java}
18/10/18 23:06:18 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 27 (PythonRDD[94] at RDD at PythonRDD.scala:49) (first 15 tasks are 
for partitions Vector(0))
18/10/18 23:06:18 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks
18/10/18 23:06:18 INFO TaskSetManager: Starting task 0.0 in stage 27.0 (TID 
112, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
18/10/18 23:06:18 INFO BlockManagerInfo: Added broadcast_37_piece0 in memory on 
10.248.7.2:44871 (size: 9.1 KB, free: 13.0 GB)
18/10/18 23:06:24 WARN TaskSetManager: Lost task 0.0 in stage 27.0 (TID 112, 
10.248.7.2, executor 0): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
at 
org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

18/10/18 23:06:24 INFO TaskSetManager: Starting task 0.1 in stage 27.0 (TID 
113, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
18/10/18 23:06:31 INFO TaskSetManager: Lost task 0.1 in stage 27.0 (TID 113) on 
10.248.7.2, executor 0: java.lang.AssertionError (assertion failed) [duplicate 
1]
18/10/18 23:06:31 INFO TaskSetManager: Starting task 0.2 in stage 27.0 (TID 
114, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
18/10/18 23:06:31 ERROR TaskSchedulerImpl: Lost executor 0 on 10.248.7.2: 
Remote RPC client disassociated. Likely due to containers exceeding thresholds, 
or network issues. Check driver logs for WARN messages.
18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20181018230546-0040/0 is now EXITED (Command exited with code 50)
18/10/18 23:06:31 INFO StandaloneSchedulerBackend: Executor 
app-20181018230546-0040/0 removed: Command exited with code 50
18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor added: 
app-20181018230546-0040/2 on worker-20181018173742-10.248.110.2-40787 
(10.248.110.2:40787) with 4 cores
18/10/18 23:06:31 INFO StandaloneSchedulerBackend: Granted executor ID 
app-20181018230546-0040/2 on hostPort 10.248.110.2:40787 with 4 cores, 25.0 GB 
RAM
18/10/18 23:06:31 WARN TaskSetManager: Lost task 0.2 in stage 27.0 (TID 114, 
10.248.7.2, executor 0): ExecutorLostFailure (executor 0 exited caused by one 
of the running tasks) Reason: Remote RPC client disassociated. Likely due to 
containers exceeding thresholds, or network issues. Check driver logs for WARN 
messages.
18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20181018230546-0040/2 is now RUNNING
18/10/18 23:06:31 INFO DAGScheduler: Executor lost: 0 (epoch 11)
18/10/18 23:06:31 INFO BlockManagerMaster: Removal of executor 0 requested
18/10/18 23:06:31 INFO BlockManagerMasterEndpoint: Trying to remove executor 0 
from BlockManagerMaster.
18/10/18 23:06:31 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to 
remove non-existent executor 0
18/10/18 23:06:31 INFO TaskSetManager: Starting task 0.3 in stage 27.0 (TID 
115, 10.248.21.2, executor 1, partition 0, ANY, 5585 bytes)
18/10/18 23:06:31 WARN BlockManagerMasterEndpoint: No more replicas available 
for rdd_27_2 !
18/10/18 23:06:31 WARN BlockManagerMasterEndpoint: No more replicas available 
for rdd_44_7 !
18/10/18 23:06:31 

[jira] [Created] (SPARK-25775) Race between end-of-task and completion iterator read lock release

2018-10-18 Thread Ablimit A. Keskin (JIRA)
Ablimit A. Keskin created SPARK-25775:
-

 Summary: Race between end-of-task and completion iterator read 
lock release
 Key: SPARK-25775
 URL: https://issues.apache.org/jira/browse/SPARK-25775
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Spark Core
Affects Versions: 2.2.0
Reporter: Ablimit A. Keskin


The following issue comes from a production Spark job where executors die due 
to uncaught exceptions during block release. When the task run with a specific 
configuration for "--executor-cores" and "--total-executor-cores" (e.g. 4 & 8 
or 1 & 8),  it constantly fails at the same code segment.  Following are logs 
from our run:

 
{code:java}
18/10/18 23:06:18 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 27 (PythonRDD[94] at RDD at PythonRDD.scala:49) (first 15 tasks are 
for partitions Vector(0))
18/10/18 23:06:18 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks
18/10/18 23:06:18 INFO TaskSetManager: Starting task 0.0 in stage 27.0 (TID 
112, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
18/10/18 23:06:18 INFO BlockManagerInfo: Added broadcast_37_piece0 in memory on 
10.248.7.2:44871 (size: 9.1 KB, free: 13.0 GB)
18/10/18 23:06:24 WARN TaskSetManager: Lost task 0.0 in stage 27.0 (TID 112, 
10.248.7.2, executor 0): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
at 
org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
at 
org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

18/10/18 23:06:24 INFO TaskSetManager: Starting task 0.1 in stage 27.0 (TID 
113, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
18/10/18 23:06:31 INFO TaskSetManager: Lost task 0.1 in stage 27.0 (TID 113) on 
10.248.7.2, executor 0: java.lang.AssertionError (assertion failed) [duplicate 
1]
18/10/18 23:06:31 INFO TaskSetManager: Starting task 0.2 in stage 27.0 (TID 
114, 10.248.7.2, executor 0, partition 0, PROCESS_LOCAL, 5585 bytes)
18/10/18 23:06:31 ERROR TaskSchedulerImpl: Lost executor 0 on 10.248.7.2: 
Remote RPC client disassociated. Likely due to containers exceeding thresholds, 
or network issues. Check driver logs for WARN messages.
18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20181018230546-0040/0 is now EXITED (Command exited with code 50)
18/10/18 23:06:31 INFO StandaloneSchedulerBackend: Executor 
app-20181018230546-0040/0 removed: Command exited with code 50
18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor added: 
app-20181018230546-0040/2 on worker-20181018173742-10.248.110.2-40787 
(10.248.110.2:40787) with 4 cores
18/10/18 23:06:31 INFO StandaloneSchedulerBackend: Granted executor ID 
app-20181018230546-0040/2 on hostPort 10.248.110.2:40787 with 4 cores, 25.0 GB 
RAM
18/10/18 23:06:31 WARN TaskSetManager: Lost task 0.2 in stage 27.0 (TID 114, 
10.248.7.2, executor 0): ExecutorLostFailure (executor 0 exited caused by one 
of the running tasks) Reason: Remote RPC client disassociated. Likely due to 
containers exceeding thresholds, or network issues. Check driver logs for WARN 
messages.
18/10/18 23:06:31 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20181018230546-0040/2 is now RUNNING
18/10/18 23:06:31 INFO DAGScheduler: Executor lost: 0 (epoch 11)
18/10/18 23:06:31 INFO BlockManagerMaster: Removal of executor 0 requested
18/10/18 23:06:31 INFO BlockManagerMasterEndpoint: Trying to remove executor 0 
from BlockManagerMaster.
18/10/18 23:06:31 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to 
remove non-existent executor 0
18/10/18 23:06:31 INFO TaskSetManager: Starting task 0.3 in stage 27.0 (TID 
115, 10.248.21.2, executor 1, partition 0,