attilapiros commented on a change in pull request #24499: [SPARK-25888][Core] 
Serve local disk persisted blocks by the external service after releasing 
executor by dynamic allocation
URL: https://github.com/apache/spark/pull/24499#discussion_r280070633
 
 

 ##########
 File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java
 ##########
 @@ -179,6 +179,18 @@ public ManagedBuffer getBlockData(
     return getSortBasedShuffleBlockData(executor, shuffleId, mapId, reduceId);
   }
 
+  public ManagedBuffer getBlockData(
 
 Review comment:
   I have tried to reproduce the problem and meanwhile I think you are 
absolutely right and as non-shuffle files are delete recompute must be 
triggered. But in my test somehow the blocks are still found.
   
   Log parts (with 7 partitions as I have 8 cores in this machine). Here are 
the cleanups in the worker log:
   
   ```
   19/05/01 15:06:11 INFO ExternalShuffleBlockResolver: Cleaning up non-shuffle 
files in executor AppExecId{appId=app-20190501150515-0001, execId=5}'s 1 local 
dirs
   19/05/01 15:06:11 INFO Worker: Executor app-20190501150515-0001/4 finished 
with state KILLED exitStatus 143
   19/05/01 15:06:11 INFO ExternalShuffleBlockResolver: Clean up non-shuffle 
files associated with the finished executor 4
   19/05/01 15:06:11 INFO ExternalShuffleBlockResolver: Cleaning up non-shuffle 
files in executor AppExecId{appId=app-20190501150515-0001, execId=4}'s 1 local 
dirs
   19/05/01 15:06:11 INFO Worker: Executor app-20190501150515-0001/6 finished 
with state KILLED exitStatus 143
   19/05/01 15:06:11 INFO ExternalShuffleBlockResolver: Clean up non-shuffle 
files associated with the finished executor 6
   19/05/01 15:06:11 INFO ExternalShuffleBlockResolver: Cleaning up non-shuffle 
files in executor AppExecId{appId=app-20190501150515-0001, execId=6}'s 1 local 
dirs
   19/05/01 15:06:11 INFO Worker: Executor app-20190501150515-0001/3 finished 
with state KILLED exitStatus 143
   19/05/01 15:06:11 INFO ExternalShuffleBlockResolver: Clean up non-shuffle 
files associated with the finished executor 3
   19/05/01 15:06:11 INFO ExternalShuffleBlockResolver: Cleaning up non-shuffle 
files in executor AppExecId{appId=app-20190501150515-0001, execId=3}'s 1 local 
dirs
   19/05/01 15:06:11 INFO Worker: Executor app-20190501150515-0001/7 finished 
with state KILLED exitStatus 143
   19/05/01 15:06:11 INFO ExternalShuffleBlockResolver: Clean up non-shuffle 
files associated with the finished executor 7
   19/05/01 15:06:11 INFO ExternalShuffleBlockResolver: Cleaning up non-shuffle 
files in executor AppExecId{appId=app-20190501150515-0001, execId=7}'s 1 local 
dirs
   19/05/01 15:06:14 INFO Worker: Asked to kill executor 
app-20190501150515-0001/0
   19/05/01 15:06:14 INFO ExecutorRunner: Runner thread for executor 
app-20190501150515-0001/0 interrupted
   19/05/01 15:06:14 INFO ExecutorRunner: Killing process!
   19/05/01 15:06:14 INFO Worker: Executor app-20190501150515-0001/0 finished 
with state KILLED exitStatus 143
   19/05/01 15:06:14 INFO ExternalShuffleBlockResolver: Clean up non-shuffle 
files associated with the finished executor 0
   19/05/01 15:06:14 INFO ExternalShuffleBlockResolver: Cleaning up non-shuffle 
files in executor AppExecId{appId=app-20190501150515-0001, execId=0}'s 1 local 
dirs
   19/05/01 15:06:18 INFO Worker: Asked to kill executor 
app-20190501150515-0001/2
   19/05/01 15:06:18 INFO ExecutorRunner: Runner thread for executor 
app-20190501150515-0001/2 interrupted
   19/05/01 15:06:18 INFO ExecutorRunner: Killing process!
   19/05/01 15:06:18 INFO Worker: Executor app-20190501150515-0001/2 finished 
with state KILLED exitStatus 143
   19/05/01 15:06:18 INFO ExternalShuffleBlockResolver: Clean up non-shuffle 
files associated with the finished executor 2
   19/05/01 15:06:18 INFO ExternalShuffleBlockResolver: Cleaning up non-shuffle 
files in executor AppExecId{appId=app-20190501150515-0001, execId=2}'s 1 local 
dirs
   ```
   
   And this part is from the end of my test where persisted RDD referenced 
again. To execute these tasks a new executor is started. This is from the 
stderr of that executor:
   
   ```
   9/05/01 15:06:52 INFO BlockManager: Found block rdd_1_0 remotely
   19/05/01 15:06:52 INFO Executor: Finished task 0.0 in stage 2.0 (TID 11). 
697 bytes result sent to driver
   19/05/01 15:06:52 INFO CoarseGrainedExecutorBackend: Got assigned task 12
   19/05/01 15:06:52 INFO Executor: Running task 1.0 in stage 2.0 (TID 12)
   19/05/01 15:06:52 INFO BlockManager: Found block rdd_1_1 remotely
   19/05/01 15:06:52 INFO Executor: Finished task 1.0 in stage 2.0 (TID 12). 
697 bytes result sent to driver
   19/05/01 15:06:52 INFO CoarseGrainedExecutorBackend: Got assigned task 13
   19/05/01 15:06:52 INFO Executor: Running task 2.0 in stage 2.0 (TID 13)
   19/05/01 15:06:52 INFO BlockManager: Found block rdd_1_2 remotely
   19/05/01 15:06:52 INFO Executor: Finished task 2.0 in stage 2.0 (TID 13). 
697 bytes result sent to driver
   19/05/01 15:06:52 INFO CoarseGrainedExecutorBackend: Got assigned task 14
   19/05/01 15:06:52 INFO Executor: Running task 3.0 in stage 2.0 (TID 14)
   19/05/01 15:06:52 INFO BlockManager: Found block rdd_1_3 remotely
   19/05/01 15:06:52 INFO Executor: Finished task 3.0 in stage 2.0 (TID 14). 
697 bytes result sent to driver
   19/05/01 15:06:52 INFO CoarseGrainedExecutorBackend: Got assigned task 15
   19/05/01 15:06:52 INFO Executor: Running task 4.0 in stage 2.0 (TID 15)
   19/05/01 15:06:52 INFO BlockManager: Found block rdd_1_4 remotely
   19/05/01 15:06:52 INFO Executor: Finished task 4.0 in stage 2.0 (TID 15). 
697 bytes result sent to driver
   19/05/01 15:06:52 INFO CoarseGrainedExecutorBackend: Got assigned task 16
   19/05/01 15:06:52 INFO Executor: Running task 5.0 in stage 2.0 (TID 16)
   19/05/01 15:06:52 INFO BlockManager: Found block rdd_1_5 remotely
   19/05/01 15:06:52 INFO Executor: Finished task 5.0 in stage 2.0 (TID 16). 
697 bytes result sent to driver
   19/05/01 15:06:52 INFO CoarseGrainedExecutorBackend: Got assigned task 17
   19/05/01 15:06:52 INFO Executor: Running task 6.0 in stage 2.0 (TID 17)
   19/05/01 15:06:52 INFO BlockManager: Found block rdd_1_6 remotely
   19/05/01 15:06:52 INFO Executor: Finished task 6.0 in stage 2.0 (TID 17). 
697 bytes result sent to driver
   19/05/01 15:06:52 INFO CoarseGrainedExecutorBackend: Got assigned task 18
   19/05/01 15:06:52 INFO Executor: Running task 7.0 in stage 2.0 (TID 18)
   19/05/01 15:06:52 INFO BlockManager: Found block rdd_1_7 remotely
   ```
    
   I will come back to this tomorrow. I would like to understand how this works 
exactly. Otherwise a possible correction (switching off the cleanup) would 
duplicate these blocks.  
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to