Ngone51 commented on pull request #31298:
URL: https://github.com/apache/spark/pull/31298#issuecomment-766596881


   ``` scala
   [info]   org.apache.spark.SparkException: Job aborted due to stage failure:
   Task 1 in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in 
stage 3.0 (TID 8, 192.168.1.57, executor 1):
   java.io.IOException: 
org.apache.spark.storage.BlockSavedOnDecommissionedBlockManagerException:
   Block broadcast_2_piece0 cannot be saved on decommissioned executor
   [info]   org.apache.spark.SparkException: Job aborted due to stage failure:
   Task 1 in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in 
stage 3.0 (TID 8, 192.168.1.57, executor 1): java.io.IOException: 
org.apache.spark.storage.BlockSavedOnDecommissionedBlockManagerException: Block 
broadcast_2_piece0 cannot be saved on decommissioned executor[info] at
   ...
   ```
   
   Usually, the task fails with the max failure times would due to the same 
reason. But in the case of `BlockSavedOnDecommissionedBlockManagerException`, 
I'd not expect the 4 times failures are all caused by 
`BlockSavedOnDecommissionedBlockManagerException` since we don't schedule tasks 
on the decommissioning executor. (Unless all the task failures always happen 
before the decommission notice arrives at the driver. But this seems very 
impossible.)
   
   So to ensure that we still ensure that we don't schedule tasks on the 
decommissioning executor (or there's really a race condition or it's just a 
coincidence),  I'd like to know the previous task failures for this task (TID = 
8). e.g., what are failures reasons for previous attempts? And which executors 
are scheduled?
   
   BTW, the change itself lgtm.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to