gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks 
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-519802295
 
 
   @squito Yeah it saves us much, from a TPC-DS 1T benchmark, 30% queries get 
1.1x+ performance boost, 13% get 1.2x + performance boost. There's still remote 
read, but only once(if index files are not swapped out because of insufficient 
cache space), and this feature can take advantage of internal network bandwith 
inside computing cluster, releasing the compute-storage network, which may be 
the bottleneck of the workload.
   
   By 'reasonable balance', did you mean not considering complex conditions? I 
think probably it's beneficial  to make it clear through discussion. Making 
this work in a long term is also fine by me. I tried to make a point that the 
current solution to not resubmitting map tasks by modifying `MapStatus` is not 
enough, due to it only cares about what Executors tell the Driver about the map 
outputs'(tasks') location. However, we should also grant Driver the right(for 
example, by add a config like this PR did) to not resubmit the map tasks even 
if it knows (not empty MapStatus location) which one to resubmit.
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to