maheshk114 commented on PR #45228:
URL: https://github.com/apache/spark/pull/45228#issuecomment-2031012182

   > > > I am also concerned about the performance.
   > > > I think the best would be if the migration of shuffle data to external 
storage would only kick in when the scale down is aggressive. This can be 
decided by checking the ratio of the number of available peers 
(non-decommissioning executors) and the number of decommissioning executors. In 
that case the parameter would not be a single boolean flag but a threshold for 
the ratio.
   > > > @maheshk114 WDYT?
   > > 
   > > 
   > > Its not only performance but also useful when the nodes are not very 
reliable. So I think we should have a Boolean flag also to allow user to chose 
to migrate the shuffle directly to external storage.
   > 
   > With a threshold you can force to do the migration to the external storage 
every time (or even never) so the flag won't be needed.
   
   @attilapiros This will work for our use case, as we will be setting it 
either to 0% or 100%. In normal scenarios, it should be set to more than 50%. 
That means when more than 50% of executors are decommissioning, we can enable 
migrating to external storage. But as of now Spark decommissions the  executor 
as and when the executor is found idle for more than idle timeout or for some 
reason cluster manager wants to decommission the node. So there is no way to 
find, how many total executors are going to be decommissioned in advance. With 
this logic, we will start migrating to peer and later when the number reaches 
the threshold we will start migrating to external storage. This will result 
into extra movement of shuffle which we want to avoid in the first place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to