attilapiros commented on PR #45228: URL: https://github.com/apache/spark/pull/45228#issuecomment-2029191559
> > I am also concerned about the performance. > > I think the best would be if the migration of shuffle data to external storage would only kick in when the scale down is aggressive. This can be decided by checking the ratio of the number of available peers (non-decommissioning executors) and the number of decommissioning executors. In that case the parameter would not be a single boolean flag but a threshold for the ratio. > > @maheshk114 WDYT? > > Its not only performance but also useful when the nodes are not very reliable. So I think we should have a Boolean flag also to allow user to chose to migrate the shuffle directly to external storage. With a threshold you can force to do the migration to the external storage every time (or even never) so the flag won't be needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
