maropu commented on pull request #29655: URL: https://github.com/apache/spark/pull/29655#issuecomment-692409151
> we added #19054 in our internal fork and don't see much OOM issues. Even so, I think removing shuffles in the middles of stages (e.g., many join cases) can make the prob. of OOM higher in theory in case of data skew. Since we can control input distributions somewhat, e.g., by the bucketing technique, it might be worth trying the restrictive approach that @imback82 suggested above, I think. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
