rdblue commented on pull request #29066: URL: https://github.com/apache/spark/pull/29066#issuecomment-735942977
> I am interested in what other devs think and whether we are OK breaking the existing API. Since the other API is targeted at the read path, I would have no problem adding this one in parallel under a `write` package. I think that we should deprecate the read-side distribution because it doesn't really help with bucketed joins. I'm also fine changing the existing API, but I'd rather just deprecate it and remove it when we have a replacement for bucketed joins and other read-side optimizations. > Probably worth to raise a discussion in dev@ mailing list? Yes. But if we want to get this into 3.1.0, we should start moving on everything in parallel. We should start getting the addition of `Write` done because it needs to carry the `RequiresDistributionAndSort` interface no matter what we decide about `Distribution`. And we can at least get a WIP PR up to add the new distribution interfaces. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
