rdblue commented on pull request #28026: URL: https://github.com/apache/spark/pull/28026#issuecomment-624767861
First, I should address the two new concerns. SKEWED BY: The difference between external and the skewed by is that external is partially supported, and skewed by is not supported at all; it always fails and we keep that behavior. It can be added incrementally later. In contrast, external is partially supported -- Spark makes an arbitrary choice to disallow something that is valid to a catalog and should consequently be delegated to the catalog. PARTITION BY: The other issue/concern is the partition by syntax, but I think we all agree that what is done here is the reasonable way to handle it that is a super-set of syntax for both Hive and Spark flavors. I don't see what there is to disagree with on those two or how they aren't safe. > Now I start to feel this may be too risky for 3.0 and we may need more time for things like voting. What, exactly, is the risk here? I mean: what is a plausible problem that this could cause? We agree that unified syntax is the right path forward. I don't think anyone is suggesting a different solution to the skewed-by and partitioned-by changes. All we disagree on is whether Spark should pass external to a catalog, as far as I can tell. That's not to say that I don't support a vote or discussion on the dev list. By all means, go for it to ensure the community is fine with 3.0 changes. Maybe we should focus on master here and leave the backport to 3.0 until after we reach consensus on the dev list. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
