cloud-fan commented on issue #24117: [SPARK-27181][SQL]: Add public transform API URL: https://github.com/apache/spark/pull/24117#issuecomment-568891483 The partitioning expressions need to be public because it's used in DS v2, that's why we create a public `Expression` interface. It's kind of a copy of the interval catalyst expressions, but for now there are only a few public expressions, and we plan to add more in the future. Adding new public expressions is backward compatible. But I do agree with the concern from @HyukjinKwon about how we are going to extend in the future. For now the parser is pretty strict about the partitioning expression: it can only be column name or function call with column name. I think it's good enough, it looks weird to me to support "partitioned by a + b". However, I'm a little worried about `ApplyTransform`, which just pass arbitrary function names specified by end-users to the data source, without a well defined semantic. Image we add a new transform called `Second`, whose function name is "second". Then in the new version data source would get `Second` while in the old version they got `ApplyTransform`. This is not backward compatible. @rdblue what do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
