cloud-fan commented on issue #24117: [SPARK-27181][SQL]: Add public transform 
API
URL: https://github.com/apache/spark/pull/24117#issuecomment-568891483
 
 
   The partitioning expressions need to be public because it's used in DS v2, 
that's why we create a public `Expression` interface. It's kind of a copy of 
the interval catalyst expressions, but for now there are only a few public 
expressions, and we plan to add more in the future. Adding new public 
expressions is backward compatible.
   
   But I do agree with the concern from @HyukjinKwon about how we are going to 
extend in the future. For now the parser is pretty strict about the 
partitioning expression: it can only be column name or function call with 
column name. I think it's good enough, it looks weird to me to support 
"partitioned by a + b". However, I'm a little worried about `ApplyTransform`, 
which just pass arbitrary function names specified by end-users to the data 
source, without a well defined semantic. Image we add a new transform called 
`Second`, whose function name is "second". Then in the new version data source 
would get `Second` while in the old version they got `ApplyTransform`. This is 
not backward compatible.
   
   @rdblue what do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to