rednaxelafx edited a comment on pull request #31509:
URL: https://github.com/apache/spark/pull/31509#issuecomment-775610985


   Just some random thoughts:
   
   This kind of functionality is certain useful for debugging and could come in 
handy for writing shorthands, but I have mixed feelings about exposing it on 
the SQL language extension level.
   i.e. I'd love to see this if we're designing a new language from scratch, 
but I'm not so sure about adding it to SQL.
   
   - In terms of the IR (intermediate representation) node, it's very common in 
programming languages or their compilers to feature this kind of expression 
type. Some example names are `BlockExpression` (e.g. in 
[System.Linq.Expressions.BlockExpression](https://docs.microsoft.com/en-us/dotnet/api/system.linq.expressions.blockexpression)),
 or C's [Comma expression](https://en.wikipedia.org/wiki/Comma_operator), or 
just Seq / Sequence expression etc. I haven't seen the term "delegate 
expression" used this way, though.
   - While this comes in really handy, it encourages two things:
     - side effects: yes you do want the side effects for debugging, but for 
regular SQL queries you should probably avoid side effects to have less 
surprises
     - duplicate notion of projection: the `DelegateExpression` being proposed 
here is really just a 2-level nested projection embedded on the expression 
level: 1 level for evaluating all the children expressions, and another level 
to discard all but the last expression's result. This duplication is somewhat 
annoying, as is the case for some of Spark's higher-ordered functions -- they 
duplicate some semantics from the plan level to the expression level, but the 
optimizations were not updated to recognized them efficiently.
   - If we really want to make the most out of the opportunity of adding a new 
expression infrastructure, we might want to also allow this kind of "block 
expression" to support "local alias", or more commonly known as just "local 
variables" or "local bindings" in a programming language. Again, this is 
something that already exists in the `Project` plan level operator, but it's 
just clumsy to use in this kind of context.
   
   If we're talking about making some enhancements in the internals of Catalyst 
and adding a more flexible, more powerful way of representing projections, both 
on the plan level and on the expression level, that'd be super awesome and I'd 
help push it forward.
   Otherwise adding a one-off expression as proposed in this PR here doesn't 
feel very clean. At least I don't feel like the proposal as-is should be a 
regular built-in function, but maybe just something that's on the side for 
aiding debugging.
   
   Just my two cents


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to