[GitHub] [spark] rednaxelafx edited a comment on pull request #31509: [SPARK-34396][SQL] Add a new build-in function delegate

GitBox Tue, 09 Feb 2021 11:51:41 -0800


rednaxelafx edited a comment on pull request #31509:
URL: https://github.com/apache/spark/pull/31509#issuecomment-775610985

Just some random thoughts:

This kind of functionality is certain useful for debugging and could come in
handy for writing shorthands, but I have mixed feelings about exposing it on
the SQL language extension level.
i.e. I'd love to see this if we're designing a new language from scratch,
but I'm not so sure about adding it to SQL.

- In terms of the IR (intermediate representation) node, it's very common in
programming languages or their compilers to feature this kind of expression
type. Some example names are `BlockExpression` (e.g. in
[System.Linq.Expressions.BlockExpression](https://docs.microsoft.com/en-us/dotnet/api/system.linq.expressions.blockexpression)),
or C's [Comma expression](https://en.wikipedia.org/wiki/Comma_operator), or
just Seq / Sequence expression etc. I haven't seen the term "delegate
expression" used this way, though.
- While this comes in really handy, it encourages two things:
- side effects: yes you do want the side effects for debugging, but for
regular SQL queries you should probably avoid side effects to have less
surprises
- duplicate notion of projection: the `DelegateExpression` being proposed
here is really just a 2-level nested projection embedded on the expression
level: 1 level for evaluating all the children expressions, and another level
to discard all but the last expression's result. This duplication is somewhat
annoying, as is the case for some of Spark's higher-ordered functions -- they
duplicate some semantics from the plan level to the expression level, but the
optimizations were not updated to recognized them efficiently.
- If we really want to make the most out of the opportunity of adding a new
expression infrastructure, we might want to also allow this kind of "block
expression" to support "local alias", or more commonly known as just "local
variables" or "local bindings" in a programming language. Again, this is
something that already exists in the `Project` plan level operator, but it's
just clumsy to use in this kind of context.

If we're talking about making some enhancements in the internals of Catalyst
and adding a more flexible, more powerful way of representing projections, both
on the plan level and on the expression level, that'd be super awesome and I'd
help push it forward.
Otherwise adding a one-off expression as proposed in this PR here doesn't
feel very clean. At least I don't feel like the proposal as-is should be a
regular built-in function, but maybe just something that's on the side for
aiding debugging.

Just my two cents

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rednaxelafx edited a comment on pull request #31509: [SPARK-34396][SQL] Add a new build-in function delegate

Reply via email to