cloud-fan commented on a change in pull request #35534:
URL: https://github.com/apache/spark/pull/35534#discussion_r810761848
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##########
@@ -324,7 +326,36 @@ object FunctionRegistry {
val FUNC_ALIAS = TreeNodeTag[String]("functionAliasName")
- // Note: Whenever we add a new entry here, make sure we also update
ExpressionToSQLSuite
+ //
==============================================================================================
+ // The guideline for adding SQL functions
+ //
==============================================================================================
+ // To add a SQL function, we usually need to create a new `Expression` for
the function, and
+ // implement the function logic in both the interpretation code path and
codegen code path of the
+ // `Expression`. We also need to define the type coercion behavior for the
function inputs, by
+ // extending `ImplicitCastInputTypes` or updating type coercion rules
directly.
+ //
+ // It's much simpler if the SQL function can be implemented with existing
expression(s). There are
+ // a few cases:
+ // - The function is simply an alias of another function. We can just
register the same
+ // expression with a different function name, e.g.
`expression[Rand]("random", true)`.
+ // - The function is mostly the same with another function, but has a
different parameter list.
+ // We can use `RuntimeReplaceable` to create a new expression, which can
customize the
+ // parameter list and analysis behavior (type coercion). The
`RuntimeReplaceable` expression
+ // will be replaced by the actual expression at the end of analysis. See
`Left` as an example.
+ // - The function can be implemented by combining some existing
expressions. We can use
Review comment:
Yea I also hit this issue with `MapContainsKey`. Its type coercion
behavior is very similar to `ArrayContains` but is slightly different. I just
copy-paste the code and modify for `MapContainsKey`.
In the long term, I think we should remove `InheritAnalysisRules` and ask
every (runtime replaceable) expression to define its own analysis behaviors. We
can provide many traits for common analysis behaviors to reuse code. And some
traits can have a few flags to tune part of the behaviors.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]