amaliujia commented on a change in pull request #35534:
URL: https://github.com/apache/spark/pull/35534#discussion_r810312715
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##########
@@ -324,7 +326,36 @@ object FunctionRegistry {
val FUNC_ALIAS = TreeNodeTag[String]("functionAliasName")
- // Note: Whenever we add a new entry here, make sure we also update
ExpressionToSQLSuite
+ //
==============================================================================================
+ // The guideline for adding SQL functions
+ //
==============================================================================================
+ // To add a SQL function, we usually need to create a new `Expression` for
the function, and
+ // implement the function logic in both the interpretation code path and
codegen code path of the
+ // `Expression`. We also need to define the type coercion behavior for the
function inputs, by
+ // extending `ImplicitCastInputTypes` or updating type coercion rules
directly.
+ //
+ // It's much simpler if the SQL function can be implemented with existing
expression(s). There are
+ // a few cases:
+ // - The function is simply an alias of another function. We can just
register the same
+ // expression with a different function name, e.g.
`expression[Rand]("random", true)`.
+ // - The function is mostly the same with another function, but has a
different parameter list.
+ // We can use `RuntimeReplaceable` to create a new expression, which can
customize the
+ // parameter list and analysis behavior (type coercion). The
`RuntimeReplaceable` expression
+ // will be replaced by the actual expression at the end of analysis. See
`Left` as an example.
+ // - The function can be implemented by combining some existing
expressions. We can use
Review comment:
I found one scenario that when reusing existing expressions, existing
expression can satisfy most of the specification but has probably one different
behavior on an edge cage. I am not sure what is the best way to handle it.
One could be introducing a flag to existing expression to control a specific
edge case, or because of that minor diff we have to copy code but change
slightly (thus still introduce new Expression but with duplicate code).
I just bring out what I saw when applying this idea. In the future I hope I
can see more examples of how to deal with the scenario properly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]