[GitHub] [spark] cloud-fan commented on a change in pull request #35534: [SPARK-38240][SQL] Improve RuntimeReplaceable and add a guideline for adding new functions

GitBox Sun, 20 Feb 2022 19:59:54 -0800


cloud-fan commented on a change in pull request #35534:
URL: https://github.com/apache/spark/pull/35534#discussion_r810761848




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##########
@@ -324,7 +326,36 @@ object FunctionRegistry {
 
   val FUNC_ALIAS = TreeNodeTag[String]("functionAliasName")
 
-  // Note: Whenever we add a new entry here, make sure we also update 
ExpressionToSQLSuite
+  // 
==============================================================================================
+  //                          The guideline for adding SQL functions
+  // 
==============================================================================================
+  // To add a SQL function, we usually need to create a new `Expression` for 
the function, and
+  // implement the function logic in both the interpretation code path and 
codegen code path of the
+  // `Expression`. We also need to define the type coercion behavior for the 
function inputs, by
+  // extending `ImplicitCastInputTypes` or updating type coercion rules 
directly.
+  //
+  // It's much simpler if the SQL function can be implemented with existing 
expression(s). There are
+  // a few cases:
+  //   - The function is simply an alias of another function. We can just 
register the same
+  //     expression with a different function name, e.g. 
`expression[Rand]("random", true)`.
+  //   - The function is mostly the same with another function, but has a 
different parameter list.
+  //     We can use `RuntimeReplaceable` to create a new expression, which can 
customize the
+  //     parameter list and analysis behavior (type coercion). The 
`RuntimeReplaceable` expression
+  //     will be replaced by the actual expression at the end of analysis. See 
`Left` as an example.
+  //   - The function can be implemented by combining some existing 
expressions. We can use

Review comment:
       Yea I also hit this issue with `MapContainsKey`. Its type coercion 
behavior is very similar to `ArrayContains` but is slightly different. I just 
copy-paste the code and modify for `MapContainsKey`.
   
   In the long term, I think we should remove `InheritAnalysisRules` and ask 
every (runtime replaceable) expression to define its own analysis behaviors. We 
can provide many traits for common analysis behaviors to reuse code. And some 
traits can have a few flags to tune part of the behaviors.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #35534: [SPARK-38240][SQL] Improve RuntimeReplaceable and add a guideline for adding new functions

Reply via email to