amaliujia commented on a change in pull request #35534:
URL: https://github.com/apache/spark/pull/35534#discussion_r810312715



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##########
@@ -324,7 +326,36 @@ object FunctionRegistry {
 
   val FUNC_ALIAS = TreeNodeTag[String]("functionAliasName")
 
-  // Note: Whenever we add a new entry here, make sure we also update 
ExpressionToSQLSuite
+  // 
==============================================================================================
+  //                          The guideline for adding SQL functions
+  // 
==============================================================================================
+  // To add a SQL function, we usually need to create a new `Expression` for 
the function, and
+  // implement the function logic in both the interpretation code path and 
codegen code path of the
+  // `Expression`. We also need to define the type coercion behavior for the 
function inputs, by
+  // extending `ImplicitCastInputTypes` or updating type coercion rules 
directly.
+  //
+  // It's much simpler if the SQL function can be implemented with existing 
expression(s). There are
+  // a few cases:
+  //   - The function is simply an alias of another function. We can just 
register the same
+  //     expression with a different function name, e.g. 
`expression[Rand]("random", true)`.
+  //   - The function is mostly the same with another function, but has a 
different parameter list.
+  //     We can use `RuntimeReplaceable` to create a new expression, which can 
customize the
+  //     parameter list and analysis behavior (type coercion). The 
`RuntimeReplaceable` expression
+  //     will be replaced by the actual expression at the end of analysis. See 
`Left` as an example.
+  //   - The function can be implemented by combining some existing 
expressions. We can use

Review comment:
       I found one scenario that when reusing existing expressions, existing 
expression can satisfy most of the specification but has probably one different 
behavior on an edge cage. I am not sure what is the best way to handle it. 
   
   One could be introducing a flag to existing expression to control a specific 
edge case, or because of that minor diff we have to copy code but change 
slightly (thus still introduce new Expression but with duplicate code). 
   
   I just bring out what I saw when applying this idea. In the future I hope I 
can see more examples of how to deal with the scenario properly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to