Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19821#discussion_r153095567
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
---
@@ -785,13 +785,36 @@ class CodegenContext {
* @param expressions the codes to evaluate expressions.
*/
def splitExpressions(row: String, expressions: Seq[String]): String = {
- if (row == null || currentVars != null) {
+ if (INPUT_ROW == null || currentVars != null) {
// Cannot split these expressions because they are not created from
a row object.
return expressions.mkString("\n")
}
splitExpressions(expressions, funcName = "apply", arguments =
("InternalRow", row) :: Nil)
}
+ /**
+ * Splits the generated code of expressions into multiple functions,
because function has
+ * 64kb code size limit in JVM. This version takes care of INPUT_ROW and
currentVars
+ *
+ * @param expressions the codes to evaluate expressions.
+ * @param funcName the split function name base.
+ * @param argumentsExceptRow the list of (type, name) of the arguments
of the split function
+ * except for ctx.INPUT_ROW
+ */
+ def splitExpressions(
+ expressions: Seq[String],
+ funcName: String,
+ argumentsExceptRow: Seq[(String, String)]): String = {
--- End diff --
I agree that it is good from the view of consistency. I have one question
in my mind.
If we use the same argument name `arguments`, is it possible to for
developer to distinguish this `splitExpressions` from the below (rich)
`splitExpressions` when they want to pass only three arguments `expressions`,
`funcName`, and `arguments`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]