Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19821#discussion_r153262121
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
---
@@ -785,13 +785,36 @@ class CodegenContext {
* @param expressions the codes to evaluate expressions.
*/
def splitExpressions(row: String, expressions: Seq[String]): String = {
- if (row == null || currentVars != null) {
+ if (INPUT_ROW == null || currentVars != null) {
// Cannot split these expressions because they are not created from
a row object.
return expressions.mkString("\n")
}
splitExpressions(expressions, funcName = "apply", arguments =
("InternalRow", row) :: Nil)
}
+ /**
+ * Splits the generated code of expressions into multiple functions,
because function has
+ * 64kb code size limit in JVM. This version takes care of INPUT_ROW and
currentVars
+ *
+ * @param expressions the codes to evaluate expressions.
+ * @param funcName the split function name base.
+ * @param argumentsExceptRow the list of (type, name) of the arguments
of the split function
+ * except for ctx.INPUT_ROW
+ */
+ def splitExpressions(
+ expressions: Seq[String],
+ funcName: String,
+ argumentsExceptRow: Seq[(String, String)]): String = {
--- End diff --
I confirmed there are some cases that do not require to check `INPUT_ROW`
and `currentVars`.
- access fields in struct
- `UnsafeJoiner`
- comparison for ordering
I will try to merge cases `1` and `2`. If a different name is required, I
will use `splitExpressionsOnInputRow`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]