Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19767#discussion_r152417333
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
---
@@ -105,6 +105,36 @@ abstract class Expression extends TreeNode[Expression]
{
val isNull = ctx.freshName("isNull")
val value = ctx.freshName("value")
val ve = doGenCode(ctx, ExprCode("", isNull, value))
+
+ // TODO: support whole stage codegen too
+ if (ve.code.trim.length > 1024 && ctx.INPUT_ROW != null &&
ctx.currentVars == null) {
+ val setIsNull = if (ve.isNull != "false" && ve.isNull != "true") {
+ val globalIsNull = ctx.freshName("globalIsNull")
+ ctx.addMutableState("boolean", globalIsNull, s"$globalIsNull =
false;")
+ val localIsNull = ve.isNull
+ ve.isNull = globalIsNull
+ s"$globalIsNull = $localIsNull;"
+ } else {
+ ""
+ }
+
+ val javaType = ctx.javaType(dataType)
+ val newValue = ctx.freshName("value")
+
+ val funcName = ctx.freshName(nodeName)
+ val funcFullName = ctx.addNewFunction(funcName,
+ s"""
+ |private $javaType $funcName(InternalRow ${ctx.INPUT_ROW}) {
--- End diff --
To continue the discussion in
https://github.com/apache/spark/pull/19767#discussion_r151631456
I think there are more global variables can be eliminated by leveraging the
method return value. However in some cases, we use global variables to avoid
creating an object for each iteration, then we are facing a trade-off between
GC overhead and global variable overhead. It would be great if java has
something like C struct and can allocate objects on method stack...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]