Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19811#discussion_r153647477
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
---
@@ -177,11 +190,67 @@ class CodegenContext {
* the list of default imports available.
* Also, generic type arguments are accepted but ignored.
* @param variableName Name of the field.
- * @param initCode The statement(s) to put into the init() method to
initialize this field.
+ * @param codeFunctions Function includes statement(s) to put into the
init() method to
+ * initialize this field. An argument is the name of the
mutable state variable
* If left blank, the field will be default-initialized.
+ * @param inline whether the declaration and initialization code may be
inlined rather than
+ * compacted. If true, the name is not changed
+ * @return the name of the mutable state variable, which is either the
original name if the
+ * variable is inlined to the outer class, or an array access if
the variable is to be
+ * stored in an array of variables of the same type and
initialization.
+ * primitive type variables will be inlined into outer class
when the total number of
+ * mutable variables is less than
`CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD`
+ * the max size of an array for compaction is given by
+ * `CodeGenerator.MUTABLESTATEARRAY_SIZE_LIMIT`.
*/
- def addMutableState(javaType: String, variableName: String, initCode:
String = ""): Unit = {
- mutableStates += ((javaType, variableName, initCode))
+ def addMutableState(
+ javaType: String,
+ variableName: String,
+ codeFunctions: String => String = _ => "",
+ inline: Boolean = false): String = {
+ val varName = if (!inline) freshName(variableName) else variableName
+ val initCode = codeFunctions(varName)
+
+ if (inline ||
+ // want to put a primitive type variable at outerClass for
performance
+ isPrimitiveType(javaType) &&
+ (mutableStates.length <
CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD) ||
+ // identify multi-dimensional array or no simply-assigned object
+ !isPrimitiveType(javaType) &&
+ (javaType.contains("[][]") ||
+ !initCode.matches("(^[\\w_]+\\d+\\s*=\\s*null;|"
--- End diff --
@bdrillard I mean that in
https://github.com/apache/spark/pull/19811/files#diff-8bcc5aea39c73d4bf38aef6f6951d42cR248
we are using the init code to initialize together the variables with the same
code. If there are init codes which use previously defined variables, their
init code would differ from all the other (unless the same previously defined
variable is used), thus I don't see the problem.
Instead I do see the problem that in this way we might change the
initialization order and this could be a problem. But I think that this problem
can be present also in the current implementation, since we are actually
changing the order in which things are inited, isn't it?
So, I am thinking, why aren't we initing the arrays as all the other
variables so far (ie. as they weren't arrays, as before this PR, one piece of
code after the other, without any for loop) and splitting the init code to
avoid it to grow beyond the 64 KB limit?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]