[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

mgaido91 Tue, 28 Nov 2017 14:50:32 -0800

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19811#discussion_r153647477
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
    @@ -177,11 +190,67 @@ class CodegenContext {
        *                 the list of default imports available.
        *                 Also, generic type arguments are accepted but ignored.
        * @param variableName Name of the field.
    -   * @param initCode The statement(s) to put into the init() method to 
initialize this field.
    +   * @param codeFunctions Function includes statement(s) to put into the 
init() method to
    +   *                 initialize this field. An argument is the name of the 
mutable state variable
        *                 If left blank, the field will be default-initialized.
    +   * @param inline whether the declaration and initialization code may be 
inlined rather than
    +   *               compacted. If true, the name is not changed
    +   * @return the name of the mutable state variable, which is either the 
original name if the
    +   *         variable is inlined to the outer class, or an array access if 
the variable is to be
    +   *         stored in an array of variables of the same type and 
initialization.
    +   *         primitive type variables will be inlined into outer class 
when the total number of
    +   *         mutable variables is less than 
`CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD`
    +   *         the max size of an array for compaction is given by
    +   *         `CodeGenerator.MUTABLESTATEARRAY_SIZE_LIMIT`.
        */
    -  def addMutableState(javaType: String, variableName: String, initCode: 
String = ""): Unit = {
    -    mutableStates += ((javaType, variableName, initCode))
    +  def addMutableState(
    +      javaType: String,
    +      variableName: String,
    +      codeFunctions: String => String = _ => "",
    +      inline: Boolean = false): String = {
    +    val varName = if (!inline) freshName(variableName) else variableName
    +    val initCode = codeFunctions(varName)
    +
    +    if (inline ||
    +        // want to put a primitive type variable at outerClass for 
performance
    +        isPrimitiveType(javaType) &&
    +          (mutableStates.length < 
CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD) ||
    +        // identify multi-dimensional array or no simply-assigned object
    +        !isPrimitiveType(javaType) &&
    +          (javaType.contains("[][]") ||
    +           !initCode.matches("(^[\\w_]+\\d+\\s*=\\s*null;|"
    --- End diff --
    
    @bdrillard I mean that in 
https://github.com/apache/spark/pull/19811/files#diff-8bcc5aea39c73d4bf38aef6f6951d42cR248
 we are using the init code to initialize together the variables with the same 
code. If there are init codes which use previously defined variables, their 
init code would differ from all the other (unless the same previously defined 
variable is used), thus I don't see the problem.
    
    Instead I do see the problem that in this way we might change the 
initialization order and this could be a problem. But I think that this problem 
can be present also in the current implementation, since we are actually 
changing the order in which things are inited, isn't it?
    So, I am thinking, why aren't we initing the arrays as all the other 
variables so far (ie. as they weren't arrays, as before this PR, one piece of 
code after the other, without any for loop) and  splitting the init code to 
avoid it to grow beyond the 64 KB limit?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

Reply via email to