Github user bdrillard commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16648#discussion_r117842789
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
    @@ -145,11 +145,85 @@ class CodegenContext {
        *
        * They will be kept as member variables in generated classes like 
`SpecificProjection`.
        */
    -  val mutableStates: mutable.ArrayBuffer[(String, String, String)] =
    -    mutable.ArrayBuffer.empty[(String, String, String)]
    +  val mutableState: mutable.ListBuffer[(String, String, String)] =
    +    mutable.ListBuffer.empty[(String, String, String)]
     
    -  def addMutableState(javaType: String, variableName: String, initCode: 
String): Unit = {
    -    mutableStates += ((javaType, variableName, initCode))
    +  // An array keyed by the tuple of mutable states' types and 
initialization codes, holds the
    +  // current max index of the array
    +  var mutableStateArrayIdx: mutable.Map[(String, String), Int] =
    +    mutable.Map.empty[(String, String), Int]
    +
    +  // An array keyed by the tuple of mutable states' types and 
initialization codes, holds the name
    +  // of the mutableStateArray into which state of the given key will be 
compacted
    +  var mutableStateArrayNames: mutable.Map[(String, String), String] =
    +    mutable.Map.empty[(String, String), String]
    +
    +  // An array keyed by the tuple of mutable states' types and 
initialization codes, holds the code
    +  // that will initialize the mutableStateArray when initialized in loops
    +  var mutableStateArrayInitCodes: mutable.Map[(String, String), String] =
    +    mutable.Map.empty[(String, String), String]
    +
    +  /**
    +   * Adds an instance of globally-accessible mutable state. Mutable state 
may either be inlined
    +   * as a private member variable to the class, or it may be compacted 
into arrays of the same
    +   * type and initialization in order to avoid Constant Pool limit errors 
for both state declaration
    +   * and initialization.
    +   *
    +   * We compact state into arrays when we can anticipate variables of the 
same type and initCode
    +   * may appear numerous times. Variable names with integer suffixes (as 
given by the `freshName`
    +   * function), that are either simply assigned (null or no 
initialization) or are primitive are
    +   * good candidates for array compaction, as these variables types are 
likely to appear numerous
    +   * times, and can be easily initialized in loops.
    +   *
    +   * @param javaType the javaType
    +   * @param variableName the variable name
    +   * @param initCode the initialization code for the variable
    +   * @return the name of the mutable state variable, which is either the 
original name if the
    +   *         variable is inlined to the class, or an array access if the 
variable is to be stored
    +   *         in an array of variables of the same type and initialization.
    +   */
    +  def addMutableState(
    --- End diff --
    
    @kiszk That's a good question. My initial reaction would be to say that any 
differences would be negligible, since Java will store both member variables of 
an Object and values of an array in heap, and one usually assumes O(1) access 
time for an array item (naturally we'd assume O(1) access for a member 
variable). 
    
    But I've tried to take time and do analysis of the bytecode generated under 
both scenarios, where all data is inlined as member variables, and where data 
is compacted into arrays. The conservative conclusion is that the difference in 
performance is negligible. The results and the testing classes can be seen 
[here](https://github.com/bdrillard/bytecode-poc). Some folks with more 
knowledge of JVM architecture may catch a flaw in my experiment design, this is 
my shot at an informative benchmark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to