Github user bdrillard commented on a diff in the pull request: https://github.com/apache/spark/pull/16648#discussion_r117842789 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -145,11 +145,85 @@ class CodegenContext { * * They will be kept as member variables in generated classes like `SpecificProjection`. */ - val mutableStates: mutable.ArrayBuffer[(String, String, String)] = - mutable.ArrayBuffer.empty[(String, String, String)] + val mutableState: mutable.ListBuffer[(String, String, String)] = + mutable.ListBuffer.empty[(String, String, String)] - def addMutableState(javaType: String, variableName: String, initCode: String): Unit = { - mutableStates += ((javaType, variableName, initCode)) + // An array keyed by the tuple of mutable states' types and initialization codes, holds the + // current max index of the array + var mutableStateArrayIdx: mutable.Map[(String, String), Int] = + mutable.Map.empty[(String, String), Int] + + // An array keyed by the tuple of mutable states' types and initialization codes, holds the name + // of the mutableStateArray into which state of the given key will be compacted + var mutableStateArrayNames: mutable.Map[(String, String), String] = + mutable.Map.empty[(String, String), String] + + // An array keyed by the tuple of mutable states' types and initialization codes, holds the code + // that will initialize the mutableStateArray when initialized in loops + var mutableStateArrayInitCodes: mutable.Map[(String, String), String] = + mutable.Map.empty[(String, String), String] + + /** + * Adds an instance of globally-accessible mutable state. Mutable state may either be inlined + * as a private member variable to the class, or it may be compacted into arrays of the same + * type and initialization in order to avoid Constant Pool limit errors for both state declaration + * and initialization. + * + * We compact state into arrays when we can anticipate variables of the same type and initCode + * may appear numerous times. Variable names with integer suffixes (as given by the `freshName` + * function), that are either simply assigned (null or no initialization) or are primitive are + * good candidates for array compaction, as these variables types are likely to appear numerous + * times, and can be easily initialized in loops. + * + * @param javaType the javaType + * @param variableName the variable name + * @param initCode the initialization code for the variable + * @return the name of the mutable state variable, which is either the original name if the + * variable is inlined to the class, or an array access if the variable is to be stored + * in an array of variables of the same type and initialization. + */ + def addMutableState( --- End diff -- @kiszk That's a good question. My initial reaction would be to say that any differences would be negligible, since Java will store both member variables of an Object and values of an array in heap, and one usually assumes O(1) access time for an array item (naturally we'd assume O(1) access for a member variable). But I've tried to take time and do analysis of the bytecode generated under both scenarios, where all data is inlined as member variables, and where data is compacted into arrays. The conservative conclusion is that the difference in performance is negligible. The results and the testing classes can be seen [here](https://github.com/bdrillard/bytecode-poc). Some folks with more knowledge of JVM architecture may catch a flaw in my experiment design, this is my shot at an informative benchmark.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org