[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

bdrillard Thu, 30 Nov 2017 09:08:42 -0800

Github user bdrillard commented on the issue:

    https://github.com/apache/spark/pull/19811
  
    As some context, I had initially found array initializations necessary 
because the number of `init` methods created to do line-by-line var 
initializations for large test cases was still triggering constant pool errors, 
even after having compacted the data into arrays. A loop allowed reduction of 
the number of expressions needed to initialize that array state, but in order 
to ensure that single loops could initialize whole groups of variables, it 
became necessary to add additional state to hold the matching init codes and 
the length of the array.
    
    I think @mgaido91's work in SPARK-22226 obviates that original issue with 
the way it would re-distribute the init method calls.
    
    Perhaps another benefit, removing the requirement that state be initialized 
in loops would allow us to also compact more complicated state than previously 
could have been initialized in loops, like the 
[`UnsafeRowWriter`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala#L78),
 which can appear as many times as struct columns appear in the dataset. Since 
their initialization is dependent on varying arguments, no single loop could 
initialize all of them, but inline-statements could, allowing us to potentially 
compact them (or any other prevalent non-simply assigned object type).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

Reply via email to