Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/19811
As some context, I had initially found array initializations necessary
because the number of `init` methods created to do line-by-line var
initializations for large test cases was still triggering constant pool errors,
even after having compacted the data into arrays. A loop allowed reduction of
the number of expressions needed to initialize that array state, but in order
to ensure that single loops could initialize whole groups of variables, it
became necessary to add additional state to hold the matching init codes and
the length of the array.
I think @mgaido91's work in SPARK-22226 obviates that original issue with
the way it would re-distribute the init method calls.
Perhaps another benefit, removing the requirement that state be initialized
in loops would allow us to also compact more complicated state than previously
could have been initialized in loops, like the
[`UnsafeRowWriter`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala#L78),
which can appear as many times as struct columns appear in the dataset. Since
their initialization is dependent on varying arguments, no single loop could
initialize all of them, but inline-statements could, allowing us to potentially
compact them (or any other prevalent non-simply assigned object type).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]