Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
Thanks for that other test case. The one you provide I would say falls in
the same class of error, however, this patch is still capable of addressing
some others that still exist. While class-splitting is capable of handling more
_complex_ schemas (ones that are reliant on object creation like for JavaBean's
and Avro), there are still instances where the shear number of variables can
still blow the constant pool limit. In particular, if an enormous amount of
mutable state is kicked up into the outer class. In spark 2.0.x, it was
previously the case that global mutable state was used more sparingly, however,
(as is more directly the case for your test case) there are instances where
conditional expressions produce an enormous amount of mutable state (see
[SPARK-18091](https://github.com/apache/spark/commit/e463678b194e08be4a8bc9d1d45461d6c77a15ee)
for a recent change that can produce a great degree of mutable state). In your
test case, the shear amount of mutable state generated f
or conditional null-checks is already over 65,536.
One strategy might be to create a cache of excess mutable state only when
the volume of mutable state threatens to breach the constant pool limit. Some
type of cache (perhaps even just a simple array in the outer class) still
accessible to the outer and nested classes would allow us to both keep code
between classes within limits, and also keep the amount of mutable state in the
outer class manageable. I thought such a caching scheme was a bit out of the
scope of this look into class-splitting.
Other strategies may exist for addressing the constant pool limit for the
existing code-generation scheme, but I don't see how they fit quite as well
given Catalysts proclivity for generating a single large class for each of its
operations.
Thanks again for the case, I'm glad to hear thoughts on this issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]