Github user bdrillard commented on the issue:

    https://github.com/apache/spark/pull/16648
  
    Thanks for that other test case. The one you provide I would say falls in 
the same class of error, however, this patch is still capable of addressing 
some others that still exist. While class-splitting is capable of handling more 
_complex_ schemas (ones that are reliant on object creation like for JavaBean's 
and Avro), there are still instances where the shear number of variables can 
still blow the constant pool limit. In particular, if an enormous amount of 
mutable state is kicked up into the outer class. In spark 2.0.x, it was 
previously the case that global mutable state was used more sparingly, however, 
(as is more directly the case for your test case) there are instances where 
conditional expressions produce an enormous amount of mutable state (see 
[SPARK-18091](https://github.com/apache/spark/commit/e463678b194e08be4a8bc9d1d45461d6c77a15ee)
 for a recent change that can produce a great degree of mutable state). In your 
test case, the shear amount of mutable state generated f
 or conditional null-checks is already over 65,536. 
    
    One strategy might be to create a cache of excess mutable state only when 
the volume of mutable state threatens to breach the constant pool limit. Some 
type of cache (perhaps even just a simple array in the outer class) still 
accessible to the outer and nested classes would allow us to both keep code 
between classes within limits, and also keep the amount of mutable state in the 
outer class manageable. I thought such a caching scheme was a bit out of the 
scope of this look into class-splitting.
    
    Other strategies may exist for addressing the constant pool limit for the 
existing code-generation scheme, but I don't see how they fit quite as well 
given Catalysts proclivity for generating a single large class for each of its 
operations. 
    
    Thanks again for the case, I'm glad to hear thoughts on this issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to