Github user bdrillard commented on the issue:

    https://github.com/apache/spark/pull/16648
  
    I've made some changes to this PR to address @mkiedys comments, and I'm 
using his test case, as it sets a higher bar for both class splitting and 
management of mutable state. Mutable state and its initialization seems to 
create a significant potential limitation for the size of schemas that can be 
marshaled to datasets. Not only is it possible for the amount of private 
variables required by mutable state to themselves grow beyond 2^16, but the 
initialization functions, which include references to that state, when inlined 
to the main outerclass, also puts significant strain towards the Constant Pool 
limit. The strategy I attempt to implement, including class splitting, as 
already mentioned above, is to 'compact' mutable state of primitives and 
simply-assigned objects into bounded arrays that can be initialized with simple 
loops rather than large init functions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to