Github user bdrillard commented on the issue: https://github.com/apache/spark/pull/16648 I've made some changes to this PR to address @mkiedys comments, and I'm using his test case, as it sets a higher bar for both class splitting and management of mutable state. Mutable state and its initialization seems to create a significant potential limitation for the size of schemas that can be marshaled to datasets. Not only is it possible for the amount of private variables required by mutable state to themselves grow beyond 2^16, but the initialization functions, which include references to that state, when inlined to the main outerclass, also puts significant strain towards the Constant Pool limit. The strategy I attempt to implement, including class splitting, as already mentioned above, is to 'compact' mutable state of primitives and simply-assigned objects into bounded arrays that can be initialized with simple loops rather than large init functions.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org