Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1165#issuecomment-49682053 Hey @andrewor14, one question here just to make sure I understand: if the data is supposed to be stored as MEMORY_ONLY_SER, will this code still unroll it in an un-serialized form before testing whether it can put it? I guess this is okay, but it it would be better to write directly to a serialized stream in this case. And then we'd have to track whether that becomes too big to store as well. Also, it seems like in this case, even if the array of non-serialized elements fits in memory, we allocate a bit of extra space as we write the objects to a byte stream. Not horrible but it's another reason to try to serialize directly.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---