Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/8995#issuecomment-145927459
By the way, in addition to the changes here we need to update code
elsewhere in order to benefit from the concatenation of serialized streams. For
the Tungsten shuffle write path, the right line to change is
https://github.com/apache/spark/blob/27ecfe61f07c8413a7b8b9fbdf36ed99cf05227d/core/src/main/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriter.java#L269
Rather than changing this here, though, I'd prefer to do something similar
to what I did for Serializer, defining a private API to let instances express
whether they have this fast-merging property:
https://github.com/apache/spark/blob/27ecfe61f07c8413a7b8b9fbdf36ed99cf05227d/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L99
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]