Github user JoshRosen commented on the issue:
https://github.com/apache/spark/pull/19788
Is there an implicit assumption here that contiguous partitions' data can
be decompressed / deserialized in a single stream? If the shuffled data is
written with a non-relocatable serializer (Java serialization) or
non-concatenatable compression format then I'm not sure that you'll actually
be able to successfully deserialize a multi-reducer range of the map output
using a single decompression / deserialization stream.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]