[ https://issues.apache.org/jira/browse/SPARK-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-7041. ------------------------------ Resolution: Duplicate Target Version/s: (was: 1.6.0) I'll close this in favor of the more recent issue since there is an active PR and discussion there. > Avoid writing empty files in BypassMergeSortShuffleWriter > --------------------------------------------------------- > > Key: SPARK-7041 > URL: https://issues.apache.org/jira/browse/SPARK-7041 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Reporter: Josh Rosen > Assignee: Josh Rosen > > In BypassMergeSortShuffleWriter, we may end up opening disk writers files for > empty partitions; this occurs because we manually call {{open()}} after > creating the writer, causing serialization and compression input streams to > be created; these streams may write headers to the output stream, resulting > in non-zero-length files being created for partitions that contain no > records. This is unnecessary, though, since the disk object writer will > automatically open itself when the first write is performed. Removing this > eager {{open()}} call and rewriting the consumers to cope with the > non-existence of empty files results in a large performance benefit for > certain sparse workloads when using sort-based shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org