[GitHub] [spark] mccheah commented on a change in pull request #28618: [SPARK-31801][WIP][API][SHUFFLE] Register map output metadata

GitBox Mon, 22 Jun 2020 18:06:23 -0700


mccheah commented on a change in pull request #28618:
URL: https://github.com/apache/spark/pull/28618#discussion_r443905043




##########
File path: 
core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java
##########
@@ -273,19 +280,24 @@ void forceSorterToSpill() throws IOException {
       if (maybeSingleFileWriter.isPresent()) {
         // Here, we don't need to perform any metrics updates because the 
bytes written to this
         // output file would have already been counted as shuffle bytes 
written.
-        partitionLengths = spills[0].partitionLengths;
-        maybeSingleFileWriter.get().transferMapSpillFile(spills[0].file, 
partitionLengths);
+        Optional<MapOutputMetadata> maybeMetadata =
+            maybeSingleFileWriter.get().transferMapSpillFile(
+                spills[0].file, spills[0].partitionLengths);
+        mapOutputCommitMessage = maybeMetadata.map(
+            metadata -> MapOutputCommitMessage.of(spills[0].partitionLengths, 
metadata))
+            .orElse(MapOutputCommitMessage.of(spills[0].partitionLengths));

Review comment:
       Hm, I think this was originally designed this way because we didn't want 
the single spill writer to set a list of partition lengths that was different 
from what was passed into the writer's transfer function. But, maybe we can 
wrap this with a preconditions check to ensure that the state remains 
consistent, and that's good enough along with Javadoc.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mccheah commented on a change in pull request #28618: [SPARK-31801][WIP][API][SHUFFLE] Register map output metadata

Reply via email to