[GitHub] spark pull request: [SPARK-8029] Robust shuffle writer

squito Tue, 10 Nov 2015 18:59:06 -0800

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9610#discussion_r44497139
  
    --- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
 ---
    @@ -155,9 +156,20 @@ public void write(Iterator<Product2<K, V>> records) 
throws IOException {
           writer.commitAndClose();
         }
     
    -    partitionLengths =
    -      writePartitionedFile(shuffleBlockResolver.getDataFile(shuffleId, 
mapId));
    -    shuffleBlockResolver.writeIndexFile(shuffleId, mapId, 
partitionLengths);
    +    File output = shuffleBlockResolver.getDataFile(shuffleId, mapId);
    +    final File tmp = new File(output.getAbsolutePath() + "." + 
UUID.randomUUID());
    +    partitionLengths = writePartitionedFile(tmp);
    +    if (!output.exists()) {
    --- End diff --
    
    I dont' think you can do this and still support SPARK-4085 -- regenerating 
the output if one of the shuffle files goes completely missing.  Because if the 
index file goes missing, and the data file is still there, with this logic 
you'll always never regenerate the shuffle output.  But maybe SPARK-4085 is not 
worth it ...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8029] Robust shuffle writer

Reply via email to