Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/6397#issuecomment-106537393
  
    Ah, I found the problem: we're closing the partition writers at the wrong 
place.  We should be calling `commitAndClose()` at the end of `insertAll()`.  
The current usage of DiskBlockObjectWriter isn't legal because we call 
`fileSegment()` before we've called `commitAndClose()`.  
    
    According to BlockObjectWriter's API contract:
    
    ```scala
      /**
       * Returns the file segment of committed data that this Writer has 
written.
       * This is only valid after commitAndClose() has been called.
       */
      def fileSegment(): FileSegment
    ```
    
    However, DiskBlockObjectWriter doesn't have assertions to catch if 
`fileSegment()` is called at the incorrect time.  I'll add these assertions, 
then fix my new writer code.
    
    I also noticed that InputOutputMetricsSuite is one of the only places with 
an end-to-end test of the shuffle records written metric.  This test probably 
belongs in ShuffleSuite, since it should be tested for each shuffle manager.  
We should also add another test to cover these metrics for shuffles which 
perform aggregation, since these shuffles may use a different code path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to