Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7948#discussion_r36272364
  
    --- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillMerger.java
 ---
    @@ -47,11 +47,19 @@ public int compare(UnsafeSorterIterator left, 
UnsafeSorterIterator right) {
         priorityQueue = new PriorityQueue<UnsafeSorterIterator>(numSpills, 
comparator);
       }
     
    -  public void addSpill(UnsafeSorterIterator spillReader) throws 
IOException {
    +  /**
    +   * Add an UnsafeSorterIterator to this merger
    +   */
    +  public void addSpillIfNotEmpty(UnsafeSorterIterator spillReader) throws 
IOException {
         if (spillReader.hasNext()) {
    +      // We only add the spillReader to the priorityQueue if it is not 
empty. We do this to
    --- End diff --
    
    Yep, makes sense. Putting empty spill writers violates an invariant that's 
maintained by the `loadNext()` loop: if a spill reader is in the priority 
queue, then `getBaseObject()`, `getBaseOffset()`, etc. point to a row that has 
not been returned yet.  We covered the maintenance of that invariant but didn't 
establish it properly when there were empty spills. This change fixes that, 
though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to