Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12736#discussion_r61321394
  
    --- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java ---
    @@ -291,7 +291,7 @@ public void testSetOperation() {
           unioned.collectAsList());
     
         Dataset<String> subtracted = ds.except(ds2);
    -    Assert.assertEquals(Arrays.asList("abc", "abc"), 
subtracted.collectAsList());
    +    Assert.assertEquals(Arrays.asList("abc"), subtracted.collectAsList());
    --- End diff --
    
    The current implementation (before this PR) is somewhere between EXCEPT and 
EXCEPT ALL it will will remove all rows if it finds a match (essentially 
eliminating duplicates), but it does not remove duplicates where there is no 
match. Lets follow the principle of least surprise and create a correct EXCEPT 
(one that removes duplicates).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to