[GitHub] spark issue #13483: [SPARK-15688][SQL] RelationalGroupedDataset.toDF should ...

viirya Sat, 04 Jun 2016 02:32:55 -0700

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/13483
  
    @gatorsmile this automatically deduplicate the group by columns will cause 
confusion. Using your example:
    
        df.groupBy("col1").agg(count("*"))
    
    When users try the above API call, they will find that `groupBy` will 
retain group by columns. Then as they try:
    
        df.groupBy("col1").agg($"col1", count("*"))  
    
    They will figure out that adding group by columns will make duplicate ones.
    
    However, if we automatically deduplicate them. The users first think that 
`groupBy` will include group by columns. Then when adding group by columns 
explicitly, the duplicate group by columns are not shown up. But as they add 
more, the group by columns show up again...
    
    Personally I think the behavior above is inconsistent.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #13483: [SPARK-15688][SQL] RelationalGroupedDataset.toDF should ...

Reply via email to