[GitHub] [spark] EnricoMi commented on pull request #37551: [SPARK-38591][SQL] Add sortWithinGroups to KeyValueGroupedDataset

GitBox Wed, 18 Jan 2023 00:06:41 -0800


EnricoMi commented on PR #37551:
URL: https://github.com/apache/spark/pull/37551#issuecomment-1386641052


   The nice thing about this approach is that you take a sorted grouped 
dataframe and can use it with existing `cogroup` or `flatMapGroups` methods, 
without introducing new methods. Down-side is that you might confuse that 
sorting with aggregation functions. I thought, the documentation on 
`sortWithinGroups` the would be sufficient to make that clear:
   
https://github.com/apache/spark/pull/37551/files#diff-3437bb4bcaf2e18c305978985e474daab11e397dc5f4666c13c8e11da0d7180b
   
       Returns a new [[KeyValueGroupedDataset]] with each group sorted by the 
given expressions.
       Operations that provide an iterator that contains all of the elements in 
a group will
       then provide a sorted iterator (flatMapGroups, mapGroups, cogroup).
   
   Alternative is to add `flatMapSortedGroups` and `cogroupSorted` with extra 
sorting parameters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] EnricoMi commented on pull request #37551: [SPARK-38591][SQL] Add sortWithinGroups to KeyValueGroupedDataset

Reply via email to