EnricoMi commented on PR #37551: URL: https://github.com/apache/spark/pull/37551#issuecomment-1386641052
The nice thing about this approach is that you take a sorted grouped dataframe and can use it with existing `cogroup` or `flatMapGroups` methods, without introducing new methods. Down-side is that you might confuse that sorting with aggregation functions. I thought, the documentation on `sortWithinGroups` the would be sufficient to make that clear: https://github.com/apache/spark/pull/37551/files#diff-3437bb4bcaf2e18c305978985e474daab11e397dc5f4666c13c8e11da0d7180b Returns a new [[KeyValueGroupedDataset]] with each group sorted by the given expressions. Operations that provide an iterator that contains all of the elements in a group will then provide a sorted iterator (flatMapGroups, mapGroups, cogroup). Alternative is to add `flatMapSortedGroups` and `cogroupSorted` with extra sorting parameters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
