[GitHub] [spark] EnricoMi commented on a diff in pull request #37551: [SPARK-38591][SQL] Add sortWithinGroups to KeyValueGroupedDataset

GitBox Wed, 18 Jan 2023 00:19:56 -0800


EnricoMi commented on code in PR #37551:
URL: https://github.com/apache/spark/pull/37551#discussion_r1073217390



##########
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala:
##########
@@ -119,13 +130,66 @@ class KeyValueGroupedDataset[K, V] private[sql](
         Project(groupingAttributes, logicalPlan)))
   }
 
+  /**
+   * Returns a new [[KeyValueGroupedDataset]] with each group sorted by the 
given expressions.
+   * Operations that provide an iterator that contains all of the elements in 
a group will
+   * then provide a sorted iterator (flatMapGroups, mapGroups, cogroup).
+   *
+   * This is not supported for streaming Datasets (mapGroupsWithState, 
flatMapGroupsWithState).
+   *
+   * @tparam S The type of the sort value. Must be encodable to Spark SQL 
types.
+   * @param sortBy A function that provides a sort value for each row.
+   * @param direction The sort direction.
+   *
+   * @since 3.4.0
+   */
+  def sortWithinGroups[S: Encoder](
+      sortBy: V => S, direction: SortDirection = Ascending): 
KeyValueGroupedDataset[K, V] = {

Review Comment:
   This was inspired by `groupByKey`. You are right, equality (`groupByKey`) 
and order are two different things. since there is no equivalent 
`Dataset.sortBy` method, I'll remove this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] EnricoMi commented on a diff in pull request #37551: [SPARK-38591][SQL] Add sortWithinGroups to KeyValueGroupedDataset

Reply via email to