[GitHub] [spark] cloud-fan commented on a diff in pull request #37551: [SPARK-38591][SQL] Add sortWithinGroups to KeyValueGroupedDataset

GitBox Tue, 17 Jan 2023 21:21:44 -0800


cloud-fan commented on code in PR #37551:
URL: https://github.com/apache/spark/pull/37551#discussion_r1073105645



##########
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala:
##########
@@ -119,13 +130,66 @@ class KeyValueGroupedDataset[K, V] private[sql](
         Project(groupingAttributes, logicalPlan)))
   }
 
+  /**
+   * Returns a new [[KeyValueGroupedDataset]] with each group sorted by the 
given expressions.
+   * Operations that provide an iterator that contains all of the elements in 
a group will
+   * then provide a sorted iterator (flatMapGroups, mapGroups, cogroup).
+   *
+   * This is not supported for streaming Datasets (mapGroupsWithState, 
flatMapGroupsWithState).
+   *
+   * @tparam S The type of the sort value. Must be encodable to Spark SQL 
types.
+   * @param sortBy A function that provides a sort value for each row.
+   * @param direction The sort direction.
+   *
+   * @since 3.4.0
+   */
+  def sortWithinGroups[S: Encoder](
+      sortBy: V => S, direction: SortDirection = Ascending): 
KeyValueGroupedDataset[K, V] = {

Review Comment:
   I feel it's tricky to define the sort order with a class `S`. Shall we 
respect its `compareTo` method? There is no `Dataset.sortBy` with a class 
either. Shall we remove this overload?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #37551: [SPARK-38591][SQL] Add sortWithinGroups to KeyValueGroupedDataset

Reply via email to