Re: [PR] [SPARK-55322][SQL] `MaxBy` and `MinBy` Overload with K Elements [spark]

via GitHub Wed, 18 Feb 2026 10:28:12 -0800


viirya commented on code in PR #54134:
URL: https://github.com/apache/spark/pull/54134#discussion_r2823816165



##########
sql/api/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -989,6 +1019,38 @@ object functions {
    */
   def min_by(e: Column, ord: Column): Column = Column.fn("min_by", e, ord)
 
+  /**
+   * Aggregate function: returns an array of values associated with the bottom 
`k` values of
+   * `ord`.
+   *
+   * The result array contains values in ascending order by their associated 
ordering values.
+   *
+   * @note
+   *   The function is non-deterministic because the order of collected 
results depends on the
+   *   order of the rows which may be non-deterministic after a shuffle when 
there are ties in the
+   *   ordering expression.

Review Comment:
   Please also note that max limitation of `k` for the functions in this file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55322][SQL] `MaxBy` and `MinBy` Overload with K Elements [spark]

Reply via email to