Re: [PR] [SPARK-48728][SQL] Support ignoreNulls for collect_list and collect_set [spark]

via GitHub Thu, 02 Jan 2025 20:08:10 -0800


cloud-fan commented on code in PR #47149:
URL: https://github.com/apache/spark/pull/47149#discussion_r1901452155



##########
sql/api/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -354,7 +354,23 @@ object functions {
   def collect_list(columnName: String): Column = 
collect_list(Column(columnName))
 
   /**
-   * Aggregate function: returns a set of objects with duplicate elements 
eliminated.
+   * Aggregate function: returns a list of objects with duplicates.
+   *
+   * The parameter ignoreNulls controls if nulls should be excluded from the 
result.
+   *
+   * @note
+   *   The function is non-deterministic because the order of collected 
results depends on the
+   *   order of the rows which may be non-deterministic after a shuffle.
+   *
+   * @group agg_funcs
+   * @since 4.0.0
+   */
+  def collect_list(e: Column, ignoreNulls: Column): Column =

Review Comment:
   I think it's better to follow the existing style. Many functions in this 
file (sort_array, first, etc.) take a boolean parameter of the corresponding 
SQL function only accept boolean constant. cc @HyukjinKwon 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48728][SQL] Support ignoreNulls for collect_list and collect_set [spark]

Reply via email to