eejbyfeldt commented on code in PR #47149:
URL: https://github.com/apache/spark/pull/47149#discussion_r1900996338


##########
sql/api/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -354,7 +354,23 @@ object functions {
   def collect_list(columnName: String): Column = 
collect_list(Column(columnName))
 
   /**
-   * Aggregate function: returns a set of objects with duplicate elements 
eliminated.
+   * Aggregate function: returns a list of objects with duplicates.
+   *
+   * The parameter ignoreNulls controls if nulls should be excluded from the 
result.
+   *
+   * @note
+   *   The function is non-deterministic because the order of collected 
results depends on the
+   *   order of the rows which may be non-deterministic after a shuffle.
+   *
+   * @group agg_funcs
+   * @since 4.0.0
+   */
+  def collect_list(e: Column, ignoreNulls: Column): Column =

Review Comment:
   That was how I implemented it originally, but changed it based on comments 
from @HyukjinKwon 
(https://github.com/apache/spark/pull/47149#discussion_r1660331121)



##########
sql/api/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -354,7 +354,23 @@ object functions {
   def collect_list(columnName: String): Column = 
collect_list(Column(columnName))
 
   /**
-   * Aggregate function: returns a set of objects with duplicate elements 
eliminated.
+   * Aggregate function: returns a list of objects with duplicates.
+   *
+   * The parameter ignoreNulls controls if nulls should be excluded from the 
result.
+   *
+   * @note
+   *   The function is non-deterministic because the order of collected 
results depends on the
+   *   order of the rows which may be non-deterministic after a shuffle.
+   *
+   * @group agg_funcs
+   * @since 4.0.0
+   */
+  def collect_list(e: Column, ignoreNulls: Column): Column =

Review Comment:
   That was how I implemented it originally, but changed it based on comments 
from @HyukjinKwon 
https://github.com/apache/spark/pull/47149#discussion_r1660331121



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to