cloud-fan commented on code in PR #47149:
URL: https://github.com/apache/spark/pull/47149#discussion_r1901452155
##########
sql/api/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -354,7 +354,23 @@ object functions {
def collect_list(columnName: String): Column =
collect_list(Column(columnName))
/**
- * Aggregate function: returns a set of objects with duplicate elements
eliminated.
+ * Aggregate function: returns a list of objects with duplicates.
+ *
+ * The parameter ignoreNulls controls if nulls should be excluded from the
result.
+ *
+ * @note
+ * The function is non-deterministic because the order of collected
results depends on the
+ * order of the rows which may be non-deterministic after a shuffle.
+ *
+ * @group agg_funcs
+ * @since 4.0.0
+ */
+ def collect_list(e: Column, ignoreNulls: Column): Column =
Review Comment:
I think it's better to follow the existing style. Many functions in this
file (sort_array, first, etc.) take a boolean parameter of the corresponding
SQL function only accept boolean constant. cc @HyukjinKwon
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]