cloud-fan commented on code in PR #47149:
URL: https://github.com/apache/spark/pull/47149#discussion_r1895515932
##########
sql/api/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -354,7 +354,23 @@ object functions {
def collect_list(columnName: String): Column =
collect_list(Column(columnName))
/**
- * Aggregate function: returns a set of objects with duplicate elements
eliminated.
+ * Aggregate function: returns a list of objects with duplicates.
+ *
+ * The parameter ignoreNulls controls if nulls should be excluded from the
result.
+ *
+ * @note
+ * The function is non-deterministic because the order of collected
results depends on the
+ * order of the rows which may be non-deterministic after a shuffle.
+ *
+ * @group agg_funcs
+ * @since 4.0.0
+ */
+ def collect_list(e: Column, ignoreNulls: Column): Column =
Review Comment:
Since the SQL syntax requires users to explicitly specify IGNORE/RESPECT
NULLS, this `ignoreNull` flag must be a constant. Do we really want to allow
`Column` as the `ignoreNull` parameter?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]