Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Change the delimiter parameter of listagg scala functions from Column to String [spark]

via GitHub Tue, 11 Feb 2025 04:31:24 -0800


zhengruifeng commented on PR #49879:
URL: https://github.com/apache/spark/pull/49879#issuecomment-2650680171

https://github.com/apache/spark/pull/49879#issuecomment-2650528939

@yaooqinn The problem is that spark doesn't provide a consistent string
argument handling, the same argument in very similar functions can be treated
in different ways.

For example,
https://github.com/apache/spark/blob/59dd406ffab6f7df7f36fe7befe121822e68bf00/python/pyspark/sql/functions/builtin.py#L18495-L18499

And this inconsistency actually caused unexpected results:

A user changed his code from `element_at(c, "a")` to `try_element_at(c,
"a")`, and the query still ran successfully but generated unexpected results,
because the input dataframe has column 'a'. That is why I fixed such type hint
and added some notes like this.

There are 500+ functions APIs and column APIs, we cannot expected users
always check the API references.

With `Column` argument, users can exactly express what they want `col("a")`
or `lit("a")`. The query may fail and SQL engine tells what happened, but won't
silently generate _wrong_ results.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Change the delimiter parameter of listagg scala functions from Column to String [spark]

Reply via email to