zhengruifeng commented on PR #49879: URL: https://github.com/apache/spark/pull/49879#issuecomment-2650680171
https://github.com/apache/spark/pull/49879#issuecomment-2650528939 @yaooqinn The problem is that spark doesn't provide a consistent string argument handling, the same argument in very similar functions can be treated in different ways. For example, https://github.com/apache/spark/blob/59dd406ffab6f7df7f36fe7befe121822e68bf00/python/pyspark/sql/functions/builtin.py#L18495-L18499 And this inconsistency actually caused unexpected results: A user changed his code from `element_at(c, "a")` to `try_element_at(c, "a")`, and the query still ran successfully but generated unexpected results, because the input dataframe has column 'a'. That is why I fixed such type hint and added some notes like this. There are 500+ functions APIs and column APIs, we cannot expected users always check the API references. With `Column` argument, users can exactly express what they want `col("a")` or `lit("a")`. The query may fail and SQL engine tells what happened, but won't silently generate _wrong_ results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
