itholic opened a new pull request, #47518: URL: https://github.com/apache/spark/pull/47518
### What changes were proposed in this pull request? This PR proposes to raise proper error for `dropDuplicates` when wrong `subset` is given ### Why are the changes needed? Current error message is hard to understand since it raises unrelated `INTERNAL_ERROR`: ```python >>> df.dropDuplicates(None) [INTERNAL_ERROR] Undefined error message parameter for error class: '_LEGACY_ERROR_TEMP_1201', MessageTemplate: Cannot resolve column name "<colName>" among (<fieldNames>)., Parameters: Map(colName -> null, fieldNames -> name, age) SQLSTATE: XX000 ``` ### Does this PR introduce _any_ user-facing change? No API changes, but the user-facing error message is improved: ```python >>> df.dropDuplicates(None) [NOT_STR] Argument `subset` should be a str, got NoneType. ``` ### How was this patch tested? Added UTs. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
