uros-db commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-1991590743
@cloud-fan yes, that is a problem... should we settle only on `string functions` for now? I think these functions that are meant to work with Strings are more sensitive to this error on a more important note, even if we were to update `StringType.acceptsType`, it would still not solve the problem of collation support level (introduced in this PR) that would prevent passing correctly matched arguments to a function that simply does not (yet) support that particular collation - we will be needing this much in the near future, and while we're at it - overriding `checkInputTypes` seems to solve both while type coercion is a separate effort, and will probably cover other parts of the codebase, what do we think about implementing this for now? @dbatomic > a bit more context for readers: for now, everything in the codebase that supports `StringType` will take `StringType(#)` (any collation) and treat it as the default collation (UTF8_BINARY); this is especially problematic for string functions that essentially return incorrect results without warning -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
