srowen commented on issue #24872: [SPARK-28023][SQL] Trim the string when cast string type to Boolean/Numeric types URL: https://github.com/apache/spark/pull/24872#issuecomment-502445215 Presumably right now, _all_ input to these functions doesn't have spaces -- otherwise it would fail. If it's not clear whether the input should be trimmed by these functions from a standards perspective, then I'd say don't make this change at all. Just leave behavior without a compelling reason to change it. If there is, then we need to enforce it. You're right, that leaves users with a possibly redundant trim() in their code. If they know enough to know this, they'd just remove the manual trim(), then -- not undo this 'fix' going forward for future usages. Most people won't know about the flag either way, anyway, if one were added. What's the cost? I put together a crude benchmark of 90% strings that have no whitespace at the ends, and 10% that do. It's 20 nanoseconds per call or so. If I add one extra short-circuit to trim() for that common case, it's 6 nanoseconds. We can at least bring the overhead of the common case down a lot, but it's already very small. I'll propose that change separately anyway. _If_ this change is important, I think a flag isn't necessary. But it may just not be the right behavior change anyway.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
