jovanpavl-db commented on PR #48545: URL: https://github.com/apache/spark/pull/48545#issuecomment-2443694564
> Does it make sense to use the Levenshtein distance for suggestions like we do in `UNRESOLVED_FIELD.WITH_SUGGESTION`, see > > https://github.com/apache/spark/blob/bb15eb7b91ab775bdb84b6b17353a706794b122d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala#L75-L97 I went through this implementation and it seems useful, however it can be quite problematic in our case to do suggestion on specifiers (we currently want 100% match on them because they are usually 2 chars long), for example if someone puts _C you can suggest _CI, _CS and even some UNICODE collations like ci. For every wrong specifier we can have multiple suggestions so it can really explode. That's why I think we (currently) stayed out of business of doing suggestions on specifiers but rather whole collation. We can definitely rethink this approach in the future at least for the some of specifiers like trim. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
