jovanpavl-db commented on PR #48545:
URL: https://github.com/apache/spark/pull/48545#issuecomment-2443694564

   > Does it make sense to use the Levenshtein distance for suggestions like we 
do in `UNRESOLVED_FIELD.WITH_SUGGESTION`, see
   > 
   > 
https://github.com/apache/spark/blob/bb15eb7b91ab775bdb84b6b17353a706794b122d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala#L75-L97
   
   I went through this implementation and it seems useful, however it can be 
quite problematic in our case to do suggestion on specifiers (we currently want 
100% match on them because they are usually 2 chars long), for example if 
someone puts _C you can suggest _CI, _CS and even some UNICODE collations like 
ci. For every wrong specifier we can have multiple suggestions so it can really 
explode. That's why I think we (currently) stayed out of business of doing 
suggestions on specifiers but rather whole collation. We can definitely rethink 
this approach in the future at least for the some of specifiers like trim. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to