nikolamand-db commented on code in PR #46180: URL: https://github.com/apache/spark/pull/46180#discussion_r1601334716
########## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ########## @@ -245,29 +599,26 @@ public static StringSearch getStringSearch( * Returns the collation id for the given collation name. */ public static int collationNameToId(String collationName) throws SparkException { - String normalizedName = collationName.toUpperCase(); - if (collationNameToIdMap.containsKey(normalizedName)) { - return collationNameToIdMap.get(normalizedName); - } else { - Collation suggestion = Collections.min(List.of(collationTable), Comparator.comparingInt( - c -> UTF8String.fromString(c.collationName).levenshteinDistance( - UTF8String.fromString(normalizedName)))); - - Map<String, String> params = new HashMap<>(); - params.put("collationName", collationName); - params.put("proposal", suggestion.collationName); - - throw new SparkException( - "COLLATION_INVALID_NAME", SparkException.constructMessageParams(params), null); - } + return Collation.CollationSpec.collationNameToId(collationName); + } + + public static Collation fetchCollationUnsafe(int collationId) throws SparkException { + return Collation.CollationSpec.fetchCollation(collationId); } public static Collation fetchCollation(int collationId) { - return collationTable[collationId]; + try { + return fetchCollationUnsafe(collationId); + } catch (SparkException e) { + return Collation.CollationSpecUTF8Binary.UTF8_BINARY_COLLATION; + } Review Comment: The idea for this function is that it is free of exceptions because we assume internal implementation will always call the function with valid collation id parameter obtained earlier by parsing collation name string. We forbid the user to explicitly pass collation id to `StringType` by marking this constructor as private. However, internal fetch with collation id does potentially throw an exception. So by returning `UTF8_BINARY` if the error does occur (which would indicate code logic problems - internal error) we don't need to change the signature of this function to throw an exception and propagate the change to numerous places where function is called (mainly in `CollationSupport`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org