nikolamand-db commented on code in PR #46180:
URL: https://github.com/apache/spark/pull/46180#discussion_r1601334716
##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -245,29 +599,26 @@ public static StringSearch getStringSearch(
* Returns the collation id for the given collation name.
*/
public static int collationNameToId(String collationName) throws
SparkException {
- String normalizedName = collationName.toUpperCase();
- if (collationNameToIdMap.containsKey(normalizedName)) {
- return collationNameToIdMap.get(normalizedName);
- } else {
- Collation suggestion = Collections.min(List.of(collationTable),
Comparator.comparingInt(
- c -> UTF8String.fromString(c.collationName).levenshteinDistance(
- UTF8String.fromString(normalizedName))));
-
- Map<String, String> params = new HashMap<>();
- params.put("collationName", collationName);
- params.put("proposal", suggestion.collationName);
-
- throw new SparkException(
- "COLLATION_INVALID_NAME",
SparkException.constructMessageParams(params), null);
- }
+ return Collation.CollationSpec.collationNameToId(collationName);
+ }
+
+ public static Collation fetchCollationUnsafe(int collationId) throws
SparkException {
+ return Collation.CollationSpec.fetchCollation(collationId);
}
public static Collation fetchCollation(int collationId) {
- return collationTable[collationId];
+ try {
+ return fetchCollationUnsafe(collationId);
+ } catch (SparkException e) {
+ return Collation.CollationSpecUTF8Binary.UTF8_BINARY_COLLATION;
+ }
Review Comment:
The idea for this function is that it is free of exceptions because we
assume internal implementation will always call the function with valid
collation id parameter obtained earlier by parsing collation name string. We
forbid the user to explicitly pass collation id to `StringType` by marking this
constructor as private.
However, internal fetch with collation id does potentially throw an
exception. So by returning `UTF8_BINARY` if the error does occur (which would
indicate code logic problems - internal error) we don't need to change the
signature of this function to throw an exception and propagate the change to
numerous places where function is called (mainly in `CollationSupport`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]