mihailom-db commented on code in PR #47364:
URL: https://github.com/apache/spark/pull/47364#discussion_r1751315215
##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -923,4 +1063,13 @@ public static String getClosestSuggestionsOnInvalidName(
return String.join(", ", suggestions);
}
+
+ public static List<CollationIdentifier> listCollations(String catalog,
String schema) {
+ return Collation.CollationSpec.listCollations(catalog, schema);
Review Comment:
I would maybe say lets keep catalog and schema outside of CollationFactory.
How I see it is that we have sessionCatalog which has some catalog/schema
defined and it is listing collations that it has under it. This way we keep
Collations simple objects that only depend on the actual info they need to be
constructed.
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala:
##########
@@ -1112,4 +1112,29 @@ class SparkSqlAstBuilder extends AstBuilder {
withIdentClause(ctx.identifierReference(), UnresolvedNamespace(_)),
cleanedProperties)
}
+
+ /**
+ * Create a [[ShowCollationsCommand]] command.
+ * Expected format:
+ * {{{
+ * SHOW identifier? COLLATIONS ((FROM | IN) ns=identifierReference)?
(LIKE? pattern=stringLit);
+ * }}}
+ */
+ override def visitShowCollations(ctx: ShowCollationsContext): LogicalPlan =
withOrigin(ctx) {
+ val ns = if (ctx.ns != null) {
+ withIdentClause(ctx.ns, UnresolvedNamespace(_))
+ } else {
+ CurrentNamespace
+ }
+ val (userScope, systemScope) = Option(ctx.identifier)
+ .map(_.getText.toLowerCase(Locale.ROOT)) match {
+ case None | Some("all") => (true, true)
+ case Some("system") => (false, true)
+ case Some("user") =>
+ throw QueryParsingErrors.showCollationsUnsupportedError("user",
ctx.identifier())
Review Comment:
Could we maybe remove this scoping totally and just return the current
namespace? I would argue for now it is not necessary to allow any additional
feature that we do not actually need or have implemented. So simple SHOW
COLLATIONS (LIKE pattern)? should be enough and we can just keep
ShowCollationsCommand to use CurrentNamespace, userScope=false,
systemScope=true. What do you think about that? This way I would say we are
keeping the idea of what we will do when user defined collations are in, but
will not pollute users with unnecessary errors.
##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -704,6 +809,39 @@ private String collationName() {
}
return builder.toString();
}
+
+ private static List<String> allCollationNames() {
+ List<String> collationNames = new ArrayList<>();
+ for (String locale: ICULocaleToId.keySet()) {
+ // CaseSensitivity.CS + AccentSensitivity.AS
+ collationNames.add(locale);
+ // CaseSensitivity.CI + AccentSensitivity.AS
+ collationNames.add(locale + "_CI");
+ // CaseSensitivity.CS + AccentSensitivity.AI
+ collationNames.add(locale + "_AI");
Review Comment:
nit: order these alphabetically first _AI then _CI
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]