Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

via GitHub Fri, 06 Sep 2024 06:17:33 -0700


mihailom-db commented on code in PR #47364:
URL: https://github.com/apache/spark/pull/47364#discussion_r1747099741



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala:
##########
@@ -1112,4 +1112,28 @@ class SparkSqlAstBuilder extends AstBuilder {
       withIdentClause(ctx.identifierReference(), UnresolvedNamespace(_)),
       cleanedProperties)
   }
+
+  /**
+   * Create a [[ShowCollationsCommand]] command.
+   * Expected format:
+   * {{{
+   *   SHOW identifier? COLLATIONS ((FROM | IN) ns=identifierReference)? 
(LIKE? pattern=stringLit);
+   * }}}
+   */
+  override def visitShowCollations(ctx: ShowCollationsContext): LogicalPlan = 
withOrigin(ctx) {
+    val ns = if (ctx.ns != null) {
+      withIdentClause(ctx.ns, UnresolvedNamespace(_))
+    } else {
+      CurrentNamespace
+    }
+    val (userScope, systemScope) = Option(ctx.identifier)
+      .map(_.getText.toLowerCase(Locale.ROOT)) match {
+      case None | Some("all") => (true, true)
+      case Some("system") => (false, true)
+      case Some("user") => (true, false)

Review Comment:
   Please block this case as well for now. User defined collations are not 
supported.



##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -923,4 +1076,12 @@ public static String getClosestSuggestionsOnInvalidName(
 
     return String.join(", ", suggestions);
   }
+
+  public static List<CollationIdentifier> listCollations(String catalog, 
String schema) {
+    return Collation.CollationSpec.listCollations(catalog, schema);
+  }
+
+  public static Collation loadCollation(CollationIdentifier 
collationIdentifier) {

Review Comment:
   I would say we need to avoid passing Collation object anywhere for default 
collations. The problem is that we are storing these in the hashmap when 
loadCollation is called and we would just overload the map. Could we maybe 
create a record like you did and pass that as a result. Also I do not think we 
need to add schema and catalog to Collation in the current implementation, as 
show collations has that information from before.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala:
##########
@@ -1112,4 +1112,15 @@ class SparkSqlAstBuilder extends AstBuilder {
       withIdentClause(ctx.identifierReference(), UnresolvedNamespace(_)),
       cleanedProperties)
   }
+
+  /**
+   * Create a [[ShowCollationsCommand]] command.
+   * Expected format:
+   * {{{
+   *   SHOW COLLATIONS (LIKE? pattern)?;
+   * }}}
+   */
+  override def visitShowCollations(ctx: ShowCollationsContext): LogicalPlan = 
withOrigin(ctx) {

Review Comment:
   I agree with that. What do you think about using ShowCollations as a new 
LogicalPlan node and then resolve that to either ShowCollationsCommand or 
ShowCollationsExec, as ShowCollationsExec represents a v1 path of execution. 
(Something similar was done for Functions)



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ShowCollationsCommand.scala:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis.ResolvedNamespace
+import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util.CollationFactory.Collation
+import org.apache.spark.sql.types.StringType
+
+/**
+ * A command for `SHOW COLLATIONS`.
+ *
+ * The syntax of this command is:
+ * {{{
+ *    SHOW identifier? COLLATIONS ((FROM | IN) ns=identifierReference)? (LIKE? 
pattern=stringLit);
+ * }}}
+ */
+case class ShowCollationsCommand(
+    ns: LogicalPlan,
+    userScope: Boolean,
+    systemScope: Boolean,
+    pattern: Option[String]) extends UnaryRunnableCommand {
+
+  override val output: Seq[Attribute] = Seq(
+    AttributeReference("COLLATION_CATALOG", StringType, nullable = false)(),
+    AttributeReference("COLLATION_SCHEMA", StringType, nullable = false)(),
+    AttributeReference("COLLATION_NAME", StringType, nullable = false)(),
+    AttributeReference("LANGUAGE", StringType)(),
+    AttributeReference("COUNTRY", StringType)(),
+    AttributeReference("ACCENT_SENSITIVITY", StringType, nullable = false)(),
+    AttributeReference("CASE_SENSITIVITY", StringType, nullable = false)(),
+    AttributeReference("PAD_ATTRIBUTE", StringType, nullable = false)(),
+    AttributeReference("ICU_VERSION", StringType)())
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+    val ResolvedNamespace(catalog, ns, _) = child
+
+    val systemCollations: Seq[Collation] = if (systemScope) {
+      sparkSession.sessionState.catalog.listCollations(pattern)
+    } else Seq.empty
+
+    val userCollations: Seq[Collation] = if (userScope) {

Review Comment:
   This looks great. Could we maybe force UTF8_BINRAY and UTF8_LCASE to be the 
first ones? These are spark internal implementations and I would expect the 
default collation to appear first. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

Reply via email to