Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

via GitHub Tue, 11 Feb 2025 08:20:51 -0800


dejankrak-db commented on code in PR #49772:
URL: https://github.com/apache/spark/pull/49772#discussion_r1951165528



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala:
##########
@@ -18,80 +18,48 @@
 package org.apache.spark.sql.catalyst.analysis
 
 import org.apache.spark.sql.catalyst.expressions.{Cast, Expression, Literal}
-import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, 
AlterColumnSpec, AlterViewAs, ColumnDefinition, CreateView, LogicalPlan, 
QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan}
-import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
+import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, 
AlterColumnSpec, AlterTableCommand, AlterViewAs, ColumnDefinition, CreateTable, 
CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, 
V2CreateTablePlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.connector.catalog.TableCatalog
 import org.apache.spark.sql.types.{DataType, StringType}
 
 /**
- * Resolves default string types in queries and commands. For queries, the 
default string type is
- * determined by the session's default string type. For DDL, the default 
string type is the
- * default type of the object (table -> schema -> catalog). However, this is 
not implemented yet.
- * So, we will just use UTF8_BINARY for now.
+ * Resolves string types in DDL commands, where the string type inherits the
+ * collation from the corresponding object (table/view -> schema -> catalog).
  */
-object ResolveDefaultStringTypes extends Rule[LogicalPlan] {
+object ResolveDDLCommandStringTypes extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = {
-    val newPlan = apply0(plan)
-    if (plan.ne(newPlan)) {
-      // Due to how tree transformations work and StringType object being 
equal to
-      // StringType("UTF8_BINARY"), we need to transform the plan twice
-      // to ensure the correct results for occurrences of default string type.
-      val finalPlan = apply0(newPlan)
-      RuleExecutor.forceAdditionalIteration(finalPlan)

Review Comment:
   In terms of potential perf overhead, the hack above which we removed as part 
of this PR still included the remaining logic below by doing apply0(plan) call, 
so by keeping the hack we wouldn't avoid this overhead.
   
   Now, there is a potential for a perf optimization for non-DDL statements to 
avoid checking isDDLCommand(plan) every time, e.g. by tagging such plan on 
first ResolveDDLCommandStringTypes rule processing when isDDLCommand(plan) 
returns false, and then simply checking whether that tag is present in 
subsequent ResolveDDLCommandStringTypes calls for the same plan.
   
   However, given that this PR already grew in size by virtue of reverting all 
the previous logic related to session-level collation, and that it already 
brings value by simplifying the logic, removing previously added hacks and 
introducing DDL collation resolution, while ensuring the correctness without 
increasing potential perf overhead compared to the code that is already merged, 
my proposal would be not to block on this, merge the PR (if there are no other 
outstanding issues) and deliver its value, with following up on potential 
optimizations separately.
   
   @cloud-fan, please let me know what you think and if the proposed approach 
makes sense to you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

Reply via email to