dejankrak-db commented on code in PR #49772:
URL: https://github.com/apache/spark/pull/49772#discussion_r1951165528
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala:
##########
@@ -18,80 +18,48 @@
package org.apache.spark.sql.catalyst.analysis
import org.apache.spark.sql.catalyst.expressions.{Cast, Expression, Literal}
-import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns,
AlterColumnSpec, AlterViewAs, ColumnDefinition, CreateView, LogicalPlan,
QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan}
-import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
+import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns,
AlterColumnSpec, AlterTableCommand, AlterViewAs, ColumnDefinition, CreateTable,
CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan,
V2CreateTablePlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.connector.catalog.TableCatalog
import org.apache.spark.sql.types.{DataType, StringType}
/**
- * Resolves default string types in queries and commands. For queries, the
default string type is
- * determined by the session's default string type. For DDL, the default
string type is the
- * default type of the object (table -> schema -> catalog). However, this is
not implemented yet.
- * So, we will just use UTF8_BINARY for now.
+ * Resolves string types in DDL commands, where the string type inherits the
+ * collation from the corresponding object (table/view -> schema -> catalog).
*/
-object ResolveDefaultStringTypes extends Rule[LogicalPlan] {
+object ResolveDDLCommandStringTypes extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = {
- val newPlan = apply0(plan)
- if (plan.ne(newPlan)) {
- // Due to how tree transformations work and StringType object being
equal to
- // StringType("UTF8_BINARY"), we need to transform the plan twice
- // to ensure the correct results for occurrences of default string type.
- val finalPlan = apply0(newPlan)
- RuleExecutor.forceAdditionalIteration(finalPlan)
Review Comment:
In terms of potential perf overhead, the hack above which we removed as part
of this PR still included the remaining logic below by doing apply0(plan) call,
so by keeping the hack we wouldn't avoid this overhead.
Now, there is a potential for a perf optimization for non-DDL statements to
avoid checking isDDLCommand(plan) every time, e.g. by tagging such plan on
first ResolveDDLCommandStringTypes rule processing when isDDLCommand(plan)
returns false, and then simply checking whether that tag is present in
subsequent ResolveDDLCommandStringTypes calls for the same plan.
However, given that this PR already grew in size by virtue of reverting all
the previous logic related to session-level collation, and that it already
brings value by simplifying the logic, removing previously added hacks and
introducing DDL collation resolution, while ensuring the correctness without
increasing potential perf overhead compared to the code that is already merged,
my proposal would be not to block on this, merge the PR (if there are no other
outstanding issues) and deliver its value, with following up on potential
optimizations separately.
@cloud-fan, please let me know what you think and if the proposed approach
makes sense to you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]