Re: [PR] [WIP][SPARK-48156][SQL] Eliminate unnecessary COLLATE expressions in query analysis [spark]

via GitHub Tue, 07 May 2024 00:58:24 -0700


mihailom-db commented on code in PR #46421:
URL: https://github.com/apache/spark/pull/46421#discussion_r1591982876



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -3943,6 +3943,16 @@ object EliminateUnions extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Removes [[Collate]] expressions if the input is already the correct type.
+ */
+object EliminateCollates extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions {
+    case Collate(child, collation) if 
child.dataType.sameType(StringType(collation)) =>

Review Comment:
   This seems like a nice solution, but unfortunately, in this setting we can't 
do it. If we just blindly remove `Collate` expressions from any place we will 
change a meaning of `StringType` priority. In other words, it will not have 
explicit meaning, but only implicit/default which is not correct, as user 
specifically said COLLATE in their query.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [WIP][SPARK-48156][SQL] Eliminate unnecessary COLLATE expressions in query analysis [spark]

Reply via email to