srowen commented on a change in pull request #31189:
URL: https://github.com/apache/spark/pull/31189#discussion_r558585438



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -620,6 +620,30 @@ object PushFoldableIntoBranches extends Rule[LogicalPlan] 
with PredicateHelper {
   }
 }
 
+/**
+ * Remove duplicated case when branches.
+ */
+object RemoveDuplicatedBranches extends Rule[LogicalPlan] with PredicateHelper 
{
+
+  private def contains(branches: Seq[(Expression, Expression)], elem: 
(Expression, Expression)) = {
+    branches.exists { case (condExpr, valueExpr) =>
+      condExpr.semanticEquals(elem._1) && valueExpr.semanticEquals(elem._2)
+    }
+  }
+
+  private def deduplicate[T](branches: Seq[(Expression, Expression)]) = {
+    branches.foldLeft(Seq.empty[(Expression, Expression)]) { (seq, elem) =>

Review comment:
       In the usual case, where there are no duplications, this becomes an 
O(N^2) check. I wonder if that's going to be an issue for large numbers of 
branches, which are not uncommon. (That said... realistically you might find 
10s, 100s of branches at worst? the real problem might be in generated SQL 
where some tool generates 1000s of branches)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to