zhidongqu-db commented on code in PR #55629:
URL: https://github.com/apache/spark/pull/55629#discussion_r3184581851


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -2541,14 +2542,36 @@ object CheckCartesianProducts extends Rule[LogicalPlan] 
with PredicateHelper {
     }
   }
 
-  def apply(plan: LogicalPlan): LogicalPlan =
+  def apply(plan: LogicalPlan): LogicalPlan = {
     if (conf.crossJoinEnabled) {
-      plan
-    } else plan.transformWithPruning(_.containsAnyPattern(INNER_LIKE_JOIN, 
OUTER_JOIN))  {
+      return plan
+    }
+
+    // Joins synthesized by `RewriteNearestByJoin` are an intentional, bounded 
cross-product
+    // wrapped by a `MaxMinByK` aggregate. Identify them by their unambiguous 
post-rewrite
+    // signature -- `Aggregate(_, exprs, Join(_, _, LeftOuter, None, _))` 
where `exprs`
+    // contains a `MaxMinByK` -- and skip them so user queries written as 
`NEAREST BY` are not
+    // rejected when `spark.sql.crossJoin.enabled = false`. We use structural 
detection rather
+    // than a `TreeNodeTag` because a tag set on the `Join` would be silently 
dropped by any
+    // intervening optimizer rule that constructs a fresh `Join` via the 
case-class
+    // constructor without calling `copyTagsFrom`.
+    val nearestByJoins: java.util.IdentityHashMap[Join, Unit] = {
+      val acc = new java.util.IdentityHashMap[Join, Unit]()
+      plan.foreach {

Review Comment:
   I agree with @sigmod here we should probably let it fail in this case - 
users should explicitly set `spark.sql.crossJoin.enabled = true` to use NEAREST 
BY join



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to