dilipbiswal commented on code in PR #55629:
URL: https://github.com/apache/spark/pull/55629#discussion_r3177345024


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -657,6 +657,29 @@ trait CheckAnalysis extends LookupCatalog with 
QueryErrorsBase with PlanToString
                 messageParameters = Map.empty)
             }
 
+          // Reject streaming inputs early. The optimizer rewrite introduces
+          // `MonotonicallyIncreasingID()`, which is per-batch only and would 
silently produce
+          // incorrect results across micro-batches; failing at analysis time 
is clearer than
+          // letting the streaming check fire on an incidental MID node.
+          case j: NearestByJoin if j.isStreaming =>

Review Comment:
   @zhidongqu-db Thanks for the suggestion. I would try to address this in a 
folllow-up.



##########
sql/core/src/test/resources/sql-tests/inputs/join-nearest-by.sql:
##########
@@ -0,0 +1,57 @@
+-- Test cases for NEAREST BY top-K ranking join.
+

Review Comment:
   Have added the tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to