WweiL commented on code in PR #44076:
URL: https://github.com/apache/spark/pull/44076#discussion_r1411160900


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala:
##########
@@ -637,18 +653,22 @@ case class StreamingSymmetricHashJoinExec(
         thisRow: UnsafeRow,
         subIter: Iterator[InternalRow])
       extends CompletionIterator[InternalRow, Iterator[InternalRow]](subIter) {
-
+      // scalastyle:off
       private val iteratorNotEmpty: Boolean = super.hasNext
 
       override def completion(): Unit = {
         val isLeftSemiWithMatch =
           joinType == LeftSemi && joinSide == LeftSide && iteratorNotEmpty
         // Add to state store only if both removal predicates do not match,
         // and the row is not matched for left side of left semi join.
+        println(s"!stateKeyWatermarkPredicateFunc(key): 
${!stateKeyWatermarkPredicateFunc(key)}" +
+          s" !stateValueWatermarkPredicateFunc(thisRow): 
${!stateValueWatermarkPredicateFunc(thisRow)}")
         val shouldAddToState =
           !stateKeyWatermarkPredicateFunc(key) && 
!stateValueWatermarkPredicateFunc(thisRow) &&
           !isLeftSemiWithMatch
         if (shouldAddToState) {
+          println(s"wei==add to state: $thisRow")

Review Comment:
   So what happens here is in the no data batch, the wm of 
`stateKeyWatermarkPredicateFunc` is updated to the new global wm (8), the 
emitted key from both parent window aggregations are [0, 5). Hence 
`stateKeyWatermarkPredicateFunc(key)` returns true, meaning that the window 
should not be added to the join state store.
   
   This is wrong, because the two [0, 5) windows should be joined here. 
   
   This looks like some updates to the multiple state operators that we need to 
consider



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to