neilramaswamy commented on code in PR #44323:
URL: https://github.com/apache/spark/pull/44323#discussion_r1582300122


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala:
##########
@@ -219,10 +222,41 @@ object StreamingSymmetricHashJoinHelper extends Logging {
           attributesWithEventWatermark = 
AttributeSet(otherSideInputAttributes),
           condition,
           eventTimeWatermarkForEviction)
-        val inputAttributeWithWatermark = 
oneSideInputAttributes.find(_.metadata.contains(delayKey))
-        val expr = watermarkExpression(inputAttributeWithWatermark, 
stateValueWatermark)
-        expr.map(JoinStateValueWatermarkPredicate.apply _)
 
+        // If the condition itself is empty (for example, left_time < 
left_time + INTERVAL ...),
+        // then we will not have generated a stateValueWatermark.
+        if (stateValueWatermark.isEmpty) {
+          None
+        } else {
+          // For example, if the condition is of the form:
+          //    left_time > right_time + INTERVAL 30 MINUTES
+          // Then this extracts left_time and right_time.
+          val attributesInCondition = AttributeSet(
+            condition.get.collect { case a: AttributeReference => a }
+          )
+
+          // Construct an AttributeSet so that we can perform equality between 
attributes,
+          // which we do in the filter below.
+          val oneSideInputAttributeSet = AttributeSet(oneSideInputAttributes)
+
+          // oneSideInputAttributes could be [left_value, left_time], and we 
just
+          // want the attribute _in_ the time-interval condition.
+          val oneSideStateWatermarkAttributes = attributesInCondition.filter { 
a =>
+            oneSideInputAttributeSet.contains(a)
+          }
+
+          // There should be a single attribute per side in the time-interval 
condition, so,
+          // filtering for oneSideInputAttributes as done above should lead us 
with 1 attribute.
+          if (oneSideStateWatermarkAttributes.size == 1) {
+            val expr =
+              watermarkExpression(Some(oneSideStateWatermarkAttributes.head), 
stateValueWatermark)
+            expr.map(JoinStateValueWatermarkPredicate.apply _)
+          } else {
+            // This should never happen, since the grammar will ensure that we 
have one attribute

Review Comment:
   Good question. I thought more about this, and I actually think that I might 
be wrong in the case of an edge-case we don't have in any of our tests: if the 
user does:
   
   `left_time > right_time + m AND other_left_time > right_time + n`, there 
will be _three_ attributes in the condition. Then, 
`oneSideStateWatermarkAttributes.size` will be 2 (it will be `left_time` and 
`other_left_time`, neither of which are watermark attributes), and the 
condition that we need to return would be a conjunctive watermark predicate: 
`left_time <= watermark(right) + m AND other_left_time <= watermark(right) + n`.
   
   We can remove state, but I'm pretty sure the current implementation in Spark 
master would fail. I need to check this. It might be out-of-scope for this PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to