HeartSaVioR edited a comment on issue #23634: [SPARK-26154][SS] Streaming 
left/right outer join should not return outer nulls for already matched rows
URL: https://github.com/apache/spark/pull/23634#issuecomment-457789591
 
 
   > In update mode it may be ok to emit null value for one side and later when 
the matching events arrive on the other side the new rows be re-emitted.
   
   In this case it produces incorrect result, because matched row will be 
emitted first, and null-matched row will be emitted later, which may 
"overwrite" the result and treat the final result as null-matched.
   
   IMHO regardless of modes, the final result should be same per key, same 
result between batch and streaming. Suppose the query is running as a batch 
query, then null-matched row will never be produced. So I'm not sure this is 
related to output mode.
   
   > So it seems that there is two different watermarks here (one for each 
input) which seems wrong. Ideally the watermark should be tied to the operator 
(join) and not separate watermarks for each input so that the operator can 
compute the result based on its watermark.
   
   As I commented earlier to Jose, watermark for late tuple is same across 
operators. The difference is when to evict rows in state, which I guess it 
could be according to join condition.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to