HeartSaVioR edited a comment on issue #23634: [SPARK-26154][SS] Streaming left/right outer join should not return outer nulls for already matched rows URL: https://github.com/apache/spark/pull/23634#issuecomment-457789591 > In update mode it may be ok to emit null value for one side and later when the matching events arrive on the other side the new rows be re-emitted. In this case it produces incorrect result, because matched row will be emitted first, and null-matched row will be emitted later, which may "overwrite" the result and treat the final result as null-matched. IMHO regardless of modes, the final result should be same per key, same result between batch and streaming. Suppose the query is running as a batch query, then null-matched row will never be produced. So I'm not sure this is related to output mode. > So it seems that there is two different watermarks here (one for each input) which seems wrong. Ideally the watermark should be tied to the operator (join) and not separate watermarks for each input so that the operator can compute the result based on its watermark. As I commented earlier to Jose, watermark for late tuple is same across operators. The difference is when to evict rows in state, which I guess it could be according to join condition.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
