Github user fuyufjh commented on the issue:
https://github.com/apache/spark/pull/21890
Hi @attilapiros , before going on adding test cases (actually I am doing
this right now), I think there is one more thing need to be figure out first.
As this PR wrote, I want to support stream-stream join in update mode by
let it behaves exactly same as in append mode. This is totally fine for inner
join, but not so straight forward for outer join.
For example:
Assuming watermark delay is set to 10 minutes, we run a query like `A left
outer join B`, while event `A1` comes at 10:01 and event `B1` comes at 10:02.
In append mode, of course, `A1` will wait for `B1` to produce a join result
`A1-B1` at 10:02.
However, in update mode, we can keep this behavior, ***OR*** take actions
as following, which looks also reasonable but some kind of costly:
1. Emit an `A1-null` at 10:01.
2. Emit an `A1-B1` at 10:02 when B1 appears, and expect the data sink to
write over previous result.
So which is the *correct* behavior? - The same way as append mode, or the
above way. Please let me know your opinion.
cc. @jose-torres @tdas
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]