[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

GitBox Tue, 17 Nov 2020 21:10:37 -0800


c21 commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-729427520



   Just to add my thought, and would like to get opinions from previous authors 
and reviewers in this area for sure.
   
   * In terms of scalability, reliability, etc, full outer join stores same 
amount of data in state store, compared to left outer/right outer/inner join. 
The only difference is full outer join would output more rows (all rows from 
both stream sides). So I think from system perspective, we probably can support 
it, similar to other joins.
   
   * The motivation from my side is that, I am investigating the performance 
difference between spark structured streaming (micro-batch) vs other internal 
streaming systems, and plan to add some more stuff on top of it to support 
internal use cases. So it would be good to have a basic full outer join 
implementation here (merge to upstream instead of forking for each spark 
version upgrade for me).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

Reply via email to