c21 commented on pull request #30395: URL: https://github.com/apache/spark/pull/30395#issuecomment-729427520
Just to add my thought, and would like to get opinions from previous authors and reviewers in this area for sure. * In terms of scalability, reliability, etc, full outer join stores same amount of data in state store, compared to left outer/right outer/inner join. The only difference is full outer join would output more rows (all rows from both stream sides). So I think from system perspective, we probably can support it, similar to other joins. * The motivation from my side is that, I am investigating the performance difference between spark structured streaming (micro-batch) vs other internal streaming systems, and plan to add some more stuff on top of it to support internal use cases. So it would be good to have a basic full outer join implementation here (merge to upstream instead of forking for each spark version upgrade for me). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
