AnishMahto commented on PR #56311: URL: https://github.com/apache/spark/pull/56311#issuecomment-4672605945
@jose-torres @hhhhhazelnut for review. About ~400 LOC of actual logic, which is still fairly large - sorry in advance. I don't think I can reasonably break this up into further PRs without losing coherency, but hopefully the PR description makes it pretty clear what's going on. The logic here isn't ground breaking but is getting into the heart of SCD2 microbatch reconciliation. The sort order of affected rows is especially central to how the algorithm works, all other logic follows fairly naturally if we agree on sort order. I put thought into best-effort gracefully handling edge cases like duplicate sequence events (including full row duplicates). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
