AnishMahto commented on PR #56311:
URL: https://github.com/apache/spark/pull/56311#issuecomment-4672605945

   @jose-torres @hhhhhazelnut for review.
   
   About ~400 LOC of actual logic, which is still fairly large - sorry in 
advance. I don't think I can reasonably break this up into further PRs without 
losing coherency, but hopefully the PR description makes it pretty clear what's 
going on.
   
   The logic here isn't ground breaking but is getting into the heart of SCD2 
microbatch reconciliation. The sort order of affected rows is especially 
central to how the algorithm works, all other logic follows fairly naturally if 
we agree on sort order. I put thought into best-effort gracefully handling edge 
cases like duplicate sequence events (including full row duplicates).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to