peter-toth commented on PR #52776: URL: https://github.com/apache/spark/pull/52776#issuecomment-3467056116
> > It makes the quality of the checksum worse. > > I'm worry that some bits may be lost, which could actually affect the reliability of the comparative checksum. Checksum computation is always about losing bits :smile:, the less we lose the better quality checksum we can get. Here we actually combine order agnostic `RowBasedChecksum`s computed by partitions into one final checksum that represents the whole data. Losing 1 more bit doesn't seem like a good idea, but if you have a problematic test case then let's investigate it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
