Hey all, I've been playing around with Hudi for a little while now. Really like it! Thanks for all the work :-)
I do have a question about S3 and consistency: How does Hudi get around eventual consistency in S3? Particularly in the case of metadata files. I can see there is a ConsistencyGuard[1] which ensures that the JVM Thread its run in can see a path, however it isn't clear to me that this would be valid across a system. If a writer 'A' performs an action which requires a rename for example how can we ensure that readers B and C see the newly renamed file? Or even that nodes across reader B (eg a spark cluster) see the same file content? To me this is checking if an object is visible from a particular thread rather than checking the eventual consistency restrictions of S3[2]. People have gone to great lengths to get around S3s consistency issues as well [3][4]. Apologies if this is a naive question, I am still grappling with the Hudi commit model. Best, Ryan [1] https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/fs/ConsistencyGuard.java [2] https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel [3] https://github.com/Netflix/s3mper [4] https://docs.delta.io/latest/delta-storage.html#amazon-s3
