Hey all,

I've been playing around with Hudi for a little while now. Really like it!
Thanks for all the work :-)

I do have a question about S3 and consistency: How does Hudi get around
eventual consistency in S3? Particularly in the case of metadata files.

I can see there is a ConsistencyGuard[1] which ensures that the JVM Thread
its run in can see a path, however it isn't clear to me that this would be
valid across a system.

If a writer 'A' performs an action which requires a rename for example how
can we ensure that readers B and C see the newly renamed file? Or even that
nodes across reader B (eg a spark cluster) see the same file content?

To me this is checking if an object is visible from a particular thread
rather than checking the eventual consistency restrictions of S3[2]. People
have gone to great lengths to get around S3s consistency issues as well
[3][4].

Apologies if this is a naive question, I am still grappling with the Hudi
commit model.

Best,
Ryan

[1]
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/fs/ConsistencyGuard.java
[2]
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel
[3] https://github.com/Netflix/s3mper
[4] https://docs.delta.io/latest/delta-storage.html#amazon-s3

Reply via email to