Thanks Balaji, That makes a lot of sense. I haven't seen any issues in my testing, I am just trying to understand all the edge cases.
I suppose the only theoretical issue is that a reader may not see the most recent update for the writer but that would be a rare and transient occurrence in real life. Best, Ryan On Thu, Aug 13, 2020 at 5:38 PM Balaji Varadarajan <[email protected]> wrote: > Hey Ryan, > > Thanks for the detailed writeup and great job explaining the question and > the links :) > > W.r.t Renaming, Hudi avoids renaming metadata files altogether and creates > immutable metadata filenames encoded with state of the commit. > > Generally, We believe some of the consistency solutions out there have > been written in early days of S3 when the guarantees were not well > estabilished/understood. > > S3 consistency guard in Hudi has been fairly battle-tested for a while by > the community now in their production cluser. Are you seeing any specific > issues in your setup ? > > Once again thanks for your interest in Hudi > > Balaji.V > On Wednesday, August 12, 2020, 10:35:05 AM PDT, Ryan Murray < > [email protected]> wrote: > > > Hey all, > > I've been playing around with Hudi for a little while now. Really like it! > Thanks for all the work :-) > > I do have a question about S3 and consistency: How does Hudi get around > eventual consistency in S3? Particularly in the case of metadata files. > > I can see there is a ConsistencyGuard[1] which ensures that the JVM Thread > its run in can see a path, however it isn't clear to me that this would be > valid across a system. > > If a writer 'A' performs an action which requires a rename for example how > can we ensure that readers B and C see the newly renamed file? Or even that > nodes across reader B (eg a spark cluster) see the same file content? > > To me this is checking if an object is visible from a particular thread > rather than checking the eventual consistency restrictions of S3[2]. People > have gone to great lengths to get around S3s consistency issues as well > [3][4]. > > Apologies if this is a naive question, I am still grappling with the Hudi > commit model. > > Best, > Ryan > > [1] > https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/fs/ConsistencyGuard.java > [2] > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel > [3] https://github.com/Netflix/s3mper > [4] https://docs.delta.io/latest/delta-storage.html#amazon-s3 >
