[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-494106274 Closing this PR and kicking off a new one This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-492739025 just pushed out my addition of a bulk update context all the way through the commit operation; it's not yet wired up for the commit phase and too many needless PUTs are being generated. This does represent my model of how we can have an ongoing rename/commit operation where we know not to create duplicate parents Not incorporated andrew's comments...will do that separately. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-491055414 Test failure: interesting. Will fix. For people watching this, rename seems happy, but I don't like the way every put() is creating the entire list of parent entries and pushing them out, even for bulk operations like rename into a directory tree or commits into a tree. Both of these generate needless write load, one of O(files + depth(files) plan: the notion of a rename operation is expanded to cover a "bulk add operation" which is also initiated for the commits" -it'll track which entries have already been created and not reissue them. There's a separate JIRA for commit load; there's enough commonality to cover it here too. Not going to touch this until May 14; doing review of other people's work and some backporting until then ## reviews of this PR as is welcome! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-490219003 +testing. S3 ireland (versioned store), with/without auth. Also: local. I've expressed interest in removing the local mode as its a distraction in tests (it doesn't match production, what does it prove?): the changes here have the potential to amplify that mismatch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-490218551 This patch is now of a state where it is ready for review * it's going to have to be changed to keep up with the S3Guard versioning patches so I'm hoping to nurture those in, but the incompatibilities are related to the type of FileStatus passed around & general git merge problems, rather than functional conflict. There's one production side improvement I'd like to add. This new patch does the move incrementally: whenever you add a file we call s3guard.move(null, dest-file-status) to add the destination (and ancestors), on a bulk delete we update the deletes, But: that move(List, List) call creates all the parent paths, relying on a hash table to avoid duplicates,. Once you move to single-file additions then both that and metastore.put() are creating too many entries due to their need to meet the goal of "no duplicates". I want to restore the original behavior by passing in to the metastore the map being built up in the rename tracker, so it knows what already exists. (Note: this all needs to be done thread safely, so that when > 1 copy completes...I don't want the locks for that to also block other updates to the metastore) This isn't a functionality change, it's a performance and cost improvement, one designed to keep those DDB write IOPs down. ## Please take a look at the code as it stands. The architecture is based on my [refactoring S3A](https://github.com/steveloughran/engineering-proposals/blob/master/refactoring-s3a.md) doc -the new classes are designed to work with the new `StoreContext` class; the metastore moves with this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-487529747 @harshavardhana I'm not making any promises of performance. Already on large files the AWS SDK transfer manager breaks things up into 128MB units, so what we gain here is the copying files in parallel above that, which is probably best for smaller files. I'm looking at resilience of failures and S3Guard consistency —that's the key driver and takes priority over the speedups. Note that time to rename on AWS will always been O(data), even if we do more files in parallel; it won't be atomic. For high performance commit algorithms you need something like the S3A committers, or S3-first data structures like Apache Iceberg This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-487165270 looks like the last version doesn't compile. oops This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-485048733 squashed entire patch into one This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-485047484 OK, some bit of the patch history is causing confusion ``` Checking patch hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/ContractTestUtils.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInstrumentation.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Statistic.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AExceptionTranslation.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/MultiObjectDeleteSupport.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ExtraAssertions.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AExceptionTranslation.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestPartialRenamesDeletes.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestPartialDeleteFailures.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/MultiObjectDeleteSupport.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestPartialRenamesDeletes.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestPartialDeleteFailures.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ExtraAssertions.java => hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/test/ExtraAssertions.java... error: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ExtraAssertions.java: does not exist in index Checking patch hadoop-tools/hadoop-aws/pom.xml... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/RoleTestUtils.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestPartialRenamesDeletes.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/MultiObjectDeleteSupport.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/RoleTestUtils.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestPartialRenamesDeletes.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestPartialDeleteFailures.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/MultiObjectDeleteSupport.java... Checking patch hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContext.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java... Checking patch hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestPartialDeleteFailures.java... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-485047355 Having weirdness here as yetus can't apply patch. Local attempt (over SFO airport wifi) ``` > dev-support/bin/smart-apply-patch --project=hadoop GH:654 Processing: GH:654 GITHUB PR #654 is being downloaded at Fri 19 Apr 2019 18:14:35 PDT from https://github.com/apache/hadoop/pull/654 Patch from GITHUB PR #654 is being downloaded at Fri 19 Apr 2019 18:14:36 PDT from https://github.com/apache/hadoop/pull/654.patch ERROR: Aborting! GH:654 cannot be verified. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-485046885 not applying to trunk...don't understand this. Rebased This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename
steveloughran commented on issue #654: HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/654#issuecomment-480226353 Note, layout of packages & aspects of arch are based on https://github.com/steveloughran/engineering-proposals/blob/master/refactoring-s3a.md , primarily * moving multidelete support into its own module in .impl & testing in isolation This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org