[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-530748126 # +1 As well as all the automated tests, did some manual command line operations. * empty args * command without -check * -check without path * against store marked as auth but with incomplete MS * after doing an import, same store * empty store * unguarded store All outcomes were as expected I'm happy with this ## Followup One of the changes with the HADOOP-16430 PR is that we now have an S3A FS method `boolean allowAuthoritative(final Path path) ` that takes a path and returns true iff its authoritative either by the MS being auth *or* the given path being marked as one of the authoritative dirs. I think the validation when an authoritative directory is consistent between the metastore and S3 should be using this when it wants to highlight an authoritative path is inconsistent. This can be a follow-on patch, because as usual it will need more tests, in the code, and someone to try out the command line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-530564476 ok, full test with ddb non-auth happy; repeating with auth Tomorrow I'll build the CLI and run the various manual operations which were failing This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-530380233 my branch with changes to the assertion https://github.com/steveloughran/hadoop/tree/incoming/HADOOP-16423-fsck-log-changes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-530357583 last test run: ``` [ERROR] Failures: [ERROR] ITestS3GuardFsck.testIDetectParentTombstoned:194->assertComparePairsSize:452 [Number of compare pairs] expected:<[1]> but was:<[2]> [ERROR] Errors: [ERROR] ITestS3GuardFsck.testIAuthoritativeDirectoryContentMismatch:292->checkForViolationInPairs:474 ยป NoSuchElement [INFO] [INFO] Running org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat [ERROR] Tests run: 12, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 30.746 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck [ERROR] testIDetectParentTombstoned(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) Time elapsed: 8.026 s <<< FAILURE! org.junit.ComparisonFailure: [Number of compare pairs] expected:<[1]> but was:<[2]> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.assertComparePairsSize(ITestS3GuardFsck.java:452) at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIDetectParentTombstoned(ITestS3GuardFsck.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) [ERROR] testIAuthoritativeDirectoryContentMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) Time elapsed: 4.626 s <<< ERROR! java.util.NoSuchElementException: No value present at java.util.Optional.get(Optional.java:135) at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.checkForViolationInPairs(ITestS3GuardFsck.java:474) at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIAuthoritativeDirectoryContentMismatch(ITestS3GuardFsck.java:292) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services -
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-52599 test runs before your last commit. First fine; second with -Dauth failed ``` [ERROR] Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.371 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck [ERROR] testIAuthoritativeDirectoryContentMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) Time elapsed: 3.461 s <<< ERROR! java.util.NoSuchElementException: No value present at java.util.Optional.get(Optional.java:135) at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIAuthoritativeDirectoryContentMismatch(ITestS3GuardFsck.java:403) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-529980416 OK, done a quick scan. Changes all LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-528905689 Overall then, last iteration has a working CLI * One of the tests is brittle in parallel runs * fsck must return an error code on a failure for scripts and tests * modtime handling needs to be tuned (followup?) + no docs that I can see This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-528894617 OK, latest review is good with etags; modtime is something we can worry about as an extra iteration. Tested on a store which is set up for auth listings and is clearly considered inconsistent. It warns me of this, maybe in too much detail. ``` ... 2019-09-06 15:51:25,549 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - On path: s3a://hwdev-steve-london/fork-0006/test/ITestS3AContractDistCp/testTrackDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir The content of an authoritative directory listing does not match the content of the S3 listing. S3: [[S3AFileStatus{path=s3a://hwdev-steve-london/fork-0006/test/ITestS3AContractDistCp/testTrackDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/subDir1; isDirectory=true; modification_time=0; access_time=0; owner=stevel; group=stevel; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=null versionId=null, S3AFileStatus{path=s3a://hwdev-steve-london/fork-0006/test/ITestS3AContractDistCp/testTrackDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir/subDir2; isDirectory=true; modification_time=0; access_time=0; owner=stevel; group=stevel; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=null versionId=null]], MS: [] 2019-09-06 15:51:25,549 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(144)) - Total scan time: 3s 2019-09-06 15:51:25,549 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(145)) - Scanned entries: 51 ~/P/R/fsck echo $status 0 ``` It's noisy, but, well, that could be tuned a bit by cutting back on the amount of the S3AFileStatus fields to print. What is clear is: it found real problems where the DDB was incomplete. But: exit code was still 0. I think we should return something in the case where there is a mismatch of this scale This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-528883425 The testCLIFsckWithParam test works standalone. Looks to me like a race condition -the fsck is being performed on an active bucket and files have been deleted between listed and queued for scanning and the actual scan. 1. Test should only scan the test directory, or we handle FNFEs as something to ignore 2. Could the fsck code itself change here? Because on a stable bucket the FNFE could be a sign of a mismatch from DDB to store; you don't want them ignored. But: the failure could be delayed and reported as a missing file, rather than triggering a fast failure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-528873116 and ``` [ERROR] Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 25.342 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck [ERROR] testIVersionIdMismatch(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck) Time elapsed: 1.591 s <<< FAILURE! java.lang.AssertionError: [Violations in the childPair] Expecting: <[ETAG_MISMATCH, LENGTH_MISMATCH, MOD_TIME_MISMATCH]> to contain: <[VERSIONID_MISMATCH]> but could not find: <[VERSIONID_MISMATCH]> at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardFsck.testIVersionIdMismatch(ITestS3GuardFsck.java:589) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-528871326 got a test run failure in a new test ``` testCLIFsckWithParam(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB) Time elapsed: 8.028 s <<< ERROR! java.io.FileNotFoundException: No such file or directory: s3a://hwdev-steve-ireland-new/fork-0004/test at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2788) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2677) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2571) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:2360) at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$10(S3AFileSystem.java:2339) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2339) at org.apache.hadoop.fs.s3a.s3guard.S3GuardFsck.compareS3ToMs(S3GuardFsck.java:115) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$Fsck.run(S3GuardTool.java:1560) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:402) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1763) at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.run(AbstractS3GuardToolTestBase.java:137) at org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB.testCLIFsckWithParam(ITestS3GuardToolDynamoDB.java:301) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-528867021 Did another test run on an unversioned bucket where the the DDB table was built up with an ls -R, so filled up straight from S3. all checks happy (e.g modtime) but still warning of null etags on all directories, including the root one. ``` 2019-09-06 14:59:23,893 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - On path: s3a://hwdev-steve-london/ No etag. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-528866219 e.g ``` bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d 2019-09-06 14:51:46,377 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized. 2019-09-06 14:51:46,705 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d - Length S3: 0, MS: 0 - Etag S3: null, MS: null 2019-09-06 14:51:46,764 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d/example.sh - Length S3: 3880, MS: 3880 - Etag S3: c7dbe1b877a287175df9dfc32c226765, MS: c7dbe1b877a287175df9dfc32c226765 2019-09-06 14:51:46,792 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - On path: s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d No etag. 2019-09-06 14:51:46,792 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - On path: s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d/example.sh getModificationTime mismatch - s3: 1567773091000, ms: 1567773090841 2019-09-06 14:51:46,792 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(144)) - Total scan time: 0s 2019-09-06 14:51:46,792 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(145)) - Scanned entries: 2 ~/P/R/fsck echo $status 0 ``` The good news: the return code is 0; it passed the scan. So these are just info rather than warn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-528864203 OK, latest patch is better. * still warns of no etag on a dir; * when you pass a path to a file it is scanned twice * I think we need to be able to disable the modtime checks, because you tend to get them whenever you create an entry after writing a file (system clock is used); they get updated on the first read. Or: we allow a range of accuracy? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-528858428 working on the CLI, but overreporting errors, especially on versioning Warns of no etag on directories; there's no need to check here ``` 2019-09-06 13:46:33,571 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - On path: s3a://hwdev-steve-ireland-new/etc/hadoop No etag. ``` Reports mismatch on a directory scan, where the listing doesn't include the versions. Maybe this is just the time mismatch triggering the reporting, in which case it is misleading ``` 2019-09-06 13:46:33,572 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - On path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml getModificationTime mismatch - s3: 1567773093000, ms: 1567773092204 getVersionId mismatch - s3: null, ms: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t ```The ddb table has the version ID, but I'm assuming that the scan doesn't get them from S3 because we'd need to use HEAD over LIST. When I give the full path, it says there's a mismatch but now prints the same value on both sides. This is not a mismatch and should not appear. ``` ~/P/R/fsck bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml 2019-09-06 13:59:52,857 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized. 2019-09-06 13:59:53,057 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml - Length S3: 8260, MS: 8260 - Etag S3: 2887e7740b821abd405e6a5c70d2081e, MS: 2887e7740b821abd405e6a5c70d2081e 2019-09-06 13:59:53,115 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml - Length S3: 8260, MS: 8260 - Etag S3: 2887e7740b821abd405e6a5c70d2081e, MS: 2887e7740b821abd405e6a5c70d2081e 2019-09-06 13:59:53,142 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - On path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml getModificationTime mismatch - s3: 1567773093000, ms: 1567773092204 getVersionId mismatch - s3: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t, ms: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t 2019-09-06 13:59:53,142 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - On path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml getModificationTime mismatch - s3: 1567773093000, ms: 1567773092204 getVersionId mismatch - s3: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t, ms: AfxJ3agigvhWyYhkCVXikPCpgx1C5z1t 2019-09-06 13:59:53,142 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(144)) - Total scan time: 0s 2019-09-06 13:59:53,142 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareS3ToMs(145)) - Scanned entries: 2 ``` Note also that the file gets scanned twice. This hints at the scanning playing up when the supplied path is a file, not a dir. Now I open the file with `hadoop fs -cat s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml`; there's a PUT to the DDB table as the modtime is updated; the next scan doesn't report modtime issues, but it does still mistakenly report the version IDs are different. ``` bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml 2019-09-06 14:33:59,582 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized. 2019-09-06 14:33:59,773 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml - Length S3: 8260, MS: 8260 - Etag S3: 2887e7740b821abd405e6a5c70d2081e, MS: 2887e7740b821abd405e6a5c70d2081e 2019-09-06 14:33:59,828 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(217)) - Path: s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml - Length S3: 8260, MS: 8260 - Etag S3: 2887e7740b821abd405e6a5c70d2081e, MS: 2887e7740b821abd405e6a5c70d2081e 2019-09-06 14:33:59,856 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(79)) - On path: s3a:
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) URL: https://github.com/apache/hadoop/pull/1208#issuecomment-523927611 ## Overall there are checks, but as it doesnt recurse from me it's hard to valide them. The UX can be improved. I propose: * for successful entries, print their details as they are processed, such as length and etag. * failure to initialize the fs to include the error. * print the total duration of the check, number of entries scanned. ## operations failed on root entry. ``` bin/hadoop s3guard fsck -check s3a://guarded-table/ 2019-08-22 14:17:37,973 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized. == Path: s3a://guarded-table/ 2019-08-22 14:17:38,160 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(220)) - Entry is in the root, so there's no parent == Path: s3a://guarded-table/example 2019-08-22 14:17:38,189 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) - On path: s3a://guarded-table/ No etag. ``` with a path, I got the same message twice ``` bin/hadoop s3guard fsck -check s3a://guarded-table/example 2019-08-22 14:19:26,674 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized. == Path: s3a://guarded-table/example == Path: s3a://guarded-table/example ``` missing file. ``` bin/hadoop s3guard fsck -check s3a://guarded-table/example/missing 2019-08-22 14:21:23,252 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized. java.io.FileNotFoundException: No such file or directory: s3a://guarded-table/example/missing 2019-08-22 14:21:23,404 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 44: java.io.FileNotFoundException: No such file or directory: s3a://guarded-table/example/missing ``` This is good. Add a test for it. s3a://bucket/.. This is bad. Add test and then fix. ``` bin/hadoop s3guard fsck -check s3a://guarded-table/.. 2019-08-22 14:23:14,640 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized. org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3a://guarded-table/..: com.amazonaws.services.s3.model.AmazonS3Exception: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null:400 Invalid URI: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:237) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:164) at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2732) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2694) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2587) at org.apache.hadoop.fs.s3a.s3guard.S3GuardFsck.compareS3RootToMs(S3GuardFsck.java:94) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$Fsck.run(S3GuardTool.java:1560) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:402) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1759) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.main(S3GuardTool.java:1768) Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113) at com.amazonaws.http.AmazonHttpClient$Requ