[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841079#comment-16841079 ] Gabor Bota commented on HADOOP-15999: - [~fabbri] that is covered in HADOOP-16279 and HADOOP-16184. You can find the test cases under HADOOP-16184 PR. HADOOP-16184 can be fixed after creating the metadata expiry for all MS entries in HADOOP-16279. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > HADOOP-15999.009.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840846#comment-16840846 ] Sean Mackrory commented on HADOOP-15999: {quote} Seems like a common case{quote} Yeah that's actually the exact case that motivated this ticket, IIRC... > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > HADOOP-15999.009.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840844#comment-16840844 ] Aaron Fabbri commented on HADOOP-15999: --- What about in-band delete (create tombstone) and then OOB create? Didn't see this covered in the cases here but maybe I missed it. Seems like a common case (OOB process dropping data in bucket). Might want to add a test case and document this in a new JIRA if it is not already covered here. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > HADOOP-15999.009.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804150#comment-16804150 ] Hudson commented on HADOOP-15999: - FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16299 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16299/]) HADOOP-15999. S3Guard: Better support for out-of-band operations. (stevel: rev b5db2383832881034d57d836a8135a07a2bd1cf4) * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java * (add) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3GuardOutOfBandOperations.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java * (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java * (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractGetFileStatusTest.java > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > HADOOP-15999.009.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796148#comment-16796148 ] Gabor Bota commented on HADOOP-15999: - updated pull request with the directory skipping: [https://github.com/apache/hadoop/pull/624] successful itests run against Ireland > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > HADOOP-15999.009.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793031#comment-16793031 ] Steve Loughran commented on HADOOP-15999: - really close to getting in, ran lots of tests, am happy. I tried adding a new test but failed and gave up HADOOP-16193 is the outcome there. One more change to request: skip going to s3 if the file checked is a directory. Because if the dest is also a directory, there's no difference. Pro: misses out the two failing HEAD calls and an expensive LIST whose output is discarded Con: doesn't catch up on the special failure case: someone has taken a directory path /a/b/ and overwritten it with a file /a/b . ' If we did want to worry about that, then rather than doing the whole s3GetFileStatus call, we only need to execute a single getObjectMetadata for the key "a/b" and, if something is actually there do an update. That would still be hitting the store, but it'd only be doing 1/3 as many requests > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > HADOOP-15999.009.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792794#comment-16792794 ] Steve Loughran commented on HADOOP-15999: - bq. (note to myself: if I use low ddb.table.capacity.read and forget to modify it on the dashboard tests will timeout and fail) try switching to PAYG capacity. I have; there's a few tests we need to fix, but otherwise all seems well. Regarding the tests timing out *rather than failing*, I consider that a success of HADOOP-15426: no matter how overloaded things are, your client shouldn't fail > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > HADOOP-15999.009.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792645#comment-16792645 ] Gabor Bota commented on HADOOP-15999: - verify ran successfuly against ireland. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > HADOOP-15999.009.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791892#comment-16791892 ] Gabor Bota commented on HADOOP-15999: - I get random failures/timeouts with integration tests but seems unrelated. I'll run the test tomorrow again and describe the failures if still persist. I'm going to create the proposed issues now. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > HADOOP-15999.009.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790839#comment-16790839 ] Hadoop QA commented on HADOOP-15999: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 29s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 56m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HADOOP-15999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12962164/HADOOP-15999.009.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7dbe88a9baf5 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 34b1406 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/16045/testReport/ | | Max. process+thread count | 323 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/16045/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL:
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790770#comment-16790770 ] Gabor Bota commented on HADOOP-15999: - fixed the ordering in patch 009. I will create those issues you've proposed tomorrow. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > HADOOP-15999.009.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787977#comment-16787977 ] Steve Loughran commented on HADOOP-15999: - This is ~ready to go in; I've only got one change to the code (see the bottom). I do think we need to be sure that we've got all opportunties for inconsistencies to arise covered, and I'm now considering performance too. h3. Deletions I think the whole of S3Guard is potentially brittle to * OOB deletions: you skip it here, so no worse, but because the S3AInputStream retries on FNFE, so as to "debounce" cached 404s, it's potentially going to retry forever * OOB creation of a file which has a deletion tombstone marker. You are already documenting this, so the next step to think about is code: *Proposed*: write a test to simulate that deletion problem, to see what happens. I'm actually curious now. We ought to have the S3AInputStream retry briefly on that initial GET failing, but only on that initial one. (after setting "fs.s3a.retry.limit" to something low & the interval down to 10ms or so to fail fast) sequences {code} 1. create; delete; open; read -> fail after retry 2. create; open, read, delete, read -> fail fast on second read {code} The StoreStatistics of the filesystem's IGNORED_ERRORS stat will be increased on the ignored error, so on sequence 1 will have increased, whereas on sequence 2 it will not have. If either of these tests don't quite fail as expected, we can disable the tests and continue, at least now with some tests to simulate a condition we don't have a fix for *Proposed* add a JIRA on this for us all to worry about. For both we just need to have some model of how long it takes for debouncing to stabilise. Then in this new check, if an FNFE is raised *and* the check is happening > (modtime+ debounce-delay) then its a real FNFE. h3. Timestamp ordering I'm going to add a new complication here. When you initiate a PUT, AFAIK (and [~Thomas Demoor] should be able to confirm), the modified time is that of the time the PUT began, not when the PUT completed. Which means I can have a workflow of {code} write1 = fs.create(path, true) write2 = fs.create(path, true) write2.close() status = fs.getFileStatus(path) write1.write(128MB of data) write1.close() status2 = fs.getFileStatus(path) assertTrue(status2.getLastModified() < status1.getLastModified()) {code} There's no way we are going to be able to defend against that except by tracking versions in the DDB tables, and the S3a Status including that when known. What we'll have to do then is make sure that this issue is documented today, and for the extension to do tag tracking in S3Guard, it keeps an eye on versions *Proposed*: mention this problem in the docs. Once version tracking goes in to s3guard, we'll need to move the ("is newer than") operator out of this modtime check into somewhere else (proposed: do it in the S3AFileStatus, which will look @ version info if set, falling back to etags). (actually, if version checking is on in the GET, we'd never see the updated file, would we?) h3. Performance impact this is going to reinstate the HEAD on every read, so making non-auth S3Guard a bit slower. We could think about addressing that by moving the checks into the input stream itself. That is: the first GET which returns data will also act as the metadata check. That'd mean the read context will need updating with some "metastoreProcessHeader" callback to invoke on the first GET. *Proposed*: Add a JIRA for this to become an optimization The good news is that because it's reading a file, its only one HTTP HEAD request: no need to do any of the other two directory probes except in the case that the file isn't there. h2. code review h3. ITestS3GuardOutOfBandOperations Check your import ordering: new files are where we should start off with getting things "correct" according to our style rules. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784464#comment-16784464 ] Hadoop QA commented on HADOOP-15999: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 27s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 54m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HADOOP-15999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12961163/HADOOP-15999.008.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0bd51f8b96a3 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0aefe28 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/16016/testReport/ | | Max. process+thread count | 341 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/16016/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL:
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784393#comment-16784393 ] Gabor Bota commented on HADOOP-15999: - Uploaded patch 8. Integration tests run against Ireland without issues. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, > out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784299#comment-16784299 ] Gabor Bota commented on HADOOP-15999: - I'm thinking about to use a {{static}} cache in LocalMS. Basically it would emulate the same behaviour as in dynamo: whenever we create a new metadatastore instance, the content of the underlying cache would be the same for all ms. It's likely that I will create an issue for that if I run this "reference tho the LocalMS is lost" issue agan, and go through all tests to fix this. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784294#comment-16784294 ] Gabor Bota commented on HADOOP-15999: - I figured it out: the timeout in the LocalMS for file metadata was set to 10s and the test was running longer than this. Because of the timeout the metadata was no longer available from the local metadata store, so we hot the exception on {{getFileStatus.}} The {{DEFAULT_S3GUARD_METASTORE_LOCAL_ENTRY_TTL}} will be increased to 60s, so no there will be no more issues with this kind of test. We can increase this value safely as the LocalMS is an only testing implementation. I'll run all other tests with dynamo && local && null. If everything's ok I'll upload a new patch. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783633#comment-16783633 ] Gabor Bota commented on HADOOP-15999: - It was really me - so running the tests in my ide with the setting: {noformat} fs.s3a.s3guard.test.implementation local {noformat} Running the same test with *dynamo* everything passes. Turned out the reason for *NPE*s when using local was we had the issue with the reference for the localms again. When we rebuild the fs or build a new fs instance we have to set the same cache and the NPEs are gone. After fixing the NPEs the next issue is {{java.util.concurrent.ExecutionException: java.io.FileNotFoundException:}} - only for *local* again. In {{expectExceptionWhenReadingOpenFileAPI}} when the following is called: {code:java} try (FSDataInputStream in = guardedFs.openFile(testFilePath).build().get()) { intercept(FileNotFoundException.class, () -> { byte[] bytes = new byte[text.length()]; return in.read(bytes, 0, bytes.length); }); } {code} The *{{FSDataInputStream in = guardedFs.openFile(testFilePath).build().get()}}* throws *FNFE*, and that's even before it's expected. That means there's something wrong going on with open file API is used. I don't have a clue right now why would this happen just when using local and not when using dynamo, but I need to figure it out. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783325#comment-16783325 ] Steve Loughran commented on HADOOP-15999: - {code} java.lang.NullPointerException at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.overwriteFileInListing(ITestS3GuardOutOfBandOperations.java:319) at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.testListingSameLengthOverwrite(ITestS3GuardOutOfBandOperations.java:211) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} you get to see what is null at that line. Maybe add an extra assert above to help debug > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783322#comment-16783322 ] Steve Loughran commented on HADOOP-15999: - {code} java.util.concurrent.ExecutionException: java.io.FileNotFoundException: No such file or directory: s3a://cloudera-dev-gabor-ireland/OutOfBandDelete-abc0f4f4-741e-4e25-b6bf-3e60b180e6b4 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.expectExceptionWhenReadingOpenFileAPI(ITestS3GuardOutOfBandOperations.java:439) at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.expectExceptionWhenReading(ITestS3GuardOutOfBandOperations.java:427) at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.outOfBandDeletes(ITestS3GuardOutOfBandOperations.java:240) at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.testOutOfBandDeletes(ITestS3GuardOutOfBandOperations.java:206) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: No such file or directory: s3a://cloudera-dev-gabor-ireland/OutOfBandDelete-abc0f4f4-741e-4e25-b6bf-3e60b180e6b4 at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2526) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2420) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2331) at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:863) at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$null$18(S3AFileSystem.java:3764) at org.apache.hadoop.util.LambdaUtils.eval(LambdaUtils.java:52) at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$openFileWithOptions$19(S3AFileSystem.java:3763) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more {code} maybe its an observed inconsistency. if the check for status for a file fails second time round, you'll have to think about using the {{s3guardInvoker}} we use for file reading to check this. This uses `S3GuardExistsRetryPolicy` to retry on FNFE. Until now we've assumed that if the entry is in DDB then we can open the file, and its the `S3aInputSTream.reopen()` which gets to handle OOB deletion. Now it'll need to be done earlier on. Or: not. If the file isn't there, carry on with the open and expect the read to handle it, at the specific place where it is needed. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781817#comment-16781817 ] Gabor Bota commented on HADOOP-15999: - Sure, here are the stacktraces: https://gist.github.com/bgaborg/4378fd13cf9ee8dab9475274a6dd251d > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781712#comment-16781712 ] Steve Loughran commented on HADOOP-15999: - other tests run parametrized fine; worked for me in intellij. So there is a problem here it's not related to parameterization, more in how JUnit runs under the IDE are different from those of external runner. Can you stick the stack traces up. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780436#comment-16780436 ] Gabor Bota commented on HADOOP-15999: - -1 for the latest (007) patch, because 8 out of the 12 new tests are failing in my IDE. CLI runs clear, but I had no problems with running test before in my IDE like this. Will debug more and provide another method maybe without using {{Parameterized.class}} > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779400#comment-16779400 ] Gabor Bota commented on HADOOP-15999: - Downloaded the patch and looking into the tombstone problem. > S3Guard: Better support for out-of-band operations > -- > > Key: HADOOP-15999 > URL: https://issues.apache.org/jira/browse/HADOOP-15999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Gabor Bota >Priority: Major > Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, > HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, > HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch > > > S3Guard was initially done on the premise that a new MetadataStore would be > the source of truth, and that it wouldn't provide guarantees if updates were > done without using S3Guard. > I've been seeing increased demand for better support for scenarios where > operations are done on the data that can't reasonably be done with S3Guard > involved. For example: > * A file is deleted using S3Guard, and replaced by some other tool. S3Guard > can't tell the difference between the new file and delete / list > inconsistency and continues to treat the file as deleted. > * An S3Guard-ed file is overwritten by a longer file by some other tool. When > reading the file, only the length of the original file is read. > We could possibly have smarter behavior here by querying both S3 and the > MetadataStore (even in cases where we may currently only query the > MetadataStore in getFileStatus) and use whichever one has the higher modified > time. > This kills the performance boost we currently get in some workloads with the > short-circuited getFileStatus, but we could keep it with authoritative mode > which should give a larger performance boost. At least we'd get more > correctness without authoritative mode and a clear declaration of when we can > make the assumptions required to short-circuit the process. If we can't > consider S3Guard the source of truth, we need to defer to S3 more. > We'd need to be extra sure of any locality / time zone issues if we start > relying on mod_time more directly, but currently we're tracking the > modification time as returned by S3 anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations
[ https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778758#comment-16778758 ] Hadoop QA commented on HADOOP-15999: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 18s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 1 new + 6 unchanged - 0 fixed = 7 total (was 6) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 31s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 55m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HADOOP-15999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12960256/HADOOP-15999-007.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 411e690ab4b1 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9192f71 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/15981/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/15981/testReport/ | | Max. process+thread count | 341 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/15981/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This