[jira] [Commented] (HADOOP-14154) Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)

2018-08-06 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571116#comment-16571116
 ] 

Aaron Fabbri commented on HADOOP-14154:
---

Thanks for the patch.  Running through some tests.. should have some feedback 
in another day or so.

> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Rajesh Balamohan
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch, HADOOP-14154-spec-001.pdf, 
> HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch, 
> HADOOP-14154.003.patch
>
>
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in 
> {{DirListingMetadata.isAuthoritative}}. 
> This topic has been super confusing for folks so I will also file a 
> documentation Jira to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to 
> .isFullListing to eliminate the multiple uses and meanings of the word 
> "authoritative".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15546) ABFS: tune imports & javadocs; stabilise tests

2018-08-06 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571091#comment-16571091
 ] 

Sean Mackrory commented on HADOOP-15546:


As a data point, prior to applying this patch, the ListStatus / FileStatus 
tests work, but the BackCompat test still fails, and the other failures 
mentioned above are either skipped or not being run.

> ABFS: tune imports & javadocs; stabilise tests
> --
>
> Key: HADOOP-15546
> URL: https://issues.apache.org/jira/browse/HADOOP-15546
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: HADOOP-15407
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15546-001.patch, 
> HADOOP-15546-HADOOP-15407-001.patch, HADOOP-15546-HADOOP-15407-002.patch, 
> HADOOP-15546-HADOOP-15407-003.patch, HADOOP-15546-HADOOP-15407-004.patch, 
> HADOOP-15546-HADOOP-15407-005.patch, HADOOP-15546-HADOOP-15407-006.patch, 
> HADOOP-15546-HADOOP-15407-006.patch, HADOOP-15546-HADOOP-15407-007.patch, 
> HADOOP-15546-HADOOP-15407-008.patch, HADOOP-15546-HADOOP-15407-009.patch, 
> azure-auth-keys.xml
>
>
> Followup on HADOOP-15540 with some initial review tuning
> h2. Tuning
> * ordering of imports
> * rely on azure-auth-keys.xml to store credentials (change imports, 
> docs,.gitignore)
> * log4j -> info
> * add a "." to the first sentence of all the javadocs I noticed.
> * remove @Public annotations except for some constants (which includes some 
> commitment to maintain them).
> * move the AbstractFS declarations out of the src/test/resources XML file 
> into core-default.xml for all to use
> * other IDE-suggested tweaks
> h2. Testing
> Review the tests, move to ContractTestUtil assertions, make more consistent 
> to contract test setup, and general work to make the tests work well over 
> slower links, document, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15069) support git-secrets commit hook to keep AWS secrets out of git

2018-08-06 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571059#comment-16571059
 ] 

genericqa commented on HADOOP-15069:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
53s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 31m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
73m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 36m 
41s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
51s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HADOOP-15069 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899103/HADOOP-15069-002.patch
 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 172db90da713 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ca20e0d |
| maven | version: Apache Maven 3.3.9 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14994/artifact/out/whitespace-eol.txt
 |
| Max. process+thread count | 327 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14994/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> support git-secrets commit hook to keep AWS secrets out of git
> --
>
> Key: HADOOP-15069
> URL: https://issues.apache.org/jira/browse/HADOOP-15069
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15069-001.patch, HADOOP-15069-002.patch
>
>
> The latest Uber breach looks like it involved AWS keys in git repos.
> Nobody wants that, which is why amazon provide 
> [git-secrets|https://github.com/awslabs/git-secrets]; a script you can use to 
> scan a repo and its history, *and* add as an automated check.
> Anyone can set this up, but there are a few false positives in the scan, 
> mostly from longs and a few all-upper-case constants. These can all be added 
> to a .gitignore file.
> Also: mention git-secrets in the aws testing docs; say "use it"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15583) Stabilize S3A Assumed Role support

2018-08-06 Thread Larry McCay (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571048#comment-16571048
 ] 

Larry McCay commented on HADOOP-15583:
--

Couple more nits...

* AWSCredentialProviderList.java -

// do this outsize the synchronized block.

s/outsize/outside/

* S3AUtils - 

result= new AWSServiceIOException(message, ddbException);

s/= / = /

 

> Stabilize S3A Assumed Role support
> --
>
> Key: HADOOP-15583
> URL: https://issues.apache.org/jira/browse/HADOOP-15583
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15583-001.patch, HADOOP-15583-002.patch, 
> HADOOP-15583-003.patch, HADOOP-15583-004.patch, HADOOP-15583-005.patch
>
>
> started off just on sharing credentials across S3A and S3Guard, but in the 
> process it has grown to becoming one of stabilising the assumed role support 
> so it can be used for more than just testing.
> Was: "S3Guard to get AWS Credential chain from S3AFS; credentials closed() on 
> shutdown"
> h3. Issue: lack of auth chain sharing causes ddb and s3 to get out of sync
> S3Guard builds its DDB auth chain itself, which stops it having to worry 
> about being created standalone vs part of an S3AFS, but it means its 
> authenticators are in a separate chain.
> When you are using short-lived assumed roles or other session credentials 
> updated in the S3A FS authentication chain, you need that same set of 
> credentials picked up by DDB. Otherwise, at best you are doubling load, at 
> worse: the DDB connector may not get refreshed credentials.
> Proposed: {{DynamoDBClientFactory.createDynamoDBClient()}} to take an 
> optional ref to aws credentials. If set: don't create a new set. 
> There's one little complication here: our {{AWSCredentialProviderList}} list 
> is autocloseable; it's close() will go through all children and close them. 
> Apparently the AWS S3 client (And hopefully the DDB client) will close this 
> when they are closed themselves. If DDB  has the same set of credentials as 
> the FS, then there could be trouble if they are closed in one place when the 
> other still wants to use them.
> Solution; have a use count the uses of the credentials list, starting at one: 
> every close() call decrements, and when this hits zero the cleanup is kicked 
> off
> h3. Issue: {{AssumedRoleCredentialProvider}} connector to STS not picking up 
> the s3a connection settings, including proxy.
> h3. issue: we're not using getPassword() to get user/password for proxy 
> binding for STS. Fix: use that and pass down the bucket ref for per-bucket 
> secrets in a JCEKS file.
> h3. Issue; hard to debug what's going wrong :)
> h3. Issue: docs about KMS permissions for SSE-KMS are wrong, and the 
> ITestAssumedRole* tests don't request KMS permissions, so fail in a bucket 
> when the base s3 FS is using SSE-KMS. KMS permissions need to be included in 
> generated profiles



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14776) clean up ITestS3AFileSystemContract

2018-08-06 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571023#comment-16571023
 ] 

genericqa commented on HADOOP-14776:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
28s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HADOOP-14776 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12882007/HADOOP-14776.01.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5eaeb4a563a6 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ca20e0d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14996/testReport/ |
| Max. process+thread count | 302 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14996/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> clean up ITestS3AFileSystemContract
> ---
>
> Key: HADOOP-14776
> URL: 

[jira] [Commented] (HADOOP-15546) ABFS: tune imports & javadocs; stabilise tests

2018-08-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571012#comment-16571012
 ] 

Steve Loughran commented on HADOOP-15546:
-

[~tmarquardt] the only key conflict that is really surfacing is 
{{fs.azure.test.account.name}}. How about a new key is used there; the old one 
used solely for the Wasb connector suites. 

* {{fs.azure.test.account.name}} set: run wasb tests
* {{fs.abfs.test.account.name}} set: run abfs tests
* both set: run all tests.

makes it easy to do full test runs while doing abfs dev by leaving the azure 
key unset; regression testing of changes in hadoop-common, poms, shared tests 
can be tested by running both.

> ABFS: tune imports & javadocs; stabilise tests
> --
>
> Key: HADOOP-15546
> URL: https://issues.apache.org/jira/browse/HADOOP-15546
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: HADOOP-15407
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15546-001.patch, 
> HADOOP-15546-HADOOP-15407-001.patch, HADOOP-15546-HADOOP-15407-002.patch, 
> HADOOP-15546-HADOOP-15407-003.patch, HADOOP-15546-HADOOP-15407-004.patch, 
> HADOOP-15546-HADOOP-15407-005.patch, HADOOP-15546-HADOOP-15407-006.patch, 
> HADOOP-15546-HADOOP-15407-006.patch, HADOOP-15546-HADOOP-15407-007.patch, 
> HADOOP-15546-HADOOP-15407-008.patch, HADOOP-15546-HADOOP-15407-009.patch, 
> azure-auth-keys.xml
>
>
> Followup on HADOOP-15540 with some initial review tuning
> h2. Tuning
> * ordering of imports
> * rely on azure-auth-keys.xml to store credentials (change imports, 
> docs,.gitignore)
> * log4j -> info
> * add a "." to the first sentence of all the javadocs I noticed.
> * remove @Public annotations except for some constants (which includes some 
> commitment to maintain them).
> * move the AbstractFS declarations out of the src/test/resources XML file 
> into core-default.xml for all to use
> * other IDE-suggested tweaks
> h2. Testing
> Review the tests, move to ContractTestUtil assertions, make more consistent 
> to contract test setup, and general work to make the tests work well over 
> slower links, document, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15546) ABFS: tune imports & javadocs; stabilise tests

2018-08-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571010#comment-16571010
 ] 

Steve Loughran commented on HADOOP-15546:
-

HADOOP-15546 patch 009

* resource azure-bfs-test.xml imports azure-bfs-auth-keys.xml so you can set 
fs.azure.test.account.name to an abfs account
* and the abfs tests check that this is true
* fix test which didn't compile with the touch() call in the superclass
* log4j.properties back to logging all at debug. noisier, but sets you up to 
debug things.
* ITestAzureNativeContractAppend.testAppendDirectory() expects the exception 
raised by that FS

The sole reason for needing a separate {{azure-bfs-auth-keys.xml}} seems to be 
for reusing the {{fs.azure.test.account.name}}
key to refer to an abfs account with the abfs tests. Given this doesn't get 
along with the waze tests, I don't understand why
this is needed. Can't we just have a separate config key , say 
{{fs.abfs.test.account.name}} for the abfs tests? That will
also provide an easy way to skip all the wasb tests in the module: simply 
undefine {{fs.azure.test.account.name}} from the config resources

Testing: wasb against UK; abfs against US. I'm in the US right now and a lot of 
the WASB tests were timing out in a (very slow) parallel test run,
The only real outlier here was {{ITestBlobDataValidation}}, where the assertion 
that MD5 checksums were coming back failed. This happens
repeatedly, even when run as a single test. I have no idea what is going on 
here.
 
 {code}
 [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.424 
s <<< FAILURE! - in org.apache.hadoop.fs.azure.ITestBlobDataValidation
 [ERROR] testCheckBlockMd5(org.apache.hadoop.fs.azure.ITestBlobDataValidation)  
Time elapsed: 2.541 s  <<< ERROR!
 java.io.IOException: java.lang.AssertionError Please see the cause for further 
information.
at 
com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:778)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.flush(BlobOutputStreamInternal.java:533)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at 
org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.close(NativeAzureFileSystem.java:1058)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
at 
org.apache.hadoop.fs.azure.ITestBlobDataValidation.testCheckBlockMd5(ITestBlobDataValidation.java:235)
at 
org.apache.hadoop.fs.azure.ITestBlobDataValidation.testCheckBlockMd5(ITestBlobDataValidation.java:151)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Suppressed: java.io.IOException: java.lang.AssertionError Please see 
the cause for further information.
at 
com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:778)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.flush(BlobOutputStreamInternal.java:533)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.close(BlobOutputStreamInternal.java:317)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
... 17 more
Caused by: java.util.concurrent.ExecutionException: 
java.lang.AssertionError
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.flush(BlobOutputStreamInternal.java:530)
... 19 more
Caused by: java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at 

[jira] [Comment Edited] (HADOOP-15546) ABFS: tune imports & javadocs; stabilise tests

2018-08-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571010#comment-16571010
 ] 

Steve Loughran edited comment on HADOOP-15546 at 8/7/18 1:43 AM:
-

HADOOP-15546 patch 009

* resource azure-bfs-test.xml imports azure-bfs-auth-keys.xml so you can set 
fs.azure.test.account.name to an abfs account
* and the abfs tests check that this is true
* fix test which didn't compile with the touch() call in the superclass
* log4j.properties back to logging all at debug. noisier, but sets you up to 
debug things.
* ITestAzureNativeContractAppend.testAppendDirectory() expects the exception 
raised by that FS

The sole reason for needing a separate {{azure-bfs-auth-keys.xml}} seems to be 
for reusing the {{fs.azure.test.account.name}} key to refer to an abfs account 
with the abfs tests. Given this doesn't get along with the waze tests, I don't 
understand why this is needed. Can't we just have a separate config key , say 
{{fs.abfs.test.account.name}} for the abfs tests? That will also provide an 
easy way to skip all the wasb tests in the module: simply undefine 
{fs.azure.test.account.name}} from the config resources

Testing: wasb against UK; abfs against US. I'm in the US right now and a lot of 
the WASB tests were timing out in a (very slow) parallel test run, 

The only real outlier here was {{ITestBlobDataValidation}}, where the assertion 
that MD5 checksums were coming back failed. This happens repeatedly, even when 
run as a single test. I have no idea what is going on here.
 
 {code}
 [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.424 
s <<< FAILURE! - in org.apache.hadoop.fs.azure.ITestBlobDataValidation
 [ERROR] testCheckBlockMd5(org.apache.hadoop.fs.azure.ITestBlobDataValidation)  
Time elapsed: 2.541 s  <<< ERROR!
 java.io.IOException: java.lang.AssertionError Please see the cause for further 
information.
at 
com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:778)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.flush(BlobOutputStreamInternal.java:533)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at 
org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.close(NativeAzureFileSystem.java:1058)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
at 
org.apache.hadoop.fs.azure.ITestBlobDataValidation.testCheckBlockMd5(ITestBlobDataValidation.java:235)
at 
org.apache.hadoop.fs.azure.ITestBlobDataValidation.testCheckBlockMd5(ITestBlobDataValidation.java:151)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Suppressed: java.io.IOException: java.lang.AssertionError Please see 
the cause for further information.
at 
com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:778)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.flush(BlobOutputStreamInternal.java:533)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.close(BlobOutputStreamInternal.java:317)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
... 17 more
Caused by: java.util.concurrent.ExecutionException: 
java.lang.AssertionError
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.flush(BlobOutputStreamInternal.java:530)
... 19 more
Caused by: java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
 

[jira] [Updated] (HADOOP-15546) ABFS: tune imports & javadocs; stabilise tests

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15546:

Attachment: HADOOP-15546-HADOOP-15407-009.patch

> ABFS: tune imports & javadocs; stabilise tests
> --
>
> Key: HADOOP-15546
> URL: https://issues.apache.org/jira/browse/HADOOP-15546
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: HADOOP-15407
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15546-001.patch, 
> HADOOP-15546-HADOOP-15407-001.patch, HADOOP-15546-HADOOP-15407-002.patch, 
> HADOOP-15546-HADOOP-15407-003.patch, HADOOP-15546-HADOOP-15407-004.patch, 
> HADOOP-15546-HADOOP-15407-005.patch, HADOOP-15546-HADOOP-15407-006.patch, 
> HADOOP-15546-HADOOP-15407-006.patch, HADOOP-15546-HADOOP-15407-007.patch, 
> HADOOP-15546-HADOOP-15407-008.patch, HADOOP-15546-HADOOP-15407-009.patch, 
> azure-auth-keys.xml
>
>
> Followup on HADOOP-15540 with some initial review tuning
> h2. Tuning
> * ordering of imports
> * rely on azure-auth-keys.xml to store credentials (change imports, 
> docs,.gitignore)
> * log4j -> info
> * add a "." to the first sentence of all the javadocs I noticed.
> * remove @Public annotations except for some constants (which includes some 
> commitment to maintain them).
> * move the AbstractFS declarations out of the src/test/resources XML file 
> into core-default.xml for all to use
> * other IDE-suggested tweaks
> h2. Testing
> Review the tests, move to ContractTestUtil assertions, make more consistent 
> to contract test setup, and general work to make the tests work well over 
> slower links, document, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15546) ABFS: tune imports & javadocs; stabilise tests

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15546:

Status: Open  (was: Patch Available)

> ABFS: tune imports & javadocs; stabilise tests
> --
>
> Key: HADOOP-15546
> URL: https://issues.apache.org/jira/browse/HADOOP-15546
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: HADOOP-15407
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15546-001.patch, 
> HADOOP-15546-HADOOP-15407-001.patch, HADOOP-15546-HADOOP-15407-002.patch, 
> HADOOP-15546-HADOOP-15407-003.patch, HADOOP-15546-HADOOP-15407-004.patch, 
> HADOOP-15546-HADOOP-15407-005.patch, HADOOP-15546-HADOOP-15407-006.patch, 
> HADOOP-15546-HADOOP-15407-006.patch, HADOOP-15546-HADOOP-15407-007.patch, 
> HADOOP-15546-HADOOP-15407-008.patch, azure-auth-keys.xml
>
>
> Followup on HADOOP-15540 with some initial review tuning
> h2. Tuning
> * ordering of imports
> * rely on azure-auth-keys.xml to store credentials (change imports, 
> docs,.gitignore)
> * log4j -> info
> * add a "." to the first sentence of all the javadocs I noticed.
> * remove @Public annotations except for some constants (which includes some 
> commitment to maintain them).
> * move the AbstractFS declarations out of the src/test/resources XML file 
> into core-default.xml for all to use
> * other IDE-suggested tweaks
> h2. Testing
> Review the tests, move to ContractTestUtil assertions, make more consistent 
> to contract test setup, and general work to make the tests work well over 
> slower links, document, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14707) AbstractContractDistCpTest to test attr preservation with -p, verify blobstores downgrade

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14707:

Issue Type: Sub-task  (was: Improvement)
Parent: HADOOP-15620

> AbstractContractDistCpTest to test attr preservation with -p, verify 
> blobstores downgrade
> -
>
> Key: HADOOP-14707
> URL: https://issues.apache.org/jira/browse/HADOOP-14707
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, fs/s3, test, tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14707-001.patch, HADOOP-14707-002.patch, 
> HADOOP-14707-003.patch
>
>
> It *may* be that trying to use {{distcp -p}} with S3a triggers a stack trace 
> {code}
> java.lang.UnsupportedOperationException: S3AFileSystem doesn't support 
> getXAttrs 
> at org.apache.hadoop.fs.FileSystem.getXAttrs(FileSystem.java:2559) 
> at 
> org.apache.hadoop.tools.util.DistCpUtils.toCopyListingFileStatus(DistCpUtils.java:322)
>  
> {code}
> Add a test to {{AbstractContractDistCpTest}} to verify that this is handled 
> better. What is "handle better" here? Either ignore the option or fail with 
> "don't do that" text



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14707) AbstractContractDistCpTest to test attr preservation with -p, verify blobstores downgrade

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14707:

Target Version/s:   (was: 3.2.0)

> AbstractContractDistCpTest to test attr preservation with -p, verify 
> blobstores downgrade
> -
>
> Key: HADOOP-14707
> URL: https://issues.apache.org/jira/browse/HADOOP-14707
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/azure, fs/s3, test, tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14707-001.patch, HADOOP-14707-002.patch, 
> HADOOP-14707-003.patch
>
>
> It *may* be that trying to use {{distcp -p}} with S3a triggers a stack trace 
> {code}
> java.lang.UnsupportedOperationException: S3AFileSystem doesn't support 
> getXAttrs 
> at org.apache.hadoop.fs.FileSystem.getXAttrs(FileSystem.java:2559) 
> at 
> org.apache.hadoop.tools.util.DistCpUtils.toCopyListingFileStatus(DistCpUtils.java:322)
>  
> {code}
> Add a test to {{AbstractContractDistCpTest}} to verify that this is handled 
> better. What is "handle better" here? Either ignore the option or fail with 
> "don't do that" text



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15281) Distcp to add no-rename copy option

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15281:

Target Version/s:   (was: 3.2.0)

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14762) S3A warning of obsolete encryption key which is never used

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14762:

Parent Issue: HADOOP-15620  (was: HADOOP-15220)

> S3A warning of obsolete encryption key which is never used
> --
>
> Key: HADOOP-14762
> URL: https://issues.apache.org/jira/browse/HADOOP-14762
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Priority: Minor
>
> During CLI init, I get warned of using an obsolete file. 
> {code}
> ./hadoop s3guard import  s3a://hwdev-steve-ireland-new
> 2017-08-11 18:08:43,935 DEBUG conf.Configuration: Reloading 3 existing 
> configurations
> 2017-08-11 18:08:45,146 INFO Configuration.deprecation: 
> fs.s3a.server-side-encryption-key is deprecated. Instead, use 
> fs.s3a.server-side-encryption.key
> 2017-08-11 18:08:45,702 INFO s3guard.S3GuardTool: Metadata store 
> DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new} is 
> initialized.
> ./hadoop s3guard import  s3a://hwdev-steve-ireland-new
> 2017-08-11 18:08:43,935 DEBUG conf.Configuration: Reloading 3 existing 
> configurations
> 2017-08-11 18:08:45,146 INFO Configuration.deprecation: 
> fs.s3a.server-side-encryption-key is deprecated. Instead, use 
> fs.s3a.server-side-encryption.key
> 2017-08-11 18:08:45,702 INFO s3guard.S3GuardTool: Metadata store 
> DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new} is 
> initialized.
> ./hadoop s3guard import  s3a://hwdev-steve-ireland-new
> 2017-08-11 18:08:43,935 DEBUG conf.Configuration: Reloading 3 existing 
> configurations
> 2017-08-11 18:08:45,146 INFO Configuration.deprecation: 
> fs.s3a.server-side-encryption-key is deprecated. Instead, use 
> fs.s3a.server-side-encryption.key
> 2017-08-11 18:08:45,702 INFO s3guard.S3GuardTool: Metadata store 
> DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new} is 
> initialized.
> {code}
> I don't have this setting set as far as I can see, not in an XML file nor any 
> jceks file. Maybe its falsely being picked up during the scan for jceks files?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15576) S3A Multipart Uploader to work with S3Guard and encryption

2018-08-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570980#comment-16570980
 ] 

Steve Loughran commented on HADOOP-15576:
-

I think this is good to go in, if you are happy with my changes
And there's no need for protobuf; we aren't worrying about wire compat over 
time, are we?

But there's still some ambiguity about things, something which surfaces 
precisely because the spec of HDFS-13713 doesn't exist, and we are left looking 
at the behaviours of the two existing implementations and guessing which are 
"the reference" behaviours vs "implementation artifacts"

* what policy for 0-entry commits MUST be (here: fail)
* what happens if your MPU complete call, there are uploaded parts which are 
not listed?
* what happens if >1 part is listed twice in completion
* what happens if you try to upload a part after the MPU has completed

If we had the time, I'd say "pull that specification task into this one and 
define things alongside the tests", that being how I like to do tests and 
reverse-engineer a spec from behaviours: build the spec, come up with ways to 
break the preconditions, see what happens when you try that in tests, fix 
code/refine spec.

But...we are approaching the cutoff for 3.2, and ideally I'd like this in ASAP, 
along with the other 3.2 features. Getting this in gives us time to finish & 
Review those.

So

I'm +1 for this patch as is. It is working for me against us-west-1 + s3guard, 
where it wasn't before. (I'm not reliably testing encryption BTW, as there's no 
test actually verifying the object has the encryption header.)

If you are happy with the patch-as-modified, it's good to go. But we do need 
that spec still, which I'd like before the actual apps using this stuff come 
together



> S3A  Multipart Uploader to work with S3Guard and encryption
> ---
>
> Key: HADOOP-15576
> URL: https://issues.apache.org/jira/browse/HADOOP-15576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2
>Reporter: Steve Loughran
>Assignee: Ewan Higgs
>Priority: Blocker
> Attachments: HADOOP-15576-005.patch, HADOOP-15576-007.patch, 
> HADOOP-15576-008.patch, HADOOP-15576.001.patch, HADOOP-15576.002.patch, 
> HADOOP-15576.003.patch, HADOOP-15576.004.patch
>
>
> The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with 
> the tests to demonstrate this
> # move from low-level calls of S3A client to calls of WriteOperationHelper; 
> adding any new methods are needed there.
> # Tests. the tests of HDFS-13713. 
> # test execution, with -DS3Guard, -DAuth
> There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and 
> even if there was, it might not show that S3Guard was bypassed, because 
> there's no checks that listFiles/listStatus shows the newly committed files.
> Similarly, because MPU requests are initiated in S3AMultipartUploader, 
> encryption settings are't picked up. Files being uploaded this way *are not 
> being encrypted*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15583) Stabilize S3A Assumed Role support

2018-08-06 Thread Larry McCay (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570977#comment-16570977
 ] 

Larry McCay commented on HADOOP-15583:
--

Hi [~ste...@apache.org] - I thought STS was Security Token Service - no?

I am still in the process of reviewing it all in detail but wanted to ask that 
up front.

> Stabilize S3A Assumed Role support
> --
>
> Key: HADOOP-15583
> URL: https://issues.apache.org/jira/browse/HADOOP-15583
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15583-001.patch, HADOOP-15583-002.patch, 
> HADOOP-15583-003.patch, HADOOP-15583-004.patch, HADOOP-15583-005.patch
>
>
> started off just on sharing credentials across S3A and S3Guard, but in the 
> process it has grown to becoming one of stabilising the assumed role support 
> so it can be used for more than just testing.
> Was: "S3Guard to get AWS Credential chain from S3AFS; credentials closed() on 
> shutdown"
> h3. Issue: lack of auth chain sharing causes ddb and s3 to get out of sync
> S3Guard builds its DDB auth chain itself, which stops it having to worry 
> about being created standalone vs part of an S3AFS, but it means its 
> authenticators are in a separate chain.
> When you are using short-lived assumed roles or other session credentials 
> updated in the S3A FS authentication chain, you need that same set of 
> credentials picked up by DDB. Otherwise, at best you are doubling load, at 
> worse: the DDB connector may not get refreshed credentials.
> Proposed: {{DynamoDBClientFactory.createDynamoDBClient()}} to take an 
> optional ref to aws credentials. If set: don't create a new set. 
> There's one little complication here: our {{AWSCredentialProviderList}} list 
> is autocloseable; it's close() will go through all children and close them. 
> Apparently the AWS S3 client (And hopefully the DDB client) will close this 
> when they are closed themselves. If DDB  has the same set of credentials as 
> the FS, then there could be trouble if they are closed in one place when the 
> other still wants to use them.
> Solution; have a use count the uses of the credentials list, starting at one: 
> every close() call decrements, and when this hits zero the cleanup is kicked 
> off
> h3. Issue: {{AssumedRoleCredentialProvider}} connector to STS not picking up 
> the s3a connection settings, including proxy.
> h3. issue: we're not using getPassword() to get user/password for proxy 
> binding for STS. Fix: use that and pass down the bucket ref for per-bucket 
> secrets in a JCEKS file.
> h3. Issue; hard to debug what's going wrong :)
> h3. Issue: docs about KMS permissions for SSE-KMS are wrong, and the 
> ITestAssumedRole* tests don't request KMS permissions, so fail in a bucket 
> when the base s3 FS is using SSE-KMS. KMS permissions need to be included in 
> generated profiles



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14661) S3A to support Requester Pays Buckets

2018-08-06 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570975#comment-16570975
 ] 

genericqa commented on HADOOP-14661:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HADOOP-14661 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-14661 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877218/HADOOP-14661.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14995/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> S3A to support Requester Pays Buckets
> -
>
> Key: HADOOP-14661
> URL: https://issues.apache.org/jira/browse/HADOOP-14661
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common, util
>Affects Versions: 3.0.0-alpha3
>Reporter: Mandus Momberg
>Assignee: Mandus Momberg
>Priority: Minor
> Attachments: HADOOP-14661.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Amazon S3 has the ability to charge the requester for the cost of accessing 
> S3. This is called Requester Pays Buckets. 
> In order to access these buckets, each request needs to be signed with a 
> specific header. 
> http://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPaysBuckets.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14776) clean up ITestS3AFileSystemContract

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14776:

Parent: HADOOP-15620  (was: HADOOP-15220)

> clean up ITestS3AFileSystemContract
> ---
>
> Key: HADOOP-14776
> URL: https://issues.apache.org/jira/browse/HADOOP-14776
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-14776.01.patch
>
>
> With the move of {{FileSystemContractTest}} test to JUnit4, the bits of 
> {{ITestS3AFileSystemContract}} which override existing methods just to skip 
> them can be cleaned up: The subclasses could throw assume() so their skippage 
> gets noted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13636) s3a rm on the CLI generates deprecation warning on io.bytes.per.checksum

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13636:

Parent: HADOOP-15620  (was: HADOOP-15220)

> s3a rm on the CLI generates deprecation warning on io.bytes.per.checksum
> 
>
> Key: HADOOP-13636
> URL: https://issues.apache.org/jira/browse/HADOOP-13636
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> an rm -r triggers a deprecation warning  of {{io.bytes.per.checksum}}. This 
> is a property that is not explicitly used anywhere in S3: something else is 
> asking for it —something that should switch to the new value
> {code}
> $ ./hadoop fs -rm -r s3a://XYZ/lib
> 16/09/21 16:24:04 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/09/21 16:24:05 INFO Configuration.deprecation: io.bytes.per.checksum is 
> deprecated. Instead, use dfs.bytes-per-checksum
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14661) S3A to support Requester Pays Buckets

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14661:

Parent: HADOOP-15620  (was: HADOOP-15220)

> S3A to support Requester Pays Buckets
> -
>
> Key: HADOOP-14661
> URL: https://issues.apache.org/jira/browse/HADOOP-14661
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common, util
>Affects Versions: 3.0.0-alpha3
>Reporter: Mandus Momberg
>Assignee: Mandus Momberg
>Priority: Minor
> Attachments: HADOOP-14661.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Amazon S3 has the ability to charge the requester for the cost of accessing 
> S3. This is called Requester Pays Buckets. 
> In order to access these buckets, each request needs to be signed with a 
> specific header. 
> http://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPaysBuckets.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15069) support git-secrets commit hook to keep AWS secrets out of git

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15069:

Parent: HADOOP-15620  (was: HADOOP-15220)

> support git-secrets commit hook to keep AWS secrets out of git
> --
>
> Key: HADOOP-15069
> URL: https://issues.apache.org/jira/browse/HADOOP-15069
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15069-001.patch, HADOOP-15069-002.patch
>
>
> The latest Uber breach looks like it involved AWS keys in git repos.
> Nobody wants that, which is why amazon provide 
> [git-secrets|https://github.com/awslabs/git-secrets]; a script you can use to 
> scan a repo and its history, *and* add as an automated check.
> Anyone can set this up, but there are a few false positives in the scan, 
> mostly from longs and a few all-upper-case constants. These can all be added 
> to a .gitignore file.
> Also: mention git-secrets in the aws testing docs; say "use it"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15091) S3aUtils.getEncryptionAlgorithm() always logs@Debug "Using SSE-C"

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15091:

Parent: HADOOP-15620  (was: HADOOP-15220)

> S3aUtils.getEncryptionAlgorithm() always logs@Debug "Using SSE-C"
> -
>
> Key: HADOOP-15091
> URL: https://issues.apache.org/jira/browse/HADOOP-15091
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>Priority: Trivial
>
> even when you have encryption off or set to sse-kms/aes256, the debug logs 
> print a comment about using SSE-C
> {code}
> 2017-12-05 12:44:33,292 [main] DEBUG s3a.S3AUtils 
> (S3AUtils.java:getEncryptionAlgorithm(1097)) - Using SSE-C with null key
> {code}
> That log statement should be moved to only get printed with SSE-C enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15243) test and document use of fs.s3a.signing-algorithm

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15243:

Parent: HADOOP-15620  (was: HADOOP-15220)

> test and document use of fs.s3a.signing-algorithm
> -
>
> Key: HADOOP-15243
> URL: https://issues.apache.org/jira/browse/HADOOP-15243
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs/s3, test
>Reporter: Steve Loughran
>Priority: Minor
>
> we suggest changing of {{fs.s3a.signing-algorithm}} as a way to deal with 
> some auth problems, but there's no standalone section of what is available, 
> or any tests trying to switch to different methods. 
> This is hampered by the fact that none of us have owned up to understanding 
> all the choices available and the AWS docs not covering it...all we have is 
> the source which sets up the options in >1 place
> Proposed:
> * add some tests of the actual valid options.
> * document the values in the s3a docs
> * and in core-default.xml



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15460) S3A FS to add "s3a:no-existence-checks" to the builder file creation option set

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15460:

Parent: HADOOP-15620  (was: HADOOP-15220)

> S3A FS to add  "s3a:no-existence-checks" to the builder file creation option 
> set
> 
>
> Key: HADOOP-15460
> URL: https://issues.apache.org/jira/browse/HADOOP-15460
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
>
> As promised to [~StephanEwen]: add and s3a-specific option to the builder-API 
> to create files for all existence checks to be skipped.
> This
> # eliminates a few hundred milliseconds
> # avoids any caching of negative HEAD/GET responses in the S3 load balancers.
> Callers will be expected to know what what they are doing.
> FWIW, we are doing some PUT calls in the committer which bypass this stuff, 
> for the same reason. If you've just created a directory, you know there's 
> nothing underneath, so no need to check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15303) make s3a read fault injection configurable including "off"

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15303:

Parent: HADOOP-15620  (was: HADOOP-15220)

> make s3a read fault injection configurable including "off"
> --
>
> Key: HADOOP-15303
> URL: https://issues.apache.org/jira/browse/HADOOP-15303
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Minor
>
> When trying to test distcp with large files and inconsistent destination (P 
> fail = 0.4),  read() failures on the D/L can overload the retry logic in 
> S3AInput, even though all I want to see is how listings cope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15575) Add a way for an FS instance to say "really, no trash interval at all"

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15575:

Parent: HADOOP-15620  (was: HADOOP-15220)

> Add a way for an FS instance to say "really, no trash interval at all"
> --
>
> Key: HADOOP-15575
> URL: https://issues.apache.org/jira/browse/HADOOP-15575
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
>
> Object stores like S3 often offer [object 
> versioning|https://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html]
>  as a way to recover from accidental deletions. There is no need to use the 
> Fs Shell trash mechanism.
> Proposed: a way for an FS instance to tell the shell to not use trash, even 
> if the client has configured it. The current 
> {{getServerDefaults().getTrashInterval()}} returns a value from the server, 
> but it *only* overrides the ""fs.trash.interval" if it !=0. So if a server 
> says "don't use trash" it gets ignored.
> If a special value (-1) is returned (or we use a new field?), FS instances 
> can return to the shell saying "no need for trash". Then you can turn it on 
> for a bucket by bucket basis; any store with the feature enabled can do the 
> same thing.
> Alternative option: A special S3A-aware trash policy which does a 
> getFileSystem.getConf.getBoolean("fs.s3a.trash.skip") & skips trash if set. 
> This will all you to disable it on a bucket-by-bucket basis. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15370) S3A log message on rm s3a://bucket/ not intuitive

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15370:

Parent: HADOOP-15620  (was: HADOOP-15220)

> S3A log message on rm s3a://bucket/ not intuitive
> -
>
> Key: HADOOP-15370
> URL: https://issues.apache.org/jira/browse/HADOOP-15370
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Trivial
>
> when you try to delete the root of a bucket from command line, e.g. {{hadoop 
> fs -rm -r -skipTrash s3a://hwdev-steve-new/}}, the output isn't that useful
> {code}
> 2018-04-06 16:35:23,048 [main] INFO  s3a.S3AFileSystem 
> (S3AFileSystem.java:rejectRootDirectoryDelete(1837)) - s3a delete the 
> hwdev-steve-new root directory of true
> rm: `s3a://hwdev-steve-new/': Input/output error
> 2018-04-06 16:35:23,050 [pool-2-thread-1] DEBUG s3a.S3AFileSystem
> {code}
> the single log message doesn't parse, and the error message raised is lost by 
> the FS -rm CLI command (why?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14937) initial part uploads seem to block unnecessarily in S3ABlockOutputStream

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14937:

Parent: HADOOP-15620  (was: HADOOP-15220)

> initial part uploads seem to block unnecessarily in S3ABlockOutputStream
> 
>
> Key: HADOOP-14937
> URL: https://issues.apache.org/jira/browse/HADOOP-14937
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: yjp_threads.png
>
>
> From looking at a YourKit snapshot of an FsShell process running a {{hadoop 
> fs -put file:///... s3a://...}}, it seems that the first part in the 
> multipart upload doesn't begin to upload until n of the 
> {{s3a-transfer-shared-pool}} threads are able to start uploading, where n is 
> the value of {{fs.s3a.fast.upload.active.blocks}}.
> To hopefully clarify a bit, the series of events that I expected to see with 
> {{fs.s3a.fast.upload.active.blocks}} set to 4 is:
> 1.  An amount of data equal to {{fs.s3a.multipart.size}} is buffered into 
> off-heap memory (I have {{fs.s3a.fast.upload.buffer = bytebuffer}}).
> 2. As soon as that happens, a thread begins to upload that part. Meanwhile, 
> the main thread continues to buffer data into off-heap memory.
> 3. Once another part has been buffered into off-heap memory, a separate 
> thread uploads that part, and so on.
> Whereas what I think the YK snapshot shows happening is:
> 1. An amount of data equal to {{fs.s3a.multipart.size}} * 4 is buffered into 
> off-heap memory.
> 2. Four threads start to upload one part each at the same time.
> I've attached a picture of the "Threads" tab to show what I mean. Basically 
> the times at which the first four {{s3a-transfer-shared-pool}} threads start 
> to upload are roughly the same, whereas I would've expected them to be more 
> staggered.
> I'm actually not sure whether this is the expected behavior or not, so feel 
> free to close if this doesn't come as a surprise to anyone.
> For some context, I've been trying to get a sense for roughly which values of 
> {{fs.s3a.multipart.size}} perform the best at different file sizes. One thing 
> that I found confusing is that a part size of 5 MB seems to outperform a part 
> size of 64 MB up until files that are upwards of about 500 MB in size. This 
> seems odd, since each {{uploadPart}} call is its own HTTP request, and I 
> would've expected the overhead of those to become costly at small part sizes. 
> My suspicion is that with 4 concurrent part uploads and 64 MB blocks, we have 
> to wait until 256 MB are buffered before we can start uploading, while with 5 
> MB blocks we can start uploading as soon as we buffer 20 MB, and that's what 
> gives the smaller parts the advantage for smaller files.
> I'm happy to submit a patch if this is in fact a problem, but wanted to check 
> to make sure I'm not just misunderstanding something.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15555) S3AFileStatus to add a serialVersionUID; review & test serialization

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-1:

Parent: HADOOP-15620  (was: HADOOP-15220)

> S3AFileStatus to add a serialVersionUID; review & test serialization
> 
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
>
> As FileStatus is now serializable, review S3AFilestatus
> * add a version field
> * review deserialization to see if we need any checks to stop 
> invalid/malicious status instances being created



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15503) strip s3.amazonaws.com off hostnames before making s3a calls

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15503:

Parent: HADOOP-15620  (was: HADOOP-15220)

> strip s3.amazonaws.com off hostnames before making s3a calls
> 
>
> Key: HADOOP-15503
> URL: https://issues.apache.org/jira/browse/HADOOP-15503
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Minor
>
> If you copy an http URL https://bucketname.s3.amazonaws.com/ and convert to 
> an s3a one by replacing the schema, you can try to list
> {code}
> bin/hadoop fs -ls -r s3a://hwdev-steve-new.s3.amazonaws.com/
> {code}
> But do that and you get told there's no such bucket
> {code}
> ls: Bucket hwdev-steve-new.s3.amazonaws.com does not exist
> {code}
> This is non-intuitive, and catches me out.
> We could strip this automatically during initialization to produce the actual 
> bucket, which would need to be done before any per-bucket init is done.
> I do worry about what could break though



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15654) wasb client "java.lang.IllegalArgumentException: Self-suppression not permitted" in close()

2018-08-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570962#comment-16570962
 ] 

Steve Loughran commented on HADOOP-15654:
-

Full stack
{code}
[ERROR] Tests run: 8, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 116.298 
s <<< FAILURE! - in 
org.apache.hadoop.fs.azure.metrics.ITestAzureFileSystemInstrumentation
[ERROR] 
testMetricsOnBigFileCreateRead(org.apache.hadoop.fs.azure.metrics.ITestAzureFileSystemInstrumentation)
  Time elapsed: 71.585 s  <<< ERROR!
java.lang.IllegalArgumentException: Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at 
org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.close(NativeAzureFileSystem.java:1058)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
at 
org.apache.hadoop.fs.azure.metrics.ITestAzureFileSystemInstrumentation.testMetricsOnBigFileCreateRead(ITestAzureFileSystemInstrumentation.java:260)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.io.IOException: Operation could not be completed within the 
specified time. Please see the cause for further information.
at 
com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:778)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:462)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:47)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:406)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:403)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.microsoft.azure.storage.StorageException: Operation could not be 
completed within the specified time.
at 
com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87)
at 
com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:315)
at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:185)
at 
com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:1097)
at 
com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:1069)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:456)
... 9 more
{code}

> wasb client "java.lang.IllegalArgumentException: Self-suppression not 
> permitted" in close()
> ---
>
> Key: HADOOP-15654
> URL: https://issues.apache.org/jira/browse/HADOOP-15654
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Priority: Minor
>
> On a test on a network connection which is clearly overloaded (parallel D/L 
> of hadoop 3.1 JARS, 8 test threads active), got an exception
> {code}
> [ERROR] 
> testMetricsOnBigFileCreateRead(org.apache.hadoop.fs.azure.metrics.ITestAzureFileSystemInstrumentation)
> 

[jira] [Created] (HADOOP-15654) wasb client "java.lang.IllegalArgumentException: Self-suppression not permitted" in close()

2018-08-06 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15654:
---

 Summary: wasb client "java.lang.IllegalArgumentException: 
Self-suppression not permitted" in close()
 Key: HADOOP-15654
 URL: https://issues.apache.org/jira/browse/HADOOP-15654
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.2.0
Reporter: Steve Loughran


On a test on a network connection which is clearly overloaded (parallel D/L of 
hadoop 3.1 JARS, 8 test threads active), got an exception
{code}
[ERROR] 
testMetricsOnBigFileCreateRead(org.apache.hadoop.fs.azure.metrics.ITestAzureFileSystemInstrumentation)
  Time elapsed: 71.585 s  <<< ERROR! java.lang.IllegalArgumentException: 
Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at 
org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.close(NativeAzureFileSystem.java:1058)
{code}
This seems to imply that an exception was called with initcause==self. I don't 
see how the codepath would permit that though



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15642) Update to latest/recent version of aws-sdk for Hadoop 3.2

2018-08-06 Thread Larry McCay (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570955#comment-16570955
 ] 

Larry McCay commented on HADOOP-15642:
--

LGTM - +1

I like the idea of documenting the upgrade process itself - especially since it 
requires testing across multiple "components"!

 

> Update to latest/recent version of aws-sdk for Hadoop 3.2
> -
>
> Key: HADOOP-15642
> URL: https://issues.apache.org/jira/browse/HADOOP-15642
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15642-001.patch, HADOOP-15642-003.patch, Screen 
> Shot 2018-07-30 at 14.11.22.png
>
>
> Move to a later version of the AWS SDK library for a different set of 
> features and issues.
> proposed version: 1.11.375
> One thing which doesn't work on the one we ship with is the ability to create 
> assumed role sessions >1h, as there's a check in the client lib for 
> role-duration <= 3600 seconds. I'll assume more recent SDKs delegate duration 
> checks to the far end.
> see: 
> [https://aws.amazon.com/about-aws/whats-new/2018/03/longer-role-sessions/]
> * assuming later versions will extend assumed role life, docs will need 
> changing, 
> * Adding a test in HADOOP-15583 which expects an error message if you ask for 
> a duration of 3h; this should act as the test to see what happens.
> * think this time would be good to explicitly write down the SDK update 
> process



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15642) Update to latest/recent version of aws-sdk for Hadoop 3.2

2018-08-06 Thread Larry McCay (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570955#comment-16570955
 ] 

Larry McCay edited comment on HADOOP-15642 at 8/7/18 12:17 AM:
---

[~ste...@apache.org] - LGTM - +1

I like the idea of documenting the upgrade process itself - especially since it 
requires testing across multiple "components"!

 


was (Author: lmccay):
LGTM - +1

I like the idea of documenting the upgrade process itself - especially since it 
requires testing across multiple "components"!

 

> Update to latest/recent version of aws-sdk for Hadoop 3.2
> -
>
> Key: HADOOP-15642
> URL: https://issues.apache.org/jira/browse/HADOOP-15642
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15642-001.patch, HADOOP-15642-003.patch, Screen 
> Shot 2018-07-30 at 14.11.22.png
>
>
> Move to a later version of the AWS SDK library for a different set of 
> features and issues.
> proposed version: 1.11.375
> One thing which doesn't work on the one we ship with is the ability to create 
> assumed role sessions >1h, as there's a check in the client lib for 
> role-duration <= 3600 seconds. I'll assume more recent SDKs delegate duration 
> checks to the far end.
> see: 
> [https://aws.amazon.com/about-aws/whats-new/2018/03/longer-role-sessions/]
> * assuming later versions will extend assumed role life, docs will need 
> changing, 
> * Adding a test in HADOOP-15583 which expects an error message if you ask for 
> a duration of 3h; this should act as the test to see what happens.
> * think this time would be good to explicitly write down the SDK update 
> process



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15583) Stabilize S3A Assumed Role support

2018-08-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570914#comment-16570914
 ] 

Steve Loughran commented on HADOOP-15583:
-

The javac warning is about deprecation, which I'm not going to fix. 
This patch is ready for review/commit

> Stabilize S3A Assumed Role support
> --
>
> Key: HADOOP-15583
> URL: https://issues.apache.org/jira/browse/HADOOP-15583
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15583-001.patch, HADOOP-15583-002.patch, 
> HADOOP-15583-003.patch, HADOOP-15583-004.patch, HADOOP-15583-005.patch
>
>
> started off just on sharing credentials across S3A and S3Guard, but in the 
> process it has grown to becoming one of stabilising the assumed role support 
> so it can be used for more than just testing.
> Was: "S3Guard to get AWS Credential chain from S3AFS; credentials closed() on 
> shutdown"
> h3. Issue: lack of auth chain sharing causes ddb and s3 to get out of sync
> S3Guard builds its DDB auth chain itself, which stops it having to worry 
> about being created standalone vs part of an S3AFS, but it means its 
> authenticators are in a separate chain.
> When you are using short-lived assumed roles or other session credentials 
> updated in the S3A FS authentication chain, you need that same set of 
> credentials picked up by DDB. Otherwise, at best you are doubling load, at 
> worse: the DDB connector may not get refreshed credentials.
> Proposed: {{DynamoDBClientFactory.createDynamoDBClient()}} to take an 
> optional ref to aws credentials. If set: don't create a new set. 
> There's one little complication here: our {{AWSCredentialProviderList}} list 
> is autocloseable; it's close() will go through all children and close them. 
> Apparently the AWS S3 client (And hopefully the DDB client) will close this 
> when they are closed themselves. If DDB  has the same set of credentials as 
> the FS, then there could be trouble if they are closed in one place when the 
> other still wants to use them.
> Solution; have a use count the uses of the credentials list, starting at one: 
> every close() call decrements, and when this hits zero the cleanup is kicked 
> off
> h3. Issue: {{AssumedRoleCredentialProvider}} connector to STS not picking up 
> the s3a connection settings, including proxy.
> h3. issue: we're not using getPassword() to get user/password for proxy 
> binding for STS. Fix: use that and pass down the bucket ref for per-bucket 
> secrets in a JCEKS file.
> h3. Issue; hard to debug what's going wrong :)
> h3. Issue: docs about KMS permissions for SSE-KMS are wrong, and the 
> ITestAssumedRole* tests don't request KMS permissions, so fail in a bucket 
> when the base s3 FS is using SSE-KMS. KMS permissions need to be included in 
> generated profiles



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15628) S3A Filesystem does not check return from AmazonS3Client deleteObjects

2018-08-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570910#comment-16570910
 ] 

Steve Loughran commented on HADOOP-15628:
-

Steve, thanks for the update. 

* If your store is correct, you don't need s3guard, and you won't need to worry 
about any inconsistency from partial delete failures.
* this is potentially a mismatch in the store behaviour vs the AWS S3 
implementation.

bq. the response codes coming back were correct, and the xml was as well, it 
just wasn't being parsed by hdfs. 

we rely on the AWS SDK to parse this stuff, so if it's not doing it right, 
there's problems. Again, the latest SDKs should be best here. FWIW, our role 
assert stuff does seem to pick up partial deletes, though it may be the 
specific details of the failure which we aren't replicating. 

If you've evidence that this isn't  happening, even with the latests SDK 
offered by HADOOP-15642, then that's possibly the best place to fix it.

# raise it with the AWS SDK issue tracker on github; include a ref to this JIRA 
(and add a link from this JIRA to the open issue after)
# include as much diagnostics info there; the s3a docs show how to print out 
the HTTP-level logs, which it sounds like you already have. 

In the S3A code, we're going to need something which checks that return code 
and at least escalates to an IOE. For now I'd suggest a 
{{org.apache.hadoop.fs.PathPermissionException}}, Because this is all happening 
with your store, then you'll have to be the one showing it works there...you'll 
also need to get set up to build/test hadoop branch-3 against that store (which 
is really useful for us for better coverage).

+[~Thomas Demoor] [~ehiggs]

> S3A Filesystem does not check return from AmazonS3Client deleteObjects
> --
>
> Key: HADOOP-15628
> URL: https://issues.apache.org/jira/browse/HADOOP-15628
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.9.1, 2.8.4, 3.1.1, 3.0.3
> Environment: Hadoop 3.0.2 / Hadoop 2.8.3
> Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
>Reporter: Steve Jacobs
>Assignee: Steve Loughran
>Priority: Minor
>
> Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api 
> do not check to see if all objects have been succesfully delete. In the event 
> of a failure, the api will still return a 200 OK (which isn't checked 
> currently):
> [Delete Code from Hadoop 
> 2.8|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
>  
> {code:java}
> if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
> DeleteObjectsRequest deleteRequest =
> new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
> s3.deleteObjects(deleteRequest);
> statistics.incrementWriteOps(1);
> keysToDelete.clear();
> }
> {code}
> This should be converted to use the DeleteObjectsResult class from the 
> S3Client: 
> [Amazon Code 
> Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
> {code:java}
> // Verify that the objects were deleted successfully.
> DeleteObjectsResult delObjRes = 
> s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
> delObjRes.getDeletedObjects().size();
> System.out.println(successfulDeletes + " objects successfully deleted.");
> {code}
> Bucket policies can be misconfigured, and deletes will fail without warning 
> by S3A clients.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15628) S3A Filesystem does not check return from AmazonS3Client deleteObjects

2018-08-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15628:

Environment: 
Hadoop 3.0.2 / Hadoop 2.8.3

Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0

Non-AWS S3 implementation

  was:
Hadoop 3.0.2 / Hadoop 2.8.3

Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0


> S3A Filesystem does not check return from AmazonS3Client deleteObjects
> --
>
> Key: HADOOP-15628
> URL: https://issues.apache.org/jira/browse/HADOOP-15628
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.9.1, 2.8.4, 3.1.1, 3.0.3
> Environment: Hadoop 3.0.2 / Hadoop 2.8.3
> Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
> Non-AWS S3 implementation
>Reporter: Steve Jacobs
>Assignee: Steve Loughran
>Priority: Minor
>
> Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api 
> do not check to see if all objects have been succesfully delete. In the event 
> of a failure, the api will still return a 200 OK (which isn't checked 
> currently):
> [Delete Code from Hadoop 
> 2.8|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
>  
> {code:java}
> if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
> DeleteObjectsRequest deleteRequest =
> new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
> s3.deleteObjects(deleteRequest);
> statistics.incrementWriteOps(1);
> keysToDelete.clear();
> }
> {code}
> This should be converted to use the DeleteObjectsResult class from the 
> S3Client: 
> [Amazon Code 
> Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
> {code:java}
> // Verify that the objects were deleted successfully.
> DeleteObjectsResult delObjRes = 
> s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
> delObjRes.getDeletedObjects().size();
> System.out.println(successfulDeletes + " objects successfully deleted.");
> {code}
> Bucket policies can be misconfigured, and deletes will fail without warning 
> by S3A clients.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15651) NPE in S3AInputStream.read() in ITestS3AInconsistency.testOpenFailOnRead

2018-08-06 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570738#comment-16570738
 ] 

Aaron Fabbri commented on HADOOP-15651:
---

Yeah, this one is not obvious.  Let's see:
 * Stack trace (assuming up to date) implies wrappedStream is null.
 * wrappedStream set to null in {{closeStream()}}
 * {{closeStream()}} called by close(), resetConnection(), reopen(), 
seekInStream()
 * reopen() should not leave wrappedStream null
 * seekInStream() is called via lazySeek(), but lazySeek() will call reopen() 
if wrappedStream ends up null after calling seekInStream().
 * close() and resetConnection() should not be getting called here.

Wild guess, something else is calling close() on the FS?  Just because (1) I 
don't see any normal codepaths getting us here and (2) the history of trouble 
with tests trampling on each others' fs instances via close()..

> NPE in S3AInputStream.read() in ITestS3AInconsistency.testOpenFailOnRead
> 
>
> Key: HADOOP-15651
> URL: https://issues.apache.org/jira/browse/HADOOP-15651
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: org.apache.hadoop.fs.s3a.ITestS3AInconsistency-output.txt
>
>
> Test {{ITestS3AInconsistency.testOpenFailOnRead()}} raise an NPE in read(); 
> could only happen on that line if {{wrappedStream==null}}, which implies that 
> previous attempts to re-open the closed stream failed and that this didn't 
> trigger anything.
> Not seen this before, but given that the fault injection is random, it may be 
> that so far test runs have been *unlucky* and missed this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15546) ABFS: tune imports & javadocs; stabilise tests

2018-08-06 Thread Thomas Marquardt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570685#comment-16570685
 ] 

Thomas Marquardt commented on HADOOP-15546:
---

I'm back from paternity leave.  Sorry for the long pause.

The preview of ABFS shared many of the same config property names as WASB, and 
we now have a lot of customers using them.  I agree that we need to improve the 
configuration, make it easier to use, and easier to troubleshoot (better error 
handling).  However, we cannot remove or rename the current config properties 
without breaking our customers. I spoke with Steve about this on a phone call 
this morning, and would like you to make a list of the changes you'd like to 
see regarding configuration and we will work on them in a separate JIRA.  For 
now, I'd like to revert back to using two test configuration files, one for 
WASB and one for ABFS.

I promise we will improve the situation, but I want to do it right, and need 
sometime to think thru it.  I don't want to block progress in the meantime.  So 
our primary goal is to stabilize ABFS (get all the tests passing, bugs fixed, 
etc), and then we can fix the config issue.  ABFS was originally designed to 
replace WASB, which is why it uses the same config.  The idea was that users 
would simply replace the class mapping for the WASB scheme, and they'd suddenly 
be using the new ABFS code.  We have since realized that ABFS needs to be 
independent of WASB, allow WASB to be an option, even though it will be 
deprecated and all future development work will target ABFS.  So we will think 
about this and come up with a way to onboard customers to a better config 
solution for ABFS, but for now please do not remove/rename or change the 
behavior of existing config keys, as this will break customers.

> ABFS: tune imports & javadocs; stabilise tests
> --
>
> Key: HADOOP-15546
> URL: https://issues.apache.org/jira/browse/HADOOP-15546
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: HADOOP-15407
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15546-001.patch, 
> HADOOP-15546-HADOOP-15407-001.patch, HADOOP-15546-HADOOP-15407-002.patch, 
> HADOOP-15546-HADOOP-15407-003.patch, HADOOP-15546-HADOOP-15407-004.patch, 
> HADOOP-15546-HADOOP-15407-005.patch, HADOOP-15546-HADOOP-15407-006.patch, 
> HADOOP-15546-HADOOP-15407-006.patch, HADOOP-15546-HADOOP-15407-007.patch, 
> HADOOP-15546-HADOOP-15407-008.patch, azure-auth-keys.xml
>
>
> Followup on HADOOP-15540 with some initial review tuning
> h2. Tuning
> * ordering of imports
> * rely on azure-auth-keys.xml to store credentials (change imports, 
> docs,.gitignore)
> * log4j -> info
> * add a "." to the first sentence of all the javadocs I noticed.
> * remove @Public annotations except for some constants (which includes some 
> commitment to maintain them).
> * move the AbstractFS declarations out of the src/test/resources XML file 
> into core-default.xml for all to use
> * other IDE-suggested tweaks
> h2. Testing
> Review the tests, move to ContractTestUtil assertions, make more consistent 
> to contract test setup, and general work to make the tests work well over 
> slower links, document, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15546) ABFS: tune imports & javadocs; stabilise tests

2018-08-06 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570664#comment-16570664
 ] 

Sean Mackrory commented on HADOOP-15546:


So with this small change I was able to get WASB and ABFS tests working. We 
could leave the other properties as fs.azure.test because they should usually 
be the same between WASB and ABFS. This is the only case I see where the values 
MUST be different for the 2, and we don't have any way to specify 2 different 
values at once unless we rename that property, so azure -> azure-bfs to resolve 
the conflict. I don't think it's very clean, but the alternatives all seem 
messier.

{code}
diff --git 
a/hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/constants/TestConfigurationKeys.java
 
b/hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/constants/TestConfigurationKeys.java
index a2251ee..704b0b2 100644
--- 
a/hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/constants/TestConfigurationKeys.java
+++ 
b/hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/constants/TestConfigurationKeys.java
@@ -26,7 +26,7 @@
  */
 @InterfaceStability.Evolving
 public final class TestConfigurationKeys {
-  public static final String FS_AZURE_TEST_ACCOUNT_NAME = 
"fs.azure.test.account.name";
+  public static final String FS_AZURE_TEST_ACCOUNT_NAME = 
"fs.azure-bfs.test.account.name";
   public static final String FS_AZURE_TEST_ACCOUNT_KEY_PREFIX = 
"fs.azure.test.account.key.";
   public static final String FS_AZURE_TEST_HOST_NAME = 
"fs.azure.test.host.name";
   public static final String FS_AZURE_TEST_HOST_PORT = 
"fs.azure.test.host.port";
{code}

This leaves me with a much cleaner list of test failures. Any thoughts on 
including this in the current patch or resolving this conflict some other way?

* 3/5 tests fail in ITestWasbAbfsCompatibility and they look legit: 1 file 
found when expecting 2, and 404's on some file paths.
* ITestAzureBlobFileSystemBackCompat also fails with a 404.
* ITestAzureBlobFileSystemRenameUnicode.testRenameFileUsingUnicode fails (gets 
an empty listing)
* ITestAzureBlobFileSystemFileStatus.testEnsureStatusWorksForRoot and 
ITestAzureBlobFileSystemListStatus.testListFiles both fail because of otherwise 
identical instances of VersionedFileStatus and LocatedFileStatus not being 
equal (only the type differs, and the version is null).
* 16/20 tests failed in ITestFileSystemOperationsWithThreads, with "No ... 
operation found in logs"

I suspect the first 3 items are probably something server-side, so I'm going to 
start looking into the bottom 2.

> ABFS: tune imports & javadocs; stabilise tests
> --
>
> Key: HADOOP-15546
> URL: https://issues.apache.org/jira/browse/HADOOP-15546
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: HADOOP-15407
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15546-001.patch, 
> HADOOP-15546-HADOOP-15407-001.patch, HADOOP-15546-HADOOP-15407-002.patch, 
> HADOOP-15546-HADOOP-15407-003.patch, HADOOP-15546-HADOOP-15407-004.patch, 
> HADOOP-15546-HADOOP-15407-005.patch, HADOOP-15546-HADOOP-15407-006.patch, 
> HADOOP-15546-HADOOP-15407-006.patch, HADOOP-15546-HADOOP-15407-007.patch, 
> HADOOP-15546-HADOOP-15407-008.patch, azure-auth-keys.xml
>
>
> Followup on HADOOP-15540 with some initial review tuning
> h2. Tuning
> * ordering of imports
> * rely on azure-auth-keys.xml to store credentials (change imports, 
> docs,.gitignore)
> * log4j -> info
> * add a "." to the first sentence of all the javadocs I noticed.
> * remove @Public annotations except for some constants (which includes some 
> commitment to maintain them).
> * move the AbstractFS declarations out of the src/test/resources XML file 
> into core-default.xml for all to use
> * other IDE-suggested tweaks
> h2. Testing
> Review the tests, move to ContractTestUtil assertions, make more consistent 
> to contract test setup, and general work to make the tests work well over 
> slower links, document, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15576) S3A Multipart Uploader to work with S3Guard and encryption

2018-08-06 Thread Ewan Higgs (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570240#comment-16570240
 ] 

Ewan Higgs edited comment on HADOOP-15576 at 8/6/18 2:05 PM:
-

{quote}legit regression; the rejection of 0-part MPUs is breaking a mock test. 
{quote}
This is checked in {{testCompleteEmptyUpload}} that you've added.

Is there anything outstanding here? Just the question of whether we want to go 
full protobuf to avoid Java Serialization?


was (Author: ehiggs):
{quote}legit regression; the rejection of 0-part MPUs is breaking a mock test. 
{quote}
This is checked in {{testCompleteEmptyUpload}} that you've added. AIUI, S3 
requires an MPU to have at least 1 part uploaded even if that part is 0 bytes.

Is there anything outstanding here? Just the question of whether we want to go 
full protobuf to avoid Java Serialization?

> S3A  Multipart Uploader to work with S3Guard and encryption
> ---
>
> Key: HADOOP-15576
> URL: https://issues.apache.org/jira/browse/HADOOP-15576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2
>Reporter: Steve Loughran
>Assignee: Ewan Higgs
>Priority: Blocker
> Attachments: HADOOP-15576-005.patch, HADOOP-15576-007.patch, 
> HADOOP-15576-008.patch, HADOOP-15576.001.patch, HADOOP-15576.002.patch, 
> HADOOP-15576.003.patch, HADOOP-15576.004.patch
>
>
> The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with 
> the tests to demonstrate this
> # move from low-level calls of S3A client to calls of WriteOperationHelper; 
> adding any new methods are needed there.
> # Tests. the tests of HDFS-13713. 
> # test execution, with -DS3Guard, -DAuth
> There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and 
> even if there was, it might not show that S3Guard was bypassed, because 
> there's no checks that listFiles/listStatus shows the newly committed files.
> Similarly, because MPU requests are initiated in S3AMultipartUploader, 
> encryption settings are't picked up. Files being uploaded this way *are not 
> being encrypted*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15576) S3A Multipart Uploader to work with S3Guard and encryption

2018-08-06 Thread Ewan Higgs (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570240#comment-16570240
 ] 

Ewan Higgs commented on HADOOP-15576:
-

{quote}legit regression; the rejection of 0-part MPUs is breaking a mock test. 
{quote}
This is checked in {{testCompleteEmptyUpload}} that you've added. AIUI, S3 
requires an MPU to have at least 1 part uploaded even if that part is 0 bytes.

Is there anything outstanding here? Just the question of whether we want to go 
full protobuf to avoid Java Serialization?

> S3A  Multipart Uploader to work with S3Guard and encryption
> ---
>
> Key: HADOOP-15576
> URL: https://issues.apache.org/jira/browse/HADOOP-15576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2
>Reporter: Steve Loughran
>Assignee: Ewan Higgs
>Priority: Blocker
> Attachments: HADOOP-15576-005.patch, HADOOP-15576-007.patch, 
> HADOOP-15576-008.patch, HADOOP-15576.001.patch, HADOOP-15576.002.patch, 
> HADOOP-15576.003.patch, HADOOP-15576.004.patch
>
>
> The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with 
> the tests to demonstrate this
> # move from low-level calls of S3A client to calls of WriteOperationHelper; 
> adding any new methods are needed there.
> # Tests. the tests of HDFS-13713. 
> # test execution, with -DS3Guard, -DAuth
> There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and 
> even if there was, it might not show that S3Guard was bypassed, because 
> there's no checks that listFiles/listStatus shows the newly committed files.
> Similarly, because MPU requests are initiated in S3AMultipartUploader, 
> encryption settings are't picked up. Files being uploaded this way *are not 
> being encrypted*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15400) Improve S3Guard documentation on Authoritative Mode implementation

2018-08-06 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570193#comment-16570193
 ] 

genericqa commented on HADOOP-15400:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
38m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HADOOP-15400 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934469/HADOOP-15400.001.patch
 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 5f56b916d4b5 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bcfc985 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 330 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14993/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve S3Guard documentation on Authoritative Mode implementation
> --
>
> Key: HADOOP-15400
> URL: https://issues.apache.org/jira/browse/HADOOP-15400
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15400.001.patch
>
>
> Part of the design of S3Guard is support for skipping the call to S3 
> listObjects and serving directory listings out of the MetadataStore under 
> certain circumstances.  This feature is called "authoritative" mode.  I've 
> talked to many people about this feature and it seems to be universally 
> confusing.
> I suggest we improve / add a section to the s3guard.md site docs elaborating 
> on what Authoritative Mode is.
> It is *not* treating the MetadataStore (e.g. dynamodb) as the source of truth 
> in general.
> It *is* the ability to short-circuit S3 list objects and serve listings from 
> the MetadataStore in some circumstances: 
> For S3A to skip S3's list objects on some *path*, and serve it directly from 
> the MetadataStore, the following things must all be true:
>  # The MetadataStore implementation persists the bit 
> {{DirListingMetadata.isAuthorititative}} set when calling 
> {{MetadataStore#put(DirListingMetadata)}}
>  # The S3A client is configured to allow metadatastore to be authoritative 
> source of a directory listing (fs.s3a.metadatastore.authoritative=true).
>  # The MetadataStore has a full listing for *path* stored in it.  This only 
> happens if the FS client (s3a) explicitly has stored a full directory listing 
> with {{DirListingMetadata.isAuthorititative=true}} before the said listing 
> request happens.
> Note that #1 only currently happens in LocalMetadataStore. Adding support to 
> DynamoDBMetadataStore is covered in HADOOP-14154.
> Also, the multiple uses of 

[jira] [Updated] (HADOOP-15400) Improve S3Guard documentation on Authoritative Mode implementation

2018-08-06 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HADOOP-15400:

Status: Patch Available  (was: In Progress)

> Improve S3Guard documentation on Authoritative Mode implementation
> --
>
> Key: HADOOP-15400
> URL: https://issues.apache.org/jira/browse/HADOOP-15400
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15400.001.patch
>
>
> Part of the design of S3Guard is support for skipping the call to S3 
> listObjects and serving directory listings out of the MetadataStore under 
> certain circumstances.  This feature is called "authoritative" mode.  I've 
> talked to many people about this feature and it seems to be universally 
> confusing.
> I suggest we improve / add a section to the s3guard.md site docs elaborating 
> on what Authoritative Mode is.
> It is *not* treating the MetadataStore (e.g. dynamodb) as the source of truth 
> in general.
> It *is* the ability to short-circuit S3 list objects and serve listings from 
> the MetadataStore in some circumstances: 
> For S3A to skip S3's list objects on some *path*, and serve it directly from 
> the MetadataStore, the following things must all be true:
>  # The MetadataStore implementation persists the bit 
> {{DirListingMetadata.isAuthorititative}} set when calling 
> {{MetadataStore#put(DirListingMetadata)}}
>  # The S3A client is configured to allow metadatastore to be authoritative 
> source of a directory listing (fs.s3a.metadatastore.authoritative=true).
>  # The MetadataStore has a full listing for *path* stored in it.  This only 
> happens if the FS client (s3a) explicitly has stored a full directory listing 
> with {{DirListingMetadata.isAuthorititative=true}} before the said listing 
> request happens.
> Note that #1 only currently happens in LocalMetadataStore. Adding support to 
> DynamoDBMetadataStore is covered in HADOOP-14154.
> Also, the multiple uses of the word "authoritative" are confusing. Two 
> meanings are used:
>  1. In the FS client configuration fs.s3a.metadatastore.authoritative
>  - Behavior of S3A code (not MetadataStore)
>  - "S3A is allowed to skip S3.list() when it has full listing from 
> MetadataStore"
> 2. MetadataStore
>  When storing a dir listing, can set a bit isAuthoritative
>  1 : "full contents of directory"
>  0 : "may not be full listing"
> Note that a MetadataStore *MAY* persist this bit. (not *MUST*).
> We should probably rename the {{DirListingMetadata.isAuthorititative}} to 
> {{.fullListing}} or at least put a comment where it is used to clarify its 
> meaning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15400) Improve S3Guard documentation on Authoritative Mode implementation

2018-08-06 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570126#comment-16570126
 ] 

Gabor Bota commented on HADOOP-15400:
-

I've uploaded a patch for this because HADOOP-14154 seems to be a long-running 
issue for testing reasons. I'll update the documentation whit that patch, but I 
think this issue could be resolved faster than the other one.

[~fabbri], please do as you feel like and if you have anything in mind that I 
haven't included just go ahead and upload a new patch with your fixes during 
the review.

> Improve S3Guard documentation on Authoritative Mode implementation
> --
>
> Key: HADOOP-15400
> URL: https://issues.apache.org/jira/browse/HADOOP-15400
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15400.001.patch
>
>
> Part of the design of S3Guard is support for skipping the call to S3 
> listObjects and serving directory listings out of the MetadataStore under 
> certain circumstances.  This feature is called "authoritative" mode.  I've 
> talked to many people about this feature and it seems to be universally 
> confusing.
> I suggest we improve / add a section to the s3guard.md site docs elaborating 
> on what Authoritative Mode is.
> It is *not* treating the MetadataStore (e.g. dynamodb) as the source of truth 
> in general.
> It *is* the ability to short-circuit S3 list objects and serve listings from 
> the MetadataStore in some circumstances: 
> For S3A to skip S3's list objects on some *path*, and serve it directly from 
> the MetadataStore, the following things must all be true:
>  # The MetadataStore implementation persists the bit 
> {{DirListingMetadata.isAuthorititative}} set when calling 
> {{MetadataStore#put(DirListingMetadata)}}
>  # The S3A client is configured to allow metadatastore to be authoritative 
> source of a directory listing (fs.s3a.metadatastore.authoritative=true).
>  # The MetadataStore has a full listing for *path* stored in it.  This only 
> happens if the FS client (s3a) explicitly has stored a full directory listing 
> with {{DirListingMetadata.isAuthorititative=true}} before the said listing 
> request happens.
> Note that #1 only currently happens in LocalMetadataStore. Adding support to 
> DynamoDBMetadataStore is covered in HADOOP-14154.
> Also, the multiple uses of the word "authoritative" are confusing. Two 
> meanings are used:
>  1. In the FS client configuration fs.s3a.metadatastore.authoritative
>  - Behavior of S3A code (not MetadataStore)
>  - "S3A is allowed to skip S3.list() when it has full listing from 
> MetadataStore"
> 2. MetadataStore
>  When storing a dir listing, can set a bit isAuthoritative
>  1 : "full contents of directory"
>  0 : "may not be full listing"
> Note that a MetadataStore *MAY* persist this bit. (not *MUST*).
> We should probably rename the {{DirListingMetadata.isAuthorititative}} to 
> {{.fullListing}} or at least put a comment where it is used to clarify its 
> meaning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15400) Improve S3Guard documentation on Authoritative Mode implementation

2018-08-06 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HADOOP-15400:

Attachment: HADOOP-15400.001.patch

> Improve S3Guard documentation on Authoritative Mode implementation
> --
>
> Key: HADOOP-15400
> URL: https://issues.apache.org/jira/browse/HADOOP-15400
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15400.001.patch
>
>
> Part of the design of S3Guard is support for skipping the call to S3 
> listObjects and serving directory listings out of the MetadataStore under 
> certain circumstances.  This feature is called "authoritative" mode.  I've 
> talked to many people about this feature and it seems to be universally 
> confusing.
> I suggest we improve / add a section to the s3guard.md site docs elaborating 
> on what Authoritative Mode is.
> It is *not* treating the MetadataStore (e.g. dynamodb) as the source of truth 
> in general.
> It *is* the ability to short-circuit S3 list objects and serve listings from 
> the MetadataStore in some circumstances: 
> For S3A to skip S3's list objects on some *path*, and serve it directly from 
> the MetadataStore, the following things must all be true:
>  # The MetadataStore implementation persists the bit 
> {{DirListingMetadata.isAuthorititative}} set when calling 
> {{MetadataStore#put(DirListingMetadata)}}
>  # The S3A client is configured to allow metadatastore to be authoritative 
> source of a directory listing (fs.s3a.metadatastore.authoritative=true).
>  # The MetadataStore has a full listing for *path* stored in it.  This only 
> happens if the FS client (s3a) explicitly has stored a full directory listing 
> with {{DirListingMetadata.isAuthorititative=true}} before the said listing 
> request happens.
> Note that #1 only currently happens in LocalMetadataStore. Adding support to 
> DynamoDBMetadataStore is covered in HADOOP-14154.
> Also, the multiple uses of the word "authoritative" are confusing. Two 
> meanings are used:
>  1. In the FS client configuration fs.s3a.metadatastore.authoritative
>  - Behavior of S3A code (not MetadataStore)
>  - "S3A is allowed to skip S3.list() when it has full listing from 
> MetadataStore"
> 2. MetadataStore
>  When storing a dir listing, can set a bit isAuthoritative
>  1 : "full contents of directory"
>  0 : "may not be full listing"
> Note that a MetadataStore *MAY* persist this bit. (not *MUST*).
> We should probably rename the {{DirListingMetadata.isAuthorititative}} to 
> {{.fullListing}} or at least put a comment where it is used to clarify its 
> meaning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15107) Stabilize/tune S3A committers; review correctness & docs

2018-08-06 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569843#comment-16569843
 ] 

genericqa commented on HADOOP-15107:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  3s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
49s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
27s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}129m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HADOOP-15107 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934423/HADOOP-15107-004.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b727c202a825 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bcfc985 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| whitespace |