[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386178#comment-16386178
 ] 

Hudson commented on HADOOP-13761:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13769 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13769/])
HADOOP-13761. S3Guard: implement retries for DDB failures and (stevel: rev 
8110d6a0d59e7dc2ddb25fa424fab188c5e9ce35)
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/InconsistentS3Object.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStoreScale.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/InconsistentAmazonS3Client.java
* (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractCommitITest.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3GuardListConsistency.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AInconsistency.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Invoker.java
* (edit) hadoop-tools/hadoop-aws/dev-support/findbugs-exclude.xml
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3GuardExistsRetryPolicy.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java
* (delete) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestDynamoDBMetadataStoreScale.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/FailureInjectionPolicy.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AReadOpContext.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AOpContext.java


> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761-012.patch, HADOOP-13761-013.patch, 
> HADOOP-13761.001.patch, HADOOP-13761.002.patch, HADOOP-13761.003.patch, 
> HADOOP-13761.004.patch, HADOOP-15183-013.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386098#comment-16386098
 ] 

genericqa commented on HADOOP-13761:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
13s{color} | {color:red} Docker failed to build yetus/hadoop:d4cc50f. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13761 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913046/HADOOP-13761-013.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14254/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761-012.patch, HADOOP-13761-013.patch, 
> HADOOP-13761.001.patch, HADOOP-13761.002.patch, HADOOP-13761.003.patch, 
> HADOOP-13761.004.patch, HADOOP-15183-013.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386097#comment-16386097
 ] 

Steve Loughran commented on HADOOP-13761:
-

+1 committing. I see I used the wrong JIRA name on the -013 patch; resubmitting 
it with the correct name just to avoid confusion.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761-012.patch, HADOOP-13761-013.patch, 
> HADOOP-13761.001.patch, HADOOP-13761.002.patch, HADOOP-13761.003.patch, 
> HADOOP-13761.004.patch, HADOOP-15183-013.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386091#comment-16386091
 ] 

genericqa commented on HADOOP-13761:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} hadoop-tools/hadoop-aws: The patch generated 0 new + 
15 unchanged - 1 fixed = 15 total (was 16) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
35s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-13761 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913033/HADOOP-15183-013.patch
 |
| Optional Tests |  asflicense  findbugs  xml  compile  javac  javadoc  
mvninstall  mvnsite  unit  shadedclient  checkstyle  |
| uname | Linux 80e1525b0b25 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e8c5be6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14252/testReport/ |
| Max. process+thread count | 336 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14252/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org 

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385084#comment-16385084
 ] 

Steve Loughran commented on HADOOP-13761:
-

LGTM, just need to get rid of findbugs

+1 pending that. I'll have a go monday am.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761-012.patch, HADOOP-13761.001.patch, 
> HADOOP-13761.002.patch, HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-02 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384246#comment-16384246
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Findbugs seems to be smoking crack and/or lambda-challenged.
|Return value of S3AReadOpContext.getReadInvoker() ignored, but method has no 
side effect At S3AInputStream.java:but method has no side effect At 
S3AInputStream.java:[line 181]|
{noformat}
S3Object object = context.getReadInvoker().once(text, uri,
() -> client.getObject(request));{noformat}

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761-012.patch, HADOOP-13761.001.patch, 
> HADOOP-13761.002.patch, HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-02 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384241#comment-16384241
 ] 

genericqa commented on HADOOP-13761:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m  
3s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} hadoop-tools/hadoop-aws: The patch generated 0 new + 
13 unchanged - 1 fixed = 13 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
53s{color} | {color:red} hadoop-tools/hadoop-aws generated 1 new + 0 unchanged 
- 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
46s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-aws |
|  |  Return value of S3AReadOpContext.getReadInvoker() ignored, but method has 
no side effect  At S3AInputStream.java:but method has no side effect  At 
S3AInputStream.java:[line 181] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-13761 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12912718/HADOOP-13761-012.patch
 |
| Optional Tests |  asflicense  findbugs  xml  compile  javac  javadoc  
mvninstall  mvnsite  unit  shadedclient  checkstyle  |
| uname | Linux 8563509fe6cd 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 60080fb |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| findbugs | 

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-01 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383164#comment-16383164
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

v12 patch attached which eliminates nested retries. Changes from v11:

- reopen() RetryTranslated -> OnceTranslated.  Use once() instead of retry().  
All callers (lazySeek() and callers of onReadFailure()) have their own 
higher-level retries.

- onReadFailure() RetryTranslated -> OnceTranslated, because reopen() no longer 
has internal retries.

- Move lazySeek() in read() above the retry'd lambda; eliminates nested retry 
and matches what read(b, off, len) does.

-  Remove reopen() from read(b, off, len) after wrapped stream throws EOF. 
*This was existing code I changed*: I don't understand why you'd reopen() on 
EOF, [~ste...@apache.org]?

Tested in us-west-2.  I was able to reproduce the test timeout for 
ITestS3AFailureHandling, which is now passing for me.

Running on little sleep but wanted to get a patch posted to give a chance for 
European Friday code review.
 
Here is the delta from v11:
{noformat}
diff -ur ./src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java 
/Users/fabbri/Code/hadoop/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
--- ./src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
2018-03-01 20:34:33.0 -0800
+++ 
/Users/fabbri/Code/hadoop/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
2018-03-01 20:09:05.0 -0800
@@ -154,7 +154,7 @@
* @param length length requested
* @throws IOException on any failure to open the object
*/
-  @Retries.RetryTranslated
+  @Retries.OnceTranslated
   private synchronized void reopen(String reason, long targetPos, long length)
   throws IOException {

@@ -178,7 +178,7 @@
 }
 String text = String.format("Failed to %s %s at %d",
 (opencount == 0 ? "open" : "re-open"), uri, targetPos);
-S3Object object = context.invoker.retry(text, uri, true,
+S3Object object = context.getReadInvoker().once(text, uri,
 () -> client.getObject(request));
 wrappedStream = object.getObjectContent();
 contentRangeStart = targetPos;
@@ -349,6 +349,12 @@
   return -1;
 }

+try {
+  lazySeek(nextReadPos, 1);
+} catch (EOFException e) {
+  return -1;
+}
+
 // With S3Guard, the metadatastore gave us metadata for the file in
 // open(), so we use a slightly different retry policy.
 // read() may not be likely to fail, but reopen() does a GET which
@@ -358,7 +364,6 @@
 () -> {
   int b;
   try {
-lazySeek(nextReadPos, 1);
 b = wrappedStream.read();
   } catch (EOFException e) {
 return -1;
@@ -387,7 +392,7 @@
* @param length length of data being attempted to read
* @throws IOException any exception thrown on the re-open attempt.
*/
-  @Retries.RetryTranslated
+  @Retries.OnceTranslated
   private void onReadFailure(IOException ioe, int length) throws IOException {

 LOG.info("Got exception while trying to read from stream {}" +
@@ -439,7 +444,6 @@
   try {
 bytes = wrappedStream.read(buf, off, len);
   } catch (EOFException e) {
-onReadFailure(e, len);
 // the base implementation swallows EOFs.
 return -1;
   } catch (IOException e) {

{noformat}
 

 

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761-012.patch, HADOOP-13761.001.patch, 
> HADOOP-13761.002.patch, HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not 

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383163#comment-16383163
 ] 

Wangda Tan commented on HADOOP-13761:
-

Thanks [~fabbri] for help, we have last two blockers, this is one of them, if 
everything goes very well, we can get RC0 by end of this week or early next 
week. Just want to make sure this is not the last one :) 

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761-012.patch, HADOOP-13761.001.patch, 
> HADOOP-13761.002.patch, HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-01 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383066#comment-16383066
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Thanks [~leftnoteasy] I've been travelling week, so sorry for the delay.  I 
should be able to get out a new patch by Friday morning pacific time. Not sure 
how quick we can turn around a code review and testing.  When do you want to 
cut RC0?

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-03-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382580#comment-16382580
 ] 

Wangda Tan commented on HADOOP-13761:
-

[~fabbri]/[~ste...@apache.org],

Given the other blockers are close to finish, how much more time needed for 
this JIRA? Asking this because I want to cut RC0 of 3.1.0 earlier if possible.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380326#comment-16380326
 ] 

Steve Loughran commented on HADOOP-13761:
-

My failures are HADOOP-15269; something has gone very wrong with a bucket of 
mine, one which I want to understand before anything else. But we do need to 
handle that double retry when such a situation arises. I just now have a test 
case

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376983#comment-16376983
 ] 

Steve Loughran commented on HADOOP-13761:
-

-1

I'd committed this locally and was doing the cherry pick to branch-3.1 & got a 
test timeout in {{ITestS3AFailureHandling.testReadFileChanged}} on that branch

{code}
java.lang.Exception: test timed out after 60 milliseconds
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:344)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:231)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:181)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.lambda$lazySeek$1(S3AInputStream.java:327)
at 
org.apache.hadoop.fs.s3a.S3AInputStream$$Lambda$23/570183744.execute(Unknown 
Source)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$2(Invoker.java:190)
at 
org.apache.hadoop.fs.s3a.Invoker$$Lambda$24/1791082625.execute(Unknown Source)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
at 
org.apache.hadoop.fs.s3a.Invoker$$Lambda$13/1380113967.execute(Unknown Source)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:188)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:210)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:320)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:423)
at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
at 
org.apache.hadoop.fs.s3a.ITestS3AFailureHandling.testReadFileChanged(ITestS3AFailureHandling.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

This is the bit where read failures are being retried: are EOF exceptions being 
over-retried? 

Switching back to my dev terminal and trunk and I managed to get a 400 on all 
those failure tests, which makes me think maybe S3 ireland has startedf is 
playing up: switching to london fixes it.

Anyway, assuming there is a problem with S3 in a region, is this recovery code 
going to keep trying too often. That is: are we overdoing in retry on retry, as 
lazySeek does a retry with the chosen retryInvoker, and reopen does its retry 
too: with retry on retry things are taking so long to fail in a read that tests 
time out.

I think what needs to be done is to not have that double retry, or have the 
outer retry policy only handle FNFEs, and even then, only on s3guard.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-26 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376781#comment-16376781
 ] 

genericqa commented on HADOOP-13761:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} hadoop-tools/hadoop-aws: The patch generated 0 new + 
13 unchanged - 1 fixed = 13 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m  
3s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-13761 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12912023/HADOOP-13761-011.patch
 |
| Optional Tests |  asflicense  findbugs  xml  compile  javac  javadoc  
mvninstall  mvnsite  unit  shadedclient  checkstyle  |
| uname | Linux 73bfc4656ad6 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2fa7963 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14209/testReport/ |
| Max. process+thread count | 302 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14209/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org 

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376698#comment-16376698
 ] 

Steve Loughran commented on HADOOP-13761:
-

* ..forgot to include the wrapped-> private change in patch 010; here it is in 
patch 011
*. w.r.t annotations, I see findbugs has some. I was wondering if they'd allow 
us to annotate lambdas

I was thinking findbugs should be able to conclude that, "a closure declared 
and executed within a synchronized env is itself synchronized", but I now think 
that fb can't be sure. It doesn't know that Invoker.once() executes the closure 
in that method...it queuing it for async use. So findbugs is right to warn

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761-011.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374834#comment-16374834
 ] 

genericqa commented on HADOOP-13761:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 1 
new + 13 unchanged - 1 fixed = 14 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
30s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-13761 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911763/HADOOP-13761-010.patch
 |
| Optional Tests |  asflicense  findbugs  xml  compile  javac  javadoc  
mvninstall  mvnsite  unit  shadedclient  checkstyle  |
| uname | Linux fa9b40046281 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d1cd573 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14204/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14204/testReport/ |
| Max. process+thread count | 328 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-23 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374799#comment-16374799
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Thanks.. I added some @ annotations for checkstyle but it didn't seem to pick 
them up.

Didn't we used to be able to use bin/dev-support/test-patch to preview 
checkstyle deltas?  I had a completely green run of that script before 
submitting v9 patch so I don't think it is working.  I'll probably start a 
thread on common-dev.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761.001.patch, HADOOP-13761.002.patch, HADOOP-13761.003.patch, 
> HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374746#comment-16374746
 ] 

Steve Loughran commented on HADOOP-13761:
-

attached: patch 010; fixes findbugs by marking one field as private, turning of 
serialization warnigns for InconsistentS3Object and turning off all IDE sync 
warnings for S3AInputStream. The latter is risky, but I couldn't do any other 
way given the names of lambdas aren't known/declared.

Apparently findbugs now supports some @annotations which can be used to disable 
the checks, they may be things you can apply on a specific lambda-exp. If/when 
we adopt them in hadoop, this exclusion can be revisited. 

Until then, IDE only.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761-010.patch, HADOOP-13761-010.patch, 
> HADOOP-13761.001.patch, HADOOP-13761.002.patch, HADOOP-13761.003.patch, 
> HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374625#comment-16374625
 ] 

Steve Loughran commented on HADOOP-13761:
-

LGTM. Like the opcontext model.

+1

just trying to see if I can get findbugs to shut up.

One minor worry of mine: if someone deletes a file being read, when the client 
with the open stream does a seek(), the FNFE may get retried a few times before 
giving up. It's a failure case anyway, so I'm not that worried, and trying to 
cleanly model it would need for the input stream to record whether it had done 
1+ successful open, and pass that info to the read conext with 
{{getReadInvoker()}} adding a param {{isFirstRead}} to use in its decision 
making. 



> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761-008.patch, 
> HADOOP-13761-009.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373730#comment-16373730
 ] 

genericqa commented on HADOOP-13761:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 1 
new + 13 unchanged - 1 fixed = 14 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
44s{color} | {color:red} hadoop-tools/hadoop-aws generated 7 new + 0 unchanged 
- 0 fixed = 7 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
54s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-aws |
|  |  org.apache.hadoop.fs.s3a.InconsistentS3Object is Serializable; consider 
declaring a serialVersionUID  At InconsistentS3Object.java:a serialVersionUID  
At InconsistentS3Object.java:[lines 39-181] |
|  |  The field org.apache.hadoop.fs.s3a.InconsistentS3Object.wrapped is 
transient but isn't set by deserialization  In InconsistentS3Object.java:but 
isn't set by deserialization  In InconsistentS3Object.java |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.contentRangeFinish; locked 88% of time  
Unsynchronized access at S3AInputStream.java:88% of time  Unsynchronized access 
at S3AInputStream.java:[line 299] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.nextReadPos; locked 93% of time  
Unsynchronized access at S3AInputStream.java:93% of time  Unsynchronized access 
at S3AInputStream.java:[line 361] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.pos; locked 56% of time  Unsynchronized 
access at S3AInputStream.java:56% of time  Unsynchronized access at 
S3AInputStream.java:[line 244] |
|  |  

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373534#comment-16373534
 ] 

genericqa commented on HADOOP-13761:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 13 
new + 13 unchanged - 1 fixed = 26 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
42s{color} | {color:red} hadoop-tools/hadoop-aws generated 7 new + 0 unchanged 
- 0 fixed = 7 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
39s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-aws |
|  |  org.apache.hadoop.fs.s3a.InconsistentS3Object is Serializable; consider 
declaring a serialVersionUID  At InconsistentS3Object.java:a serialVersionUID  
At InconsistentS3Object.java:[lines 39-181] |
|  |  The field org.apache.hadoop.fs.s3a.InconsistentS3Object.wrapped is 
transient but isn't set by deserialization  In InconsistentS3Object.java:but 
isn't set by deserialization  In InconsistentS3Object.java |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.contentRangeFinish; locked 88% of time  
Unsynchronized access at S3AInputStream.java:88% of time  Unsynchronized access 
at S3AInputStream.java:[line 299] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.nextReadPos; locked 93% of time  
Unsynchronized access at S3AInputStream.java:93% of time  Unsynchronized access 
at S3AInputStream.java:[line 361] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.pos; locked 56% of time  Unsynchronized 
access at S3AInputStream.java:56% of time  Unsynchronized access at 
S3AInputStream.java:[line 244] |
|  |  

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-22 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373419#comment-16373419
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Cleaning up checkstyle / findbugs now.. new patch soon.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761.001.patch, 
> HADOOP-13761.002.patch, HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372426#comment-16372426
 ] 

genericqa commented on HADOOP-13761:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 40 
new + 13 unchanged - 1 fixed = 53 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
43s{color} | {color:red} hadoop-tools/hadoop-aws generated 6 new + 0 unchanged 
- 0 fixed = 6 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
36s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-aws |
|  |  Class org.apache.hadoop.fs.s3a.InconsistentS3Object defines non-transient 
non-serializable instance field policy  In InconsistentS3Object.java:instance 
field policy  In InconsistentS3Object.java |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.contentRangeFinish; locked 88% of time  
Unsynchronized access at S3AInputStream.java:88% of time  Unsynchronized access 
at S3AInputStream.java:[line 300] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.nextReadPos; locked 93% of time  
Unsynchronized access at S3AInputStream.java:93% of time  Unsynchronized access 
at S3AInputStream.java:[line 362] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.pos; locked 56% of time  Unsynchronized 
access at S3AInputStream.java:56% of time  Unsynchronized access at 
S3AInputStream.java:[line 245] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.readahead; locked 83% of time  
Unsynchronized access at S3AInputStream.java:83% of time  Unsynchronized access 
at S3AInputStream.java:[line 251] 

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372378#comment-16372378
 ] 

genericqa commented on HADOOP-13761:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HADOOP-13761 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13761 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911491/HADOOP-13761-007.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14183/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761.001.patch, 
> HADOOP-13761.002.patch, HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-21 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372377#comment-16372377
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

v7 patch incoming.  Since last revision:

 
 * Add some Retries annotations to S3AInputStream methods.
 * Also add retry loop around single-byte read().  Also make new test case 
exercise both read() variants.
 * Consolidate logic around whether or not to use the S3Guard Invoker.  It is 
really a "S3Guard file present" invoker, not only a read() invoker; we may end 
up using the same policy for other methods.  
 * Stash output of path.toString() in S3AInputStream instead of recreating it 
over and over from file status.
 * Inline "operation to be retried" closures.
 * Add similar test case to what [~ste...@apache.org] suggested above (but a 
little different).. See testOpenDeleteRead() in the new patch

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761-007.patch, HADOOP-13761.001.patch, 
> HADOOP-13761.002.patch, HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-21 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372014#comment-16372014
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

I will try to post an updated patch today.

{quote}

will not handle the situation where the GET still gets back the initial PUT.

{quote}

Seems like the ultimate solution to that would be to store an etag with the 
metadata in S3Guard and then pass it along with the operation context and have 
it treat the wrong etag similarly to how we treat FNFE here today.  If we ever 
decide to do that we can introduce specific fault injection which mimics the 
stale GET case.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369187#comment-16369187
 ] 

Steve Loughran commented on HADOOP-13761:
-

bq. Or just call them Retry with a clarifying comment–since they "are covered 
WRT retries that we actually want".  I'm thinking the latter.

+1

# I believe throttling can reject any new HTTP verb issued over an open 
HTTP/1.1 channel. That is: every request to a shard is checked against that 
shard's tracking of an AWS account's load & throttled if need be
# same for requests of AWS KMS
# things like bandwidth throttling are more likely to happen between EC2 VM and 
the store
# load balancers can get overloaded. AWS recommend not caching the DNS entries 
for very long.

One failure mode to consider is delete + create inconsistency. The retry logic 
in this patch will catch the situation of PUT, DELETE, PUT, GET, as if the 
second PUT hasn't trickled through, it will spin. I will not handle the 
situation where the GET still gets back the initial PUT. We don't have any 
explicit tests for that situation, and if we did, not much which could be done.





> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-16 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367954#comment-16367954
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

{quote}Mark up the seek/read calls with @RetryPolicy, so it's easier to see 
what happens transitively
{quote}
Humm.. these codepaths are sort of a new case. They retry sometimes, depending 
on s3guard state. Should we add some new annotations 
S3GuardRetry[Mixed,Raw,Translated]? Or just call them Retry with a clarifying 
comment–since they "are covered WRT retries that we actually want".  I'm 
thinking the latter.
{quote}only have the s3guardInvoker field set to S3GuardExistsRetryPolicy if 
S3Guard is enabled, otherwise just set s3guardInvoker=invoker.
 and maybe just call it readInvoker
 then you can go context.readInvoker on those calls without much logic in the 
input stream.
{quote}
I thought about this. I'm concerned about misuse of the new retry policy 
breaking fail-fast behavior where it is needed, so I opted to make each 
codepath very explicit where the change in behavior happens and why.  Let me 
take a look at it again.
{quote}the invoked operations themselves can be stuck inline
{quote}
This statement is true.  ;)

Thanks for taking the time to think through this stuff–I wanted extra eyes on 
it.  I was a bit on the fence about the read failpoints, not knowing if they 
really occur in the wild: getObject() seems to actually open an HTTP stream and 
read usually consumes that, so not sure you can really get FNFE there. 
Wondering if we could get throttling there too–but it depends on unknown S3 
service behavior (i.e. does throttling only affect new read sessions and 
metadata requests, or can it affect an open HTTP/TCP stream.  I suppose we 
could have a reconnect that hits another shard and gets FNFE?).

 

 

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367654#comment-16367654
 ] 

Steve Loughran commented on HADOOP-13761:
-

bq. was this part of your diff intentional or accident?

kind of unrelated, but as the IDE was telling me off I fixed it

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367653#comment-16367653
 ] 

Steve Loughran commented on HADOOP-13761:
-

Spent a lot of time staring at that read code, good to follow those failure 
modes. I like you did with the fault injection.

What this adds is not just the s3guard FNFE handling, but the whole resilience 
to read failures. Which is nice.

Some thoughts

* Mark up the seek/read calls with @RetryPolicy, so it's easier to see what 
happens transitively
* only have the s3guardInvoker field set to S3GuardExistsRetryPolicy if S3Guard 
is enabled, otherwise just set {{s3guardInvoker=invoker}}.
* and maybe just call it {{readInvoker}}
* then you can go {{context.readInvoker}} on those calls without much logic in 
the input stream.
* the invoked operations themselves can be stuck inline

BTW, one thing I don't remember us having any test for is the code sequence

{code}
Path p = path();
writeTextFile(fs, p, "hello")
InputStream s = fs.open(path)
fs.delete(p)
intercept(FileNotFoundException.class, () -> s.read())
{code}

This would let us check that the deletion of a file does still get picked up in 
a bounded-time when s3guard is turned on.

+ same for readfully().

A really clever variant of that test would be a test which did the delete after 
the first read and before a seek we knew would trigger a new GET (such as a 
backwards one). Again, we'd expect a failure with/without s3guard.


Finally, you know, we don't actually implement skip() ourselves in S3A, just do 
a read + discard all the way to the end. Filed HADOOP-15245 to do that later. 
Luckily, none of the columnar formats appear to use skip(), or, if anything 
uses skip, it's doing it over short distances. Whatever gets done there should 
think about resilience testing too.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-15 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366675#comment-16366675
 ] 

genericqa commented on HADOOP-13761:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 38 
new + 13 unchanged - 1 fixed = 51 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
42s{color} | {color:red} hadoop-tools/hadoop-aws generated 5 new + 0 unchanged 
- 0 fixed = 5 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
42s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-aws |
|  |  Class org.apache.hadoop.fs.s3a.InconsistentS3Object defines non-transient 
non-serializable instance field policy  In InconsistentS3Object.java:instance 
field policy  In InconsistentS3Object.java |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.contentRangeFinish; locked 88% of time  
Unsynchronized access at S3AInputStream.java:88% of time  Unsynchronized access 
at S3AInputStream.java:[line 296] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.pos; locked 62% of time  Unsynchronized 
access at S3AInputStream.java:62% of time  Unsynchronized access at 
S3AInputStream.java:[line 241] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.readahead; locked 83% of time  
Unsynchronized access at S3AInputStream.java:83% of time  Unsynchronized access 
at S3AInputStream.java:[line 247] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.wrappedStream; locked 57% of time  
Unsynchronized access at S3AInputStream.java:57% of time  Unsynchronized access 
at S3AInputStream.java:[line 

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-15 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366647#comment-16366647
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

About to post v6 patch.  Changes from v5:

- Run testOpenFailOnRead() in a loop with 0.5 failure probability. (My idea 
about making failpoint probing more deterministic wasn't worth the trouble.)  
It runs 20 times, but I tested it with 500 iterations.  I no longer assert 
non-s3guard fails since it is now RNG.
- Add status=404 failpoint to InconsistentAmazonS3Client#getObject(): that is 
the first GET AFAICT
- Add FNFE failpoint to to skip() (which may use read() underneath to advance 
stream, I believe).
- Add s3guard-specific retry() around lazySeek().  Now both callers of reopen() 
are covered in a retry, only when S3Guard is enabled.

I'll add the v6 patch as well as a v5-to-v6 diff so you can skim the 
changes..(diff-diff is not exact: it contains a change or two from your v5 
tweaks).


> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761.001.patch, HADOOP-13761.002.patch, HADOOP-13761.003.patch, 
> HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-15 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366622#comment-16366622
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

[~ste...@apache.org] was this part of your diff intentional or accident?

{noformat}
diff --git 
a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java
 
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java
index 61ce68441e3..514a5e606e8 100644
--- 
a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java
+++ 
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java
@@ -153,7 +153,6 @@ public S3ARetryPolicy(Configuration conf) {
 policyMap.put(InterruptedException.class, fail);
 // note this does not pick up subclasses (like socket timeout)
 policyMap.put(InterruptedIOException.class, fail);
-policyMap.put(AWSRedirectException.class, fail);
{noformat}



> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, HADOOP-13761-005.patch, 
> HADOOP-13761.001.patch, HADOOP-13761.002.patch, HADOOP-13761.003.patch, 
> HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-15 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366433#comment-16366433
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

[~ste...@apache.org] on your question about changing the retry() to enclose 
lazySeek() instead of around stream.read(), that is not sufficient with the 
current failure model (i.e. how I'm injecting failures).  I think the failure 
injection needs work.

{noformat}
2018-02-15 15:27:00,907 [JUnit-testOpenFailOnRead] ERROR 
s3a.AbstractS3ATestBase (ITestS3AInconsistency.java:testOpenFailOnRead(129)) - 
Error:
java.io.FileNotFoundException: read(b, 0, 4) on key 
test/ancestor/file-to-read-DELAY_LISTING_ME failed: injecting error 3/5 for 
test.
at 
org.apache.hadoop.fs.s3a.InconsistentS3Object$InconsistentS3InputStream.readFailpoint(InconsistentS3Object.java:169)
at 
org.apache.hadoop.fs.s3a.InconsistentS3Object$InconsistentS3InputStream.read(InconsistentS3Object.java:159)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.lambda$read$56(S3AInputStream.java:426)
at 
org.apache.hadoop.fs.s3a.S3AInputStream$$Lambda$31/1279551328.execute(Unknown 
Source)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$7(Invoker.java:260)
at 
org.apache.hadoop.fs.s3a.Invoker$$Lambda$12/89609.execute(Unknown Source)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:231)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:438)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
org.apache.hadoop.fs.s3a.ITestS3AInconsistency.testOpenFailOnRead(ITestS3AInconsistency.java:126)
{noformat}

If we can agree on where to inject failures I think I can come up with a good 
solution.

Maybe:

- We need both lazySeek() and the stream.read() retries?
- The failure injection for InconsistentS3InputStream() should also have a 
failpoint in skip(), which would have exposed the lack of retry in lazySeek().
- AmazonS3Client.getObject() currently does not fail, but returns an 
InconsistentS3Object with the read/skip/etc. failpoints mentioned.  Seems like 
getObject() needs to fail as, looking at the SDK code, it actually does the GET 
request I believe.

Does this sound right?

Also: My current approach of failing read 5 times then succeeding (5<20 which 
is retry max) is not going to work to expose all the codepaths that fail.  I 
need a loop runs multiple tests and either (1) increases max failures, or 
failure offset, by one each duration (how traceroute uses TTLs to probe router 
hops; I use to probe failure points) or (2) uses randomization to be likely to 
hit different failure points.

#1 actually seems more deterministic.



> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, HADOOP-13761-005.patch, 
> HADOOP-13761.001.patch, HADOOP-13761.002.patch, HADOOP-13761.003.patch, 
> HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-15 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366211#comment-16366211
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Your changes all look straightforward, thanks.  I can play with moving the 
retry this evening.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, HADOOP-13761-005.patch, 
> HADOOP-13761.001.patch, HADOOP-13761.002.patch, HADOOP-13761.003.patch, 
> HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-15 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366200#comment-16366200
 ] 

genericqa commented on HADOOP-13761:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 38 
new + 13 unchanged - 1 fixed = 51 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
42s{color} | {color:red} hadoop-tools/hadoop-aws generated 4 new + 0 unchanged 
- 0 fixed = 4 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
29s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 44m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-aws |
|  |  org.apache.hadoop.fs.s3a.InconsistentS3Object.readFailureCounter should 
be package protected  At InconsistentS3Object.java: At 
InconsistentS3Object.java:[line 42] |
|  |  Class org.apache.hadoop.fs.s3a.InconsistentS3Object defines non-transient 
non-serializable instance field policy  In InconsistentS3Object.java:instance 
field policy  In InconsistentS3Object.java |
|  |  Format-string method String.format(String, Object[]) called with format 
string "read(b, %d, %d) on key %s failed:with format string "read(b, %d, %d) on 
key %s failed: injecting error {}/{} for test." wants 3 arguments but is given 
5 in 
org.apache.hadoop.fs.s3a.InconsistentS3Object$InconsistentS3InputStream.readFailpoint(int,
 int)  At InconsistentS3Object.java:[line 169] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.fs.s3a.S3AInputStream.wrappedStream; locked 88% of time  
Unsynchronized access at S3AInputStream.java:88% of time  Unsynchronized access 
at S3AInputStream.java:[line 409] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA 

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366140#comment-16366140
 ] 

Steve Loughran commented on HADOOP-13761:
-

submitted. Other than the big q around the read() retry, all LGTM

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, HADOOP-13761-005.patch, 
> HADOOP-13761.001.patch, HADOOP-13761.002.patch, HADOOP-13761.003.patch, 
> HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366138#comment-16366138
 ] 

Steve Loughran commented on HADOOP-13761:
-

I think you forgot to hit the submit button. I do that all too often.

Here's my review. I've done it in the IDE and am attaching (a) the new patch 
and (b) the diff between yours and mine. That'll simplify your reviewing my 
changes.

One thing I've not done is the change in  S3AInputStream invoke on read, where 
I have three comments

# I think the S3Guard invoker should be around/in lazySeek(nextReadPos, len) 
-that is where the GET is initiated. The read() is just trying to work with an 
existing stream.
# what happens in the state that the file really isn't there? That is: how 
rapidly do we expect to timeout on retries for it?
# context.dstFileStatus.getPath().toString()  => move to a const to avoid 
recreating it on every read() call.

The key issues is #1: is the retry in the right place? I don't think so.

h2. changes in the patch

h3. S3AFileSystem

* L83 remove unused import
* L704 reformatted new input stream constructor invocation for clarity; do like 
the new structure BTW.
* L2106 reinstated the log debug statement, unless you really don't want it.
 
 
h3. ITestS3AInconsistency
 
* L120: this stream needs closing in a try-with-resources call
* L131 the fail() call should nest the inner exception as the cause 
* move waitUntilDeleted to java 8 lambda
 
h3. Other
* InconsistentS3Object: IDE complains that the LOG is non serializable; mark as 
transient to get it to quieten down
* AbstractCommitITest, ITestDynamoDBMetadataStoreScale, FailureInjectionPolicy, 
S3AOpContext, S3GuardExistsRetryPolicy : import ordering
* InconsistentS3Object: import ordering & tell IDE to stop complaining about 
serialization issues on this subclass
* S3ARetryPolicy - fixed up something wrong in the original code; duplicate 
AWSRedirectException in map



> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761-004-to-005.patch, HADOOP-13761.001.patch, 
> HADOOP-13761.002.patch, HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-14 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365068#comment-16365068
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

tested in us-west-2, with and without s3guard.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-14 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365059#comment-16365059
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

v4 patch: v3 was missing a couple of chunks.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-14 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365040#comment-16365040
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Just posted v3 patch.
{noformat}
Feedback from previous patch
  - info -> debug
  - switch to provisionTableBlocking() in DDBMS#updateParameters()
  - use getClass().getSimpleName() for test folder
  - Add warning to testing doc: -Dscale + -Ds3guard + -Ddynamo requires you use
a private table that you can afford to mess up.

New retry policy when failing read() with S3Guard.

- ITestS3AInconsistency#testOpenFailOnRead() shows the new retry behavior maybe
  read that first.

- Add new failure injection around S3Object API.
  - New class InconsistentS3Object subclasses the AWS S3Object,
and produces the new InconsistentS3InputStream.
  - InconsistentS3Input stream just wraps S3ObjectInputStream, adding failpoints
to read() and read(byte[], int, int).
  - Factor some S3AInputStream constructor args into a new struct, S3AOpContext.
(I expect we will want to subclass this and add more per-operation state
when we rewrite rename(). Any per-op state, such as S3Guard details on
expected metadata, can go here.)
  - Add retry to S3AInputStream#onReadFailure(), only when metadata store is in
use. Currently we infer that isS3GuardEnabled ==> S3guard had metadata for
the file, which is how the code is currently structured, but we could make
this more explicit.

- Move failure injection config / policy into separate class,
  FailureInjectionPolicy

- Refactor S3ARetryPolicy to make the new S3GuardExistsRetryPolicy possible.
{noformat}

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-14 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364725#comment-16364725
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Finishing up the fail-on-read after open() retry stuff.. Should be posting a 
patch today.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363852#comment-16363852
 ] 

Steve Loughran commented on HADOOP-13761:
-

I'll call that a success. Is the patch ready to go in?

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-13 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363517#comment-16363517
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Some entertaining logs from the new scale test introduced here (batch size 
exceeds provisioned throughput).  Multiple of seconds of backoff timer.. still 
never fails though.
{noformat}
DEBUG s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:completeAncestry(548)) - auto-create ancestor path 
s3a://bucket-new/scaleTestBWEP0 for child path 
s3a://bucket-new/scaleTestBWEP0/lvl1/lvl2/lvl3/lvl4/lvl5/lvl6/lvl7/lvl8/lvl9/lvl10/lvl11/lvl12/lvl13/lvl14/lvl15/lvl16/lvl17/lvl18/lvl19/lvl20/lvl21/lvl22/lvl23/lvl2423

DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:put(693)) - 
Saving batch of 25 items to table fabbri-dev3, region us-west-2

DEBUG s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:retryBackoff(660)) - Sleeping 250 msec before next 
retry

DEBUG s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:retryBackoff(660)) - Sleeping 586 msec before next 
retry

DEBUG s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:retryBackoff(660)) - Sleeping 747 msec before next 
retry

DEBUG s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:retryBackoff(660)) - Sleeping 2295 msec before next 
retry

DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:put(681)) - 
Saving to table table-dev3 in region us-west-2: 
PathMetadata{fileStatus=FileStatus{path=s3a{noformat}

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357386#comment-16357386
 ] 

Steve Loughran commented on HADOOP-13761:
-

I was looking at the input stream retry logic in the context of SPARK-23308, 
where someone else's S3 client was getting SocketException during their reads., 
suspected EC2 network throttling. 

created: HADOOP-15216

For s3guard, yes. an FNFE in the stream open is something we may want to retry 
on. Without s3guard the open() would fail fast, so it's only when s3guard = on 
and there's some delete inconsistency surfacing

Rename is scary

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-07 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356378#comment-16356378
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Thanks for the feedback on the scale test patch [~ste...@apache.org].  On the 
retries, I've been looking at open(), delete(), and rename() in terms of what 
happens when MetadataStore has metadata but we hit eventual consistency when we 
start operating on S3.

 
 * open(): This is the only code that creates a S3AInputStream constructor, and 
it does a getFileStatus() first, so failures should be happening later in 
read() paths.  There is already a single handler for read failures: 
{{onReadFailure()}}.  I suggest we add invoker/retry policy to this function. 
Thoughts? As is, it appears to only retry once without any backoff timer.
 * delete(): this uses deleteObject(), which already has retries. I think we 
are good here?
 * rename(): this is more complicated, and we probably need to do more in-depth 
rename() rewrite to include HADOOP-15183.  One thing I noticed is, 
{{copyFile()}} is annotated {{RetryMixed}} but only uses {{once()}} internally, 
so I might fix that.

I'd like to address Steve's feedback on the scale test and make the above 
changes if it sounds good. I think rename() would benefit from its own 
patch/JIRA, (possibly even add a new experimental "v2" rename codepath that can 
be switched in via config while we improve and test with it, dunno).

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-07 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356048#comment-16356048
 ] 

Steve Loughran commented on HADOOP-13761:
-

I've not seen that exception since; it was on ASF trunk/, so unless we've 
updated the SDK since then (we have, haven't we?), I don't know what's up

h3. DynamoDBMetadataStore

L693: leave @ debug


{{DynamoDBMetadataStore.updateParameters()}} should which to 
{{provisionTableBlocking()}}, so the CLI tool will not complete until the 
provisioning has happened. This will improve its ability to be used in scripts 
& tests.

h3. ITestDynamoDBMetadataStoreScale


* I like the fact the DB gets shrunk back after. Currently the CLI tests slowly 
leak capacity, even though it should, on my reading, clean up
* {{pathOfDepth}} has the base path "/scaleTestBWEP" . I think it should be 
getClass.getShortName()

This is going to be fun on a shared test run

# need to make sure that is not run in parallel
# need to doc that this is test must be run on a private DDB instance, not one 
shared across other buckets. Don't want other teams getting upset because their 
tests on a different bucket are failing.

For HADOOP-14918 I've been wondering if we should have tests explicitly declare 
a "test DDB table"; {{ITestS3GuardToolDynamoDB}} hard codes to 
"testDynamoDBInitDestroy", which relies on at most one user in a shared AWS a/c 
from running the test at the same time. This is one which can be created on 
demand, destroyed afterwards, so the test suite can do what it wants. This 
would line up for future tests of things like upgrades, TTL checking, etc. This 
scale test could share the same config option.



> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-01-26 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341807#comment-16341807
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

v2 patch is same as v1 (just a test which tries to reproduce Exceeded 
exception).. just has a small tweak to make sure both the put and delete 
batches are full size.  Still no reproduction of that issue.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-01-25 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340137#comment-16340137
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Attached v1 RFC patch that just adds a test that attempts to reproduce the 
ProvisionedThroughputExceededException that [~ste...@apache.org] reported 
before.  Look for testBatchedWriteExceedsProvisioned().  I moved the DDB scale 
test to the .s3guard package for better access to testing functions in DDB 
metastore.

I have not been able to reproduce the SDK failing to do its own retries. I'm 
wondering if [~ste...@apache.org] would have more luck getting it to 
reproduce.. Maybe there are differences in DDB service in the region you use 
(I'm in us-west-2).  IIRC the SDK version is still the same as when you posted 
that stack trace?

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-01-24 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338852#comment-16338852
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Wrote a test to check above hypothesis on batch size vs. provisioned I/O.  No 
repo yet but will run it overnight and share a patch tomorrow.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-01-24 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338609#comment-16338609
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

{quote}
Managed to break tests when working with a bucket whose DDB table was 
precreated at {{ hadoop s3guard init -write 20 -read 20}} & five parallel test 
cases
{quote}

Just noticing your comment from September [~ste...@apache.org].  This is pretty 
annoying, according to [sdk 
docs|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.RetryAndBackoff]:

{quote}
The AWS SDKs implement automatic retry logic and exponential backoff.
{quote}

I confirmed this when I created the {{ITestDynamoDBMetadataStoreScale}} tests 
with low provisioned I/O as well.  However, here is a hint from those same SDK 
docs:

{quote}
However, after a minute, if the request has not succeeded, the problem might be 
the request size exceeding your provisioned throughput, and not the request 
rate. 
{quote}

Hypothesis: SDK does behave as advertised (the bulk write interface explicitly 
exposes "stuff left for retry" so it seems to be the intention), *except* when 
your batch size > provisioned I/O units.

That is, you had a batch of over 20 items to write at once and the SDK gives up 
and says "give me a smaller batch".

Another thing to note here, is this JIRA has a lot of work mentioned in it.. so 
heads up I may split things up.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-01-16 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328003#comment-16328003
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

[~ste...@apache.org] sure.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Steve Loughran
>Priority: Blocker
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-01-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325113#comment-16325113
 ] 

Steve Loughran commented on HADOOP-13761:
-

Aaron: you got time to look @ this?

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Steve Loughran
>Priority: Blocker
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-01-11 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322178#comment-16322178
 ] 

Steve Loughran commented on HADOOP-13761:
-

+think we could add an annotation 
RetrySwallowsExceptions/OnceSwallowsExceptions & use, tag also the headers of 
the interface and declare what's required, as you node

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Steve Loughran
>Priority: Blocker
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-01-10 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321643#comment-16321643
 ] 

Aaron Fabbri commented on HADOOP-13761:
---

Reminder comment here: We should also:

- Update MetadataStore javadoc to say that implementations *must* implement 
their own retries as needed to recover from transient issues like services 
unavailable or network connectivity blips. (The most sane design decision IMO.. 
especially if more backends are added in the future).

- Update any S3Guard-specific retry annotations in S3A FS to assume 
MetadataStores do their own retries.  e.g. the 
{{Retries.OnceTranslated("s3guard not retrying")}} being added to 
listLocatedStatus() in HADOOP-15079.

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Steve Loughran
>Priority: Blocker
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2017-12-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278672#comment-16278672
 ] 

Steve Loughran commented on HADOOP-13761:
-

pulling HADOOP-15035 in here, which adds the extra requirement "translate AWS 
SDK exceptions into IOEs". For that reason, uprated to a blocker for 3.1

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Steve Loughran
>Priority: Blocker
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org