[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-11-06 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676483#comment-16676483
 ] 

Sunil Govindan commented on HADOOP-15349:
-

I added 3.2.0 and 3.3.0 to Fixed version. Pls add if i missed any versions.

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.2.0, 3.3.0
>
> Attachments: HADOOP-15349.001.patch, failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-07-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554923#comment-16554923
 ] 

Steve Loughran commented on HADOOP-15349:
-

I'm pleased to say I can now trigger DDB overloads, and the new message is 
being printed
{code}
[ERROR] 
testFakeDirectoryDeletion(org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost)  
Time elapsed: 32.643 s  <<< ERROR!
java.io.IOException: Max retries exceeded (5) for DynamoDB. This may be because 
write threshold of DynamoDB is set too low.
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:693)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:672)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$move$4(DynamoDBMetadataStore.java:625)
at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:127)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:125)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:624)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:1072)
at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:862)
at 
org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:299)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}


> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15349.001.patch, failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-07-13 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542958#comment-16542958
 ] 

Sean Mackrory commented on HADOOP-15349:


Thanks, [~gabor.bota]. I committed the change to trunk, so resolving.

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15349.001.patch, failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-07-13 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542760#comment-16542760
 ] 

Gabor Bota commented on HADOOP-15349:
-

I've just created HADOOP-15604, so this can be closed.

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15349.001.patch, failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-07-12 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541627#comment-16541627
 ] 

Sean Mackrory commented on HADOOP-15349:


+1 to the change. It would be nice if we could confirm that the I/O thresholds 
really are the reason for unprocessed items. I don't know what else would cause 
that - JavaDocs don't mention anything. We can retrieve the capacity used in a 
given attempt from the BatchWriteItemResult, but getting the capacity 
configured doesn't seem to be exposed in the API (and we can't assume the one 
configured for new tables in Hadoop is the same as the current table).

It might be nice to do a bit of testing with some large batch sizes and see if 
we can at least document a recommended minimum that seems to reliably not 
exhaust the exponential back-off's buffer time. Can you file a follow-up JIRA 
for that and I'll commit this patch?

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15349.001.patch, failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-07-11 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540137#comment-16540137
 ] 

Gabor Bota commented on HADOOP-15349:
-

is it enough?

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15349.001.patch, failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-07-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540136#comment-16540136
 ] 

Steve Loughran commented on HADOOP-15349:
-

patch seems good though

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15349.001.patch, failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-07-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540135#comment-16540135
 ] 

Steve Loughran commented on HADOOP-15349:
-

Sorry, missed your comment. I'd just set capacity to something like 1 or 2 and 
had the normal tests fail

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15349.001.patch, failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539704#comment-16539704
 ] 

genericqa commented on HADOOP-15349:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
23s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HADOOP-15349 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931032/HADOOP-15349.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux df44e6f5dfb4 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7f1d3d0 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14873/testReport/ |
| Max. process+thread count | 303 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14873/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> S3Guard DDB retryBackoff to be more inf

[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-07-04 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532811#comment-16532811
 ] 

Gabor Bota commented on HADOOP-15349:
-

Hi [~ste...@apache.org],
I'm not able to find this test you running in the upstream/trunk. Could you 
send me the test you are running or the test is there somewhere, but I'm not 
able to find it?

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-03-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417233#comment-16417233
 ] 

Steve Loughran commented on HADOOP-15349:
-

Theres ~50 files being committed; each in their own thread from the commit 
pool; assume the DDB repo is being overloaded just from one single process 
doing task commit. We should be backing off more, especially given that failing 
on a write could potentially leave the store inconsistent with the FS (renames, 
etc).

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-03-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417229#comment-16417229
 ] 

Steve Loughran commented on HADOOP-15349:
-

+metrics. The S3Guard retries are not being added to stats. The bucket had a 
DDB table with read=write=10 units when it overloaded


{code}
2018-03-28 04:29:34,295 [ScalaTest-main-running-S3ACommitBulkDataSuite] INFO  
s3.S3AOperations (Logging.scala:logInfo(54)) - Metrics:
  S3guard_metadatastore_put_path_latency50thPercentileLatency = 0
  S3guard_metadatastore_put_path_latency75thPercentileLatency = 0
  S3guard_metadatastore_put_path_latency90thPercentileLatency = 0
  S3guard_metadatastore_put_path_latency95thPercentileLatency = 0
  S3guard_metadatastore_put_path_latency99thPercentileLatency = 0
  S3guard_metadatastore_put_path_latencyNumOps = 0
  S3guard_metadatastore_throttle_rate50thPercentileFrequency (Hz) = 0
  S3guard_metadatastore_throttle_rate75thPercentileFrequency (Hz) = 0
  S3guard_metadatastore_throttle_rate90thPercentileFrequency (Hz) = 0
  S3guard_metadatastore_throttle_rate95thPercentileFrequency (Hz) = 0
  S3guard_metadatastore_throttle_rate99thPercentileFrequency (Hz) = 0
  S3guard_metadatastore_throttle_rateNumEvents = 0
  committer_bytes_committed = 12594213
  committer_bytes_uploaded = 12594213
  committer_commits_aborted = 0
  committer_commits_completed = 138
  committer_commits_created = 136
  committer_commits_failed = 0
  committer_commits_reverted = 0
  committer_jobs_completed = 29
  committer_jobs_failed = 0
  committer_magic_files_created = 2
  committer_tasks_completed = 34
  committer_tasks_failed = 0
  directories_created = 25
  directories_deleted = 0
  fake_directories_deleted = 1788
  files_copied = 6
  files_copied_bytes = 8473
  files_created = 39
  files_deleted = 28
  ignored_errors = 92
  object_continue_list_requests = 0
  object_copy_requests = 0
  object_delete_requests = 227
  object_list_requests = 431
  object_metadata_requests = 592
  object_multipart_aborted = 0
  object_put_bytes = 12738607
  object_put_bytes_pending = 0
  object_put_requests = 204
  object_put_requests_active = 0
  object_put_requests_completed = 204
  op_copy_from_local_file = 0
  op_create = 39
  op_create_non_recursive = 0
  op_delete = 77
  op_exists = 177
  op_get_file_checksum = 0
  op_get_file_status = 1232
  op_glob_status = 8
  op_is_directory = 22
  op_is_file = 0
  op_list_files = 13
  op_list_located_status = 6
  op_list_status = 110
  op_mkdirs = 10
  op_open = 407
  op_rename = 6
  s3guard_metadatastore_initialization = 1
  s3guard_metadatastore_put_path_request = 203
  s3guard_metadatastore_retry = 0
  s3guard_metadatastore_throttled = 0
  store_io_throttled = 0
  stream_aborted = 0
  stream_backward_seek_operations = 145
  stream_bytes_backwards_on_seek = 8272842
  stream_bytes_discarded_in_abort = 0
  stream_bytes_read = 9199577
  stream_bytes_read_in_close = 783876
  stream_bytes_skipped_on_seek = 187038
  stream_close_operations = 549
  stream_closed = 549
  stream_forward_seek_operations = 72
  stream_opened = 549
  stream_read_exceptions = 0
  stream_read_fully_operations = 448
  stream_read_operations = 13129
  stream_read_operations_incomplete = 932
  stream_seek_operations = 217
  stream_write_block_uploads = 2
  stream_write_block_uploads_aborted = 0
  stream_write_block_uploads_active = 0
  stream_write_block_uploads_committed = 0
  stream_write_block_uploads_data_pending = 0
  stream_write_block_uploads_pending = 37
  stream_write_failures = 0
  stream_write_total_data = 153642
  stream_write_total_time = 738
{code}

(ps: suspect that uploads_pending is a false stat & its really the uncommitted 
uploads being counted)

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-03-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417224#comment-16417224
 ] 

Steve Loughran commented on HADOOP-15349:
-

Attached full log

# no meaningful details in the exception, should be "DDB calls not completing", 
maybe some history
# we should compare before & after of results processed. If the result count is 
decreasing, then its OK to keep retrying, as things have slowed down, not 
failed.
# +review timeout defaults & include details in exception "after 20s"

This happened during job commit, which is pretty sensitive. The job *did* 
complete successfully, because it's wrapped in retry code too. But I think we 
could have handled it better at the lower levels, as not all apps will be 
retrying so much.

+[~fabbri] [~gabor.bota]

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
> Attachments: failure.log
>
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15349) S3Guard DDB retryBackoff to be more informative on limits exceeded

2018-03-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417219#comment-16417219
 ] 

Steve Loughran commented on HADOOP-15349:
-

Stack
{code}
2018-03-28 04:22:17,375 [s3-committer-pool-2] ERROR s3a.S3AFileSystem 
(S3AFileSystem.java:finishedWrite(2730)) - S3Guard: Error updating 
MetadataStore for write to 
cloud-integration/DELAY_LISTING_ME/S3ACommitBulkDataSuite/bulkdata/output/landsat/parquet/parted-1/year=2016/month=6/part-0-24152aa2-c86d-49d2-98d4-820dc37a6df1-local-1522235507089.c000.snappy.parquet:
java.io.IOException: Max retries exceeded (9) for DynamoDB
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:657)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:636)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.put(DynamoDBMetadataStore.java:695)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.put(DynamoDBMetadataStore.java:685)
at 
org.apache.hadoop.fs.s3a.s3guard.S3Guard.putAndReturn(S3Guard.java:149)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2727)
at 
org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$finalizeMultipartUpload$1(WriteOperationHelper.java:234)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
at 
org.apache.hadoop.fs.s3a.WriteOperationHelper.finalizeMultipartUpload(WriteOperationHelper.java:222)
at 
org.apache.hadoop.fs.s3a.WriteOperationHelper.completeMPUwithRetries(WriteOperationHelper.java:267)
at 
org.apache.hadoop.fs.s3a.commit.CommitOperations.innerCommit(CommitOperations.java:179)
at 
org.apache.hadoop.fs.s3a.commit.CommitOperations.commit(CommitOperations.java:151)
at 
org.apache.hadoop.fs.s3a.commit.CommitOperations.commitOrFail(CommitOperations.java:134)
at 
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.lambda$commitPendingUploads$3(AbstractS3ACommitter.java:451)
at org.apache.hadoop.fs.s3a.commit.Tasks$Builder$1.run(Tasks.java:254)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
{code}

> S3Guard DDB retryBackoff to be more informative on limits exceeded
> --
>
> Key: HADOOP-15349
> URL: https://issues.apache.org/jira/browse/HADOOP-15349
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Minor
>
> When S3Guard can't update the DB and so throws an IOE after the retry limit 
> is exceeded, it's not at all informative. Improve logging & exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org