[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466862#comment-16466862
 ] 

genericqa commented on HADOOP-15446:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
15s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} hadoop-tools/hadoop-azure: The patch generated 
20 new + 3 unchanged - 0 fixed = 23 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
3s{color} | {color:green} hadoop-azure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:f667ef1 |
| JIRA Issue | HADOOP-15446 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922393/HADOOP-15446-branch-2.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0b0a6a021c83 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 5679920 |
| maven | version: Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) |
| Default Java | 1.7.0_171 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14600/artifact/out/diff-checkstyle-hadoop-tools_hadoop-azure.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14600/testReport/ |
| Max. process+thread count | 190 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14600/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> WASB: PageBlobInputStream.skip breaks HBASE replication
> ---
>
> Key: HADOOP-15446
> URL: https://issues.apache.org/jira/browse/HADOOP-15446
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.0.2
>

[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication

2018-05-07 Thread Thomas Marquardt (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466846#comment-16466846
 ] 

Thomas Marquardt commented on HADOOP-15446:
---

FYI, the same changes apply to both branch-2 and trunk.

> WASB: PageBlobInputStream.skip breaks HBASE replication
> ---
>
> Key: HADOOP-15446
> URL: https://issues.apache.org/jira/browse/HADOOP-15446
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.0.2
>Reporter: Thomas Marquardt
>Assignee: Thomas Marquardt
>Priority: Major
> Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, 
> HADOOP-15446-003.patch, HADOOP-15446-branch-2.001.patch
>
>
> Page Blobs are primarily used by HBASE.  HBASE replication, which apparently 
> has not been used with WASB until recently, performs non-sequential reads on 
> log files using PageBlobInputStream.  There are bugs in this stream 
> implementation which prevent skip and seek from working properly, and 
> eventually the stream state becomes corrupt and unusable.
> I believe this bug affects all releases of WASB/HADOOP.  It appears to be a 
> day-0 bug in PageBlobInputStream.  There were similar bugs opened in the past 
> (HADOOP-15042) but the issue was not properly fixed, and no test coverage was 
> added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication

2018-05-07 Thread Thomas Marquardt (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466843#comment-16466843
 ] 

Thomas Marquardt commented on HADOOP-15446:
---

For branch-2, I've attached HADOOP-15446-branch-2.001.patch.  All tests are 
passing against my storage account:

*$ mvn test -Dtest=ITestPageBlobInputStream#**

[INFO] Running org.apache.hadoop.fs.azure.ITestPageBlobInputStream
[INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.212 s 
- in org.apache.hadoop.fs.azure.ITestPageBlobInputStream

 

*$ mvn -T 1C clean verify*

[INFO] Results:
[INFO]
[WARNING] Tests run: 233, Failures: 0, Errors: 0, Skipped: 4


[INFO] Results:
[INFO]
[WARNING] Tests run: 570, Failures: 0, Errors: 0, Skipped: 12

> WASB: PageBlobInputStream.skip breaks HBASE replication
> ---
>
> Key: HADOOP-15446
> URL: https://issues.apache.org/jira/browse/HADOOP-15446
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.0.2
>Reporter: Thomas Marquardt
>Assignee: Thomas Marquardt
>Priority: Major
> Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, 
> HADOOP-15446-003.patch, HADOOP-15446-branch-2.001.patch
>
>
> Page Blobs are primarily used by HBASE.  HBASE replication, which apparently 
> has not been used with WASB until recently, performs non-sequential reads on 
> log files using PageBlobInputStream.  There are bugs in this stream 
> implementation which prevent skip and seek from working properly, and 
> eventually the stream state becomes corrupt and unusable.
> I believe this bug affects all releases of WASB/HADOOP.  It appears to be a 
> day-0 bug in PageBlobInputStream.  There were similar bugs opened in the past 
> (HADOOP-15042) but the issue was not properly fixed, and no test coverage was 
> added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication

2018-05-07 Thread Thomas Marquardt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Marquardt updated HADOOP-15446:
--
Attachment: HADOOP-15446-branch-2.001.patch

> WASB: PageBlobInputStream.skip breaks HBASE replication
> ---
>
> Key: HADOOP-15446
> URL: https://issues.apache.org/jira/browse/HADOOP-15446
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.0.2
>Reporter: Thomas Marquardt
>Assignee: Thomas Marquardt
>Priority: Major
> Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, 
> HADOOP-15446-003.patch, HADOOP-15446-branch-2.001.patch
>
>
> Page Blobs are primarily used by HBASE.  HBASE replication, which apparently 
> has not been used with WASB until recently, performs non-sequential reads on 
> log files using PageBlobInputStream.  There are bugs in this stream 
> implementation which prevent skip and seek from working properly, and 
> eventually the stream state becomes corrupt and unusable.
> I believe this bug affects all releases of WASB/HADOOP.  It appears to be a 
> day-0 bug in PageBlobInputStream.  There were similar bugs opened in the past 
> (HADOOP-15042) but the issue was not properly fixed, and no test coverage was 
> added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466836#comment-16466836
 ] 

genericqa commented on HADOOP-13649:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
32s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HADOOP-13649 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922382/HADOOP-13649.003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 48d7bc95e671 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 08ea90e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14599/testReport/ |
| Max. process+thread count | 352 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14599/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
>   

[jira] [Comment Edited] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466798#comment-16466798
 ] 

Aaron Fabbri edited comment on HADOOP-13649 at 5/8/18 3:07 AM:
---

Nice work [~gabor.bota], this looks good. I hope you don't mind, I've attached 
a v3 of the patch with a couple tweaks:

- Fix the checkstyle issue.
- TTL is optional (if ttl config is zero, it is disabled)
- TTL units seconds -> milliseconds (in case anyone wants to play with very 
short TTLs, which i think are interesting)
- Minor test changes to make sure they still work with zero TTL
- Added "Evolving" API annotations to the "undocumented" configs used for 
LocalMetadataStore.

Please review and give a "+1 (nonbinding)" if you like these changes.  I tested 
in US West 2.


was (Author: fabbri):
Nice work [~gabor.bota], this looks good. I hope you don't mind, I've attached 
a v3 of the patch with a couple tweaks:

- Fix the checkstyle issue.
- TTL is optional (if ttl config is zero, it is disabled)
- TTL units seconds -> milliseconds (in case anyone wants to play with very 
short TTLs, which i think are interesting)
- Minor test changes to make sure they still work with zero TTL

Please review and give a "+1 (nonbinding)" if you like these changes.  I tested 
in US West 2.

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch, 
> HADOOP-13649.003.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread Aaron Fabbri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-13649:
--
Attachment: HADOOP-13649.003.patch

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch, 
> HADOOP-13649.003.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466798#comment-16466798
 ] 

Aaron Fabbri commented on HADOOP-13649:
---

Nice work [~gabor.bota], this looks good. I hope you don't mind, I've attached 
a v3 of the patch with a couple tweaks:

- Fix the checkstyle issue.
- TTL is optional (if ttl config is zero, it is disabled)
- TTL units seconds -> milliseconds (in case anyone wants to play with very 
short TTLs, which i think are interesting)
- Minor test changes to make sure they still work with zero TTL

Please review and give a "+1 (nonbinding)" if you like these changes.  I tested 
in US West 2.

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch, 
> HADOOP-13649.003.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests

2018-05-07 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466764#comment-16466764
 ] 

Aaron Fabbri edited comment on HADOOP-15420 at 5/8/18 2:44 AM:
---

Thank you for working on this issue [~gabor.bota]. Good work identifying the 
bug. Couple comments on v1 patch:
{noformat}
   private boolean expired(FileStatus status, long expiry, String keyPrefix) {
+// remove the protocol from path string to be able to compare
+String bucket = status.getPath().toUri().getHost();

+  statusTranslatedPath = status.getPath().toUri().getPath();
+}
+
{noformat}

Can you use helper func {{standardize(Path)}} here instead?

Thanks for moving {{ testDiffCommand() }} to the base class.  Did you test this 
with Dynamo?  (`mvn clean test -Ds3guard -Ddynamo`) Unfortunately dynamodb 
metadatastore test the Local (in-memory) Test Dynamo thing still (until we 
finish HADOOP-14918). 



was (Author: fabbri):
Thank you for working on this issue [~gabor.bota]. Good work identifying the 
bug. Couple comments on v1 patch:
{noformat}
   private boolean expired(FileStatus status, long expiry, String keyPrefix) {
+// remove the protocol from path string to be able to compare
+String bucket = status.getPath().toUri().getHost();

+  statusTranslatedPath = status.getPath().toUri().getPath();
+}
+
{noformat}

Can you use helper func {{standardize(Path)}} here instead?

Thanks for moving {{ testDiffCommand() }} to the base class.  Did you test this 
with Dynamo?  (`mvn clean test -Ds3guard -Ddynamo`) Unfortunately dynamodb 
metadatastore test the Local (in-memory) Test Dynamo thing still (until we 
finish HADOOP-14918). 

Also a reminder please declare which AWS region you ran integration tests in. 

> s3guard ITestS3GuardToolLocal failures in diff tests
> 
>
> Key: HADOOP-15420
> URL: https://issues.apache.org/jira/browse/HADOOP-15420
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15420.001.patch, HADOOP-15420.002.patch
>
>
> Noticed this when testing the patch for HADOOP-13756.
>  
> {code:java}
> [ERROR] Failures:
> [ERROR]   
> ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testPruneCommandCLI:221->AbstractS3GuardToolTestBase.testPruneCommand:201->AbstractS3GuardToolTestBase.assertMetastoreListingCount:214->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88
>  Pruned children count 
> [PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/stale;
>  isDirectory=false; length=100; replication=1; blocksize=512; 
> modification_time=1524798258286; access_time=0; owner=hdfs; group=hdfs; 
> permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
> isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; 
> isDeleted=false}, 
> PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/fresh;
>  isDirectory=false; length=100; replication=1; blocksize=512; 
> modification_time=1524798262583; access_time=0; owner=hdfs; group=hdfs; 
> permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
> isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; 
> isDeleted=false}] expected:<1> but was:<2>{code}
>  
> Looking through the code, I'm noticing a couple of issues.
>  
> 1. {{testDiffCommand()}} is in {{ITestS3GuardToolLocal}}, but it should 
> really be running for all MetadataStore implementations.  Seems like it 
> should live in {{AbstractS3GuardToolTestBase}}.
> 2. {{AbstractS3GuardToolTestBase#createFile()}} seems wrong. When 
> {{onMetadataStore}} is false, it does a {{ContractTestUtils.touch(file)}}, 
> but the fs is initialized with a MetadataStore present, so seem like the fs 
> will still put the file in the MetadataStore?
> There are other tests which explicitly go around the MetadataStore by using 
> {{fs.setMetadataStore(nullMS)}}, e.g. ITestS3AInconsistency. We should do 
> something similar in {{AbstractS3GuardToolTestBase#createFile()}}, minding 
> any issues with parallel test runs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466769#comment-16466769
 ] 

genericqa commented on HADOOP-1:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 35 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
36s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  1m 
27s{color} | {color:red} hadoop-tools in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 20s{color} | {color:orange} root: The patch generated 4 new + 7 unchanged - 
0 fixed = 11 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
28s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
6s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
25s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
24s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 56s{color} 
| {color:red} hadoop-ftp in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m  9s{color} 
| {color:red} hadoop-tools in the patch failed. 

[jira] [Updated] (HADOOP-15399) KMSAcls should read kms-site.xml file.

2018-05-07 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HADOOP-15399:

Fix Version/s: (was: 2.8.4)

> KMSAcls should read kms-site.xml file.
> --
>
> Key: HADOOP-15399
> URL: https://issues.apache.org/jira/browse/HADOOP-15399
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Major
>
> KMSACLs uses {{AccessControlList}} for authorization.
> For creating groups membership, the group implementation class that will be 
> instantiated is configured by {{hadoop.security.group.mapping}}.
> Today {{KMSACLs}} class reads only {{kms-acls.xml}} file to create 
> {{AccessControlList}}.
> {{kms-acls.xml}} doesn't look the right place add the above config.
> So KMSAcls should read either kms-site.
> [~xiaochen]: Any preference which file should acls load ?
> IMO it should be kms-site because that file is mandatory. But all the 
> properties in kms-site.xml starts with {{hadoop.kms}}, I am little bit 
> inclined towards core-site.xml.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests

2018-05-07 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466764#comment-16466764
 ] 

Aaron Fabbri commented on HADOOP-15420:
---

Thank you for working on this issue [~gabor.bota]. Good work identifying the 
bug. Couple comments on v1 patch:
{noformat}
   private boolean expired(FileStatus status, long expiry, String keyPrefix) {
+// remove the protocol from path string to be able to compare
+String bucket = status.getPath().toUri().getHost();

+  statusTranslatedPath = status.getPath().toUri().getPath();
+}
+
{noformat}

Can you use helper func {{standardize(Path)}} here instead?

Thanks for moving {{ testDiffCommand() }} to the base class.  Did you test this 
with Dynamo?  (`mvn clean test -Ds3guard -Ddynamo`) Unfortunately dynamodb 
metadatastore test the Local (in-memory) Test Dynamo thing still (until we 
finish HADOOP-14918). 

Also a reminder please declare which AWS region you ran integration tests in. 

> s3guard ITestS3GuardToolLocal failures in diff tests
> 
>
> Key: HADOOP-15420
> URL: https://issues.apache.org/jira/browse/HADOOP-15420
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15420.001.patch, HADOOP-15420.002.patch
>
>
> Noticed this when testing the patch for HADOOP-13756.
>  
> {code:java}
> [ERROR] Failures:
> [ERROR]   
> ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testPruneCommandCLI:221->AbstractS3GuardToolTestBase.testPruneCommand:201->AbstractS3GuardToolTestBase.assertMetastoreListingCount:214->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88
>  Pruned children count 
> [PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/stale;
>  isDirectory=false; length=100; replication=1; blocksize=512; 
> modification_time=1524798258286; access_time=0; owner=hdfs; group=hdfs; 
> permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
> isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; 
> isDeleted=false}, 
> PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/fresh;
>  isDirectory=false; length=100; replication=1; blocksize=512; 
> modification_time=1524798262583; access_time=0; owner=hdfs; group=hdfs; 
> permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
> isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; 
> isDeleted=false}] expected:<1> but was:<2>{code}
>  
> Looking through the code, I'm noticing a couple of issues.
>  
> 1. {{testDiffCommand()}} is in {{ITestS3GuardToolLocal}}, but it should 
> really be running for all MetadataStore implementations.  Seems like it 
> should live in {{AbstractS3GuardToolTestBase}}.
> 2. {{AbstractS3GuardToolTestBase#createFile()}} seems wrong. When 
> {{onMetadataStore}} is false, it does a {{ContractTestUtils.touch(file)}}, 
> but the fs is initialized with a MetadataStore present, so seem like the fs 
> will still put the file in the MetadataStore?
> There are other tests which explicitly go around the MetadataStore by using 
> {{fs.setMetadataStore(nullMS)}}, e.g. ITestS3AInconsistency. We should do 
> something similar in {{AbstractS3GuardToolTestBase#createFile()}}, minding 
> any issues with parallel test runs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation

2018-05-07 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HADOOP-15450:

Target Version/s: 3.1.1, 2.9.2, 3.0.3, 2.8.5  (was: 2.8.4, 3.1.1, 2.9.2, 
3.0.3)

> Avoid fsync storm triggered by DiskChecker and handle disk full situation
> -
>
> Key: HADOOP-15450
> URL: https://issues.apache.org/jira/browse/HADOOP-15450
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> Fix disk checker issues reported by [~kihwal] in HADOOP-13738:
> # When space is low, the os returns ENOSPC. Instead simply stop writing, the 
> drive is marked bad and replication happens. This make cluster-wide space 
> problem worse. If the number of "failed" drives exceeds the DFIP limit, the 
> datanode shuts down.
> # There are non-hdfs users of DiskChecker, who use it proactively, not just 
> on failures. This was fine before, but now it incurs heavy I/O due to 
> introduction of fsync() in the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13738) DiskChecker should perform some disk IO

2018-05-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466748#comment-16466748
 ] 

Junping Du commented on HADOOP-13738:
-

Revert it from branch-2.8.4 but keep it on branch-2.8 as I plan to kick off 
2.8.4 RC0 today. We can leave the work to 2.8.5.

> DiskChecker should perform some disk IO
> ---
>
> Key: HADOOP-13738
> URL: https://issues.apache.org/jira/browse/HADOOP-13738
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha2, 2.8.5
>
> Attachments: HADOOP-13738-branch-2.8-06.patch, HADOOP-13738.01.patch, 
> HADOOP-13738.02.patch, HADOOP-13738.03.patch, HADOOP-13738.04.patch, 
> HADOOP-13738.05.patch
>
>
> DiskChecker can fail to detect total disk/controller failures indefinitely. 
> We have seen this in real clusters. DiskChecker performs simple 
> permissions-based checks on directories which do not guarantee that any disk 
> IO will be attempted.
> A simple improvement is to write some data and flush it to the disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13738) DiskChecker should perform some disk IO

2018-05-07 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HADOOP-13738:

Fix Version/s: (was: 2.8.4)
   2.8.5

> DiskChecker should perform some disk IO
> ---
>
> Key: HADOOP-13738
> URL: https://issues.apache.org/jira/browse/HADOOP-13738
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha2, 2.8.5
>
> Attachments: HADOOP-13738-branch-2.8-06.patch, HADOOP-13738.01.patch, 
> HADOOP-13738.02.patch, HADOOP-13738.03.patch, HADOOP-13738.04.patch, 
> HADOOP-13738.05.patch
>
>
> DiskChecker can fail to detect total disk/controller failures indefinitely. 
> We have seen this in real clusters. DiskChecker performs simple 
> permissions-based checks on directories which do not guarantee that any disk 
> IO will be attempted.
> A simple improvement is to write some data and flush it to the disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15449) ZK performance issues causing frequent Namenode failover

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466685#comment-16466685
 ] 

genericqa commented on HADOOP-15449:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
67m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
14s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}119m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HADOOP-15449 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/1293/HADOOP-15449.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  |
| uname | Linux e2eea1bc6f6f 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 696a4be |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14597/testReport/ |
| Max. process+thread count | 1513 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14597/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> ZK performance issues causing frequent Namenode failover 
> -
>
> Key: HADOOP-15449
> URL: https://issues.apache.org/jira/browse/HADOOP-15449
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>

[jira] [Commented] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466624#comment-16466624
 ] 

genericqa commented on HADOOP-15420:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
40s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HADOOP-15420 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922327/HADOOP-15420.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f57ac7a5353d 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 696a4be |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14598/testReport/ |
| Max. process+thread count | 345 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14598/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> s3guard ITestS3GuardToolLocal failures in diff tests
> 
>
> Key: HADOOP-15420
> URL: 

[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication

2018-05-07 Thread Thomas Marquardt (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466557#comment-16466557
 ] 

Thomas Marquardt commented on HADOOP-15446:
---

Yes, I will submit a branch-2 patch later today.

> WASB: PageBlobInputStream.skip breaks HBASE replication
> ---
>
> Key: HADOOP-15446
> URL: https://issues.apache.org/jira/browse/HADOOP-15446
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.0.2
>Reporter: Thomas Marquardt
>Assignee: Thomas Marquardt
>Priority: Major
> Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, 
> HADOOP-15446-003.patch
>
>
> Page Blobs are primarily used by HBASE.  HBASE replication, which apparently 
> has not been used with WASB until recently, performs non-sequential reads on 
> log files using PageBlobInputStream.  There are bugs in this stream 
> implementation which prevent skip and seek from working properly, and 
> eventually the stream state becomes corrupt and unusable.
> I believe this bug affects all releases of WASB/HADOOP.  It appears to be a 
> day-0 bug in PageBlobInputStream.  There were similar bugs opened in the past 
> (HADOOP-15042) but the issue was not properly fixed, and no test coverage was 
> added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15441) After HADOOP-14445, encryption zone operations print unnecessary INFO logs

2018-05-07 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466486#comment-16466486
 ] 

Xiao Chen commented on HADOOP-15441:


Changed links based on Wei-Chiu's comment

> After HADOOP-14445, encryption zone operations print unnecessary INFO logs
> --
>
> Key: HADOOP-15441
> URL: https://issues.apache.org/jira/browse/HADOOP-15441
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15441.001.patch, HADOOP-15441.002.patch
>
>
> It looks like after HADOOP-14445, any encryption zone operations prints extra 
> INFO log messages as follows:
> {code:java}
> $ hdfs dfs -copyFromLocal /etc/krb5.conf /scale/
> 18/05/02 11:54:55 INFO kms.KMSClientProvider: KMSClientProvider for KMS url: 
> https://hadoop3-1.example.com:16000/kms/v1/ delegation token service: 
> kms://ht...@hadoop3-1.example.com:16000/kms created.
> {code}
> It might make sense to make it a DEBUG message instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15408) HADOOP-14445 broke Spark.

2018-05-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15408:
---
Resolution: Invalid
Status: Resolved  (was: Patch Available)

With HADOOP-14445 reverted (see 
[discussion|https://issues.apache.org/jira/browse/HADOOP-14445?focusedCommentId=16464600=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16464600]),
 this is no longer an issue.
Thanks all for the report and investigation.

> HADOOP-14445 broke Spark.
> -
>
> Key: HADOOP-15408
> URL: https://issues.apache.org/jira/browse/HADOOP-15408
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Rushabh S Shah
>Priority: Blocker
> Attachments: HADOOP-15408-trunk.001.patch, 
> HADOOP-15408.trunk.poc.patch, split.patch, split.prelim.patch
>
>
> Spark bundles hadoop related jars in their package.
>  Spark expects backwards compatibility between minor versions.
>  Their job failed after we deployed HADOOP-14445 in our test cluster.
> {noformat}
> 2018-04-20 21:09:53,245 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
> 2018-04-20 21:09:53,273 ERROR [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.util.ServiceConfigurationError: 
> org.apache.hadoop.security.token.TokenIdentifier: Provider 
> org.apache.hadoop.crypto.key.kms.KMSDelegationToken$
> KMSLegacyDelegationTokenIdentifier could not be instantiated
> at java.util.ServiceLoader.fail(ServiceLoader.java:232)
> at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
> at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
> at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
> at 
> org.apache.hadoop.security.token.Token.getClassForIdentifier(Token.java:117)
> at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:138)
> at org.apache.hadoop.security.token.Token.identifierToString(Token.java:393)
> at org.apache.hadoop.security.token.Token.toString(Token.java:413)
> at java.lang.String.valueOf(String.java:2994)
> at 
> org.apache.commons.logging.impl.SLF4JLocationAwareLog.info(SLF4JLocationAwareLog.java:155)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1634)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1583)
> Caused by: java.lang.NoSuchFieldError: TOKEN_LEGACY_KIND
> at 
> org.apache.hadoop.crypto.key.kms.KMSDelegationToken$KMSLegacyDelegationTokenIdentifier.(KMSDelegationToken.java:64)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at java.lang.Class.newInstance(Class.java:442)
> at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
> ... 10 more
> 2018-04-20 21:09:53,278 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1
> {noformat}
> Their classpath looks like 
> {{\{...:hadoop-common-pre-HADOOP-14445.jar:.:hadoop-common-with-HADOOP-14445.jar:\}}}
> This is because the container loaded {{KMSDelegationToken}} class from an 
> older jar and {{KMSLegacyDelegationTokenIdentifier}} from new jar and it 
> fails when {{KMSLegacyDelegationTokenIdentifier}} wants to read 
> {{TOKEN_LEGACY_KIND}} from {{KMSDelegationToken}} which doesn't exist before.
>  Cc [~xiaochen]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15431) KMSTokenRenewer should work with KMS_DELEGATION_TOKEN which has ip:port as service

2018-05-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15431:
---
Resolution: Invalid
Status: Resolved  (was: Patch Available)

With HADOOP-14445 reverted (see 
[discussion|https://issues.apache.org/jira/browse/HADOOP-14445?focusedCommentId=16464600=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16464600]),
 this is no longer an issue.
Thanks all for the report and investigation.

> KMSTokenRenewer should work with KMS_DELEGATION_TOKEN which has ip:port as 
> service
> --
>
> Key: HADOOP-15431
> URL: https://issues.apache.org/jira/browse/HADOOP-15431
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Blocker
> Attachments: HADOOP-15431.01.patch, HADOOP-15431.02.patch
>
>
> Seen a test failure where a MR job failed to submit.
> RM log has:
> {noformat}
> 2018-04-30 15:00:17,864 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.lang.IllegalArgumentException: Invalid token service IP_ADDR:16000
> at 
> org.apache.hadoop.util.KMSUtil.createKeyProviderFromTokenService(KMSUtil.java:237)
> at 
> org.apache.hadoop.crypto.key.kms.KMSTokenRenewer.createKeyProvider(KMSTokenRenewer.java:100)
> at 
> org.apache.hadoop.crypto.key.kms.KMSTokenRenewer.renew(KMSTokenRenewer.java:57)
> at org.apache.hadoop.security.token.Token.renew(Token.java:414)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:590)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:587)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:585)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:894)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:871)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> while client log has
> {noformat}
> 18/04/30 15:53:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_1525128478242_0001
> 18/04/30 15:53:28 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:ns1, Ident: (token for systest: HDFS_DELEGATION_TOKEN 
> owner=syst...@example.com, renewer=yarn, realUser=, issueDate=1525128807236, 
> maxDate=1525733607236, sequenceNumber=1038, masterKeyId=20)
> 18/04/30 15:53:28 INFO mapreduce.JobSubmitter: Kind: HBASE_AUTH_TOKEN, 
> Service: 621a942b-292f-493d-ba50-f9b783704359, Ident: 
> (org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier@0)
> 18/04/30 15:53:28 INFO mapreduce.JobSubmitter: Kind: KMS_DELEGATION_TOKEN, 
> Service: IP_ADDR:16000, Ident: 00 07 73 79 73 74 65 73 74 04 79 61 72 6e 00 
> 8a 01 63 18 c2 c3 d5 8a 01 63 3c cf 47 d5 8e 01 ec 10
> 18/04/30 15:53:29 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/systest/.staging/job_1525128478242_0001
> 18/04/30 15:53:29 WARN security.UserGroupInformation: 
> PriviledgedActionException as:syst...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: 
> Failed to submit application_1525128478242_0001 to YARN : Invalid token 
> service IP_ADDR:16000
> 18/04/30 15:53:29 INFO client.ConnectionManager$HConnectionImplementation: 
> Closing master protocol: MasterService
> 18/04/30 15:53:29 INFO client.ConnectionManager$HConnectionImplementation: 
> Closing zookeeper sessionid=0x1630ba2d0001cb5
> 18/04/30 15:53:29 INFO zookeeper.ZooKeeper: Session: 

[jira] [Assigned] (HADOOP-15408) HADOOP-14445 broke Spark.

2018-05-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reassigned HADOOP-15408:
--

Assignee: Rushabh S Shah

> HADOOP-14445 broke Spark.
> -
>
> Key: HADOOP-15408
> URL: https://issues.apache.org/jira/browse/HADOOP-15408
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Blocker
> Attachments: HADOOP-15408-trunk.001.patch, 
> HADOOP-15408.trunk.poc.patch, split.patch, split.prelim.patch
>
>
> Spark bundles hadoop related jars in their package.
>  Spark expects backwards compatibility between minor versions.
>  Their job failed after we deployed HADOOP-14445 in our test cluster.
> {noformat}
> 2018-04-20 21:09:53,245 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
> 2018-04-20 21:09:53,273 ERROR [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.util.ServiceConfigurationError: 
> org.apache.hadoop.security.token.TokenIdentifier: Provider 
> org.apache.hadoop.crypto.key.kms.KMSDelegationToken$
> KMSLegacyDelegationTokenIdentifier could not be instantiated
> at java.util.ServiceLoader.fail(ServiceLoader.java:232)
> at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
> at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
> at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
> at 
> org.apache.hadoop.security.token.Token.getClassForIdentifier(Token.java:117)
> at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:138)
> at org.apache.hadoop.security.token.Token.identifierToString(Token.java:393)
> at org.apache.hadoop.security.token.Token.toString(Token.java:413)
> at java.lang.String.valueOf(String.java:2994)
> at 
> org.apache.commons.logging.impl.SLF4JLocationAwareLog.info(SLF4JLocationAwareLog.java:155)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1634)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1583)
> Caused by: java.lang.NoSuchFieldError: TOKEN_LEGACY_KIND
> at 
> org.apache.hadoop.crypto.key.kms.KMSDelegationToken$KMSLegacyDelegationTokenIdentifier.(KMSDelegationToken.java:64)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at java.lang.Class.newInstance(Class.java:442)
> at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
> ... 10 more
> 2018-04-20 21:09:53,278 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1
> {noformat}
> Their classpath looks like 
> {{\{...:hadoop-common-pre-HADOOP-14445.jar:.:hadoop-common-with-HADOOP-14445.jar:\}}}
> This is because the container loaded {{KMSDelegationToken}} class from an 
> older jar and {{KMSLegacyDelegationTokenIdentifier}} from new jar and it 
> fails when {{KMSLegacyDelegationTokenIdentifier}} wants to read 
> {{TOKEN_LEGACY_KIND}} from {{KMSDelegationToken}} which doesn't exist before.
>  Cc [~xiaochen]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-05-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Fix Version/s: (was: 3.0.3)
   (was: 2.9.2)
   (was: 3.1.1)
   (was: 2.8.4)
   (was: 2.10.0)
   Status: Open  (was: Patch Available)

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 3.0.0-alpha1, 2.8.0
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, 
> HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, 
> HADOOP-14445.12.patch, HADOOP-14445.13.patch, 
> HADOOP-14445.branch-2.000.precommit.patch, 
> HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, 
> HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, 
> HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, 
> HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, 
> HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, 
> HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, 
> HADOOP-14445.revert.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-05-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466466#comment-16466466
 ] 

Hudson commented on HADOOP-14445:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14133 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14133/])
Revert "HADOOP-14445. Delegation tokens are not shared between KMS (xiao: rev 
a3a1552c33d5650fbd0a702369fccd21b8c9d3e2)
* (delete) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSTokenRenewer.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSDelegationToken.java
* (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* (delete) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestKMSUtil.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
* (delete) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/package-info.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticatedURL.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationHandler.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/KMSUtil.java
* (delete) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestKMSClientProvider.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java
* (edit) 
hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java
* (delete) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSLegacyTokenRenewer.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticator.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer
* (delete) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/KMSUtilFaultInjector.java


> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, 
> HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, 
> HADOOP-14445.12.patch, HADOOP-14445.13.patch, 
> HADOOP-14445.branch-2.000.precommit.patch, 
> HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, 
> HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, 
> HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, 
> HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, 
> HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, 
> HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, 
> HADOOP-14445.revert.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to 

[jira] [Assigned] (HADOOP-15416) s3guard diff assert failure if source path not found

2018-05-07 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota reassigned HADOOP-15416:
---

Assignee: Gabor Bota

> s3guard diff assert failure if source path not found
> 
>
> Key: HADOOP-15416
> URL: https://issues.apache.org/jira/browse/HADOOP-15416
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
> Environment: s3a with fault injection turned on
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Minor
>
> Got an illegal argument exception trying to do a s3guard diff in a test run. 
> Underlying cause: directory in supplied s3a path didn't exist



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-05-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Fix Version/s: (was: 3.2.0)

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, 
> HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, 
> HADOOP-14445.12.patch, HADOOP-14445.13.patch, 
> HADOOP-14445.branch-2.000.precommit.patch, 
> HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, 
> HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, 
> HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, 
> HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, 
> HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, 
> HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, 
> HADOOP-14445.revert.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-05-07 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466221#comment-16466221
 ] 

Xiao Chen edited comment on HADOOP-14445 at 5/7/18 7:54 PM:


Reopening Jira as I'm reverting those changes.

Will remove fix versions as I proceed. Some minor conflicts due to 
HADOOP-14188, HADOOP-15390 and HADOOP-15313. Ran {{mvn clean test -DskipShade  
-Dmaven.javadoc.skip=true  -Dtest=TestKMS*,TestDelegationTokenRenewer}} before 
pushing. Attached a trunk and a branch-2.8 version for reference - 3.x lines 
are similar to trunk, and 2.x lines similar to 2.8.

(HDFS-13430 will also be reverted to accommodate this)


was (Author: xiaochen):
Reopening Jira as I'm reverting those changes.

Will remove fix versions as I proceed. Some minor conflicts due to HADOOP-14188 
and HADOOP-15313, so building repo and running touched tests before I push.

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, 
> HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, 
> HADOOP-14445.12.patch, HADOOP-14445.13.patch, 
> HADOOP-14445.branch-2.000.precommit.patch, 
> HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, 
> HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, 
> HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, 
> HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, 
> HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, 
> HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, 
> HADOOP-14445.revert.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests

2018-05-07 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HADOOP-15420:

Attachment: HADOOP-15420.002.patch

> s3guard ITestS3GuardToolLocal failures in diff tests
> 
>
> Key: HADOOP-15420
> URL: https://issues.apache.org/jira/browse/HADOOP-15420
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15420.001.patch, HADOOP-15420.002.patch
>
>
> Noticed this when testing the patch for HADOOP-13756.
>  
> {code:java}
> [ERROR] Failures:
> [ERROR]   
> ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testPruneCommandCLI:221->AbstractS3GuardToolTestBase.testPruneCommand:201->AbstractS3GuardToolTestBase.assertMetastoreListingCount:214->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88
>  Pruned children count 
> [PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/stale;
>  isDirectory=false; length=100; replication=1; blocksize=512; 
> modification_time=1524798258286; access_time=0; owner=hdfs; group=hdfs; 
> permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
> isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; 
> isDeleted=false}, 
> PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/fresh;
>  isDirectory=false; length=100; replication=1; blocksize=512; 
> modification_time=1524798262583; access_time=0; owner=hdfs; group=hdfs; 
> permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
> isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; 
> isDeleted=false}] expected:<1> but was:<2>{code}
>  
> Looking through the code, I'm noticing a couple of issues.
>  
> 1. {{testDiffCommand()}} is in {{ITestS3GuardToolLocal}}, but it should 
> really be running for all MetadataStore implementations.  Seems like it 
> should live in {{AbstractS3GuardToolTestBase}}.
> 2. {{AbstractS3GuardToolTestBase#createFile()}} seems wrong. When 
> {{onMetadataStore}} is false, it does a {{ContractTestUtils.touch(file)}}, 
> but the fs is initialized with a MetadataStore present, so seem like the fs 
> will still put the file in the MetadataStore?
> There are other tests which explicitly go around the MetadataStore by using 
> {{fs.setMetadataStore(nullMS)}}, e.g. ITestS3AInconsistency. We should do 
> something similar in {{AbstractS3GuardToolTestBase#createFile()}}, minding 
> any issues with parallel test runs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests

2018-05-07 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466387#comment-16466387
 ] 

Gabor Bota commented on HADOOP-15420:
-

Fixed checkstyle issues

> s3guard ITestS3GuardToolLocal failures in diff tests
> 
>
> Key: HADOOP-15420
> URL: https://issues.apache.org/jira/browse/HADOOP-15420
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15420.001.patch, HADOOP-15420.002.patch
>
>
> Noticed this when testing the patch for HADOOP-13756.
>  
> {code:java}
> [ERROR] Failures:
> [ERROR]   
> ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testPruneCommandCLI:221->AbstractS3GuardToolTestBase.testPruneCommand:201->AbstractS3GuardToolTestBase.assertMetastoreListingCount:214->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88
>  Pruned children count 
> [PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/stale;
>  isDirectory=false; length=100; replication=1; blocksize=512; 
> modification_time=1524798258286; access_time=0; owner=hdfs; group=hdfs; 
> permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
> isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; 
> isDeleted=false}, 
> PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/fresh;
>  isDirectory=false; length=100; replication=1; blocksize=512; 
> modification_time=1524798262583; access_time=0; owner=hdfs; group=hdfs; 
> permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
> isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; 
> isDeleted=false}] expected:<1> but was:<2>{code}
>  
> Looking through the code, I'm noticing a couple of issues.
>  
> 1. {{testDiffCommand()}} is in {{ITestS3GuardToolLocal}}, but it should 
> really be running for all MetadataStore implementations.  Seems like it 
> should live in {{AbstractS3GuardToolTestBase}}.
> 2. {{AbstractS3GuardToolTestBase#createFile()}} seems wrong. When 
> {{onMetadataStore}} is false, it does a {{ContractTestUtils.touch(file)}}, 
> but the fs is initialized with a MetadataStore present, so seem like the fs 
> will still put the file in the MetadataStore?
> There are other tests which explicitly go around the MetadataStore by using 
> {{fs.setMetadataStore(nullMS)}}, e.g. ITestS3AInconsistency. We should do 
> something similar in {{AbstractS3GuardToolTestBase#createFile()}}, minding 
> any issues with parallel test runs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15390) Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens

2018-05-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-15390:
---
Fix Version/s: 2.10.0

> Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens
> -
>
> Key: HADOOP-15390
> URL: https://issues.apache.org/jira/browse/HADOOP-15390
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-15390.01.patch, HADOOP-15390.02.patch
>
>
> When looking at a recent issue with [~rkanter] and [~yufeigu], we found that 
> the RM log in a cluster was flooded by KMS token renewal errors below:
> {noformat}
> $ tail -9 hadoop-cmf-yarn-RESOURCEMANAGER.log
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> ...
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> {noformat}
> Further inspection shows the KMS IP is from another cluster. The RM is before 
> HADOOP-14445, so needs to read from config. The config rightfully doesn't 
> have the other cluster's KMS configured.
> Although HADOOP-14445 will make this a non-issue by creating the provider 
> from token service, we should fix 2 things here:
> - KMS token renewer should throw instead of return 0. Returning 0 when not 
> able to renew shall be considered a bug in the renewer.
> - Yarn RM's {{DelegationTokenRenewer}} service should validate the return and 
> not go into this busy loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15390) Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens

2018-05-07 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466369#comment-16466369
 ] 

Xiao Chen commented on HADOOP-15390:


Just found out this was missing from branch-2, cherry-picked there.

> Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens
> -
>
> Key: HADOOP-15390
> URL: https://issues.apache.org/jira/browse/HADOOP-15390
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-15390.01.patch, HADOOP-15390.02.patch
>
>
> When looking at a recent issue with [~rkanter] and [~yufeigu], we found that 
> the RM log in a cluster was flooded by KMS token renewal errors below:
> {noformat}
> $ tail -9 hadoop-cmf-yarn-RESOURCEMANAGER.log
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> ...
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> {noformat}
> Further inspection shows the KMS IP is from another cluster. The RM is before 
> HADOOP-14445, so needs to read from config. The config rightfully doesn't 
> have the other cluster's KMS configured.
> Although HADOOP-14445 will make this a non-issue by creating the provider 
> from token service, we should fix 2 things here:
> - KMS token renewer should throw instead of return 0. Returning 0 when not 
> able to renew shall be considered a bug in the renewer.
> - Yarn RM's {{DelegationTokenRenewer}} service should validate the return and 
> not go into this busy loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation

2018-05-07 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HADOOP-15450:

Target Version/s: 2.8.4, 3.1.1, 2.9.2, 3.0.3
   Fix Version/s: (was: 2.8.4)

> Avoid fsync storm triggered by DiskChecker and handle disk full situation
> -
>
> Key: HADOOP-15450
> URL: https://issues.apache.org/jira/browse/HADOOP-15450
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> Fix disk checker issues reported by [~kihwal] in HADOOP-13738:
> # When space is low, the os returns ENOSPC. Instead simply stop writing, the 
> drive is marked bad and replication happens. This make cluster-wide space 
> problem worse. If the number of "failed" drives exceeds the DFIP limit, the 
> datanode shuts down.
> # There are non-hdfs users of DiskChecker, who use it proactively, not just 
> on failures. This was fine before, but now it incurs heavy I/O due to 
> introduction of fsync() in the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation

2018-05-07 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HADOOP-15450:

Fix Version/s: 2.8.4

> Avoid fsync storm triggered by DiskChecker and handle disk full situation
> -
>
> Key: HADOOP-15450
> URL: https://issues.apache.org/jira/browse/HADOOP-15450
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Arpit Agarwal
>Priority: Blocker
> Fix For: 2.8.4
>
>
> Fix disk checker issues reported by [~kihwal] in HADOOP-13738:
> # When space is low, the os returns ENOSPC. Instead simply stop writing, the 
> drive is marked bad and replication happens. This make cluster-wide space 
> problem worse. If the number of "failed" drives exceeds the DFIP limit, the 
> datanode shuts down.
> # There are non-hdfs users of DiskChecker, who use it proactively, not just 
> on failures. This was fine before, but now it incurs heavy I/O due to 
> introduction of fsync() in the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems

2018-05-07 Thread Lukas Waldmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Waldmann updated HADOOP-1:

Attachment: (was: HADOOP-1.15.patch)

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
>Priority: Major
> Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, 
> HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, 
> HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, 
> HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, 
> HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
>  * Support for HTTP/SOCKS proxies
>  * Support for passive FTP
>  * Support for explicit FTPS (SSL/TLS)
>  * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
>  For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
>  * Caching of directory trees. For ftp you always need to list whole 
> directory whenever you ask information about particular file.
>  Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
>  * Support of keep alive (NOOP) messages to avoid connection drops
>  * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
>  * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often
>  * Support for sftp private keys (including pass phrase)
>  * Support for keeping passwords, private keys and pass phrase in the jceks 
> key stores



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems

2018-05-07 Thread Lukas Waldmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Waldmann updated HADOOP-1:

Attachment: HADOOP-1.15.patch

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
>Priority: Major
> Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, 
> HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, 
> HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, 
> HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, 
> HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
>  * Support for HTTP/SOCKS proxies
>  * Support for passive FTP
>  * Support for explicit FTPS (SSL/TLS)
>  * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
>  For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
>  * Caching of directory trees. For ftp you always need to list whole 
> directory whenever you ask information about particular file.
>  Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
>  * Support of keep alive (NOOP) messages to avoid connection drops
>  * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
>  * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often
>  * Support for sftp private keys (including pass phrase)
>  * Support for keeping passwords, private keys and pass phrase in the jceks 
> key stores



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-05-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Status: Patch Available  (was: Reopened)

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 3.0.0-alpha1, 2.8.0
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, 
> HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, 
> HADOOP-14445.12.patch, HADOOP-14445.13.patch, 
> HADOOP-14445.branch-2.000.precommit.patch, 
> HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, 
> HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, 
> HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, 
> HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, 
> HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, 
> HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, 
> HADOOP-14445.revert.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-05-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Attachment: HADOOP-14445.branch-2.8.revert.patch

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, 
> HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, 
> HADOOP-14445.12.patch, HADOOP-14445.13.patch, 
> HADOOP-14445.branch-2.000.precommit.patch, 
> HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, 
> HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, 
> HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, 
> HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, 
> HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, 
> HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, 
> HADOOP-14445.revert.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-05-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Attachment: HADOOP-14445.revert.patch

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, 
> HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, 
> HADOOP-14445.12.patch, HADOOP-14445.13.patch, 
> HADOOP-14445.branch-2.000.precommit.patch, 
> HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, 
> HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, 
> HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, 
> HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, 
> HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, 
> HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.revert.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation

2018-05-07 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HADOOP-15450:

Priority: Blocker  (was: Major)

> Avoid fsync storm triggered by DiskChecker and handle disk full situation
> -
>
> Key: HADOOP-15450
> URL: https://issues.apache.org/jira/browse/HADOOP-15450
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> Fix disk checker issues reported by [~kihwal] in HADOOP-13738:
> # When space is low, the os returns ENOSPC. Instead simply stop writing, the 
> drive is marked bad and replication happens. This make cluster-wide space 
> problem worse. If the number of "failed" drives exceeds the DFIP limit, the 
> datanode shuts down.
> # There are non-hdfs users of DiskChecker, who use it proactively, not just 
> on failures. This was fine before, but now it incurs heavy I/O due to 
> introduction of fsync() in the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication

2018-05-07 Thread Duo Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466250#comment-16466250
 ] 

Duo Xu commented on HADOOP-15446:
-

[~tmarquardt]  & [~ste...@apache.org]

 

Could we backport this to branch-2? Thanks!

> WASB: PageBlobInputStream.skip breaks HBASE replication
> ---
>
> Key: HADOOP-15446
> URL: https://issues.apache.org/jira/browse/HADOOP-15446
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.0.2
>Reporter: Thomas Marquardt
>Assignee: Thomas Marquardt
>Priority: Major
> Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, 
> HADOOP-15446-003.patch
>
>
> Page Blobs are primarily used by HBASE.  HBASE replication, which apparently 
> has not been used with WASB until recently, performs non-sequential reads on 
> log files using PageBlobInputStream.  There are bugs in this stream 
> implementation which prevent skip and seek from working properly, and 
> eventually the stream state becomes corrupt and unusable.
> I believe this bug affects all releases of WASB/HADOOP.  It appears to be a 
> day-0 bug in PageBlobInputStream.  There were similar bugs opened in the past 
> (HADOOP-15042) but the issue was not properly fixed, and no test coverage was 
> added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-05-07 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reopened HADOOP-14445:


Reopening Jira as I'm reverting those changes.

Will remove fix versions as I proceed. Some minor conflicts due to HADOOP-14188 
and HADOOP-15313, so building repo and running touched tests before I push.

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, 
> HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, 
> HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, 
> HADOOP-14445.12.patch, HADOOP-14445.13.patch, 
> HADOOP-14445.branch-2.000.precommit.patch, 
> HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, 
> HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, 
> HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, 
> HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, 
> HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, 
> HADOOP-14445.branch-2.8.006.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation

2018-05-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HADOOP-15450:
---
Reporter: Kihwal Lee  (was: Arpit Agarwal)

> Avoid fsync storm triggered by DiskChecker and handle disk full situation
> -
>
> Key: HADOOP-15450
> URL: https://issues.apache.org/jira/browse/HADOOP-15450
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Arpit Agarwal
>Priority: Major
>
> Fix disk checker issues reported by [~kihwal] in HADOOP-13738:
> # When space is low, the os returns ENOSPC. Instead simply stop writing, the 
> drive is marked bad and replication happens. This make cluster-wide space 
> problem worse. If the number of "failed" drives exceeds the DFIP limit, the 
> datanode shuts down.
> # There are non-hdfs users of DiskChecker, who use it proactively, not just 
> on failures. This was fine before, but now it incurs heavy I/O due to 
> introduction of fsync() in the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13738) DiskChecker should perform some disk IO

2018-05-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466171#comment-16466171
 ] 

Arpit Agarwal commented on HADOOP-13738:


Filed HADOOP-15450.

> DiskChecker should perform some disk IO
> ---
>
> Key: HADOOP-13738
> URL: https://issues.apache.org/jira/browse/HADOOP-13738
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha2, 2.8.4
>
> Attachments: HADOOP-13738-branch-2.8-06.patch, HADOOP-13738.01.patch, 
> HADOOP-13738.02.patch, HADOOP-13738.03.patch, HADOOP-13738.04.patch, 
> HADOOP-13738.05.patch
>
>
> DiskChecker can fail to detect total disk/controller failures indefinitely. 
> We have seen this in real clusters. DiskChecker performs simple 
> permissions-based checks on directories which do not guarantee that any disk 
> IO will be attempted.
> A simple improvement is to write some data and flush it to the disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation

2018-05-07 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HADOOP-15450:
--

 Summary: Avoid fsync storm triggered by DiskChecker and handle 
disk full situation
 Key: HADOOP-15450
 URL: https://issues.apache.org/jira/browse/HADOOP-15450
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


Fix disk checker issues reported by [~kihwal] in HADOOP-13738:
1. When space is low, the os returns ENOSPC. Instead simply stop writing, the 
drive is marked bad and replication happens. This make cluster-wide space 
problem worse. If the number of "failed" drives exceeds the DFIP limit, the 
datanode shuts down.
1. There are non-hdfs users of DiskChecker, who use it proactively, not just on 
failures. This was fine before, but now it incurs heavy I/O due to introduction 
of fsync() in the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation

2018-05-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HADOOP-15450:
---
Description: 
Fix disk checker issues reported by [~kihwal] in HADOOP-13738:
# When space is low, the os returns ENOSPC. Instead simply stop writing, the 
drive is marked bad and replication happens. This make cluster-wide space 
problem worse. If the number of "failed" drives exceeds the DFIP limit, the 
datanode shuts down.
# There are non-hdfs users of DiskChecker, who use it proactively, not just on 
failures. This was fine before, but now it incurs heavy I/O due to introduction 
of fsync() in the code.

  was:
Fix disk checker issues reported by [~kihwal] in HADOOP-13738:
1. When space is low, the os returns ENOSPC. Instead simply stop writing, the 
drive is marked bad and replication happens. This make cluster-wide space 
problem worse. If the number of "failed" drives exceeds the DFIP limit, the 
datanode shuts down.
1. There are non-hdfs users of DiskChecker, who use it proactively, not just on 
failures. This was fine before, but now it incurs heavy I/O due to introduction 
of fsync() in the code.


> Avoid fsync storm triggered by DiskChecker and handle disk full situation
> -
>
> Key: HADOOP-15450
> URL: https://issues.apache.org/jira/browse/HADOOP-15450
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
>
> Fix disk checker issues reported by [~kihwal] in HADOOP-13738:
> # When space is low, the os returns ENOSPC. Instead simply stop writing, the 
> drive is marked bad and replication happens. This make cluster-wide space 
> problem worse. If the number of "failed" drives exceeds the DFIP limit, the 
> datanode shuts down.
> # There are non-hdfs users of DiskChecker, who use it proactively, not just 
> on failures. This was fine before, but now it incurs heavy I/O due to 
> introduction of fsync() in the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13738) DiskChecker should perform some disk IO

2018-05-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466159#comment-16466159
 ] 

Arpit Agarwal commented on HADOOP-13738:


[~daryn], [~kihwal], we can avoid the fsync storm by using disk IO only for 
HDFS-triggered disk checks. These are already throttled to at most once per 15 
minutes.

The other issue you reported - disk full - can also be handled separately. I'll 
file a follow up Jira and post a patch this week.

> DiskChecker should perform some disk IO
> ---
>
> Key: HADOOP-13738
> URL: https://issues.apache.org/jira/browse/HADOOP-13738
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha2, 2.8.4
>
> Attachments: HADOOP-13738-branch-2.8-06.patch, HADOOP-13738.01.patch, 
> HADOOP-13738.02.patch, HADOOP-13738.03.patch, HADOOP-13738.04.patch, 
> HADOOP-13738.05.patch
>
>
> DiskChecker can fail to detect total disk/controller failures indefinitely. 
> We have seen this in real clusters. DiskChecker performs simple 
> permissions-based checks on directories which do not guarantee that any disk 
> IO will be attempted.
> A simple improvement is to write some data and flush it to the disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466155#comment-16466155
 ] 

genericqa commented on HADOOP-1:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 35 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 33m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 36m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 36m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 45s{color} | {color:orange} root: The patch generated 1 new + 7 unchanged - 
0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
29s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
7s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
13s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 48s{color} 
| {color:red} hadoop-ftp in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | 

[jira] [Commented] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.

2018-05-07 Thread Karthik Palanisamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466119#comment-16466119
 ] 

Karthik Palanisamy commented on HADOOP-15449:
-

[~arpitagarwal]  Yes, it should re-connect. But Zookeeper already expires the 
session because of the timeout (no heartbeat been received from ZK client 
within session timeout).  In this case, Znode lock could have acquired by 
another ZKFC controller which eventually failover to us.

 

> Frequent Namenode Flipover affecting user Jobs.
> ---
>
> Key: HADOOP-15449
> URL: https://issues.apache.org/jira/browse/HADOOP-15449
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Critical
> Attachments: HADOOP-15449.patch
>
>
> We observed from several users regarding Namenode flip-over is due to either 
> zookeeper disk slowness (higher fsync cost) or network issue. We would need 
> to avoid flip-over issue to some extent by increasing HA session timeout, 
> ha.zookeeper.session-timeout.ms.
> Default value is 5000 ms, seems very low in any production environment.  I 
> would suggest 1 ms as default session timeout.
>  
> {code}
> ..
> 2018-05-04 03:54:36,848 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection 
> and attempting reconnect 
> 2018-05-04 03:56:49,088 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection 
> and attempting reconnect
> .. 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15449) ZK performance issues causing frequent Namenode failover

2018-05-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HADOOP-15449:
---
Summary: ZK performance issues causing frequent Namenode failover   (was: 
Frequent Namenode Flipover affecting user Jobs.)

> ZK performance issues causing frequent Namenode failover 
> -
>
> Key: HADOOP-15449
> URL: https://issues.apache.org/jira/browse/HADOOP-15449
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Critical
> Attachments: HADOOP-15449.patch
>
>
> We observed from several users regarding Namenode flip-over is due to either 
> zookeeper disk slowness (higher fsync cost) or network issue. We would need 
> to avoid flip-over issue to some extent by increasing HA session timeout, 
> ha.zookeeper.session-timeout.ms.
> Default value is 5000 ms, seems very low in any production environment.  I 
> would suggest 1 ms as default session timeout.
>  
> {code}
> ..
> 2018-05-04 03:54:36,848 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection 
> and attempting reconnect 
> 2018-05-04 03:56:49,088 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection 
> and attempting reconnect
> .. 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13738) DiskChecker should perform some disk IO

2018-05-07 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466106#comment-16466106
 ] 

Daryn Sharp commented on HADOOP-13738:
--

This jira was not thought out.  It's causing problems on the clusters where 
it's deployed.
# We had a cluster lose 10% of nodes due to this patch.  A few nodes filled up, 
they went dead, and created a domino effect that caused nodes to go dead until 
intervention.
# Jobs may cause severe performance degradation from sync storms during a sort 
phase because local dir allocator calls checkDisk.

The biggest risk is a runaway job may fill disks and cause a cluster to implode.

[~arpitagarwal], do you want me to file another jira for immediate revert?  Or 
do you want to reopen this one?



> DiskChecker should perform some disk IO
> ---
>
> Key: HADOOP-13738
> URL: https://issues.apache.org/jira/browse/HADOOP-13738
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha2, 2.8.4
>
> Attachments: HADOOP-13738-branch-2.8-06.patch, HADOOP-13738.01.patch, 
> HADOOP-13738.02.patch, HADOOP-13738.03.patch, HADOOP-13738.04.patch, 
> HADOOP-13738.05.patch
>
>
> DiskChecker can fail to detect total disk/controller failures indefinitely. 
> We have seen this in real clusters. DiskChecker performs simple 
> permissions-based checks on directories which do not guarantee that any disk 
> IO will be attempted.
> A simple improvement is to write some data and flush it to the disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.

2018-05-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465968#comment-16465968
 ] 

Arpit Agarwal commented on HADOOP-15449:


Thanks for reporting this [~kpalanisamy]. 5 seconds is rather aggressive. +1 
for increasing it to 10.

Another potential issue is why the ZKFCs are triggering failover when they 
reconnect to ZooKeeper. That should also be addressed separately.

> Frequent Namenode Flipover affecting user Jobs.
> ---
>
> Key: HADOOP-15449
> URL: https://issues.apache.org/jira/browse/HADOOP-15449
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Critical
> Attachments: HADOOP-15449.patch
>
>
> We observed from several users regarding Namenode flip-over is due to either 
> zookeeper disk slowness (higher fsync cost) or network issue. We would need 
> to avoid flip-over issue to some extent by increasing HA session timeout, 
> ha.zookeeper.session-timeout.ms.
> Default value is 5000 ms, seems very low in any production environment.  I 
> would suggest 1 ms as default session timeout.
>  
> {code}
> ..
> 2018-05-04 03:54:36,848 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection 
> and attempting reconnect 
> 2018-05-04 03:56:49,088 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection 
> and attempting reconnect
> .. 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.

2018-05-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HADOOP-15449:
---
Status: Patch Available  (was: Open)

> Frequent Namenode Flipover affecting user Jobs.
> ---
>
> Key: HADOOP-15449
> URL: https://issues.apache.org/jira/browse/HADOOP-15449
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Critical
> Attachments: HADOOP-15449.patch
>
>
> We observed from several users regarding Namenode flip-over is due to either 
> zookeeper disk slowness (higher fsync cost) or network issue. We would need 
> to avoid flip-over issue to some extent by increasing HA session timeout, 
> ha.zookeeper.session-timeout.ms.
> Default value is 5000 ms, seems very low in any production environment.  I 
> would suggest 1 ms as default session timeout.
>  
> {code}
> ..
> 2018-05-04 03:54:36,848 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection 
> and attempting reconnect 
> 2018-05-04 03:56:49,088 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection 
> and attempting reconnect
> .. 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.

2018-05-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HADOOP-15449:
--

Assignee: Karthik Palanisamy

> Frequent Namenode Flipover affecting user Jobs.
> ---
>
> Key: HADOOP-15449
> URL: https://issues.apache.org/jira/browse/HADOOP-15449
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Critical
> Attachments: HADOOP-15449.patch
>
>
> We observed from several users regarding Namenode flip-over is due to either 
> zookeeper disk slowness (higher fsync cost) or network issue. We would need 
> to avoid flip-over issue to some extent by increasing HA session timeout, 
> ha.zookeeper.session-timeout.ms.
> Default value is 5000 ms, seems very low in any production environment.  I 
> would suggest 1 ms as default session timeout.
>  
> {code}
> ..
> 2018-05-04 03:54:36,848 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection 
> and attempting reconnect 
> 2018-05-04 03:56:49,088 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection 
> and attempting reconnect
> .. 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-12896) kdiag to add a --DEFAULTREALM option

2018-05-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465877#comment-16465877
 ] 

SammiChen edited comment on HADOOP-12896 at 5/7/18 12:56 PM:
-

Remove the fix version field since it's not fixed actually.



was (Author: sammi):
Remov the fix version field since it's not fixed actually.


> kdiag to add a --DEFAULTREALM option 
> -
>
> Key: HADOOP-12896
> URL: https://issues.apache.org/jira/browse/HADOOP-12896
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> * kdiag to add a --DEFAULTREALM option to say not having a default realm is 
> an error.
> * if this flag is unset, when dumping the credential cache, if there is any 
> entry without a realm, *and there is no default realm*, diagnostics to fail 
> with an error. Hadoop will fail in this situation; kdiag should detect and 
> report



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12896) kdiag to add a --DEFAULTREALM option

2018-05-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465877#comment-16465877
 ] 

SammiChen commented on HADOOP-12896:


Remov the fix version field since it's not fixed actually.


> kdiag to add a --DEFAULTREALM option 
> -
>
> Key: HADOOP-12896
> URL: https://issues.apache.org/jira/browse/HADOOP-12896
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> * kdiag to add a --DEFAULTREALM option to say not having a default realm is 
> an error.
> * if this flag is unset, when dumping the credential cache, if there is any 
> entry without a realm, *and there is no default realm*, diagnostics to fail 
> with an error. Hadoop will fail in this situation; kdiag should detect and 
> report



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-12896) kdiag to add a --DEFAULTREALM option

2018-05-07 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-12896:
---
Fix Version/s: (was: 2.9.1)

> kdiag to add a --DEFAULTREALM option 
> -
>
> Key: HADOOP-12896
> URL: https://issues.apache.org/jira/browse/HADOOP-12896
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> * kdiag to add a --DEFAULTREALM option to say not having a default realm is 
> an error.
> * if this flag is unset, when dumping the credential cache, if there is any 
> entry without a realm, *and there is no default realm*, diagnostics to fail 
> with an error. Hadoop will fail in this situation; kdiag should detect and 
> report



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems

2018-05-07 Thread Lukas Waldmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Waldmann updated HADOOP-1:

Attachment: HADOOP-1.15.patch

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
>Priority: Major
> Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, 
> HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, 
> HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, 
> HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, 
> HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
>  * Support for HTTP/SOCKS proxies
>  * Support for passive FTP
>  * Support for explicit FTPS (SSL/TLS)
>  * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
>  For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
>  * Caching of directory trees. For ftp you always need to list whole 
> directory whenever you ask information about particular file.
>  Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
>  * Support of keep alive (NOOP) messages to avoid connection drops
>  * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
>  * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often
>  * Support for sftp private keys (including pass phrase)
>  * Support for keeping passwords, private keys and pass phrase in the jceks 
> key stores



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems

2018-05-07 Thread Lukas Waldmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Waldmann updated HADOOP-1:

Attachment: (was: HADOOP-1.15.patch)

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
>Priority: Major
> Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, 
> HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, 
> HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, 
> HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, 
> HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
>  * Support for HTTP/SOCKS proxies
>  * Support for passive FTP
>  * Support for explicit FTPS (SSL/TLS)
>  * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
>  For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
>  * Caching of directory trees. For ftp you always need to list whole 
> directory whenever you ask information about particular file.
>  Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
>  * Support of keep alive (NOOP) messages to avoid connection drops
>  * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
>  * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often
>  * Support for sftp private keys (including pass phrase)
>  * Support for keeping passwords, private keys and pass phrase in the jceks 
> key stores



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465825#comment-16465825
 ] 

genericqa commented on HADOOP-1:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 34 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 38m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 31m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 31m 
18s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 24s{color} | {color:orange} root: The patch generated 3 new + 0 unchanged - 
0 fixed = 3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
31s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
26s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 40s{color} 
| {color:red} hadoop-ftp in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 47s{color} 
| {color:red} hadoop-tools in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License 

[jira] [Commented] (HADOOP-15448) Swift auth fails "Expecting to find auth in request body"

2018-05-07 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465812#comment-16465812
 ] 

Steve Loughran commented on HADOOP-15448:
-

* Well, this isn't a support channel, more for filing bugs and fixes. So I'm 
afraid you are going to have to start creating that bug report (with stacks in 
here, versions defined, etc). And you are going to have to turn up the logging 
in the swift code, the httpclient code, etc, to see what's going on at all. I 
don't expect anyone else to put their hand up here, sorry.

Auth and openstack is a really source of pain with that swift module. Every 
endpoint had its own variants of the auth mech, and, for security reasons, 
nothing provides meaningful information. If its your own openstack instance. 
see what gets received, how it matches the expectations. That's what I would 
probably start with.

> Swift auth fails "Expecting to find auth in request body"
> -
>
> Key: HADOOP-15448
> URL: https://issues.apache.org/jira/browse/HADOOP-15448
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/swift
>Reporter: Bhujay Kumar Bhatta
>Priority: Major
>
> tried with   hadoop upstream repo as per this document 
> https://hadoop.apache.org/docs/stable/hadoop-openstack/index.html .  
> Connection fails with malformed request , here is the log 
> http://paste.openstack.org/show/720417/  .  I am out of option now . Kindly 
> help



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15448) Swift auth fails "Expecting to find auth in request body"

2018-05-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15448:

Summary: Swift auth fails "Expecting to find auth in request body"  
(was: Hadoop-8545 Swift Integration)
Component/s: fs/swift

> Swift auth fails "Expecting to find auth in request body"
> -
>
> Key: HADOOP-15448
> URL: https://issues.apache.org/jira/browse/HADOOP-15448
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/swift
>Reporter: Bhujay Kumar Bhatta
>Priority: Major
>
> tried with   hadoop upstream repo as per this document 
> https://hadoop.apache.org/docs/stable/hadoop-openstack/index.html .  
> Connection fails with malformed request , here is the log 
> http://paste.openstack.org/show/720417/  .  I am out of option now . Kindly 
> help



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465802#comment-16465802
 ] 

Gabor Bota edited comment on HADOOP-13649 at 5/7/18 11:23 AM:
--

Mvn test and verify were successful on eu-west-1 with 
fs.s3a.s3guard.test.enabled (_-Ds3guard)._


was (Author: gabor.bota):
Mvn test and verify were successful on eu-west-1 with 
fs.s3a.s3guard.test.enabled _-Ds3guard._

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465802#comment-16465802
 ] 

Gabor Bota commented on HADOOP-13649:
-

Mvn test and verify were successful on eu-west-1 with 
fs.s3a.s3guard.test.enabled _-Ds3guard._

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication

2018-05-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465799#comment-16465799
 ] 

Hudson commented on HADOOP-15446:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14132 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14132/])
HADOOP-15446. WASB: PageBlobInputStream.skip breaks HBASE replication. (stevel: 
rev 5b11b9fd413470e134ecdc7c50468f8c7b39fa50)
* (edit) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/PageBlobInputStream.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/ITestPageBlobInputStream.java


> WASB: PageBlobInputStream.skip breaks HBASE replication
> ---
>
> Key: HADOOP-15446
> URL: https://issues.apache.org/jira/browse/HADOOP-15446
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.0.2
>Reporter: Thomas Marquardt
>Assignee: Thomas Marquardt
>Priority: Major
> Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, 
> HADOOP-15446-003.patch
>
>
> Page Blobs are primarily used by HBASE.  HBASE replication, which apparently 
> has not been used with WASB until recently, performs non-sequential reads on 
> log files using PageBlobInputStream.  There are bugs in this stream 
> implementation which prevent skip and seek from working properly, and 
> eventually the stream state becomes corrupt and unusable.
> I believe this bug affects all releases of WASB/HADOOP.  It appears to be a 
> day-0 bug in PageBlobInputStream.  There were similar bugs opened in the past 
> (HADOOP-15042) but the issue was not properly fixed, and no test coverage was 
> added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication

2018-05-07 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465783#comment-16465783
 ] 

Steve Loughran commented on HADOOP-15446:
-

+1, committed to branch 3.1 & trunk. If you want backporting to branch-2, run 
the tests, tell me how it went, & I'll backport.

I did see one failure in my own test run, I'm assuming unrelated and just a 
function of network distance.
{code}
[ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 26.213 
s <<< FAILURE! - in org.apache.hadoop.fs.azure.TestClientThrottlingAnalyzer
[ERROR] 
testManySuccessAndErrorsAndWaiting(org.apache.hadoop.fs.azure.TestClientThrottlingAnalyzer)
  Time elapsed: 1.123 s  <<< FAILURE!
java.lang.AssertionError: The actual value 9 is not within the expected range: 
[5.60, 8.40].
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.fs.azure.TestClientThrottlingAnalyzer.fuzzyValidate(TestClientThrottlingAnalyzer.java:46)
at 
org.apache.hadoop.fs.azure.TestClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting(TestClientThrottlingAnalyzer.java:168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}


> WASB: PageBlobInputStream.skip breaks HBASE replication
> ---
>
> Key: HADOOP-15446
> URL: https://issues.apache.org/jira/browse/HADOOP-15446
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.0.2
>Reporter: Thomas Marquardt
>Assignee: Thomas Marquardt
>Priority: Major
> Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, 
> HADOOP-15446-003.patch
>
>
> Page Blobs are primarily used by HBASE.  HBASE replication, which apparently 
> has not been used with WASB until recently, performs non-sequential reads on 
> log files using PageBlobInputStream.  There are bugs in this stream 
> implementation which prevent skip and seek from working properly, and 
> eventually the stream state becomes corrupt and unusable.
> I believe this bug affects all releases of WASB/HADOOP.  It appears to be a 
> day-0 bug in PageBlobInputStream.  There were similar bugs opened in the past 
> (HADOOP-15042) but the issue was not properly fixed, and no test coverage was 
> added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465674#comment-16465674
 ] 

genericqa commented on HADOOP-13649:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 1 
new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
35s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HADOOP-13649 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922237/HADOOP-13649.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ab0ad586323f 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 67f239c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14593/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14593/testReport/ |
| Max. process+thread count | 356 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14593/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message 

[jira] [Comment Edited] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465636#comment-16465636
 ] 

Gabor Bota edited comment on HADOOP-13649 at 5/7/18 8:53 AM:
-

Thanks for the review.
 # I've created HADOOP-15423 to merge the two caches into one.
 # .expireAfterWrite() vs .expireAfterAccess()
 ** I think that access could be better in this situation, as long as there's no
modification in the underlying bucket from another client - so no one else is 
modifying the s3 bucket like deleting files while the cache is in use - that 
way we can
say that the cache is up to date.
 ** This store is only used for testing right now, so I can say that's right to 
choose expireAfterAccess.
 # Locking
 ** The com.google.common.cache.LocalCache has locking for write (e.g put, 
replace, remove) but not for simple read (getIfPresent).
 ** LocalMetadataStore has a lock for read too: synchronized (this) in get().
 ** As the merge of the two caches will happen in HADOOP-15423, I think that's 
a topic to discuss further on that issue.
 # Performance testing
 ** I've done some performance testing to compare the cache vs hash performance.
 ** I hope that used sane parameters during the tests.
 ** Based on this, there will be some performance decrease with this 
implementation, but nothing significant with the basic test settings - in my 
tests I've modified (increased) the settings a little. Move() performance 
should improve when merging the caches - it will be interesting to compare 
what's happening after that change.
 ** Test results are in the following gist: 
[https://gist.github.com/bgaborg/2220fd53e553ec971c8edd1adf2493cd] 


was (Author: gabor.bota):
Thanks for the review.
 # I've created HADOOP-15423 to merge the two caches into one.
 # .expireAfterWrite() vs .expireAfterAccess()
 ** I think that access could be better in this situation, as long as there's no
modification in the underlying bucket from another client - so no one else is 
modifying the s3 bucket like deleting files while the cache is in use - that 
way we can
say that the cache is up to date.
 ** This store is only used for testing right now, so I can say that's right to 
choose expireAfterAccess.
 # Locking
 ** The com.google.common.cache.LocalCache has locking for write (e.g put, 
replace, remove) but not for simple read (getIfPresent).
 ** LocalMetadataStore has a lock for read too: synchronized (this) in get().
 ** As the merge of the two caches will happen in HADOOP-15423, I think that's 
a topic to discuss further on that issue.
 # Performance testing
 ** I've done some performance testing to compare the cache vs hash performance.
 ** I hope that used sane parameters during the tests.
 ** Based on this, there will be some performance decrease with this 
implementation, but nothing significant with the basic test settings - in my 
tests I've modified the settings a little bit. Move() performance should 
improve when merging the caches - it will be interesting to compare what's 
happening after that change.
 ** Test results are in the following gist: 
[https://gist.github.com/bgaborg/2220fd53e553ec971c8edd1adf2493cd] 

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465636#comment-16465636
 ] 

Gabor Bota edited comment on HADOOP-13649 at 5/7/18 8:43 AM:
-

Thanks for the review.
 # I've created HADOOP-15423 to merge the two caches into one.
 # .expireAfterWrite() vs .expireAfterAccess()
 ** I think that access could be better in this situation, as long as there's no
modification in the underlying bucket from another client - so no one else is 
modifying the s3 bucket like deleting files while the cache is in use - that 
way we can
say that the cache is up to date.
 ** This store is only used for testing right now, so I can say that's right to 
choose expireAfterAccess.
 # Locking
 ** The com.google.common.cache.LocalCache has locking for write (e.g put, 
replace, remove) but not for simple read (getIfPresent).
 ** LocalMetadataStore has a lock for read too: synchronized (this) in get().
 ** As the merge of the two caches will happen in HADOOP-15423, I think that's 
a topic to discuss further on that issue.
 # Performance testing
 ** I've done some performance testing to compare the cache vs hash performance.
 ** I hope that used sane parameters during the tests.
 ** Based on this, there will be some performance decrease with this 
implementation, but nothing significant with the basic test settings - in my 
tests I've modified the settings a little bit. Move() performance should 
improve when merging the caches - it will be interesting to compare what's 
happening after that change.
 ** Test results are in the following gist: 
[https://gist.github.com/bgaborg/2220fd53e553ec971c8edd1adf2493cd] 


was (Author: gabor.bota):
Thanks for the review.
 # I've created HADOOP-15423 to merge the two caches into one.
 # .expireAfterWrite() vs .expireAfterAccess()
 ** I think that access could be better in this situation, as long as there's no
modification in the underlying bucket from another client - so no one else is 
modifying the s3
bucket like deleting files while the cache is in use - that way we can
say that the cache is up to date.
This store is only used for testing right now, so I can say that's right to 
choose expireAfterAccess.
 # Locking
 ** The com.google.common.cache.LocalCache has locking for write (e.g put, 
replace, remove) but not for simple read (getIfPresent).
 ** LocalMetadataStore has a lock for read too: synchronized (this) in get().
 ** As the merge of the two caches will happen in HADOOP-15423, I think that's 
a topic to discuss further on that issue.
 # Performance testing
 ** I've done some performance testing to compare the cache vs hash performance.
 ** I hope that used sane parameters during the tests.
 ** Based on this, there will be some performance decrease with this 
implementation, but nothing significant with the basic test settings - in my 
tests I've modified the settings a little bit. Move() performance should 
improve when merging the caches - it will be interesting to compare what's 
happening after that change.
 ** Test results are in the following gist: 
[https://gist.github.com/bgaborg/2220fd53e553ec971c8edd1adf2493cd] 

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465636#comment-16465636
 ] 

Gabor Bota commented on HADOOP-13649:
-

Thanks for the review.
 # I've created HADOOP-15423 to merge the two caches into one.
 # .expireAfterWrite() vs .expireAfterAccess()
 ** I think that access could be better in this situation, as long as there's no
modification in the underlying bucket from another client - so no one else is 
modifying the s3
bucket like deleting files while the cache is in use - that way we can
say that the cache is up to date.
This store is only used for testing right now, so I can say that's right to 
choose expireAfterAccess.
 # Locking
 ** The com.google.common.cache.LocalCache has locking for write (e.g put, 
replace, remove) but not for simple read (getIfPresent).
 ** LocalMetadataStore has a lock for read too: synchronized (this) in get().
 ** As the merge of the two caches will happen in HADOOP-15423, I think that's 
a topic to discuss further on that issue.
 # Performance testing
 ** I've done some performance testing to compare the cache vs hash performance.
 ** I hope that used sane parameters during the tests.
 ** Based on this, there will be some performance decrease with this 
implementation, but nothing significant with the basic test settings - in my 
tests I've modified the settings a little bit. Move() performance should 
improve when merging the caches - it will be interesting to compare what's 
happening after that change.
 ** Test results are in the following gist: 
[https://gist.github.com/bgaborg/2220fd53e553ec971c8edd1adf2493cd] 

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore

2018-05-07 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HADOOP-13649:

Attachment: HADOOP-13649.002.patch

> s3guard: implement time-based (TTL) expiry for LocalMetadataStore
> -
>
> Key: HADOOP-13649
> URL: https://issues.apache.org/jira/browse/HADOOP-13649
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch
>
>
> LocalMetadataStore is primarily a reference implementation for testing.  It 
> may be useful in narrow circumstances where the workload can tolerate 
> short-term lack of inter-node consistency:  Being in-memory, one JVM/node's 
> LocalMetadataStore will not see another node's changes to the underlying 
> filesystem.
> To put a bound on the time during which this inconsistency may occur, we 
> should implement time-based (a.k.a. Time To Live / TTL)  expiration for 
> LocalMetadataStore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems

2018-05-07 Thread Lukas Waldmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Waldmann updated HADOOP-1:

Attachment: (was: HADOOP-1.15.patch)

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
>Priority: Major
> Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, 
> HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, 
> HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, 
> HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, 
> HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
>  * Support for HTTP/SOCKS proxies
>  * Support for passive FTP
>  * Support for explicit FTPS (SSL/TLS)
>  * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
>  For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
>  * Caching of directory trees. For ftp you always need to list whole 
> directory whenever you ask information about particular file.
>  Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
>  * Support of keep alive (NOOP) messages to avoid connection drops
>  * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
>  * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often
>  * Support for sftp private keys (including pass phrase)
>  * Support for keeping passwords, private keys and pass phrase in the jceks 
> key stores



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems

2018-05-07 Thread Lukas Waldmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Waldmann updated HADOOP-1:

Attachment: HADOOP-1.15.patch

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
>Priority: Major
> Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, 
> HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, 
> HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, 
> HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, 
> HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
>  * Support for HTTP/SOCKS proxies
>  * Support for passive FTP
>  * Support for explicit FTPS (SSL/TLS)
>  * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
>  For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
>  * Caching of directory trees. For ftp you always need to list whole 
> directory whenever you ask information about particular file.
>  Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
>  * Support of keep alive (NOOP) messages to avoid connection drops
>  * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
>  * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often
>  * Support for sftp private keys (including pass phrase)
>  * Support for keeping passwords, private keys and pass phrase in the jceks 
> key stores



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.

2018-05-07 Thread Karthik Palanisamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Palanisamy updated HADOOP-15449:

Description: 
We observed from several users regarding Namenode flip-over is due to either 
zookeeper disk slowness (higher fsync cost) or network issue. We would need to 
avoid flip-over issue to some extent by increasing HA session timeout, 
ha.zookeeper.session-timeout.ms.

Default value is 5000 ms, seems very low in any production environment.  I 
would suggest 1 ms as default session timeout.

 

{code}

..

2018-05-04 03:54:36,848 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1140)) 
- Client session timed out, have not heard from server in 4689ms for sessionid 
0x260e24bac500aa3, closing socket connection and attempting reconnect 
2018-05-04 03:56:49,088 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(1140)) 
- Client session timed out, have not heard from server in 3981ms for sessionid 
0x360fd152b8700fe, closing socket connection and attempting reconnect

.. 

{code}

  was:
We observed from several users regarding Namenode flip-over is due to either 
zookeeper disk slowness (higher fsync cost) or network issue. We would need to 
avoid flip-over issue to some extent by increasing HA session timeout, 
ha.zookeeper.session-timeout.ms.

Default value is 5000 ms, seems very low in any production environment.  I 
would suggest 1 ms as default session timeout.


> Frequent Namenode Flipover affecting user Jobs.
> ---
>
> Key: HADOOP-15449
> URL: https://issues.apache.org/jira/browse/HADOOP-15449
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Karthik Palanisamy
>Priority: Critical
> Attachments: HADOOP-15449.patch
>
>
> We observed from several users regarding Namenode flip-over is due to either 
> zookeeper disk slowness (higher fsync cost) or network issue. We would need 
> to avoid flip-over issue to some extent by increasing HA session timeout, 
> ha.zookeeper.session-timeout.ms.
> Default value is 5000 ms, seems very low in any production environment.  I 
> would suggest 1 ms as default session timeout.
>  
> {code}
> ..
> 2018-05-04 03:54:36,848 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection 
> and attempting reconnect 
> 2018-05-04 03:56:49,088 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from 
> server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection 
> and attempting reconnect
> .. 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.

2018-05-07 Thread Karthik Palanisamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465524#comment-16465524
 ] 

Karthik Palanisamy commented on HADOOP-15449:
-

Cc: [~arpitagarwal] [~anu]

> Frequent Namenode Flipover affecting user Jobs.
> ---
>
> Key: HADOOP-15449
> URL: https://issues.apache.org/jira/browse/HADOOP-15449
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Karthik Palanisamy
>Priority: Critical
> Attachments: HADOOP-15449.patch
>
>
> We observed from several users regarding Namenode flip-over is due to either 
> zookeeper disk slowness (higher fsync cost) or network issue. We would need 
> to avoid flip-over issue to some extent by increasing HA session timeout, 
> ha.zookeeper.session-timeout.ms.
> Default value 5000 ms, seems very low in any production environment.  I would 
> suggest 1 ms as default session timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.

2018-05-07 Thread Karthik Palanisamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Palanisamy updated HADOOP-15449:

Description: 
We observed from several users regarding Namenode flip-over is due to either 
zookeeper disk slowness (higher fsync cost) or network issue. We would need to 
avoid flip-over issue to some extent by increasing HA session timeout, 
ha.zookeeper.session-timeout.ms.

Default value is 5000 ms, seems very low in any production environment.  I 
would suggest 1 ms as default session timeout.

  was:
We observed from several users regarding Namenode flip-over is due to either 
zookeeper disk slowness (higher fsync cost) or network issue. We would need to 
avoid flip-over issue to some extent by increasing HA session timeout, 
ha.zookeeper.session-timeout.ms.

Default value 5000 ms, seems very low in any production environment.  I would 
suggest 1 ms as default session timeout.


> Frequent Namenode Flipover affecting user Jobs.
> ---
>
> Key: HADOOP-15449
> URL: https://issues.apache.org/jira/browse/HADOOP-15449
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Karthik Palanisamy
>Priority: Critical
> Attachments: HADOOP-15449.patch
>
>
> We observed from several users regarding Namenode flip-over is due to either 
> zookeeper disk slowness (higher fsync cost) or network issue. We would need 
> to avoid flip-over issue to some extent by increasing HA session timeout, 
> ha.zookeeper.session-timeout.ms.
> Default value is 5000 ms, seems very low in any production environment.  I 
> would suggest 1 ms as default session timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.

2018-05-07 Thread Karthik Palanisamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Palanisamy updated HADOOP-15449:

Attachment: HADOOP-15449.patch

> Frequent Namenode Flipover affecting user Jobs.
> ---
>
> Key: HADOOP-15449
> URL: https://issues.apache.org/jira/browse/HADOOP-15449
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Karthik Palanisamy
>Priority: Critical
> Attachments: HADOOP-15449.patch
>
>
> We observed from several users regarding Namenode flip-over is due to either 
> zookeeper disk slowness (higher fsync cost) or network issue. We would need 
> to avoid flip-over issue to some extent by increasing HA session timeout, 
> ha.zookeeper.session-timeout.ms.
> Default value 5000 ms, seems very low in any production environment.  I would 
> suggest 1 ms as default session timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.

2018-05-07 Thread Karthik Palanisamy (JIRA)
Karthik Palanisamy created HADOOP-15449:
---

 Summary: Frequent Namenode Flipover affecting user Jobs.
 Key: HADOOP-15449
 URL: https://issues.apache.org/jira/browse/HADOOP-15449
 Project: Hadoop Common
  Issue Type: Wish
  Components: common
Affects Versions: 2.7.4
Reporter: Karthik Palanisamy


We observed from several users regarding Namenode flip-over is due to either 
zookeeper disk slowness (higher fsync cost) or network issue. We would need to 
avoid flip-over issue to some extent by increasing HA session timeout, 
ha.zookeeper.session-timeout.ms.

Default value 5000 ms, seems very low in any production environment.  I would 
suggest 1 ms as default session timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org