[jira] [Commented] (HDFS-14967) TestWebHDFS - Many test cases are failing in Windows

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973088#comment-16973088
 ] 

Hadoop QA commented on HDFS-14967:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 57s{color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs generated 2 new + 578 unchanged - 
2 fixed = 580 total (was 580) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 40s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 4 new + 29 unchanged - 19 fixed = 33 total (was 48) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 21s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 5s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14967 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985678/HDFS-14967.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fa65fa4d31e1 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / df6b316 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28305/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28305/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973087#comment-16973087
 ] 

Hadoop QA commented on HDFS-14442:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 46s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
54s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHDFSAcl |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.TestMultipleNNPortQOP |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.TestClientProtocolForPipelineRecovery |
|   | hadoop.hdfs.server.balancer.TestBalancer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14442 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985677/HDFS-14442.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux be0ef97105b3 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / df6b316 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28303/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28303/testReport/ 

[jira] [Created] (HDDS-2466) Split OM Key into a Prefix Part and a Name Part

2019-11-12 Thread Supratim Deka (Jira)
Supratim Deka created HDDS-2466:
---

 Summary: Split OM Key into a Prefix Part and a Name Part
 Key: HDDS-2466
 URL: https://issues.apache.org/jira/browse/HDDS-2466
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Manager
Reporter: Supratim Deka
Assignee: Supratim Deka


OM stores every key in a key table, which maps the key to a KeyInfo.

If we split the key into a prefix and a name part which are then stored in 
separate tables, serves 2 purposes:
1. OzoneFS operations can be made efficient by deriving a prefix tree 
representation of the pathnames(prefixes) - details of this are outside the 
current scope. Also, the prefix table can get preferential treatment when it 
comes to caching.
2. PutKey is not penalised by having to parse the key into each path component 
- this is for cases where the dataset is a pure object store. Splitting into a 
prefix and a name is the minimal work to be done inline during the putKey 
operation.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973050#comment-16973050
 ] 

Hadoop QA commented on HDFS-14283:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 59s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
68 unchanged - 0 fixed = 69 total (was 68) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
55s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m  0s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}175m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestFileChecksumCompositeCrc |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.TestMultipleNNPortQOP |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14283 |
| JIRA Patch URL | 

[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973039#comment-16973039
 ] 

Hadoop QA commented on HDFS-14648:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
57s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} hadoop-hdfs-project: The patch generated 0 new + 112 
unchanged - 1 fixed = 112 total (was 113) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
1s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}115m 48s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}199m 32s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14648 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985668/HDFS-14648.010.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 22f81004446f 4.15.0-66-generic #75-Ubuntu SMP Tue 

[jira] [Updated] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option

2019-11-12 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-14983:
-
Description: NameNode can update proxyuser config by 
-refreshSuperUserGroupsConfiguration without restarting but DFSRouter cannot. 
It would be better for DFSRouter to have such functionality to be compatible 
with NameNode.  (was: NameNode can update proxyuser config by 
-refreshSuperUserGroupsConfiguration without restarting but DFSRouter cannot.)

> RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
> ---
>
> Key: HDFS-14983
> URL: https://issues.apache.org/jira/browse/HDFS-14983
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Akira Ajisaka
>Priority: Minor
>
> NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration 
> without restarting but DFSRouter cannot. It would be better for DFSRouter to 
> have such functionality to be compatible with NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option

2019-11-12 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973036#comment-16973036
 ] 

Akira Ajisaka commented on HDFS-14983:
--

The cost of restarting DFSRouter is much lower than restarting NameNode, so I'm 
thinking the priority is not high (Just restarting DR is okay).

> RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
> ---
>
> Key: HDFS-14983
> URL: https://issues.apache.org/jira/browse/HDFS-14983
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Akira Ajisaka
>Priority: Minor
>
> NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration 
> without restarting but DFSRouter cannot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14612) SlowDiskReport won't update when SlowDisks is always empty in heartbeat

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973035#comment-16973035
 ] 

Hadoop QA commented on HDFS-14612:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
4s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}117m  2s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}184m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14612 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985669/HDFS-14612-005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0a23c6e0b1c9 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / df6b316 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28300/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28300/testReport/ |
| Max. process+thread count | 3281 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28300/console |

[jira] [Created] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option

2019-11-12 Thread Akira Ajisaka (Jira)
Akira Ajisaka created HDFS-14983:


 Summary: RBF: Add dfsrouteradmin 
-refreshSuperUserGroupsConfiguration command option
 Key: HDFS-14983
 URL: https://issues.apache.org/jira/browse/HDFS-14983
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: rbf
Reporter: Akira Ajisaka


NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration 
without restarting but DFSRouter cannot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14980) diskbalancer query command always tries to contact to port 9867

2019-11-12 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle reassigned HDFS-14980:
--

Assignee: Siddharth Wagle

> diskbalancer query command always tries to contact to port 9867
> ---
>
> Key: HDFS-14980
> URL: https://issues.apache.org/jira/browse/HDFS-14980
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Nilotpal Nandi
>Assignee: Siddharth Wagle
>Priority: Major
>
> disbalancer query commands always tries to connect to port 9867 even when 
> datanode IPC port is different.
> In this setup , datanode IPC port is set to 20001.
>  
> diskbalancer report command works fine and connects to IPC port 20001
>  
> {noformat}
> hdfs diskbalancer -report -node 172.27.131.193
> 19/11/12 08:58:55 INFO command.Command: Processing report command
> 19/11/12 08:58:57 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 19/11/12 08:58:57 INFO block.BlockTokenSecretManager: Setting block keys
> 19/11/12 08:58:57 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> 19/11/12 08:58:58 INFO command.Command: Reporting volume information for 
> DataNode(s). These DataNode(s) are parsed from '172.27.131.193'.
> Processing report command
> Reporting volume information for DataNode(s). These DataNode(s) are parsed 
> from '172.27.131.193'.
> [172.27.131.193:20001] - : 3 
> volumes with node data density 0.05.
> [DISK: volume-/dataroot/ycloud/dfs/NEW_DISK1/] - 0.15 used: 
> 39343871181/259692498944, 0.85 free: 220348627763/259692498944, isFailed: 
> False, isReadOnly: False, isSkip: False, isTransient: False.
> [DISK: volume-/dataroot/ycloud/dfs/NEW_DISK2/] - 0.15 used: 
> 39371179986/259692498944, 0.85 free: 220321318958/259692498944, isFailed: 
> False, isReadOnly: False, isSkip: False, isTransient: False.
> [DISK: volume-/dataroot/ycloud/dfs/dn/] - 0.19 used: 
> 49934903670/259692498944, 0.81 free: 209757595274/259692498944, isFailed: 
> False, isReadOnly: False, isSkip: False, isTransient: False.
>  
> {noformat}
>  
> But  diskbalancer query command fails and tries to connect to port 9867 
> (default port).
>  
> {noformat}
> hdfs diskbalancer -query 172.27.131.193
> 19/11/12 06:37:15 INFO command.Command: Executing "query plan" command.
> 19/11/12 06:37:16 INFO ipc.Client: Retrying connect to server: 
> /172.27.131.193:9867. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 19/11/12 06:37:17 INFO ipc.Client: Retrying connect to server: 
> /172.27.131.193:9867. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> ..
> ..
> ..
> 19/11/12 06:37:25 ERROR tools.DiskBalancerCLI: Exception thrown while running 
> DiskBalancerCLI.
> {noformat}
>  
>  
> Expectation :
> diskbalancer query command should work fine without explicitly mentioning 
> datanode IPC port address



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973016#comment-16973016
 ] 

Lisheng Sun commented on HDFS-14648:


i updated the patch and uploaded the v011 patch. Could you mind continue to 
review it? Thank you. [~linyiqun]

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch, HDFS-14648.011.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14648:
---
Attachment: HDFS-14648.011.patch

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch, HDFS-14648.011.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14967) TestWebHDFS - Many test cases are failing in Windows

2019-11-12 Thread Renukaprasad C (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renukaprasad C updated HDFS-14967:
--
Attachment: HDFS-14967.002.patch

> TestWebHDFS - Many test cases are failing in Windows 
> -
>
> Key: HDFS-14967
> URL: https://issues.apache.org/jira/browse/HDFS-14967
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
> Attachments: HDFS-14967.001.patch, HDFS-14967.002.patch
>
>
> In TestWebHDFS test class, few test cases are not closing the MiniDFSCluster, 
> which results in remaining test failures in Windows. Once cluster status is 
> open, all consecutive test cases fail to get the lock on Data dir which 
> results  in test case failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Ravuri Sushma sree (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14442:
--
Attachment: HDFS-14442.003.patch

> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, 
> HDFS-14442.003.patch
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972994#comment-16972994
 ] 

Yiqun Lin commented on HDFS-14648:
--

[~leosun08], sorry for the confused, you are right. Please remove this change 
in DFSStripedInputStream and address other comments. Thanks.

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972990#comment-16972990
 ] 

Lisheng Sun edited comment on HDFS-14648 at 11/13/19 3:52 AM:
--

hi [~linyiqun]
{quote}
DFSInputStream.java
I haven't seen the call dfsClient.addNodeToDeadNodeDetector added in method 
createBlockReader under this class
{quote}
i do not find the method DFSInputStream#createBlockReader. createBlockReader 
should be in DFSStripedInputStream.
please correct me if i was wrong.Thank you.


was (Author: leosun08):
hi [~linyiqun]
{quote}
DFSInputStream.java
I haven't seen the call dfsClient.addNodeToDeadNodeDetector added in method 
createBlockReader under this class
{quote}
i do not find the method DFSInputStream#createBlockReader. createBlockReader 
should be in DFSStripedInputStream.

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972990#comment-16972990
 ] 

Lisheng Sun commented on HDFS-14648:


hi [~linyiqun]
{quote}
DFSInputStream.java
I haven't seen the call dfsClient.addNodeToDeadNodeDetector added in method 
createBlockReader under this class
{quote}
i do not find the method DFSInputStream#createBlockReader. createBlockReader 
should be in DFSStripedInputStream.

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972980#comment-16972980
 ] 

Yiqun Lin commented on HDFS-14648:
--

The latest patch looks great, some more comments:

*ClientContext.java*
 We need a method to stop dead node detector thread and called this in 
DFSClient#close.
{code:java}
  /**
   * Close dead node detector thread.
   */
  public void stopDeadNodeDetectorThread() {
  if (deadNodeDetectorThr != null) {
  deadNodeDetectorThr.interrupt();
  try {
  deadNodeDetectorThr.join(3000);
  } catch (InterruptedException e) {
  LOG.warn("Encountered exception while waiting to join on dead 
node detector thread.", e);
  }
}
  }

.
  public synchronized void close() throws IOException {
if(clientRunning) {
  ...
  // close dead node detector thread
  clientContext.stopDeadNodeDetectorThread();
}
  }
{code}

 *DFSInputStream.java*
 I haven't seen the call {{dfsClient.addNodeToDeadNodeDetector}} added in 
method {{createBlockReader}} under this class.

 *DFSStripedInputStream.java*
 Can we remove dfsClient.addNodeToDeadNodeDetector in this class? It's not 
expected enable dead node detection in the EC mode.
{code:java}
   fetchBlockAt(block.getStartOffset());
-  addToDeadNodes(dnInfo.info);
+  addToLocalDeadNodes(dnInfo.info);
+  dfsClient.addNodeToDeadNodeDetector(this, dnInfo.info);   <=== be 
removed
 }
{code}

Can we also fix this whitespace warning?
{noformat}
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDeadNodeDetection.java:113:
  public void testDeadNodeDetectionInMultipleDFSInputStream() 
{noformat}
Others looks good to me now.
  

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972968#comment-16972968
 ] 

Lisheng Sun edited comment on HDFS-14283 at 11/13/19 3:05 AM:
--

Thanks for [~smeng] [~weichiu] [~ayushtkn] for good suggestions.

i updated the patch and uploaded the v005 patch. Could you mind review it? 
Thank you a lot. [~weichiu][~ayushtkn] [~smeng]


was (Author: leosun08):
Thanks for [~smeng] [~weichiu] for good suggestions.

i updated the patch and uploaded the v005 patch. Could you mind review it? 
Thank you a lot. [~weichiu] [~smeng]

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch, HDFS-14283.005.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972968#comment-16972968
 ] 

Lisheng Sun commented on HDFS-14283:


Thanks for [~smeng] [~weichiu] for good suggestions.

i updated the patch and uploaded the v005 patch. Could you mind review it? 
Thank you a lot. [~weichiu] [~smeng]

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch, HDFS-14283.005.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14283:
---
Attachment: HDFS-14283.005.patch

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch, HDFS-14283.005.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-12 Thread Xudong Cao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972962#comment-16972962
 ] 

Xudong Cao edited comment on HDFS-14969 at 11/13/19 2:34 AM:
-

cc [~xkrogen] [~vagarychen]  [~shv] [~weichiu] I feel it's not good to remove 
the entire log. The more appropriate way is to update the logic to be aware of 
how many NNs are configured. We may need to add a new method to the 
FailoverProxyProvider interface such as getProxiesCount() and implement it in 
all subclasses. Then We can compare the current failover count and the total 
number of NNs in RetryInvocationHandler to determine whether to print the 
failover log. What do you think?

However, after the HDFS-14963 is merged in the future, I feel that this problem 
will be greatly alleviated.


was (Author: xudongcao):
cc [~xkrogen] [~vagarychen]  [~shv] [~weichiu] I feel it's not good to remove 
the entire log. The more appropriate way is to update the logic to be aware of 
how many NNs are configured. We may need to add a new method to the 
FailoverProxyProvider interface such as getProxiesCount() , and then implement 
it in all subclasses. What do you think?

However, after the HDFS-14963 is merged in the future, I feel that this problem 
will be greatly alleviated.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14612) SlowDiskReport won't update when SlowDisks is always empty in heartbeat

2019-11-12 Thread Haibin Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibin Huang updated HDFS-14612:

Attachment: HDFS-14612-005.patch

> SlowDiskReport won't update when SlowDisks is always empty in heartbeat
> ---
>
> Key: HDFS-14612
> URL: https://issues.apache.org/jira/browse/HDFS-14612
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haibin Huang
>Assignee: Haibin Huang
>Priority: Major
> Attachments: HDFS-14612-001.patch, HDFS-14612-002.patch, 
> HDFS-14612-003.patch, HDFS-14612-004.patch, HDFS-14612-005.patch, 
> HDFS-14612.patch
>
>
> I found SlowDiskReport won't update when slowDisks is always empty in 
> org.apache.hadoop.hdfs.server.blockmanagement.*handleHeartbeat*, this may 
> lead to outdated SlowDiskReport alway staying in jmx of namenode until next 
> time slowDisks isn't empty. So i think this method 
> *checkAndUpdateReportIfNecessary()* should be called firstly when we want to 
> get the jmx information about SlowDiskReport, this can keep the 
> SlowDiskReport on jmx is alway valid.
>  
> There is also some incorrect object reference on 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.
> *DataNodeVolumeMetrics*
> {code:java}
> // Based on writeIoRate
> public long getWriteIoSampleCount() {
>   return syncIoRate.lastStat().numSamples();
> }
> public double getWriteIoMean() {
>   return syncIoRate.lastStat().mean();
> }
> public double getWriteIoStdDev() {
>   return syncIoRate.lastStat().stddev();
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-12 Thread Xudong Cao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972962#comment-16972962
 ] 

Xudong Cao commented on HDFS-14969:
---

cc [~xkrogen] [~vagarychen]  [~shv] [~weichiu] I feel it's not good to remove 
the entire log. The more appropriate way is to update the logic to be aware of 
how many NNs are configured. We may need to add a new method to the 
FailoverProxyProvider interface such as getProxiesCount() , and then implement 
it in all subclasses. What do you think?

However, after the HDFS-14963 is merged in the future, I feel that this problem 
will be greatly alleviated.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14648:
---
Attachment: HDFS-14648.010.patch

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2465) S3 Multipart upload failing

2019-11-12 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972951#comment-16972951
 ] 

Bharat Viswanadham commented on HDDS-2465:
--

cc [~elek]

> S3 Multipart upload failing
> ---
>
> Key: HDDS-2465
> URL: https://issues.apache.org/jira/browse/HDDS-2465
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Priority: Major
> Attachments: MPU.java
>
>
> When I run attached java program, facing below error, during 
> completeMultipartUpload.
> {code:java}
> ERROR StatusLogger No Log4j 2 configuration file found. Using default 
> configuration (logging only errors to the console), or user programmatically 
> provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
> internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2ERROR StatusLogger No Log4j 2 
> configuration file found. Using default configuration (logging only errors to 
> the console), or user programmatically provided configurations. Set system 
> property 'log4j2.debug' to show Log4j 2 internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2Exception in thread "main" 
> com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: 
> Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 
> c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), 
> S3 Extended Request ID: 7tnVbqgc4bgb at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at 
> com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464)
>  at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code}
> When I debug it is not the request is not been received by S3Gateway, and I 
> don't see any trace of this in audit log.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2465) S3 Multipart upload failing

2019-11-12 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-2465:
-
Attachment: MPU.java

> S3 Multipart upload failing
> ---
>
> Key: HDDS-2465
> URL: https://issues.apache.org/jira/browse/HDDS-2465
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Priority: Major
> Attachments: MPU.java
>
>
> When I run attached java program, facing below error, during 
> completeMultipartUpload.
> {code:java}
> ERROR StatusLogger No Log4j 2 configuration file found. Using default 
> configuration (logging only errors to the console), or user programmatically 
> provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
> internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2ERROR StatusLogger No Log4j 2 
> configuration file found. Using default configuration (logging only errors to 
> the console), or user programmatically provided configurations. Set system 
> property 'log4j2.debug' to show Log4j 2 internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2Exception in thread "main" 
> com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: 
> Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 
> c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), 
> S3 Extended Request ID: 7tnVbqgc4bgb at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at 
> com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464)
>  at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code}
> When I debug it is not the request is not been received by S3Gateway, and I 
> don't see any trace of this in audit log.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2465) S3 Multipart upload failing

2019-11-12 Thread Bharat Viswanadham (Jira)
Bharat Viswanadham created HDDS-2465:


 Summary: S3 Multipart upload failing
 Key: HDDS-2465
 URL: https://issues.apache.org/jira/browse/HDDS-2465
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Bharat Viswanadham


When I run attached java program, facing below error, during 
completeMultipartUpload.
{code:java}
ERROR StatusLogger No Log4j 2 configuration file found. Using default 
configuration (logging only errors to the console), or user programmatically 
provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
internal initialization logging. See 
https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions 
on how to configure Log4j 2ERROR StatusLogger No Log4j 2 configuration file 
found. Using default configuration (logging only errors to the console), or 
user programmatically provided configurations. Set system property 
'log4j2.debug' to show Log4j 2 internal initialization logging. See 
https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions 
on how to configure Log4j 2Exception in thread "main" 
com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon 
S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 
c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), S3 
Extended Request ID: 7tnVbqgc4bgb at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
 at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at 
com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464)
 at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code}
When I debug it is not the request is not been received by S3Gateway, and I 
don't see any trace of this in audit log.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:22 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:21 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      

[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Siyao Meng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972895#comment-16972895
 ] 

Siyao Meng commented on HDFS-14283:
---

[~leosun08] I discussed with [~weichiu]. I'm fine with ditching the sorting 
logic on the server side so that we don't need to make any server side changed 
in this patch. One reason is that in most cases there will only be one cached 
replica for a block.

We will simply allow the client to prefer the cached replica with a 
configuration option then.

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Siyao Meng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972895#comment-16972895
 ] 

Siyao Meng edited comment on HDFS-14283 at 11/12/19 11:20 PM:
--

[~leosun08] I discussed with [~weichiu]. I'm fine with ditching the sorting 
logic on the server side so that we don't need to make any server side changes 
in this patch. One reason is that in most cases there will only be one cached 
replica for a block.

We will simply allow the client to prefer the cached replica with a 
configuration option then.


was (Author: smeng):
[~leosun08] I discussed with [~weichiu]. I'm fine with ditching the sorting 
logic on the server side so that we don't need to make any server side changed 
in this patch. One reason is that in most cases there will only be one cached 
replica for a block.

We will simply allow the client to prefer the cached replica with a 
configuration option then.

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:17 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      

[jira] [Resolved] (HDFS-14792) [SBN read] StanbyNode does not come out of safemode while adding new blocks.

2019-11-12 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14792.

Fix Version/s: 2.10.1
   Resolution: Fixed

This turned out to be related to the same race condition between edits 
{{OP_ADD_BLOCK}} and IBRs of HDFS-14941. We do not see any delays in leaving 
safemode on StandbyNode after the HDFS-14941 fix.
Closing this as fixed.

> [SBN read] StanbyNode does not come out of safemode while adding new blocks.
> 
>
> Key: HDFS-14792
> URL: https://issues.apache.org/jira/browse/HDFS-14792
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
> Fix For: 2.10.1
>
>
> During startup StandbyNode reports that it needs additional X blocks to reach 
> the threshold 1.. Where X is changing up and down.
> This is because with fast tailing SBN adds new blocks from edits while DNs 
> have not reported replicas yet. Being in SafeMode SBN counts new blocks 
> towards the threshold and can stay in SafeMode for a long time.
> By design, the purpose of startup SafeMode is to disallow modifications of 
> the namespace and blocks map until all DN replicas are reported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:08 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD

{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
   5912 ]}

| ret=FAILURE | INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: 
Complete Multipart Upload Failed: volume: 
s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278

2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD

{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []}

| ret=SUCCESS |

 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      

[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972874#comment-16972874
 ] 

Bharat Viswanadham commented on HDDS-2356:
--

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, 
bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, 
replicationType=RATIS, replicationFactor=ONE, keyLocationIn       fo=[], 
multipartList=[partNumber: 1

  5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"

  5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"

 

 

.

.

  5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"

  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278

2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, 
bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, 
replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=       []} | 
ret=SUCCESS |

 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber. 
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below.

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    networkLocation: "/default-rack"
  }
  state: PIPELINE_OPEN
  type: RATIS
  factor: THREE
  id {
    

[jira] [Commented] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972872#comment-16972872
 ] 

Hadoop QA commented on HDFS-14528:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 36s{color} | {color:orange} root: The patch generated 3 new + 36 unchanged - 
0 fixed = 39 total (was 36) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
56s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m 41s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}230m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.tools.TestObserverManualFailover |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.balancer.TestBalancer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14528 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985649/HDFS-14528.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5c78eec29cb6 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-11-12 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972859#comment-16972859
 ] 

Wei-Chiu Chuang commented on HDFS-14884:


Sorry missed it. I'll review it for sure.

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Affects Versions: 2.11.0
>Reporter: Mukul Kumar Singh
>Assignee: Yuval Degani
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.11.0
>
> Attachments: HDFS-14884-branch-2.001.patch, HDFS-14884.001.patch, 
> HDFS-14884.002.patch, HDFS-14884.003.patch, hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14982) Backport HADOOP-16152 to branch-3.1

2019-11-12 Thread Siyao Meng (Jira)
Siyao Meng created HDFS-14982:
-

 Summary: Backport HADOOP-16152 to branch-3.1
 Key: HDFS-14982
 URL: https://issues.apache.org/jira/browse/HDFS-14982
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.1.3
Reporter: Siyao Meng
Assignee: Siyao Meng


HADOOP-16152. Upgrade Eclipse Jetty version to 9.4.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2464) Avoid unnecessary allocations for FileChannel.open call

2019-11-12 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2464:
---
Status: Patch Available  (was: Open)

> Avoid unnecessary allocations for FileChannel.open call
> ---
>
> Key: HDDS-2464
> URL: https://issues.apache.org/jira/browse/HDDS-2464
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ChunkUtils}} calls {{FileChannel#open(Path, OpenOption...)}}.  Vararg array 
> elements are then added to a new {{HashSet}} to call {{FileChannel#open(Path, 
> Set, FileAttribute...)}}.  We can call the latter 
> directly instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2105) Merge OzoneClientFactory#getRpcClient functions

2019-11-12 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2105 started by Siyao Meng.

> Merge OzoneClientFactory#getRpcClient functions
> ---
>
> Key: HDDS-2105
> URL: https://issues.apache.org/jira/browse/HDDS-2105
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214
> There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
> HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
> increases code paths.
> Goal: Merge those functions into fewer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2105) Merge OzoneClientFactory#getRpcClient functions

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2105?focusedWorklogId=342234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342234
 ]

ASF GitHub Bot logged work on HDDS-2105:


Author: ASF GitHub Bot
Created on: 12/Nov/19 22:14
Start Date: 12/Nov/19 22:14
Worklog Time Spent: 10m 
  Work Description: smengcl commented on pull request #148: HDDS-2105. 
Merge OzoneClientFactory#getRpcClient functions
URL: https://github.com/apache/hadoop-ozone/pull/148
 
 
   ## What changes were proposed in this pull request?
   
   There are in total 6 overloaded `OzoneClientFactory#getRpcClient` functions 
now. Some of them are not used or just used once. Remove/merge some of them. 
(Should be fine to simply remove public function without deprecating at this 
moment since ozone is still in alpha?)
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2105
   
   ## How was this patch tested?
   
   Rerun all existing tests, since this is just a straightforward refactoring.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342234)
Remaining Estimate: 0h
Time Spent: 10m

> Merge OzoneClientFactory#getRpcClient functions
> ---
>
> Key: HDDS-2105
> URL: https://issues.apache.org/jira/browse/HDDS-2105
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214
> There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
> HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
> increases code paths.
> Goal: Merge those functions into fewer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2464) Avoid unnecessary allocations for FileChannel.open call

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2464?focusedWorklogId=342233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342233
 ]

ASF GitHub Bot logged work on HDDS-2464:


Author: ASF GitHub Bot
Created on: 12/Nov/19 22:14
Start Date: 12/Nov/19 22:14
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #147: HDDS-2464. 
Avoid unnecessary allocations for FileChannel.open call
URL: https://github.com/apache/hadoop-ozone/pull/147
 
 
   ## What changes were proposed in this pull request?
   
   `ChunkUtils` calls `FileChannel#open(Path, OpenOption...)`.  Vararg array 
elements are then added to a new `HashSet` to be passed to 
`FileChannel#open(Path, Set, FileAttribute...)`.  We 
can call the latter directly instead.
   
   https://issues.apache.org/jira/browse/HDDS-2464
   
   ## How was this patch tested?
   
   Ran `TestChunkUtils`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342233)
Remaining Estimate: 0h
Time Spent: 10m

> Avoid unnecessary allocations for FileChannel.open call
> ---
>
> Key: HDDS-2464
> URL: https://issues.apache.org/jira/browse/HDDS-2464
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ChunkUtils}} calls {{FileChannel#open(Path, OpenOption...)}}.  Vararg array 
> elements are then added to a new {{HashSet}} to call {{FileChannel#open(Path, 
> Set, FileAttribute...)}}.  We can call the latter 
> directly instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2464) Avoid unnecessary allocations for FileChannel.open call

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2464:
-
Labels: pull-request-available  (was: )

> Avoid unnecessary allocations for FileChannel.open call
> ---
>
> Key: HDDS-2464
> URL: https://issues.apache.org/jira/browse/HDDS-2464
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>
> {{ChunkUtils}} calls {{FileChannel#open(Path, OpenOption...)}}.  Vararg array 
> elements are then added to a new {{HashSet}} to call {{FileChannel#open(Path, 
> Set, FileAttribute...)}}.  We can call the latter 
> directly instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2105) Merge OzoneClientFactory#getRpcClient functions

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2105:
-
Labels: pull-request-available  (was: )

> Merge OzoneClientFactory#getRpcClient functions
> ---
>
> Key: HDDS-2105
> URL: https://issues.apache.org/jira/browse/HDDS-2105
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>
> Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214
> There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
> HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
> increases code paths.
> Goal: Merge those functions into fewer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2105) Merge OzoneClientFactory#getRpcClient functions

2019-11-12 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-2105:
-
Description: 
Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214

There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
increases code paths.

Goal: Merge those functions into fewer.

  was:
Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214

There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
increases code paths.

Goal: Merge those functions into one or two.

Work will begin after HDDS-2007 is committed.


> Merge OzoneClientFactory#getRpcClient functions
> ---
>
> Key: HDDS-2105
> URL: https://issues.apache.org/jira/browse/HDDS-2105
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214
> There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
> HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
> increases code paths.
> Goal: Merge those functions into fewer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2464) Avoid unnecessary allocations for FileChannel.open call

2019-11-12 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2464:
--

 Summary: Avoid unnecessary allocations for FileChannel.open call
 Key: HDDS-2464
 URL: https://issues.apache.org/jira/browse/HDDS-2464
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


{{ChunkUtils}} calls {{FileChannel#open(Path, OpenOption...)}}.  Vararg array 
elements are then added to a new {{HashSet}} to call {{FileChannel#open(Path, 
Set, FileAttribute...)}}.  We can call the latter 
directly instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2463) Reduce unnecessary getServiceInfo calls

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2463:
-
Labels: pull-request-available  (was: )

> Reduce unnecessary getServiceInfo calls
> ---
>
> Key: HDDS-2463
> URL: https://issues.apache.org/jira/browse/HDDS-2463
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>
> OzoneManagerProtocolClientSideTranslatorPB.java Line 766-772 has multiple 
> impl.getServiceInfo() which can be reduced by adding a local variable. 
> {code:java}
>  
> resp.addAllServiceInfo(impl.getServiceInfo().getServiceInfoList().stream()
>  .map(ServiceInfo::getProtobuf)
>  .collect(Collectors.toList()));
> if (impl.getServiceInfo().getCaCertificate() != null) {
>  resp.setCaCertificate(impl.getServiceInfo().getCaCertificate()); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2463) Reduce unnecessary getServiceInfo calls

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2463?focusedWorklogId=342180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342180
 ]

ASF GitHub Bot logged work on HDDS-2463:


Author: ASF GitHub Bot
Created on: 12/Nov/19 21:32
Start Date: 12/Nov/19 21:32
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #146: HDDS-2463. 
Reduce unnecessary getServiceInfo calls. Contributed by Xi…
URL: https://github.com/apache/hadoop-ozone/pull/146
 
 
   …aoyu Yao.
   
   ## What changes were proposed in this pull request?
   
   reduce unncessary getServiceInfo calls. 
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2463
   
   ## How was this patch tested?
   
   Run Ozone RPC related unit tests and acceptance tests. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342180)
Remaining Estimate: 0h
Time Spent: 10m

> Reduce unnecessary getServiceInfo calls
> ---
>
> Key: HDDS-2463
> URL: https://issues.apache.org/jira/browse/HDDS-2463
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> OzoneManagerProtocolClientSideTranslatorPB.java Line 766-772 has multiple 
> impl.getServiceInfo() which can be reduced by adding a local variable. 
> {code:java}
>  
> resp.addAllServiceInfo(impl.getServiceInfo().getServiceInfoList().stream()
>  .map(ServiceInfo::getProtobuf)
>  .collect(Collectors.toList()));
> if (impl.getServiceInfo().getCaCertificate() != null) {
>  resp.setCaCertificate(impl.getServiceInfo().getCaCertificate()); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2463) Reduce unnecessary getServiceInfo calls

2019-11-12 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2463:
-
Summary: Reduce unnecessary getServiceInfo calls  (was: Remove unnecessary 
getServiceInfo calls)

> Reduce unnecessary getServiceInfo calls
> ---
>
> Key: HDDS-2463
> URL: https://issues.apache.org/jira/browse/HDDS-2463
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>
> OzoneManagerProtocolClientSideTranslatorPB.java Line 766-772 has multiple 
> impl.getServiceInfo() which can be reduced by adding a local variable. 
> {code:java}
>  
> resp.addAllServiceInfo(impl.getServiceInfo().getServiceInfoList().stream()
>  .map(ServiceInfo::getProtobuf)
>  .collect(Collectors.toList()));
> if (impl.getServiceInfo().getCaCertificate() != null) {
>  resp.setCaCertificate(impl.getServiceInfo().getCaCertificate()); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread Anu Engineer (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer resolved HDDS-2462.

Fix Version/s: 0.5.0
   Resolution: Fixed

Committed to Master branch.

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2462?focusedWorklogId=342146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342146
 ]

ASF GitHub Bot logged work on HDDS-2462:


Author: ASF GitHub Bot
Created on: 12/Nov/19 20:57
Start Date: 12/Nov/19 20:57
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #145: HDDS-2462. 
Add jq dependency in Contribution guideline
URL: https://github.com/apache/hadoop-ozone/pull/145
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342146)
Time Spent: 20m  (was: 10m)

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2463) Remove unnecessary getServiceInfo calls

2019-11-12 Thread Xiaoyu Yao (Jira)
Xiaoyu Yao created HDDS-2463:


 Summary: Remove unnecessary getServiceInfo calls
 Key: HDDS-2463
 URL: https://issues.apache.org/jira/browse/HDDS-2463
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.4.1
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


OzoneManagerProtocolClientSideTranslatorPB.java Line 766-772 has multiple 
impl.getServiceInfo() which can be reduced by adding a local variable. 
{code:java}
 
resp.addAllServiceInfo(impl.getServiceInfo().getServiceInfoList().stream()
 .map(ServiceInfo::getProtobuf)
 .collect(Collectors.toList()));
if (impl.getServiceInfo().getCaCertificate() != null) {
 resp.setCaCertificate(impl.getServiceInfo().getCaCertificate()); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14959) [SBNN read] access time should be turned off

2019-11-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972736#comment-16972736
 ] 

Hudson commented on HDFS-14959:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17636 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17636/])
HDFS-14959: [SBNN read] access time should be turned off (#1706) (weichiu: rev 
97ec34e117af71e1a9950b8002131c45754009c7)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ObserverNameNode.md


> [SBNN read] access time should be turned off
> 
>
> Key: HDFS-14959
> URL: https://issues.apache.org/jira/browse/HDFS-14959
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation
>Reporter: Wei-Chiu Chuang
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> Both Uber and Didi shared that access time has to be switched off to avoid 
> spiky NameNode RPC process time. If access time is not off entirely, 
> getBlockLocations RPCs have to update access time and must access the active 
> NameNode. (that's my understanding. haven't checked the code)
> We should record this as a best practice in our doc.
> (If you are on the ASF slack, check out this thread
> https://the-asf.slack.com/archives/CAD7C52Q3/p1572033324008600)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972735#comment-16972735
 ] 

Hadoop QA commented on HDFS-14648:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} hadoop-hdfs-project: The patch generated 0 new + 112 
unchanged - 1 fixed = 112 total (was 113) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
51s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 28s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}182m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy 
|
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14648 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985638/HDFS-14648.009.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 40af05af70d8 4.15.0-66-generic 

[jira] [Resolved] (HDFS-14959) [SBNN read] access time should be turned off

2019-11-12 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-14959.

Resolution: Fixed

Merged the PR to trunk and cherry pick the commit to branch-3.2 and branch-3.1.
Thanks [~csun]!

> [SBNN read] access time should be turned off
> 
>
> Key: HDFS-14959
> URL: https://issues.apache.org/jira/browse/HDFS-14959
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation
>Reporter: Wei-Chiu Chuang
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> Both Uber and Didi shared that access time has to be switched off to avoid 
> spiky NameNode RPC process time. If access time is not off entirely, 
> getBlockLocations RPCs have to update access time and must access the active 
> NameNode. (that's my understanding. haven't checked the code)
> We should record this as a best practice in our doc.
> (If you are on the ASF slack, check out this thread
> https://the-asf.slack.com/archives/CAD7C52Q3/p1572033324008600)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14959) [SBNN read] access time should be turned off

2019-11-12 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14959:
---
Fix Version/s: 3.2.2
   3.1.4
   3.3.0

> [SBNN read] access time should be turned off
> 
>
> Key: HDFS-14959
> URL: https://issues.apache.org/jira/browse/HDFS-14959
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation
>Reporter: Wei-Chiu Chuang
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> Both Uber and Didi shared that access time has to be switched off to avoid 
> spiky NameNode RPC process time. If access time is not off entirely, 
> getBlockLocations RPCs have to update access time and must access the active 
> NameNode. (that's my understanding. haven't checked the code)
> We should record this as a best practice in our doc.
> (If you are on the ASF slack, check out this thread
> https://the-asf.slack.com/archives/CAD7C52Q3/p1572033324008600)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Ravuri Sushma sree (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14442:
--
Attachment: (was: HDFS-14442.003.PATCH)

> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972714#comment-16972714
 ] 

Hadoop QA commented on HDFS-14442:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
5s{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HDFS-14442 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14442 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985651/HDFS-14442.003.PATCH |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28299/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, 
> HDFS-14442.003.PATCH
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Ravuri Sushma sree (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14442:
--
Attachment: HDFS-14442.003.PATCH

> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, 
> HDFS-14442.003.PATCH
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Ravuri Sushma sree (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14442:
--
Attachment: (was: HDFS-14442.003.patch)

> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, 
> HDFS-14442.003.PATCH
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14922) Prevent snapshot modification time got change on startup

2019-11-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972707#comment-16972707
 ] 

Hudson commented on HDFS-14922:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17634 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17634/])
HDFS-14922. Prevent snapshot modification time got change on startup. 
(inigoiri: rev 40150da1e12a41c2e774fe2a277ddc3988bed239)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSnapshotOp.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshot.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectorySnapshottableFeature.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java


> Prevent snapshot modification time got change on startup
> 
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14955) RBF: getQuotaUsage() on mount point should return global quota.

2019-11-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972704#comment-16972704
 ] 

Íñigo Goiri commented on HDFS-14955:


Thanks [~LiJinglun] for the patch.
* Update the javadoc for {{aggregateQuota()}}.
* I think we can skip most of the for loop right before if this is a mount 
point.

> RBF: getQuotaUsage() on mount point should return global quota.
> ---
>
> Key: HDFS-14955
> URL: https://issues.apache.org/jira/browse/HDFS-14955
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14955.001.patch
>
>
> When getQuotaUsage() on a mount point path, the quota part should be the 
> global quota. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-11-12 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972701#comment-16972701
 ] 

Chen Liang commented on HDFS-14655:
---

We have this fix in our deployment, one thing I found is that it prints a ton 
of WARN {{java.util.concurrent.CancellationException}} in NN logs, can we make 
a fix to suppress the warnings? 

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655-05.patch, 
> HDFS-14655-06.patch, HDFS-14655-07.patch, HDFS-14655-08.patch, 
> HDFS-14655-branch-2-01.patch, HDFS-14655-branch-2-02.patch, 
> HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Ravuri Sushma sree (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Attachment: HDFS-14528.006.patch

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.006.patch, HDFS-14528.2.Patch, 
> ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Ravuri Sushma sree (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Attachment: (was: HDFS-14528.006.patch)

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.2.Patch, ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Ravuri Sushma sree (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Attachment: HDFS-14528.006.patch

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.006.patch, HDFS-14528.2.Patch, 
> ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14922) Prevent snapshot modification time got change on startup

2019-11-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972694#comment-16972694
 ] 

Íñigo Goiri commented on HDFS-14922:


Thanks [~hemanthboyina] for the patch and [~virajith] for checking.
Committed to trunk.

> Prevent snapshot modification time got change on startup
> 
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14922) Prevent snapshot modification time got change on startup

2019-11-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14922:
---
Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Prevent snapshot modification time got change on startup
> 
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14922) Prevent snapshot modification time got change on startup

2019-11-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14922:
---
Summary: Prevent snapshot modification time got change on startup  (was: On 
StartUp, snapshot modification time got changed)

> Prevent snapshot modification time got change on startup
> 
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14922) On StartUp, snapshot modification time got changed

2019-11-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14922:
---
Summary: On StartUp, snapshot modification time got changed  (was: On 
StartUp , Snapshot modification time got changed)

> On StartUp, snapshot modification time got changed
> --
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14981) BlockStateChange logging is exceedingly verbose

2019-11-12 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HDFS-14981.
-
Resolution: Duplicate

Yep, I think you're right. Thanks for the pointer [~weichiu].

> BlockStateChange logging is exceedingly verbose
> ---
>
> Key: HDFS-14981
> URL: https://issues.apache.org/jira/browse/HDFS-14981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging
>Reporter: Nick Dimiduk
>Priority: Major
>
> On a moderately loaded cluster, name node logs are flooded with entries of 
> {{INFO BlockStateChange...}}, to the tune of ~30 lines per millisecond. This 
> provides operators with little to no usable information. I suggest reducing 
> this log message to {{DEBUG}}. Perhaps this information (and other logging 
> related to it) should be directed to a dedicated block-audit.log file that 
> can be queried, rotated on a separate schedule from the log of the main 
> process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Ravuri Sushma sree (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Attachment: (was: HDFS-14528.006.patch)

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.2.Patch, ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14922) On StartUp , Snapshot modification time got changed

2019-11-12 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972642#comment-16972642
 ] 

hemanthboyina commented on HDFS-14922:
--

[~elgoiri] can you push the patch forward

> On StartUp , Snapshot modification time got changed
> ---
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14981) BlockStateChange logging is exceedingly verbose

2019-11-12 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972640#comment-16972640
 ] 

Wei-Chiu Chuang commented on HDFS-14981:


I think this is done by HDFS-6860.

> BlockStateChange logging is exceedingly verbose
> ---
>
> Key: HDFS-14981
> URL: https://issues.apache.org/jira/browse/HDFS-14981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging
>Reporter: Nick Dimiduk
>Priority: Major
>
> On a moderately loaded cluster, name node logs are flooded with entries of 
> {{INFO BlockStateChange...}}, to the tune of ~30 lines per millisecond. This 
> provides operators with little to no usable information. I suggest reducing 
> this log message to {{DEBUG}}. Perhaps this information (and other logging 
> related to it) should be directed to a dedicated block-audit.log file that 
> can be queried, rotated on a separate schedule from the log of the main 
> process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14981) BlockStateChange logging is exceedingly verbose

2019-11-12 Thread Nick Dimiduk (Jira)
Nick Dimiduk created HDFS-14981:
---

 Summary: BlockStateChange logging is exceedingly verbose
 Key: HDFS-14981
 URL: https://issues.apache.org/jira/browse/HDFS-14981
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: logging
Reporter: Nick Dimiduk


On a moderately loaded cluster, name node logs are flooded with entries of 
{{INFO BlockStateChange...}}, to the tune of ~30 lines per millisecond. This 
provides operators with little to no usable information. I suggest reducing 
this log message to {{DEBUG}}. Perhaps this information (and other logging 
related to it) should be directed to a dedicated block-audit.log file that can 
be queried, rotated on a separate schedule from the log of the main process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972624#comment-16972624
 ] 

Lisheng Sun commented on HDFS-14648:


Thanks [~linyiqun] for good comments.
 i updated the patch as your comments and uploaded the v009 patch. Thank you.

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14648:
---
Attachment: HDFS-14648.009.patch

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972622#comment-16972622
 ] 

Hadoop QA commented on HDFS-14528:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 57s{color} | {color:orange} root: The patch generated 11 new + 36 unchanged 
- 0 fixed = 47 total (was 36) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
46s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 22s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
57s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}214m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
|   | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
|   | hadoop.hdfs.TestFileChecksumCompositeCrc |
|   | hadoop.hdfs.TestErasureCodingPolicies |
|   | hadoop.hdfs.TestDecommissionWithStriped |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestFileAppend2 |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14528 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985622/HDFS-14528.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  

[jira] [Work logged] (HDDS-2456) Add explicit base image version for images derived from ozone-runner

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2456?focusedWorklogId=342014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342014
 ]

ASF GitHub Bot logged work on HDDS-2456:


Author: ASF GitHub Bot
Created on: 12/Nov/19 16:23
Start Date: 12/Nov/19 16:23
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #139: HDDS-2456. Add 
explicit base image version for images derived from ozone-runner
URL: https://github.com/apache/hadoop-ozone/pull/139
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342014)
Time Spent: 20m  (was: 10m)

> Add explicit base image version for images derived from ozone-runner
> 
>
> Key: HDDS-2456
> URL: https://issues.apache.org/jira/browse/HDDS-2456
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: docker
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{ozone-om-ha}} and {{ozonescripts}} build images based on 
> {{apache/ozone-runner}}.
> Problem: They do not specify base image versions, so it defaults to 
> {{latest}}.  If a new {{ozone-runner}} image is published on Docker Hub, 
> developers needs to manually pull the {{latest}} image for it to take effect 
> on these derived images.
> Solution: Use explicit base image version (defined by 
> {{OZONE_RUNNER_VERSION}} variable in {{.env}} file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread Istvan Fajth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth reassigned HDDS-2462:
--

Assignee: Istvan Fajth

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972577#comment-16972577
 ] 

Hadoop QA commented on HDFS-14442:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 51s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 22 unchanged - 0 fixed = 24 total (was 22) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}112m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}185m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
|   | hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN |
|   | hadoop.hdfs.server.datanode.TestDataNodeReconfiguration |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.datanode.checker.TestDatasetVolumeCheckerTimeout |
|   | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
|   | hadoop.hdfs.server.mover.TestMover |
|   | hadoop.hdfs.server.mover.TestStorageMover |
|   | hadoop.hdfs.server.datanode.TestDataNodeLifeline |
|   | hadoop.hdfs.server.blockmanagement.TestBlockInfoStriped |
|   | hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting |
|   | hadoop.hdfs.server.datanode.TestBlockRecovery |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.blockmanagement.TestPendingReconstruction |
|   | hadoop.hdfs.server.blockmanagement.TestReplicationPolicy |
|   | hadoop.hdfs.server.datanode.TestBatchIbr |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14442 |
| JIRA Patch URL | 

[jira] [Updated] (HDDS-2456) Add explicit base image version for images derived from ozone-runner

2019-11-12 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2456:
--
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Add explicit base image version for images derived from ozone-runner
> 
>
> Key: HDDS-2456
> URL: https://issues.apache.org/jira/browse/HDDS-2456
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: docker
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ozone-om-ha}} and {{ozonescripts}} build images based on 
> {{apache/ozone-runner}}.
> Problem: They do not specify base image versions, so it defaults to 
> {{latest}}.  If a new {{ozone-runner}} image is published on Docker Hub, 
> developers needs to manually pull the {{latest}} image for it to take effect 
> on these derived images.
> Solution: Use explicit base image version (defined by 
> {{OZONE_RUNNER_VERSION}} variable in {{.env}} file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2462:
-
Labels: pull-request-available  (was: )

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2462?focusedWorklogId=342001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342001
 ]

ASF GitHub Bot logged work on HDDS-2462:


Author: ASF GitHub Bot
Created on: 12/Nov/19 16:08
Start Date: 12/Nov/19 16:08
Worklog Time Spent: 10m 
  Work Description: fapifta commented on pull request #145: HDDS-2462. Add 
jq dependency in Contribution guideline
URL: https://github.com/apache/hadoop-ozone/pull/145
 
 
   ## What changes were proposed in this pull request?
   Documentation update, add jq dependency into the Contribution Guideline in 
the "Additional requirements to execute different type of tests" section
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-2462
   
   ## How was this patch tested?
   Doc change, no tests needed as far as I can tell.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342001)
Remaining Estimate: 0h
Time Spent: 10m

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972571#comment-16972571
 ] 

Yiqun Lin commented on HDFS-14648:
--

Thanks [~leosun08] , the patch almost looks good now, only some minor comments:

*DFSInputStream.java*
 1. There are one additional places we can add the addNodeToDeadNodeDetector 
call in {{createBlockReader}}
{code:java}
  boolean createBlockReader(LocatedBlock block, long offsetInBlock,
  LocatedBlock[] targetBlocks, BlockReaderInfo[] readerInfos,
 ...
} else {
  //TODO: handles connection issues
  DFSClient.LOG.warn("Failed to connect to " + dnInfo.addr + " for " +
  "block" + block.getBlock(), e);
  // re-fetch the block in case the block has been moved
  fetchBlockAt(block.getStartOffset());
  addToLocalDeadNodes(dnInfo.info);
 //  <
}
  }
{code}
*DeadNodeDetector.java*
 1.Can you address this comment that missed?
{quote}1. Can we comment the name as Client context name

+ /**
 + * Client context name.
 + */
 + private String name;
{quote}
2. We can use the containsKey to check
{code:java}
  public boolean isDeadNode(DatanodeInfo datanodeInfo) {
return deadNodes.containsKey((datanodeInfo.getDatanodeUuid());
  }
{code}
Also we can use the key to remove in method clearAndGetDetectedDeadNodes
{code:java}
for (DatanodeInfo datanodeInfo : deadNodes.values()) {
  if (!newDeadNodes.contains(datanodeInfo)) {
deadNodes.remove(datanodeInfo.getDatanodeUuid());
  }
}
{code}
3. We can periodically call clearAndGetDetectedDeadNodes to make deadNodes list 
be refreshed. I think deadNodes list can be a little staled when the local dead 
node is cleared in dfs input stream.
{code:java}
  public void run() {
while (true) {
  clearAndGetDetectedDeadNodes();
  LOG.debug("Current detector state {}, the detected nodes: {}.", state,
  deadNodes.values());
  switch (state) {
{code}
4. Not fully get this. Why we still call this in the latest patch? Can you 
explain for this?
{noformat}
newDeadNodes.retainAll(deadNodes.values());
{noformat}

*TestDFSClientDetectDeadNodes.java*
 1. Can you rename the unit test name from {{TestDFSClientDetectDeadNodes}} to 
{{TestDeadNodeDetection}}? And simplified the comment to this:
{noformat}
+/**
+ * Tests for dead node detection in DFSClient.
+ */
+public class TestDeadNodeDetection {
{noformat}
Two other name updated:
 * testDetectDeadNodeInBackground --> testDeadNodeDetectionInBackground
 * testDeadNodeMultipleDFSInputStream --> 
testDeadNodeDetectionInMultipleDFSInputStream

2. No needed to call {{ThreadUtil.sleepAtLeastIgnoreInterrupts(10 * 1000L);}} I 
think.
 3. Can we extract the DFSClient here? I see we call many times getDFSClient().
{code:java}
assertEquals(1, din1.getDFSClient().getDeadNodes(din1).size());
assertEquals(1, din1.getDFSClient().getClientContext()
{code}
 

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each 

[jira] [Updated] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread Istvan Fajth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDDS-2462:
---
Summary: Add jq dependency in Contribution guideline  (was: Add jq 
dependency in how to contribute docs)

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Priority: Major
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972566#comment-16972566
 ] 

Hadoop QA commented on HDFS-14648:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 52s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
112 unchanged - 1 fixed = 113 total (was 113) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
13s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client generated 1 new 
+ 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
2s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 54s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}171m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs-client |
|  |  org.apache.hadoop.hdfs.protocol.DatanodeInfo is incompatible with 
expected argument type String in 
org.apache.hadoop.hdfs.DeadNodeDetector.clearAndGetDetectedDeadNodes()  At 
DeadNodeDetector.java:argument type String in 
org.apache.hadoop.hdfs.DeadNodeDetector.clearAndGetDetectedDeadNodes()  At 
DeadNodeDetector.java:[line 165] |
| Failed junit tests | hadoop.hdfs.server.namenode.TestReencryption |
|   | 

[jira] [Created] (HDDS-2462) Add jq dependency in how to contribute docs

2019-11-12 Thread Istvan Fajth (Jira)
Istvan Fajth created HDDS-2462:
--

 Summary: Add jq dependency in how to contribute docs
 Key: HDDS-2462
 URL: https://issues.apache.org/jira/browse/HDDS-2462
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Istvan Fajth


Docker based tests are using JQ to parse JMX pages of different processes, but 
the documentation does not mention it as a dependency.

Add it to CONTRIBUTION.MD in the "Additional requirements to execute different 
type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-11-12 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1868:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, 
> HDDS-1868.03.patch, HDDS-1868.04.patch, HDDS-1868.05.patch, HDDS-1868.06.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Ozone pipeline on create and restart, start in allocated state. They are 
> moved into open state after all the pipeline have reported to it. However, 
> this potentially can lead into an issue where the pipeline is still not ready 
> to accept any incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1868?focusedWorklogId=341992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341992
 ]

ASF GitHub Bot logged work on HDDS-1868:


Author: ASF GitHub Bot
Created on: 12/Nov/19 15:54
Start Date: 12/Nov/19 15:54
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #23: HDDS-1868. 
Ozone pipelines should be marked as ready only after the leader election is 
complete.
URL: https://github.com/apache/hadoop-ozone/pull/23
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341992)
Time Spent: 3h 50m  (was: 3h 40m)

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, 
> HDDS-1868.03.patch, HDDS-1868.04.patch, HDDS-1868.05.patch, HDDS-1868.06.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Ozone pipeline on create and restart, start in allocated state. They are 
> moved into open state after all the pipeline have reported to it. However, 
> this potentially can lead into an issue where the pipeline is still not ready 
> to accept any incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2415) Completely disable tracer if hdds.tracing.enabled=false

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2415?focusedWorklogId=341985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341985
 ]

ASF GitHub Bot logged work on HDDS-2415:


Author: ASF GitHub Bot
Created on: 12/Nov/19 15:46
Start Date: 12/Nov/19 15:46
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #128: HDDS-2415. 
Completely disable tracer if hdds.tracing.enabled=false
URL: https://github.com/apache/hadoop-ozone/pull/128
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341985)
Time Spent: 20m  (was: 10m)

> Completely disable tracer if hdds.tracing.enabled=false
> ---
>
> Key: HDDS-2415
> URL: https://issues.apache.org/jira/browse/HDDS-2415
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: perfomance, pull-request-available
> Fix For: 0.5.0
>
> Attachments: allocations.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a config setting to enable/disable OpenTracing-based distributed 
> tracing in Ozone ({{hdds.tracing.enabled}}).  However, setting it to false 
> does not prevent tracer initialization, which causes unnecessary object 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2415) Completely disable tracer if hdds.tracing.enabled=false

2019-11-12 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2415:
--
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Completely disable tracer if hdds.tracing.enabled=false
> ---
>
> Key: HDDS-2415
> URL: https://issues.apache.org/jira/browse/HDDS-2415
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: perfomance, pull-request-available
> Fix For: 0.5.0
>
> Attachments: allocations.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a config setting to enable/disable OpenTracing-based distributed 
> tracing in Ozone ({{hdds.tracing.enabled}}).  However, setting it to false 
> does not prevent tracer initialization, which causes unnecessary object 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2461) Logging by ChunkUtils is misleading

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2461?focusedWorklogId=341966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341966
 ]

ASF GitHub Bot logged work on HDDS-2461:


Author: ASF GitHub Bot
Created on: 12/Nov/19 15:22
Start Date: 12/Nov/19 15:22
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #144: HDDS-2461. Logging 
by ChunkUtils is misleading
URL: https://github.com/apache/hadoop-ozone/pull/144
 
 
   ## What changes were proposed in this pull request?
   
   During a k8s based test I found a lot of log message like:
   ```
   2019-11-12 14:27:13 WARN  ChunkManagerImpl:209 - Duplicate write chunk 
request. Chunk overwrite without explicit request. 
ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} 
   ```
   
   I was very surprised as at `ChunkManagerImpl:209` there was no related lines.
   
   It turned out that it's logged by `ChunkUtils` but it's used the logger of 
`ChunkManagerImpl`.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2461
   
   ## How was this patch tested?
   
   I deployed a new version  from Ozone to the kubernetes cluster. But I also 
added a new test method TestChunkUtil to have at least one unit test method 
which uses the logger. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341966)
Remaining Estimate: 0h
Time Spent: 10m

> Logging by ChunkUtils is misleading
> ---
>
> Key: HDDS-2461
> URL: https://issues.apache.org/jira/browse/HDDS-2461
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During a k8s based test I found a lot of log message like:
> {code:java}
> 2019-11-12 14:27:13 WARN  ChunkManagerImpl:209 - Duplicate write chunk 
> request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} 
> {code}
> I was very surprised as at ChunkManagerImpl:209 there was no similar lines.
> It turned out that it's logged by ChunkUtils but it's used the logger of 
> ChunkManagerImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2461) Logging by ChunkUtils is misleading

2019-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2461:
-
Labels: pull-request-available  (was: )

> Logging by ChunkUtils is misleading
> ---
>
> Key: HDDS-2461
> URL: https://issues.apache.org/jira/browse/HDDS-2461
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>
> During a k8s based test I found a lot of log message like:
> {code:java}
> 2019-11-12 14:27:13 WARN  ChunkManagerImpl:209 - Duplicate write chunk 
> request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} 
> {code}
> I was very surprised as at ChunkManagerImpl:209 there was no similar lines.
> It turned out that it's logged by ChunkUtils but it's used the logger of 
> ChunkManagerImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2461) Logging by ChunkUtils is misleading

2019-11-12 Thread Marton Elek (Jira)
Marton Elek created HDDS-2461:
-

 Summary: Logging by ChunkUtils is misleading
 Key: HDDS-2461
 URL: https://issues.apache.org/jira/browse/HDDS-2461
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Marton Elek


During a k8s based test I found a lot of log message like:
{code:java}
2019-11-12 14:27:13 WARN  ChunkManagerImpl:209 - Duplicate write chunk request. 
Chunk overwrite without explicit request. 
ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} 
{code}
I was very surprised as at ChunkManagerImpl:209 there was no similar lines.

It turned out that it's logged by ChunkUtils but it's used the logger of 
ChunkManagerImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2461) Logging by ChunkUtils is misleading

2019-11-12 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek reassigned HDDS-2461:
-

Assignee: Marton Elek

> Logging by ChunkUtils is misleading
> ---
>
> Key: HDDS-2461
> URL: https://issues.apache.org/jira/browse/HDDS-2461
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>
> During a k8s based test I found a lot of log message like:
> {code:java}
> 2019-11-12 14:27:13 WARN  ChunkManagerImpl:209 - Duplicate write chunk 
> request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} 
> {code}
> I was very surprised as at ChunkManagerImpl:209 there was no similar lines.
> It turned out that it's logged by ChunkUtils but it's used the logger of 
> ChunkManagerImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14612) SlowDiskReport won't update when SlowDisks is always empty in heartbeat

2019-11-12 Thread Haibin Huang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972486#comment-16972486
 ] 

Haibin Huang commented on HDFS-14612:
-

[~weichiu],i have update this patch, would you help review it? Thank you.

> SlowDiskReport won't update when SlowDisks is always empty in heartbeat
> ---
>
> Key: HDFS-14612
> URL: https://issues.apache.org/jira/browse/HDFS-14612
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haibin Huang
>Assignee: Haibin Huang
>Priority: Major
> Attachments: HDFS-14612-001.patch, HDFS-14612-002.patch, 
> HDFS-14612-003.patch, HDFS-14612-004.patch, HDFS-14612.patch
>
>
> I found SlowDiskReport won't update when slowDisks is always empty in 
> org.apache.hadoop.hdfs.server.blockmanagement.*handleHeartbeat*, this may 
> lead to outdated SlowDiskReport alway staying in jmx of namenode until next 
> time slowDisks isn't empty. So i think this method 
> *checkAndUpdateReportIfNecessary()* should be called firstly when we want to 
> get the jmx information about SlowDiskReport, this can keep the 
> SlowDiskReport on jmx is alway valid.
>  
> There is also some incorrect object reference on 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.
> *DataNodeVolumeMetrics*
> {code:java}
> // Based on writeIoRate
> public long getWriteIoSampleCount() {
>   return syncIoRate.lastStat().numSamples();
> }
> public double getWriteIoMean() {
>   return syncIoRate.lastStat().mean();
> }
> public double getWriteIoStdDev() {
>   return syncIoRate.lastStat().stddev();
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14978) In-place Erasure Coding Conversion

2019-11-12 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972384#comment-16972384
 ] 

Wei-Chiu Chuang commented on HDFS-14978:


bq. What is the client behavior during the CAS operation OP_SWAP_BLOCK_LIST
This operation is atomic. Semantically, it is similar to truncating the file to 
zero length, and then append the file with erasure coded blocks. 
Assuming both files are not open. A getBlockLocations() call for the $src prior 
to swapBlockList() gets the replicated block list. Once a client has the 
located blocks list, it has the block tokens too and it should be able to read 
without problems, even though the namespace has changed.

> In-place Erasure Coding Conversion
> --
>
> Key: HDFS-14978
> URL: https://issues.apache.org/jira/browse/HDFS-14978
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: In-place Erasure Coding Conversion.pdf
>
>
> HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses 
> encoding algorithms to reduce disk space usage while retaining redundancy 
> necessary for data recovery. It was a huge amount of work but it is just 
> getting adopted after almost 2 years.
> One usability problem that’s blocking users from adopting HDFS Erasure Coding 
> is that existing replicated files have to be copied to an EC-enabled 
> directory explicitly. Renaming a file/directory to an EC-enabled directory 
> does not automatically convert the blocks. Therefore users typically perform 
> the following steps to erasure-code existing files:
> {noformat}
> Create $tmp directory, set EC policy at it
> Distcp $src to $tmp
> Delete $src (rm -rf $src)
> mv $tmp $src
> {noformat}
> There are several reasons why this is not popular:
> * Complex. The process involves several steps: distcp data to a temporary 
> destination; delete source file; move destination to the source path.
> * Availability: there is a short period where nothing exists at the source 
> path, and jobs may fail unexpectedly.
> * Overhead. During the copy phase, there is a point in time where all of 
> source and destination files exist at the same time, exhausting disk space.
> * Not snapshot-friendly. If a snapshot is taken prior to performing the 
> conversion, the source (replicated) files will be preserved in the cluster 
> too. Therefore, the conversion actually increase storage space usage.
> * Not management-friendly. This approach changes file inode number, 
> modification time and access time. Erasure coded files are supposed to store 
> cold data, but this conversion makes data “hot” again.
> * Bulky. It’s either all or nothing. The directory may be partially erasure 
> coded, but this approach simply erasure code everything again.
> To ease data management, we should offer a utility tool to convert replicated 
> files to erasure coded files in-place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Ravuri Sushma sree (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Attachment: HDFS-14528.006.patch

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.006.patch, HDFS-14528.2.Patch, 
> ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >