[jira] [Commented] (HDFS-14483) Backport HDFS-3246 ByteBuffer pread interface to branch-2.8.x

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869202#comment-16869202
 ] 

Hadoop QA commented on HDFS-14483:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.8 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
39s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 4s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
26s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
17s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
41s{color} | {color:green} branch-2.8 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-hdfs-project/hadoop-hdfs-native-client {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
26s{color} | {color:red} hadoop-common-project/hadoop-common in branch-2.8 has 
1 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
25s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client in branch-2.8 
has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
41s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
3s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
25s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
42s{color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  1m 42s{color} | 
{color:red} root in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 42s{color} 
| {color:red} root in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
30s{color} | {color:red} root in the patch failed with JDK v1.8.0_212. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  1m 30s{color} | 
{color:red} root in the patch failed with JDK v1.8.0_212. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 30s{color} 
| {color:red} root in the patch failed with JDK v1.8.0_212. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} root: The patch generated 0 new + 116 unchanged - 1 
fixed = 116 total (was 117) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
27s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 6 line(s) with tabs. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-hdfs-project/hadoop-hdfs-native-client {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
17s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
53s{color} | {color:green} 

[jira] [Updated] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk

2019-06-20 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14135:

Attachment: HDFS-14135.013.patch

> TestWebHdfsTimeouts Fails intermittently in trunk
> -
>
> Key: HDFS-14135
> URL: https://issues.apache.org/jira/browse/HDFS-14135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14135-01.patch, HDFS-14135-02.patch, 
> HDFS-14135-03.patch, HDFS-14135-04.patch, HDFS-14135-05.patch, 
> HDFS-14135-06.patch, HDFS-14135-07.patch, HDFS-14135-08.patch, 
> HDFS-14135.009.patch, HDFS-14135.010.patch, HDFS-14135.011.patch, 
> HDFS-14135.012.patch, HDFS-14135.013.patch
>
>
> Reference to failure
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/982/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14573) Backport Standby Read to branch-3

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869194#comment-16869194
 ] 

Hadoop QA commented on HDFS-14573:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 27 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  5m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
39s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
10s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
51s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
15s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-hdfs-project/hadoop-hdfs-native-client {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
57s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m  
3s{color} | {color:green} branch-3.2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 14m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 14m 19s{color} 
| {color:red} root generated 170 new + 1158 unchanged - 170 fixed = 1328 total 
(was 1328) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 48s{color} | {color:orange} root: The patch generated 27 new + 2571 
unchanged - 11 fixed = 2598 total (was 2582) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-hdfs-project/hadoop-hdfs-native-client {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
52s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m  3s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  5m 36s{color} 
| {color:red} hadoop-hdfs-native-client in the patch failed. {color} |
| {color:green}+1{color} | 

[jira] [Assigned] (HDDS-1691) RDBTable#isExist should use Rocksdb#keyMayExist

2019-06-20 Thread Siddharth Wagle (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle reassigned HDDS-1691:
-

Assignee: Aravindan Vijayan  (was: Nanda kumar)

> RDBTable#isExist should use Rocksdb#keyMayExist
> ---
>
> Key: HDDS-1691
> URL: https://issues.apache.org/jira/browse/HDDS-1691
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>
> RDBTable#isExist can use Rocksdb#keyMayExist, this avoids the cost of reading 
> the value for the key.
> Please refer, 
> https://github.com/facebook/rocksdb/blob/7a8d7358bb40b13a06c2c6adc62e80295d89ed05/java/src/main/java/org/rocksdb/RocksDB.java#L2184



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed

2019-06-20 Thread Ravuri Sushma sree (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Description: 
*Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]*

*When trying to exectue the failover command from active to standby* 

*._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*

  Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
connection exception: java.net.ConnectException: Connection refused

This is encountered in two cases : When any other standby namenode is down or 
when any other zkfc is down 

  was:
*Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]*

*When trying to exectue the failover command from active to standby* 

*._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*

  Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see: [http://wiki.apache.org/hadoop/ConnectionRefused]
 at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
 at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) 


> [SBN Read]Failover from Active to Standby Failed  
> --
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: ZKFC_issue.patch
>
>
> *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in two cases : When any other standby namenode is down or 
> when any other zkfc is down 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869187#comment-16869187
 ] 

Wei-Chiu Chuang commented on HDFS-14585:


Filed HADOOP-16386 for the findbugs warning.

> Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14585.branch-2.8.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869186#comment-16869186
 ] 

Wei-Chiu Chuang commented on HDFS-14585:


Hi [~leosun08] We typically backport from the newer branches to older branches.
The branch-2.8 is the oldest active branch. Would you please rebase on branch-2?

asflicense warning is false alarm. There are minor checkstyle warning that can 
be cleaned easily. The findbugs warning looks fishy. Looking back in my 
mailbox, the same warning came out a few times for branch-2. So probably 
unrelated, but we should file a jira for that.

Suggest to update the jira summary, as this is essentially a reimplementation 
of HDFS-8901 for replicated blocks. Hadoop 2 does not support EC, and so 
there's no need to support striped blocks. How about "Reimplement HDFS-8901 for 
replicated block positional read in branch-2"?

Do we need any test? Or is this purely an performance improvement and existing 
tests cover it all?

> Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14585.branch-2.8.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14587) Support fail fast when client wait ACK by pipeline over threshold

2019-06-20 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869184#comment-16869184
 ] 

He Xiaoqiao commented on HDFS-14587:


Thanks [~jojochuang] for the helpful information, I will check if this JIRA 
could solve this issue. Thanks again.

> Support fail fast when client wait ACK by pipeline over threshold
> -
>
> Key: HDFS-14587
> URL: https://issues.apache.org/jira/browse/HDFS-14587
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>
> Recently, I meet corner case that client wait for data to be acknowledged by 
> pipeline over 9 hours. After check branch trunk, I think this issue still 
> exist. So I propose to add threshold about wait timeout then fail fast.
> {code:java}
> 2019-06-18 12:53:46,217 WARN [Thread-127] org.apache.hadoop.hdfs.DFSClient: 
> Slow waitForAckedSeqno took 35560718ms (threshold=3ms)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12564) Add the documents of swebhdfs configurations on the client side

2019-06-20 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869180#comment-16869180
 ] 

Takanobu Asanuma commented on HDFS-12564:
-

Thank you very much, [~jojochuang]!

> Add the documents of swebhdfs configurations on the client side
> ---
>
> Key: HDFS-12564
> URL: https://issues.apache.org/jira/browse/HDFS-12564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, webhdfs
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12564.1.patch, HDFS-12564.2.patch, 
> HDFS-12564.3.patch, HDFS-12564.4.patch
>
>
> Documentation does not cover the swebhdfs configurations on the client side. 
> We can reuse the hftp/hsftp documents which was removed from Hadoop-3.0 in 
> HDFS-5570, HDFS-9640.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14591) NameNode should move the replicas to the correct storages after the storage policy is changed.

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869174#comment-16869174
 ] 

Wei-Chiu Chuang commented on HDFS-14591:


I think what Jinglun wants is a more advanced version, one that detects the 
temperature of data and move files around according to the temperature. That 
is, something similar to what's proposed in HDFS-7343.

Frankly speaking, SPS looks interesting, but it is not built for my customers' 
use cases. I've been wanting to have SSM but it's not on the top of my list to 
implement...

> NameNode should move the replicas to the correct storages after the storage 
> policy is changed.
> --
>
> Key: HDFS-14591
> URL: https://issues.apache.org/jira/browse/HDFS-14591
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
>
> Our Xiaomi HDFS has a cluster storaging both HOT and COLD data. We have a 
> backgroud process searching all the files to find those that are not accessed 
> for a period of time. Then we set them to COLD and start a mover to move the 
> replicas. After moving, all the replicas are consistent with the storage 
> policy.
> It's a natural idea to let the NameNode handle the move.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log

2019-06-20 Thread qiang Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiang Liu updated HDFS-14303:
-
Affects Version/s: 3.2.0

> check block directory logic not correct when there is only meta file, print 
> no meaning warn log
> ---
>
> Key: HDFS-14303
> URL: https://issues.apache.org/jira/browse/HDFS-14303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.7.3, 3.2.0, 2.9.2, 2.8.5
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: easy-fix
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14303-branch-2.005.patch, 
> HDFS-14303-branch-2.009.patch, HDFS-14303-branch-2.010.patch, 
> HDFS-14303-branch-2.015.patch, HDFS-14303-branch-2.017.patch, 
> HDFS-14303-branch-2.7.001.patch, HDFS-14303-branch-2.7.004.patch, 
> HDFS-14303-branch-2.7.006.patch, HDFS-14303-branch-2.9.011.patch, 
> HDFS-14303-branch-2.9.012.patch, HDFS-14303-branch-2.9.013.patch, 
> HDFS-14303-trunk.014.patch, HDFS-14303-trunk.015.patch, 
> HDFS-14303-trunk.016.patch, HDFS-14303-trunk.016.path, 
> HDFS-14303.branch-3.2.017.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> chek block directory logic not correct when there is only meta file,print no 
> meaning warn log, eg:
>  WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block 
> ID-based layout. Actual block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68,
>  expected block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13893) DiskBalancer: no validations for Disk balancer commands

2019-06-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869162#comment-16869162
 ] 

Hudson commented on HDFS-13893:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16800 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16800/])
HDFS-13893. DiskBalancer: no validations for Disk balancer commands. (weichiu: 
rev 272b96d243383d9f50241d48cb070f638243bc9c)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DiskBalancerCLI.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/diskbalancer/command/TestDiskBalancerCommand.java


> DiskBalancer: no validations for Disk balancer commands 
> 
>
> Key: HDFS-13893
> URL: https://issues.apache.org/jira/browse/HDFS-13893
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Harshakiran Reddy
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-13893.001.patch, HDFS-13893.002.patch, 
> HDFS-13893.003.patch
>
>
> {{Scenario:-}}
>  
>  1 Run the Disk Balancer commands with extra arguments passing  
> {noformat} 
> hadoopclient> hdfs diskbalancer -plan hostname --thresholdPercentage 2 
> *sgfsdgfs*
> 2018-08-31 14:57:35,454 INFO planner.GreedyPlanner: Starting plan for Node : 
> hostname:50077
> 2018-08-31 14:57:35,457 INFO planner.GreedyPlanner: Disk Volume set 
> fb67f00c-e333-4f38-a3a6-846a30d4205a Type : DISK plan completed.
> 2018-08-31 14:57:35,457 INFO planner.GreedyPlanner: Compute Plan for Node : 
> hostname:50077 took 23 ms
> 2018-08-31 14:57:35,457 INFO command.Command: Writing plan to:
> 2018-08-31 14:57:35,457 INFO command.Command: 
> /system/diskbalancer/2018-Aug-31-14-57-35/hostname.plan.json
> Writing plan to:
> /system/diskbalancer/2018-Aug-31-14-57-35/hostname.plan.json
> {noformat} 
> Expected Output:- 
> =
> Disk balancer commands should be fail if we pass any invalid arguments or 
> extra arguments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14303:
---
Fix Version/s: 3.1.3
   2.9.3
   3.2.1
   2.8.6
   3.3.0
   3.0.4
   2.10.0

> check block directory logic not correct when there is only meta file, print 
> no meaning warn log
> ---
>
> Key: HDFS-14303
> URL: https://issues.apache.org/jira/browse/HDFS-14303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.7.3, 2.9.2, 2.8.5
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: easy-fix
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14303-branch-2.005.patch, 
> HDFS-14303-branch-2.009.patch, HDFS-14303-branch-2.010.patch, 
> HDFS-14303-branch-2.015.patch, HDFS-14303-branch-2.017.patch, 
> HDFS-14303-branch-2.7.001.patch, HDFS-14303-branch-2.7.004.patch, 
> HDFS-14303-branch-2.7.006.patch, HDFS-14303-branch-2.9.011.patch, 
> HDFS-14303-branch-2.9.012.patch, HDFS-14303-branch-2.9.013.patch, 
> HDFS-14303-trunk.014.patch, HDFS-14303-trunk.015.patch, 
> HDFS-14303-trunk.016.patch, HDFS-14303-trunk.016.path, 
> HDFS-14303.branch-3.2.017.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> chek block directory logic not correct when there is only meta file,print no 
> meaning warn log, eg:
>  WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block 
> ID-based layout. Actual block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68,
>  expected block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14303:
---
  Resolution: Fixed
Target Version/s: 2.9.2, 3.2.0  (was: 3.2.0, 2.9.2)
  Status: Resolved  (was: Patch Available)

Pushed the patches all the way through to branch-2.8. Thanks [~iamgd67]!

> check block directory logic not correct when there is only meta file, print 
> no meaning warn log
> ---
>
> Key: HDFS-14303
> URL: https://issues.apache.org/jira/browse/HDFS-14303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.7.3, 2.9.2, 2.8.5
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: easy-fix
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14303-branch-2.005.patch, 
> HDFS-14303-branch-2.009.patch, HDFS-14303-branch-2.010.patch, 
> HDFS-14303-branch-2.015.patch, HDFS-14303-branch-2.017.patch, 
> HDFS-14303-branch-2.7.001.patch, HDFS-14303-branch-2.7.004.patch, 
> HDFS-14303-branch-2.7.006.patch, HDFS-14303-branch-2.9.011.patch, 
> HDFS-14303-branch-2.9.012.patch, HDFS-14303-branch-2.9.013.patch, 
> HDFS-14303-trunk.014.patch, HDFS-14303-trunk.015.patch, 
> HDFS-14303-trunk.016.patch, HDFS-14303-trunk.016.path, 
> HDFS-14303.branch-3.2.017.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> chek block directory logic not correct when there is only meta file,print no 
> meaning warn log, eg:
>  WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block 
> ID-based layout. Actual block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68,
>  expected block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log

2019-06-20 Thread qiang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869155#comment-16869155
 ] 

qiang Liu commented on HDFS-14303:
--

[~jojochuang] patch for 3.2 failed because of timeout wainting minicluster to 
be active,  should it be retriggered? 

and is [^HDFS-14303-branch-2.017.patch]  ready to be pushed to branch-2?

> check block directory logic not correct when there is only meta file, print 
> no meaning warn log
> ---
>
> Key: HDFS-14303
> URL: https://issues.apache.org/jira/browse/HDFS-14303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.7.3, 2.9.2, 2.8.5
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: easy-fix
> Attachments: HDFS-14303-branch-2.005.patch, 
> HDFS-14303-branch-2.009.patch, HDFS-14303-branch-2.010.patch, 
> HDFS-14303-branch-2.015.patch, HDFS-14303-branch-2.017.patch, 
> HDFS-14303-branch-2.7.001.patch, HDFS-14303-branch-2.7.004.patch, 
> HDFS-14303-branch-2.7.006.patch, HDFS-14303-branch-2.9.011.patch, 
> HDFS-14303-branch-2.9.012.patch, HDFS-14303-branch-2.9.013.patch, 
> HDFS-14303-trunk.014.patch, HDFS-14303-trunk.015.patch, 
> HDFS-14303-trunk.016.patch, HDFS-14303-trunk.016.path, 
> HDFS-14303.branch-3.2.017.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> chek block directory logic not correct when there is only meta file,print no 
> meaning warn log, eg:
>  WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block 
> ID-based layout. Actual block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68,
>  expected block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x

2019-06-20 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869147#comment-16869147
 ] 

Lisheng Sun commented on HDFS-14585:


[~jojochuang] Could you please take a look ? HDFS-14483 is blocked by this 
issue. Thanks.

> Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14585.branch-2.8.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13359) DataXceiver hung due to the lock in FsDatasetImpl#getBlockInputStream

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869145#comment-16869145
 ] 

Hadoop QA commented on HDFS-13359:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-13359 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13359 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27027/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> DataXceiver hung due to the lock in FsDatasetImpl#getBlockInputStream
> -
>
> Key: HDFS-13359
> URL: https://issues.apache.org/jira/browse/HDFS-13359
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13359.001.patch, stack.jpg
>
>
> DataXceiver hung due to the lock that locked by 
>  {{FsDatasetImpl#getBlockInputStream}} (have attached stack).
> {code:java}
>   @Override // FsDatasetSpi
>   public InputStream getBlockInputStream(ExtendedBlock b,
>   long seekOffset) throws IOException {
> ReplicaInfo info;
> synchronized(this) {
>   info = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock());
> }
> ...
>   }
> {code}
> The lock {{synchronized(this)}} used here is expensive, there is already one 
> {{AutoCloseableLock}} type lock defined for {{ReplicaMap}}. We can use it 
> instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13359) DataXceiver hung due to the lock in FsDatasetImpl#getBlockInputStream

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869141#comment-16869141
 ] 

Wei-Chiu Chuang commented on HDFS-13359:


I would love to improve datanode lock contention, especially in the context of 
dense DataNodes.
That said, it would be really nice to have a performance benchmark to compare 
the performance before/after the change.

> DataXceiver hung due to the lock in FsDatasetImpl#getBlockInputStream
> -
>
> Key: HDFS-13359
> URL: https://issues.apache.org/jira/browse/HDFS-13359
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13359.001.patch, stack.jpg
>
>
> DataXceiver hung due to the lock that locked by 
>  {{FsDatasetImpl#getBlockInputStream}} (have attached stack).
> {code:java}
>   @Override // FsDatasetSpi
>   public InputStream getBlockInputStream(ExtendedBlock b,
>   long seekOffset) throws IOException {
> ReplicaInfo info;
> synchronized(this) {
>   info = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock());
> }
> ...
>   }
> {code}
> The lock {{synchronized(this)}} used here is expensive, there is already one 
> {{AutoCloseableLock}} type lock defined for {{ReplicaMap}}. We can use it 
> instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11298) Add storage policy info in FileStatus

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869139#comment-16869139
 ] 

Hadoop QA commented on HDFS-11298:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-11298 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-11298 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846064/HDFS-11298.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27026/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add storage policy info in FileStatus
> -
>
> Key: HDFS-11298
> URL: https://issues.apache.org/jira/browse/HDFS-11298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.7.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-11298.001.patch
>
>
> Its good to add storagePolicy field in FileStatus. We no need to call 
> getStoragePolicy() API to get the policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14592) Support NIO transferTo semantics in HDFS

2019-06-20 Thread Chenzhao Guo (JIRA)
Chenzhao Guo created HDFS-14592:
---

 Summary: Support NIO transferTo semantics in HDFS
 Key: HDFS-14592
 URL: https://issues.apache.org/jira/browse/HDFS-14592
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Chenzhao Guo


I'm currently developing a Spark shuffle manager based on HDFS. I need to merge 
some spill files on HDFS to one, or rearrange some HDFS files.

An API similar to NIO transferTo, which bypasses memory will be more efficient 
than manually reading and writing bytes(the method I'm using at present).

So can HDFS implements something like NIO transferTo? Making 
path.transferTo(pathDestination) possible? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-6937) Another issue in handling checksum errors in write pipeline

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-6937:
--
Resolution: Duplicate
  Assignee: (was: Wei-Chiu Chuang)
Status: Resolved  (was: Patch Available)

We spent several years removing all kinds of data corruption bugs, and we no 
longer see a data corruption incidence any more. There is a good chance that 
this is fixed by HDFS-4660 (or HDFS-11160 , HDFS-11056 , or other ones)

So, with that, I'll resolve this one as a dup.

> Another issue in handling checksum errors in write pipeline
> ---
>
> Key: HDFS-6937
> URL: https://issues.apache.org/jira/browse/HDFS-6937
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-6937.001.patch, HDFS-6937.002.patch
>
>
> Given a write pipeline:
> DN1 -> DN2 -> DN3
> DN3 detected cheksum error and terminate, DN2 truncates its replica to the 
> ACKed size. Then a new pipeline is attempted as
> DN1 -> DN2 -> DN4
> DN4 detects checksum error again. Later when replaced DN4 with DN5 (and so 
> on), it failed for the same reason. This led to the observation that DN2's 
> data is corrupted. 
> Found that the software currently truncates DN2's replca to the ACKed size 
> after DN3 terminates. But it doesn't check the correctness of the data 
> already written to disk.
> So intuitively, a solution would be, when downstream DN (DN3 here) found 
> checksum error, propagate this info back to upstream DN (DN2 here), DN2 
> checks the correctness of the data already written to disk, and truncate the 
> replica to to MIN(correctDataSize, ACKedSize).
> Found this issue is similar to what was reported by HDFS-3875, and the 
> truncation at DN2 was actually introduced as part of the HDFS-3875 solution. 
> Filing this jira for the issue reported here. HDFS-3875 was filed by 
> [~tlipcon]
> and found he proposed something similar there.
> {quote}
> if the tail node in the pipeline detects a checksum error, then it returns a 
> special error code back up the pipeline indicating this (rather than just 
> disconnecting)
> if a non-tail node receives this error code, then it immediately scans its 
> own block on disk (from the beginning up through the last acked length). If 
> it detects a corruption on its local copy, then it should assume that it is 
> the faulty one, rather than the downstream neighbor. If it detects no 
> corruption, then the faulty node is either the downstream mirror or the 
> network link between the two, and the current behavior is reasonable.
> {quote}
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869135#comment-16869135
 ] 

Hadoop QA commented on HDFS-14585:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  7m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.8 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
40s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
28s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
46s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
52s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
3s{color} | {color:green} branch-2.8 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
48s{color} | {color:red} hadoop-common-project/hadoop-common in branch-2.8 has 
1 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
50s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client in branch-2.8 
has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
29s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
43s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
23s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 10s{color} | {color:orange} root: The patch generated 3 new + 202 unchanged 
- 6 fixed = 205 total (was 208) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
12s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
26s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
29s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 84m 39s{color} | 
{color:black} 

[jira] [Commented] (HDFS-11298) Add storage policy info in FileStatus

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869133#comment-16869133
 ] 

Wei-Chiu Chuang commented on HDFS-11298:


Storage policy is a HDFS only feature. FileStatus is meant to support all file 
system abstraction, so I don't feel strongly about having this.

> Add storage policy info in FileStatus
> -
>
> Key: HDFS-11298
> URL: https://issues.apache.org/jira/browse/HDFS-11298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.7.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-11298.001.patch
>
>
> Its good to add storagePolicy field in FileStatus. We no need to call 
> getStoragePolicy() API to get the policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk

2019-06-20 Thread Masatake Iwasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869132#comment-16869132
 ] 

Masatake Iwasaki commented on HDFS-14135:
-

Failed tests call {{consumeConnectionBacklog}} in another thread. I'm going to 
update the patch to cover this.

> TestWebHdfsTimeouts Fails intermittently in trunk
> -
>
> Key: HDFS-14135
> URL: https://issues.apache.org/jira/browse/HDFS-14135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14135-01.patch, HDFS-14135-02.patch, 
> HDFS-14135-03.patch, HDFS-14135-04.patch, HDFS-14135-05.patch, 
> HDFS-14135-06.patch, HDFS-14135-07.patch, HDFS-14135-08.patch, 
> HDFS-14135.009.patch, HDFS-14135.010.patch, HDFS-14135.011.patch, 
> HDFS-14135.012.patch
>
>
> Reference to failure
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/982/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14591) NameNode should move the replicas to the correct storages after the storage policy is changed.

2019-06-20 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869129#comment-16869129
 ] 

Ayush Saxena edited comment on HDFS-14591 at 6/21/19 3:31 AM:
--

can SPS help? Give a check to HDFS-10285
External SPS is done, I guess Internal SPS is still in phase.


was (Author: ayushtkn):
can SPS help? Give a check to HDFS-10285

> NameNode should move the replicas to the correct storages after the storage 
> policy is changed.
> --
>
> Key: HDFS-14591
> URL: https://issues.apache.org/jira/browse/HDFS-14591
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
>
> Our Xiaomi HDFS has a cluster storaging both HOT and COLD data. We have a 
> backgroud process searching all the files to find those that are not accessed 
> for a period of time. Then we set them to COLD and start a mover to move the 
> replicas. After moving, all the replicas are consistent with the storage 
> policy.
> It's a natural idea to let the NameNode handle the move.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14591) NameNode should move the replicas to the correct storages after the storage policy is changed.

2019-06-20 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869129#comment-16869129
 ] 

Ayush Saxena commented on HDFS-14591:
-

can SPS help? Give a check to HDFS-10285

> NameNode should move the replicas to the correct storages after the storage 
> policy is changed.
> --
>
> Key: HDFS-14591
> URL: https://issues.apache.org/jira/browse/HDFS-14591
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
>
> Our Xiaomi HDFS has a cluster storaging both HOT and COLD data. We have a 
> backgroud process searching all the files to find those that are not accessed 
> for a period of time. Then we set them to COLD and start a mover to move the 
> replicas. After moving, all the replicas are consistent with the storage 
> policy.
> It's a natural idea to let the NameNode handle the move.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13893) DiskBalancer: no validations for Disk balancer commands

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13893:
---
   Resolution: Fixed
Fix Version/s: 3.2.1
   3.3.0
   Status: Resolved  (was: Patch Available)

+1 patch still applies.
Pushed to trunk, branch-3.2.

There's a trivial conflict in branch-3.1 so I'll stop here. Feel free to reopen 
and work on it.

> DiskBalancer: no validations for Disk balancer commands 
> 
>
> Key: HDFS-13893
> URL: https://issues.apache.org/jira/browse/HDFS-13893
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Harshakiran Reddy
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-13893.001.patch, HDFS-13893.002.patch, 
> HDFS-13893.003.patch
>
>
> {{Scenario:-}}
>  
>  1 Run the Disk Balancer commands with extra arguments passing  
> {noformat} 
> hadoopclient> hdfs diskbalancer -plan hostname --thresholdPercentage 2 
> *sgfsdgfs*
> 2018-08-31 14:57:35,454 INFO planner.GreedyPlanner: Starting plan for Node : 
> hostname:50077
> 2018-08-31 14:57:35,457 INFO planner.GreedyPlanner: Disk Volume set 
> fb67f00c-e333-4f38-a3a6-846a30d4205a Type : DISK plan completed.
> 2018-08-31 14:57:35,457 INFO planner.GreedyPlanner: Compute Plan for Node : 
> hostname:50077 took 23 ms
> 2018-08-31 14:57:35,457 INFO command.Command: Writing plan to:
> 2018-08-31 14:57:35,457 INFO command.Command: 
> /system/diskbalancer/2018-Aug-31-14-57-35/hostname.plan.json
> Writing plan to:
> /system/diskbalancer/2018-Aug-31-14-57-35/hostname.plan.json
> {noformat} 
> Expected Output:- 
> =
> Disk balancer commands should be fail if we pass any invalid arguments or 
> extra arguments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14591) NameNode should move the replicas to the correct storages after the storage policy is changed.

2019-06-20 Thread Jinglun (JIRA)
Jinglun created HDFS-14591:
--

 Summary: NameNode should move the replicas to the correct storages 
after the storage policy is changed.
 Key: HDFS-14591
 URL: https://issues.apache.org/jira/browse/HDFS-14591
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jinglun
Assignee: Jinglun


Our Xiaomi HDFS has a cluster storaging both HOT and COLD data. We have a 
backgroud process searching all the files to find those that are not accessed 
for a period of time. Then we set them to COLD and start a mover to move the 
replicas. After moving, all the replicas are consistent with the storage policy.
It's a natural idea to let the NameNode handle the move.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14568) The quota and consume of the file's ancestors are not handled when the storage policy of the file is changed.

2019-06-20 Thread Jinglun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-14568:
---
Description: 
The quota and consume of the file's ancestors are not handled when the storage 
policy of the file is changed. For example:
 1. Set quota StorageType.SSD fileSpace-1 to the parent dir;
 2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} under 
it;
 3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and 
expect a QuotaByStorageTypeExceededException.

Because the quota and consume is not handled, the expected exception is not 
threw out.

 

There are 3 reasons why we should handle the consume and the quota.
1. Replication uses the new storage policy. Considering a file with BlockType 
CONTIGUOUS. It's replication factor is 3 and it's storage policy is "HOT". Now 
we change the policy to "ONE_SSD". If a DN goes down and the file needs 
replication, the NN will choose storages in policy "ONE_SSD" and replicate the 
block to a SSD storage.
2. We acturally have a cluster storaging both HOT and COLD data. We have a 
backgroud process searching all the files to find those that are not accessed 
for a period of time. Then we set them to COLD and start a mover to move the 
replicas. After moving, all the replicas are consistent with the storage policy.
3. The NameNode manages the global state of the cluster. If there is any 
inconsistent situation, such as the replicas doesn't match the storage policy 
of the file, we should take the NameNode as the standard and make the cluster 
to match the NameNode. The block replication is a good example of the rule. 
When we count the consume of a file(CONTIGUOUS), we multiply the replication 
factor with the file's length, no matter the file is under replicated or 
excessed. So does the storage type quota and consume.

  was:
The quota and consume of the file's ancestors are not handled when the storage 
policy of the file is changed. For example:
 1. Set quota StorageType.SSD fileSpace-1 to the parent dir;
 2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} under 
it;
 3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and 
expect a QuotaByStorageTypeExceededException.

Because the quota and consume is not handled, the expected exception is not 
threw out.


> The quota and consume of the file's ancestors are not handled when the 
> storage policy of the file is changed.
> -
>
> Key: HDFS-14568
> URL: https://issues.apache.org/jira/browse/HDFS-14568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-14568-001.patch, HDFS-14568-unit-test.patch
>
>
> The quota and consume of the file's ancestors are not handled when the 
> storage policy of the file is changed. For example:
>  1. Set quota StorageType.SSD fileSpace-1 to the parent dir;
>  2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} 
> under it;
>  3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and 
> expect a QuotaByStorageTypeExceededException.
> Because the quota and consume is not handled, the expected exception is not 
> threw out.
>  
> There are 3 reasons why we should handle the consume and the quota.
> 1. Replication uses the new storage policy. Considering a file with BlockType 
> CONTIGUOUS. It's replication factor is 3 and it's storage policy is "HOT". 
> Now we change the policy to "ONE_SSD". If a DN goes down and the file needs 
> replication, the NN will choose storages in policy "ONE_SSD" and replicate 
> the block to a SSD storage.
> 2. We acturally have a cluster storaging both HOT and COLD data. We have a 
> backgroud process searching all the files to find those that are not accessed 
> for a period of time. Then we set them to COLD and start a mover to move the 
> replicas. After moving, all the replicas are consistent with the storage 
> policy.
> 3. The NameNode manages the global state of the cluster. If there is any 
> inconsistent situation, such as the replicas doesn't match the storage policy 
> of the file, we should take the NameNode as the standard and make the cluster 
> to match the NameNode. The block replication is a good example of the rule. 
> When we count the consume of a file(CONTIGUOUS), we multiply the replication 
> factor with the file's length, no matter the file is under replicated or 
> excessed. So does the storage type quota and consume.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Updated] (HDFS-12564) Add the documents of swebhdfs configurations on the client side

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-12564:
---
   Resolution: Fixed
Fix Version/s: 3.1.3
   3.2.1
   3.3.0
   Status: Resolved  (was: Patch Available)

+1 patch still applies.
Pushed to trunk, branch-3.2 and branch-3.1. Thanks [~tasanuma]!

> Add the documents of swebhdfs configurations on the client side
> ---
>
> Key: HDFS-12564
> URL: https://issues.apache.org/jira/browse/HDFS-12564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, webhdfs
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12564.1.patch, HDFS-12564.2.patch, 
> HDFS-12564.3.patch, HDFS-12564.4.patch
>
>
> Documentation does not cover the swebhdfs configurations on the client side. 
> We can reuse the hftp/hsftp documents which was removed from Hadoop-3.0 in 
> HDFS-5570, HDFS-9640.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14568) The quota and consume of the file's ancestors are not handled when the storage policy of the file is changed.

2019-06-20 Thread Jinglun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869122#comment-16869122
 ] 

Jinglun commented on HDFS-14568:


Hi [~ayushtkn], thanks for your comments. There are 3 reasons why we should 
handle the consume and the quota.
1. Replication uses the new storage policy. Considering a file with BlockType 
CONTIGUOUS. It's replication factor is 3 and it's storage policy is "HOT". Now 
we change the policy to "ONE_SSD". If a DN goes down and the file needs 
replication, the NN will choose storages in policy "ONE_SSD" and replicate the 
block to a SSD storage.
2. We acturally have a cluster storaging both HOT and COLD data. We have a 
backgroud process searching all the files to find those that are not accessed 
for a period of time. Then we set them to COLD and start a mover to move the 
replicas. After moving, all the replicas are consistent with the storage policy.
3. The NameNode manages the global state of the cluster. If there is any 
inconsistent situation, such as the replicas doesn't match the storage policy 
of the file, we should take the NameNode as the standard and make the cluster 
to match the NameNode. The block replication is a good example of the rule. 
When we count the consume of a file(CONTIGUOUS), we multiply the replication 
factor with the file's length, no matter the file is under replicated or 
excessed. So does the storage type quota and consume.

After the change of the storage type policy, the replicas should be moved to 
the right storages automatically by the NameNode. Let's start a new jira for 
that.

> The quota and consume of the file's ancestors are not handled when the 
> storage policy of the file is changed.
> -
>
> Key: HDFS-14568
> URL: https://issues.apache.org/jira/browse/HDFS-14568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-14568-001.patch, HDFS-14568-unit-test.patch
>
>
> The quota and consume of the file's ancestors are not handled when the 
> storage policy of the file is changed. For example:
>  1. Set quota StorageType.SSD fileSpace-1 to the parent dir;
>  2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} 
> under it;
>  3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and 
> expect a QuotaByStorageTypeExceededException.
> Because the quota and consume is not handled, the expected exception is not 
> threw out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12564) Add the documents of swebhdfs configurations on the client side

2019-06-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869123#comment-16869123
 ] 

Hudson commented on HDFS-12564:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16799 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16799/])
HDFS-12564. Add the documents of swebhdfs configurations on the client 
(weichiu: rev 98d20656433cdec76c2108d24ff3b935657c1e80)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/WebHDFS.md
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/ServerSetup.md.vm
* (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm


> Add the documents of swebhdfs configurations on the client side
> ---
>
> Key: HDFS-12564
> URL: https://issues.apache.org/jira/browse/HDFS-12564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, webhdfs
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-12564.1.patch, HDFS-12564.2.patch, 
> HDFS-12564.3.patch, HDFS-12564.4.patch
>
>
> Documentation does not cover the swebhdfs configurations on the client side. 
> We can reuse the hftp/hsftp documents which was removed from Hadoop-3.0 in 
> HDFS-5570, HDFS-9640.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14483) Backport HDFS-3246 ByteBuffer pread interface to branch-2.8.x

2019-06-20 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14483:
---
Attachment: HDFS-14483.branch-2.8.v1.patch
Status: Patch Available  (was: Open)

> Backport HDFS-3246 ByteBuffer pread interface to branch-2.8.x
> -
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14465) When the Block expected replications is larger than the number of DataNodes, entering maintenance will never exit.

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14465:
---
   Resolution: Fixed
Fix Version/s: 2.9.3
   2.10.0
   Status: Resolved  (was: Patch Available)

Pushed to branch-2 and branch-2.9.
Thanks!

> When the Block expected replications is larger than the number of DataNodes, 
> entering maintenance will never exit.
> --
>
> Key: HDFS-14465
> URL: https://issues.apache.org/jira/browse/HDFS-14465
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14465.01.patch, HDFS-14465.02.patch, 
> HDFS-14465.branch-2.9.01.patch
>
>
> Scenes:
> There is a small HDFS cluster with 5 DataNodes; one of them is maintained, 
> added to the maintenance list, and set 
> dfs.namenode.maintenance.replication.min to 1.
> When refresh Nodes, the NameNode starts checking whether the blocks on the 
> node require a new replication.
> The replications of the MapReduce task job file is 10 by default, 
> isNeededReplicationForMaintenance will determine to false, and 
> isSufficientlyReplicated will determine to false, so the block of the job 
> file needs to increase the replication.
> When adding a replication, since the cluster has only 5 DataNodes, all the 
> nodes have the replications of the block, chooseTargetInOrder will throw a 
> NotEnoughReplicasException, so that the replication cannot be increase, and 
> the Entering Maintenance cannot be ended.
> This issue will cause the independent small cluster to be unable to use the 
> maintenance mode.
>  
> {panel:title=chooseTarget exception log}
> 2019-05-03 23:42:31,008 [31545331] - WARN  
> [ReplicationMonitor:BlockPlacementPolicyDefault@431] - Failed to place enough 
> replicas, still in need of 1 to reach 5 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and 
> org.apache.hadoop.net.NetworkTopology
> {panel}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14590) [SBN Read] Add the document link to the top page

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869109#comment-16869109
 ] 

Hadoop QA commented on HDFS-14590:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
27m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14590 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972394/HDFS-14590.001.patch |
| Optional Tests |  dupname  asflicense  mvnsite  xml  |
| uname | Linux c6f774a85d5d 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d9a9e99 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 436 (vs. ulimit of 1) |
| modules | C: hadoop-project U: hadoop-project |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27023/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [SBN Read] Add the document link to the top page
> 
>
> Key: HDFS-14590
> URL: https://issues.apache.org/jira/browse/HDFS-14590
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-14590.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14586) Trash missing delete the folder which near timeout checkpoint

2019-06-20 Thread maobaolong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869107#comment-16869107
 ] 

maobaolong commented on HDFS-14586:
---

[~huyongfa] This is what our company most want? Thank you for your contribution!

> Trash missing delete the folder which near timeout checkpoint
> -
>
> Key: HDFS-14586
> URL: https://issues.apache.org/jira/browse/HDFS-14586
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hu yongfa
>Priority: Major
> Attachments: HDFS-14586.001.patch
>
>
> when trash timeout checkpoint coming, trash will delete the old folder first, 
> then create a new checkpoint folder.
> as the delete action may spend a long time, such as 2 minutes, so the new 
> checkpoint folder created late.
> at the next trash timeout checkpoint, trash will skip delete the new 
> checkpoint folder, because the new checkpoint folder is 
> less than a checkpoint interval.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk

2019-06-20 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869104#comment-16869104
 ] 

Ayush Saxena commented on HDFS-14135:
-

The test failed in this build.
https://builds.apache.org/job/PreCommit-HDFS-Build/27020/testReport/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/


> TestWebHdfsTimeouts Fails intermittently in trunk
> -
>
> Key: HDFS-14135
> URL: https://issues.apache.org/jira/browse/HDFS-14135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14135-01.patch, HDFS-14135-02.patch, 
> HDFS-14135-03.patch, HDFS-14135-04.patch, HDFS-14135-05.patch, 
> HDFS-14135-06.patch, HDFS-14135-07.patch, HDFS-14135-08.patch, 
> HDFS-14135.009.patch, HDFS-14135.010.patch, HDFS-14135.011.patch, 
> HDFS-14135.012.patch
>
>
> Reference to failure
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/982/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14586) Trash missing delete the folder which near timeout checkpoint

2019-06-20 Thread Wu Weiwei (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869102#comment-16869102
 ] 

Wu Weiwei commented on HDFS-14586:
--

Great! 

This problem has also occurred in our production environment. More than one day 
of the checkpoint directory was retained, triggering an annoying alarm call, 
and we had to manually delete the expired directory.

This patch can solve our problem.

> Trash missing delete the folder which near timeout checkpoint
> -
>
> Key: HDFS-14586
> URL: https://issues.apache.org/jira/browse/HDFS-14586
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hu yongfa
>Priority: Major
> Attachments: HDFS-14586.001.patch
>
>
> when trash timeout checkpoint coming, trash will delete the old folder first, 
> then create a new checkpoint folder.
> as the delete action may spend a long time, such as 2 minutes, so the new 
> checkpoint folder created late.
> at the next trash timeout checkpoint, trash will skip delete the new 
> checkpoint folder, because the new checkpoint folder is 
> less than a checkpoint interval.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x

2019-06-20 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14585:
---
Attachment: HDFS-14585.branch-2.8.v1.patch
Status: Patch Available  (was: Open)

> Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14585.branch-2.8.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x

2019-06-20 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14585:
---
Attachment: (was: HDFS-14585.branch-2.8.5.v1.patch)

> Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x

2019-06-20 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14585:
---
Status: Open  (was: Patch Available)

> Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x

2019-06-20 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14585:
---
Attachment: (was: HDFS-14585.branch285.000.patch)

> Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869092#comment-16869092
 ] 

Hadoop QA commented on HDFS-12914:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} HDFS-12914 does not apply to branch-3.1. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12914 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972354/HDFS-12914.branch-3.1.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27022/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, 
> HDFS-12914.branch-3.2.patch, HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14589) RPC fairness for Datanode data transfers

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869090#comment-16869090
 ] 

Wei-Chiu Chuang commented on HDFS-14589:


If this capability exists, we can avoid the situation found in HDFS-12737.

> RPC fairness for Datanode data transfers
> 
>
> Key: HDFS-14589
> URL: https://issues.apache.org/jira/browse/HDFS-14589
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Íñigo Goiri
>Assignee: Xue Liu
>Priority: Major
>
> Currently, the Datanode just replies to the data transfers from the clients 
> as soon as they come.
> Eventually, when the {{DataXceiverServer}} runs out of threads, it just 
> refuses:
> {code}
> // Make sure the xceiver count is not exceeded
> int curXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
>   throw new IOException("Xceiver count " + curXceiverCount
>   + " exceeds the limit of concurrent xcievers: "
>   + maxXceiverCount);
> }
> {code}
> We had a situation where a user had many containers accessing the same block 
> and ending up saturating the 3 Datanodes and messing with the other users.
> Ideally, the Namenode should manage this situation some degree but we can 
> still get into this situation.
> We should have some smart in the DN to track this and apply some fairness to 
> the number of requests per user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14590) [SBN Read] Add the document link to the top page

2019-06-20 Thread Takanobu Asanuma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-14590:

Status: Patch Available  (was: Open)

> [SBN Read] Add the document link to the top page
> 
>
> Key: HDFS-14590
> URL: https://issues.apache.org/jira/browse/HDFS-14590
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-14590.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14590) [SBN Read] Add the document link to the top page

2019-06-20 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869087#comment-16869087
 ] 

Takanobu Asanuma commented on HDFS-14590:
-

Uploaded the 1st patch.

> [SBN Read] Add the document link to the top page
> 
>
> Key: HDFS-14590
> URL: https://issues.apache.org/jira/browse/HDFS-14590
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-14590.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14590) [SBN Read] Add the document link to the top page

2019-06-20 Thread Takanobu Asanuma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-14590:

Attachment: HDFS-14590.001.patch

> [SBN Read] Add the document link to the top page
> 
>
> Key: HDFS-14590
> URL: https://issues.apache.org/jira/browse/HDFS-14590
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-14590.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14590) [SBN Read] Add the document link to the top page

2019-06-20 Thread Takanobu Asanuma (JIRA)
Takanobu Asanuma created HDFS-14590:
---

 Summary: [SBN Read] Add the document link to the top page
 Key: HDFS-14590
 URL: https://issues.apache.org/jira/browse/HDFS-14590
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869073#comment-16869073
 ] 

Hadoop QA commented on HDFS-14135:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 54s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 13s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}139m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14135 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972380/HDFS-14135.012.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a1d3cfafc8ff 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d9a9e99 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27020/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27020/testReport/ |
| Max. process+thread count | 4225 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Commented] (HDDS-1667) Docker compose file may referring to incorrect docker image name

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869053#comment-16869053
 ] 

Hadoop QA commented on HDDS-1667:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} yamllint {color} | {color:blue}  0m  
0s{color} | {color:blue} yamllint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
32s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
32m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
56s{color} | {color:green} hadoop-hdds in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 59s{color} 
| {color:red} hadoop-ozone in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 83m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.client.rpc.TestOzoneRpcClient |
|   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
|   | hadoop.ozone.client.rpc.TestFailureHandlingByClient |
|   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
|   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
|   | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider |
|   | hadoop.hdds.scm.safemode.TestSCMSafeModeWithPipelineRules |
|   | hadoop.ozone.om.TestOzoneManagerHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/PreCommit-HDDS-Build/2735/artifact/out/Dockerfile 
|
| JIRA Issue | HDDS-1667 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972383/HDDS-1667.006.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient xml yamllint |
| uname | Linux 9b17e4d12cb2 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 

[jira] [Updated] (HDFS-14573) Backport Standby Read to branch-3

2019-06-20 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14573:
--
Attachment: HDFS-14573-branch-3.2.004.patch

> Backport Standby Read to branch-3
> -
>
> Key: HDFS-14573
> URL: https://issues.apache.org/jira/browse/HDFS-14573
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14573-branch-3.0.001.patch, 
> HDFS-14573-branch-3.1.001.patch, HDFS-14573-branch-3.2.001.patch, 
> HDFS-14573-branch-3.2.002.patch, HDFS-14573-branch-3.2.003.patch, 
> HDFS-14573-branch-3.2.004.patch
>
>
> This Jira tracks backporting the feature consistent read from standby 
> (HDFS-12943) to branch-3.x, including 3.0, 3.1, 3.2. This is required for 
> backporting to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue

2019-06-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869039#comment-16869039
 ] 

Íñigo Goiri commented on HDFS-14403:


I opened a JIRA for the DN to use this kind of fairness (HDFS-14589).
I wanted to double check that this was not already in the air.
>From the code in DataXceiverServer and the DataNode it doesn't look like this 
>is done at all.

Regarding this JIRA itself, I have a couple minor comments:
* The location of {{applyWeights()}} in {{ProcessingDetails}} is a little 
dangerous; should we make sure that weight and timing are the same size? I'm 
not sure the {{applyWeights()}} naming is the most intuitive either. It looks 
more like a {{getCost()}}. I would actually just move the full code to 
{{WeightedTimeCostProvider}}. Not sure it adds much value to add it to 
{{ProcessingDetails}}.
* Do we want to add a couple unit tests with corner cases? Like having no 
requests, etc?
* Use logger format for the logs (using {}).
* Use lambda for the {{GenericTestUtils#waitFor()}}.
* I would add some high level text about the cost based approach not only 
describing the fields but describing what the goal is and how one would use it 
and set it up "end to end"-ish.

> Cost-Based RPC FairCallQueue
> 
>
> Key: HDFS-14403
> URL: https://issues.apache.org/jira/browse/HDFS-14403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc, namenode
>Reporter: Erik Krogen
>Assignee: Christopher Gregorian
>Priority: Major
>  Labels: qos, rpc
> Attachments: CostBasedFairCallQueueDesign_v0.pdf, 
> HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, 
> HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, 
> HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, 
> HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, 
> HDFS-14403.branch-2.8.patch
>
>
> HADOOP-15016 initially described extensions to the Hadoop FairCallQueue 
> encompassing both cost-based analysis of incoming RPCs, as well as support 
> for reservations of RPC capacity for system/platform users. This JIRA intends 
> to track the former, as HADOOP-15016 was repurposed to more specifically 
> focus on the reservation portion of the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14589) RPC fairness for Datanode data transfers

2019-06-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869036#comment-16869036
 ] 

Íñigo Goiri commented on HDFS-14589:


Not sure if we should use similat to the cost-based RPC FairCallQueue in 
HDFS-14403 as the requests might all be the same size but it is worth 
experimenting.

> RPC fairness for Datanode data transfers
> 
>
> Key: HDFS-14589
> URL: https://issues.apache.org/jira/browse/HDFS-14589
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Íñigo Goiri
>Assignee: Xue Liu
>Priority: Major
>
> Currently, the Datanode just replies to the data transfers from the clients 
> as soon as they come.
> Eventually, when the {{DataXceiverServer}} runs out of threads, it just 
> refuses:
> {code}
> // Make sure the xceiver count is not exceeded
> int curXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
>   throw new IOException("Xceiver count " + curXceiverCount
>   + " exceeds the limit of concurrent xcievers: "
>   + maxXceiverCount);
> }
> {code}
> We had a situation where a user had many containers accessing the same block 
> and ending up saturating the 3 Datanodes and messing with the other users.
> Ideally, the Namenode should manage this situation some degree but we can 
> still get into this situation.
> We should have some smart in the DN to track this and apply some fairness to 
> the number of requests per user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14589) RPC fairness for Datanode data transfers

2019-06-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869036#comment-16869036
 ] 

Íñigo Goiri edited comment on HDFS-14589 at 6/20/19 11:52 PM:
--

Not sure if we should use similar to the cost-based RPC FairCallQueue in 
HDFS-14403 as the requests might all be the same size but it is worth 
experimenting.


was (Author: elgoiri):
Not sure if we should use similat to the cost-based RPC FairCallQueue in 
HDFS-14403 as the requests might all be the same size but it is worth 
experimenting.

> RPC fairness for Datanode data transfers
> 
>
> Key: HDFS-14589
> URL: https://issues.apache.org/jira/browse/HDFS-14589
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Íñigo Goiri
>Assignee: Xue Liu
>Priority: Major
>
> Currently, the Datanode just replies to the data transfers from the clients 
> as soon as they come.
> Eventually, when the {{DataXceiverServer}} runs out of threads, it just 
> refuses:
> {code}
> // Make sure the xceiver count is not exceeded
> int curXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
>   throw new IOException("Xceiver count " + curXceiverCount
>   + " exceeds the limit of concurrent xcievers: "
>   + maxXceiverCount);
> }
> {code}
> We had a situation where a user had many containers accessing the same block 
> and ending up saturating the 3 Datanodes and messing with the other users.
> Ideally, the Namenode should manage this situation some degree but we can 
> still get into this situation.
> We should have some smart in the DN to track this and apply some fairness to 
> the number of requests per user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14589) RPC fairness for Datanode data transfers

2019-06-20 Thread JIRA
Íñigo Goiri created HDFS-14589:
--

 Summary: RPC fairness for Datanode data transfers
 Key: HDFS-14589
 URL: https://issues.apache.org/jira/browse/HDFS-14589
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Íñigo Goiri
Assignee: Xue Liu


Currently, the Datanode just replies to the data transfers from the clients as 
soon as they come.
Eventually, when the {{DataXceiverServer}} runs out of threads, it just refuses:
{code}
// Make sure the xceiver count is not exceeded
int curXceiverCount = datanode.getXceiverCount();
if (curXceiverCount > maxXceiverCount) {
  throw new IOException("Xceiver count " + curXceiverCount
  + " exceeds the limit of concurrent xcievers: "
  + maxXceiverCount);
}
{code}
We had a situation where a user had many containers accessing the same block 
and ending up saturating the 3 Datanodes and messing with the other users.
Ideally, the Namenode should manage this situation some degree but we can still 
get into this situation.
We should have some smart in the DN to track this and apply some fairness to 
the number of requests per user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1711) Set a global reference property for Ozone image name

2019-06-20 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1711:

Attachment: HDDS-1711.001.patch

> Set a global reference property for Ozone image name
> 
>
> Key: HDDS-1711
> URL: https://issues.apache.org/jira/browse/HDDS-1711
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: HDDS-1711.001.patch
>
>
> Ozone Kubernetes templates are using docker.image property for controlling 
> which image to run Kubernetes examples.  It would be best to rename the 
> property to ozone.docker.image to prevent conflict between Ozone and other 
> Hadoop sub-projects.
> There are also a few typo in the existing Kubernetes templates that reference 
> to elek/ozone.  This looks like need to match the default Ozone image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads (Dynamometer)

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869026#comment-16869026
 ] 

Hadoop QA commented on HDFS-12345:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 24m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 17m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-assemblies hadoop-tools hadoop-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
31s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 
15s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 27s{color} | {color:orange} root: The patch generated 18 new + 0 unchanged - 
0 fixed = 18 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 16m 
58s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} shellcheck {color} | {color:red}  0m  
1s{color} | {color:red} The patch generated 5 new + 1 unchanged - 0 fixed = 6 
total (was 1) {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
18s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 6 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
22s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-assemblies hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-dist 
hadoop-tools/hadoop-dynamometer hadoop-tools hadoop-dist . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}175m 26s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 2s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}355m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed 

[jira] [Commented] (HDDS-1667) Docker compose file may referring to incorrect docker image name

2019-06-20 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869024#comment-16869024
 ] 

Eric Yang commented on HDDS-1667:
-

[~elek] Patch 006 ensures docker.image can continue to work for Kubernetes 
templates until HDDS-1711 is committed.  What should be the default value for 
HADOOP_IMAGE?

> Docker compose file may referring to incorrect docker image name
> 
>
> Key: HDDS-1667
> URL: https://issues.apache.org/jira/browse/HDDS-1667
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Trivial
> Fix For: 0.4.1
>
> Attachments: HDDS-1667.001.patch, HDDS-1667.002.patch, 
> HDDS-1667.003.patch, HDDS-1667.004.patch, HDDS-1667.005.patch, 
> HDDS-1667.006.patch
>
>
> In fault injection test, the docker compose file is templated using:
> ${user.name}/ozone:${project.version}
> If user pass in parameter -Ddocker.image to cause docker build to generate a 
> different name.  This can cause fault injection test to fail/stuck because it 
> could not find the required docker image.  The fix is simply use docker.image 
> token to filter docker compose file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1667) Docker compose file may referring to incorrect docker image name

2019-06-20 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1667:

Attachment: HDDS-1667.006.patch

> Docker compose file may referring to incorrect docker image name
> 
>
> Key: HDDS-1667
> URL: https://issues.apache.org/jira/browse/HDDS-1667
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Trivial
> Fix For: 0.4.1
>
> Attachments: HDDS-1667.001.patch, HDDS-1667.002.patch, 
> HDDS-1667.003.patch, HDDS-1667.004.patch, HDDS-1667.005.patch, 
> HDDS-1667.006.patch
>
>
> In fault injection test, the docker compose file is templated using:
> ${user.name}/ozone:${project.version}
> If user pass in parameter -Ddocker.image to cause docker build to generate a 
> different name.  This can cause fault injection test to fail/stuck because it 
> could not find the required docker image.  The fix is simply use docker.image 
> token to filter docker compose file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14587) Support fail fast when client wait ACK by pipeline over threshold

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869010#comment-16869010
 ] 

Wei-Chiu Chuang commented on HDFS-14587:


This one: HDFS-8311.
Note that despite the summary suggests it's for data transfer, the timeout 
applies to clients as well. See the analysis in HDFS-13103.

So I guess you're on 2.7? HDFS-8311 is in 2.8.0

> Support fail fast when client wait ACK by pipeline over threshold
> -
>
> Key: HDFS-14587
> URL: https://issues.apache.org/jira/browse/HDFS-14587
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>
> Recently, I meet corner case that client wait for data to be acknowledged by 
> pipeline over 9 hours. After check branch trunk, I think this issue still 
> exist. So I propose to add threshold about wait timeout then fail fast.
> {code:java}
> 2019-06-18 12:53:46,217 WARN [Thread-127] org.apache.hadoop.hdfs.DFSClient: 
> Slow waitForAckedSeqno took 35560718ms (threshold=3ms)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1690) ContainerController should provide a way to retrieve containers per volume

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1690?focusedWorklogId=264190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264190
 ]

ASF GitHub Bot logged work on HDDS-1690:


Author: ASF GitHub Bot
Created on: 20/Jun/19 22:22
Start Date: 20/Jun/19 22:22
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #986: [HDDS-1690] 
ContainerController should provide a way to retrieve cont…
URL: https://github.com/apache/hadoop/pull/986#issuecomment-504220709
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 29 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 485 | trunk passed |
   | +1 | compile | 260 | trunk passed |
   | +1 | checkstyle | 73 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 866 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 163 | trunk passed |
   | 0 | spotbugs | 313 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 503 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 444 | the patch passed |
   | +1 | compile | 265 | the patch passed |
   | +1 | javac | 265 | the patch passed |
   | +1 | checkstyle | 78 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 1 | The patch has no whitespace issues. |
   | +1 | shadedclient | 675 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 153 | the patch passed |
   | +1 | findbugs | 520 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 235 | hadoop-hdds in the patch passed. |
   | -1 | unit | 1076 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 35 | The patch does not generate ASF License warnings. |
   | | | 6027 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-986/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/986 |
   | JIRA Issue | HDDS-1690 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 722a02f0507a 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / d9a9e99 |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-986/3/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-986/3/testReport/ |
   | Max. process+thread count | 4643 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/container-service U: 
hadoop-hdds/container-service |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-986/3/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264190)
Time Spent: 2h 10m  (was: 2h)

> ContainerController should provide a way to retrieve containers per volume
> --
>
> Key: HDDS-1690
> URL: https://issues.apache.org/jira/browse/HDDS-1690
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Hrishikesh Gadre
>Assignee: Hrishikesh Gadre
>Priority: Major
>  Labels: pull-request-available
> 

[jira] [Commented] (HDDS-1690) ContainerController should provide a way to retrieve containers per volume

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868978#comment-16868978
 ] 

Hadoop QA commented on HDDS-1690:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
43s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  5m 
13s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
55s{color} | {color:green} hadoop-hdds in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 56s{color} 
| {color:red} hadoop-ozone in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}100m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException |
|   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
|   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
|   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
|   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-986/3/artifact/out/Dockerfile
 |
| GITHUB PR | https://github.com/apache/hadoop/pull/986 |
| JIRA Issue | HDDS-1690 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 722a02f0507a 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| 

[jira] [Commented] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk

2019-06-20 Thread Masatake Iwasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868977#comment-16868977
 ] 

Masatake Iwasaki commented on HDFS-14135:
-

012 for addressing checkstyle errors and javac warnings.

> TestWebHdfsTimeouts Fails intermittently in trunk
> -
>
> Key: HDFS-14135
> URL: https://issues.apache.org/jira/browse/HDFS-14135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14135-01.patch, HDFS-14135-02.patch, 
> HDFS-14135-03.patch, HDFS-14135-04.patch, HDFS-14135-05.patch, 
> HDFS-14135-06.patch, HDFS-14135-07.patch, HDFS-14135-08.patch, 
> HDFS-14135.009.patch, HDFS-14135.010.patch, HDFS-14135.011.patch, 
> HDFS-14135.012.patch
>
>
> Reference to failure
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/982/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk

2019-06-20 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14135:

Attachment: HDFS-14135.012.patch

> TestWebHdfsTimeouts Fails intermittently in trunk
> -
>
> Key: HDFS-14135
> URL: https://issues.apache.org/jira/browse/HDFS-14135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14135-01.patch, HDFS-14135-02.patch, 
> HDFS-14135-03.patch, HDFS-14135-04.patch, HDFS-14135-05.patch, 
> HDFS-14135-06.patch, HDFS-14135-07.patch, HDFS-14135-08.patch, 
> HDFS-14135.009.patch, HDFS-14135.010.patch, HDFS-14135.011.patch, 
> HDFS-14135.012.patch
>
>
> Reference to failure
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/982/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1685) Recon: Add support for "start" query param to containers and containers/{id} endpoints

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1685?focusedWorklogId=264183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264183
 ]

ASF GitHub Bot logged work on HDDS-1685:


Author: ASF GitHub Bot
Created on: 20/Jun/19 22:15
Start Date: 20/Jun/19 22:15
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #987: HDDS-1685. 
Recon: Add support for 'start' query param to containers…
URL: https://github.com/apache/hadoop/pull/987#discussion_r296032332
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/ContainerDBServiceProviderImpl.java
 ##
 @@ -164,38 +194,42 @@ public Integer getCountForForContainerKeyPrefix(
 return prefixes;
   }
 
-  /**
-   * Get all the containers.
-   *
-   * @return Map of containerID -> containerMetadata.
-   * @throws IOException
-   */
-  @Override
-  public Map getContainers() throws IOException {
-// Set a negative limit to get all the containers.
-return getContainers(-1);
 
 Review comment:
   Can we name the -1 as something like Container.ALL?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264183)
Time Spent: 1h  (was: 50m)

> Recon: Add support for "start" query param to containers and containers/{id} 
> endpoints
> --
>
> Key: HDDS-1685
> URL: https://issues.apache.org/jira/browse/HDDS-1685
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> * Support "start" query param to seek to the given key in RocksDB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1685) Recon: Add support for "start" query param to containers and containers/{id} endpoints

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1685?focusedWorklogId=264185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264185
 ]

ASF GitHub Bot logged work on HDDS-1685:


Author: ASF GitHub Bot
Created on: 20/Jun/19 22:15
Start Date: 20/Jun/19 22:15
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #987: HDDS-1685. 
Recon: Add support for 'start' query param to containers…
URL: https://github.com/apache/hadoop/pull/987#discussion_r296030739
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/ContainerDBServiceProviderImpl.java
 ##
 @@ -128,23 +128,53 @@ public Integer getCountForForContainerKeyPrefix(
   }
 
   /**
-   * Use the DB's prefix seek iterator to start the scan from the given
-   * container ID prefix.
+   * Get key prefixes for the given container ID.
*
* @param containerId the given containerId.
* @return Map of (Key-Prefix,Count of Keys).
*/
   @Override
   public Map getKeyPrefixesForContainer(
   long containerId) throws IOException {
+// set the default startKeyPrefix to empty string
+return getKeyPrefixesForContainer(containerId, "");
 
 Review comment:
   Maybe we can use StringUtils.EMPTY.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264185)
Time Spent: 1h 20m  (was: 1h 10m)

> Recon: Add support for "start" query param to containers and containers/{id} 
> endpoints
> --
>
> Key: HDDS-1685
> URL: https://issues.apache.org/jira/browse/HDDS-1685
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> * Support "start" query param to seek to the given key in RocksDB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1685) Recon: Add support for "start" query param to containers and containers/{id} endpoints

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1685?focusedWorklogId=264184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264184
 ]

ASF GitHub Bot logged work on HDDS-1685:


Author: ASF GitHub Bot
Created on: 20/Jun/19 22:15
Start Date: 20/Jun/19 22:15
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #987: HDDS-1685. 
Recon: Add support for 'start' query param to containers…
URL: https://github.com/apache/hadoop/pull/987#discussion_r296032651
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/spi/ContainerDBServiceProvider.java
 ##
 @@ -85,7 +86,8 @@ Integer getCountForForContainerKeyPrefix(
* @return Map of containerID -> containerMetadata.
* @throws IOException
*/
-  Map getContainers(int limit) throws IOException;
+  Map getContainers(int limit, long start)
 
 Review comment:
   According to the implementation, the start key will be skipped if present in 
the seek. This is to support a pagination kind of API. Maybe, we can change the 
param name to reflect this. Something like previous instead of start?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264184)
Time Spent: 1h 10m  (was: 1h)

> Recon: Add support for "start" query param to containers and containers/{id} 
> endpoints
> --
>
> Key: HDDS-1685
> URL: https://issues.apache.org/jira/browse/HDDS-1685
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> * Support "start" query param to seek to the given key in RocksDB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14588) Client retries Standby NN continuously even if Active NN is available (WebHDFS)

2019-06-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868973#comment-16868973
 ] 

Íñigo Goiri commented on HDFS-14588:


I'm guessing the solution is to throw the standby exception and that's it?
I would expect this to happen already.
Can you put a unit test showing this behavior?

In the last couple months we had an issue with active/standby with WebHDFS; it 
might be worth mentioning.
The client connects to the NN asking to write a file say (reading should be 
pretty straightforward).
The NN replies the address of a DN with a parameter called "namenoderpcaddress" 
(this is the tricky one).
When the DN receives the write request it creates a regular RPC client 
(DFSClient to be specific) which connects with the NN again and does the write.
The issue we had in the past is the namenoderpcaddress being the address of the 
active NN.
When the NN failed over to some other NN, the DN couldn't find the NN to 
complete, etc.
Bottomline, for active/standby the namenoderpcaddress can be a source of issues.
Not sure is the same, but worth bringing it up.

> Client retries Standby NN continuously even if Active NN is available 
> (WebHDFS)
> ---
>
> Key: HDFS-14588
> URL: https://issues.apache.org/jira/browse/HDFS-14588
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Priority: Major
>
> This is a behavior we have observed in our HA setup of HDFS.
>  # Active NN is up and serving traffic.
>  # Stand By NN is restarted for maintenance.
>  # After step 2 all new clients (webhdfs only) which connect to Stand By keep 
> seeing Retriable Exception as Stand By NN is not yet started (Rpc server is 
> yet to come up as FS image is loading) but http server is started and ready 
> to accept traffic. This keeps happening till rpcserver is up and SNN knows 
> that it's truely standby. Based on start up time this behavior can continue 
> based on start-up times which is high (many minutes) for big clusters.
> This above behavior is causing low availability of HDFS when HDFS is actually 
> still available.
> Ideally webhdfs should throw standby exception (if HA is enabled) and let 
> clients connect to active following that. If active is also not available 
> clients will bounce and automatically connect to the right active.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14588) Client retries Standby NN continuously even if Active NN is available (WebHDFS)

2019-06-20 Thread CR Hota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868970#comment-16868970
 ] 

CR Hota commented on HDFS-14588:


[~xkrogen] Thanks for the review.

Yes to throw StandbyException but ONLY if HA is enabled.

> Client retries Standby NN continuously even if Active NN is available 
> (WebHDFS)
> ---
>
> Key: HDFS-14588
> URL: https://issues.apache.org/jira/browse/HDFS-14588
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Priority: Major
>
> This is a behavior we have observed in our HA setup of HDFS.
>  # Active NN is up and serving traffic.
>  # Stand By NN is restarted for maintenance.
>  # After step 2 all new clients (webhdfs only) which connect to Stand By keep 
> seeing Retriable Exception as Stand By NN is not yet started (Rpc server is 
> yet to come up as FS image is loading) but http server is started and ready 
> to accept traffic. This keeps happening till rpcserver is up and SNN knows 
> that it's truely standby. Based on start up time this behavior can continue 
> based on start-up times which is high (many minutes) for big clusters.
> This above behavior is causing low availability of HDFS when HDFS is actually 
> still available.
> Ideally webhdfs should throw standby exception (if HA is enabled) and let 
> clients connect to active following that. If active is also not available 
> clients will bounce and automatically connect to the right active.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11021) Add FSNamesystemLock metrics for BlockManager operations

2019-06-20 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868969#comment-16868969
 ] 

Konstantin Shvachko commented on HDFS-11021:


I think separating IBR and FBR lock metrics out of OTHER will be valuable for 
monitoring cluster health. This are internal (not client-facing) operations, so 
increase in latency can be treated as an alert that something is going wrong on 
the cluster.

> Add FSNamesystemLock metrics for BlockManager operations
> 
>
> Key: HDFS-11021
> URL: https://issues.apache.org/jira/browse/HDFS-11021
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> Right now the operations which the {{BlockManager}} issues to the 
> {{Namesystem}} will not emit metrics about which operation caused the 
> {{FSNamesystemLock}} to be held; they are all grouped under "OTHER". We 
> should fix this since the {{BlockManager}} creates many acquisitions of both 
> the read and write locks. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11021) Add FSNamesystemLock metrics for BlockManager operations

2019-06-20 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868969#comment-16868969
 ] 

Konstantin Shvachko edited comment on HDFS-11021 at 6/20/19 10:09 PM:
--

I think separating IBR and FBR lock metrics out of OTHER will be valuable for 
monitoring cluster health. These are internal (not client-facing) operations, 
so increase in latency can be treated as an alert that something is going wrong 
on the cluster.


was (Author: shv):
I think separating IBR and FBR lock metrics out of OTHER will be valuable for 
monitoring cluster health. This are internal (not client-facing) operations, so 
increase in latency can be treated as an alert that something is going wrong on 
the cluster.

> Add FSNamesystemLock metrics for BlockManager operations
> 
>
> Key: HDFS-11021
> URL: https://issues.apache.org/jira/browse/HDFS-11021
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> Right now the operations which the {{BlockManager}} issues to the 
> {{Namesystem}} will not emit metrics about which operation caused the 
> {{FSNamesystemLock}} to be held; they are all grouped under "OTHER". We 
> should fix this since the {{BlockManager}} creates many acquisitions of both 
> the read and write locks. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14588) Client retries Standby NN continuously even if Active NN is available (WebHDFS)

2019-06-20 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868968#comment-16868968
 ] 

Erik Krogen commented on HDFS-14588:


Seems like bad behavior. Am I correct in saying that your proposed fix is to 
have WebHDFS throw a {{StandbyException}} when the FSImage is in a loading 
state?

> Client retries Standby NN continuously even if Active NN is available 
> (WebHDFS)
> ---
>
> Key: HDFS-14588
> URL: https://issues.apache.org/jira/browse/HDFS-14588
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Priority: Major
>
> This is a behavior we have observed in our HA setup of HDFS.
>  # Active NN is up and serving traffic.
>  # Stand By NN is restarted for maintenance.
>  # After step 2 all new clients (webhdfs only) which connect to Stand By keep 
> seeing Retriable Exception as Stand By NN is not yet started (Rpc server is 
> yet to come up as FS image is loading) but http server is started and ready 
> to accept traffic. This keeps happening till rpcserver is up and SNN knows 
> that it's truely standby. Based on start up time this behavior can continue 
> based on start-up times which is high (many minutes) for big clusters.
> This above behavior is causing low availability of HDFS when HDFS is actually 
> still available.
> Ideally webhdfs should throw standby exception (if HA is enabled) and let 
> clients connect to active following that. If active is also not available 
> clients will bounce and automatically connect to the right active.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14588) Client retries Standby NN continuously even if Active NN is available (WebHDFS)

2019-06-20 Thread CR Hota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868966#comment-16868966
 ] 

CR Hota commented on HDFS-14588:


[~xkrogen] [~elgoiri] [~jojochuang] Thoughts on this ?

> Client retries Standby NN continuously even if Active NN is available 
> (WebHDFS)
> ---
>
> Key: HDFS-14588
> URL: https://issues.apache.org/jira/browse/HDFS-14588
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Priority: Major
>
> This is a behavior we have observed in our HA setup of HDFS.
>  # Active NN is up and serving traffic.
>  # Stand By NN is restarted for maintenance.
>  # After step 2 all new clients (webhdfs only) which connect to Stand By keep 
> seeing Retriable Exception as Stand By NN is not yet started (Rpc server is 
> yet to come up as FS image is loading) but http server is started and ready 
> to accept traffic. This keeps happening till rpcserver is up and SNN knows 
> that it's truely standby. Based on start up time this behavior can continue 
> based on start-up times which is high (many minutes) for big clusters.
> This above behavior is causing low availability of HDFS when HDFS is actually 
> still available.
> Ideally webhdfs should throw standby exception (if HA is enabled) and let 
> clients connect to active following that. If active is also not available 
> clients will bounce and automatically connect to the right active.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14588) Client retries Standby NN continuously even if Active NN is available (WebHDFS)

2019-06-20 Thread CR Hota (JIRA)
CR Hota created HDFS-14588:
--

 Summary: Client retries Standby NN continuously even if Active NN 
is available (WebHDFS)
 Key: HDFS-14588
 URL: https://issues.apache.org/jira/browse/HDFS-14588
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: CR Hota


This is a behavior we have observed in our HA setup of HDFS.
 # Active NN is up and serving traffic.
 # Stand By NN is restarted for maintenance.
 # After step 2 all new clients (webhdfs only) which connect to Stand By keep 
seeing Retriable Exception as Stand By NN is not yet started (Rpc server is yet 
to come up as FS image is loading) but http server is started and ready to 
accept traffic. This keeps happening till rpcserver is up and SNN knows that 
it's truely standby. Based on start up time this behavior can continue based on 
start-up times which is high (many minutes) for big clusters.

This above behavior is causing low availability of HDFS when HDFS is actually 
still available.

Ideally webhdfs should throw standby exception (if HA is enabled) and let 
clients connect to active following that. If active is also not available 
clients will bounce and automatically connect to the right active.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868956#comment-16868956
 ] 

Wei-Chiu Chuang commented on HDFS-14403:


I would really love to review this one, looks interesting. But CDH didn't 
support fair call queue historically, so I am afraid I may not be able to offer 
the best opinions. 
That said, if Chao +1 it, I am willing to rubber-stamp the commit :)

> Cost-Based RPC FairCallQueue
> 
>
> Key: HDFS-14403
> URL: https://issues.apache.org/jira/browse/HDFS-14403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc, namenode
>Reporter: Erik Krogen
>Assignee: Christopher Gregorian
>Priority: Major
>  Labels: qos, rpc
> Attachments: CostBasedFairCallQueueDesign_v0.pdf, 
> HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, 
> HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, 
> HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, 
> HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, 
> HDFS-14403.branch-2.8.patch
>
>
> HADOOP-15016 initially described extensions to the Hadoop FairCallQueue 
> encompassing both cost-based analysis of incoming RPCs, as well as support 
> for reservations of RPC capacity for system/platform users. This JIRA intends 
> to track the former, as HADOOP-15016 was repurposed to more specifically 
> focus on the reservation portion of the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1700) RPC Payload too large on datanode startup in kubernetes

2019-06-20 Thread Salvatore LaMendola (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868951#comment-16868951
 ] 

Salvatore LaMendola commented on HDDS-1700:
---

Will do. I'll join the mailing lists now and get ready to send my non-binding 
+1 when needed. :)

> RPC Payload too large on datanode startup in kubernetes
> ---
>
> Key: HDDS-1700
> URL: https://issues.apache.org/jira/browse/HDDS-1700
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: docker, Ozone Datanode, SCM
>Affects Versions: 0.4.0
> Environment: datanode pod's ozone-site.xml
> {code:java}
> 
> ozone.scm.block.client.addressozone-managers-service:9876
> ozone.enabledTrue
> ozone.scm.datanode.id/tmp/datanode.id
> ozone.scm.client.addressozone-managers-service:9876
> ozone.metadata.dirs/tmp/metadata
> ozone.scm.namesozone-managers-service:9876
> ozone.om.addressozone-managers-service:9874
> ozone.handler.typedistributed
> ozone.scm.datanode.addressozone-managers-service:9876
> 
> {code}
> OM/SCM pod's ozone-site.xml
> {code:java}
> 
> ozone.scm.block.client.addresslocalhost
> ozone.enabledTrue
> ozone.scm.datanode.id/tmp/datanode.id
> ozone.scm.client.addresslocalhost
> ozone.metadata.dirs/tmp/metadata
> ozone.scm.nameslocalhost
> ozone.om.addresslocalhost
> ozone.handler.typedistributed
> ozone.scm.datanode.addresslocalhost
> 
> {code}
>  
>  
>Reporter: Josh Siegel
>Priority: Minor
>
> When starting the datanode on a seperate kubernetes pod than the SCM and OM, 
> the below error appears in the datanode's {{ozone.log}}. We verified basic 
> connectivity between the datanode pod and the OM/SCM pod.
> {code:java}
> 2019-06-17 17:14:16,449 [Datanode State Machine Thread - 0] ERROR 
> (EndpointStateMachine.java:207) - Unable to communicate to SCM server at 
> ozone-managers-service:9876 for past 31800 seconds.
> java.io.IOException: Failed on local exception: 
> org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; 
> Host Details : local host is: "ozone-datanode/10.244.84.187"; destination 
> host is: "ozone-managers-service":9876;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
> at org.apache.hadoop.ipc.Client.call(Client.java:1457)
> at org.apache.hadoop.ipc.Client.call(Client.java:1367)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy88.getVersion(Unknown Source)
> at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum 
> data length
> at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1830)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1173)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1069){code}
>  
> cc [~slamendola2_bloomberg]
> [~anu]
> [~elek]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1700) RPC Payload too large on datanode startup in kubernetes

2019-06-20 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868948#comment-16868948
 ] 

Anu Engineer commented on HDDS-1700:


Thank you for root causing this issue.  Appreciate the effort. It would be nice 
if you can vote when the 0.4.1 release comes up since you would have already 
tested the yet-to-release the k8s packages.

> RPC Payload too large on datanode startup in kubernetes
> ---
>
> Key: HDDS-1700
> URL: https://issues.apache.org/jira/browse/HDDS-1700
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: docker, Ozone Datanode, SCM
>Affects Versions: 0.4.0
> Environment: datanode pod's ozone-site.xml
> {code:java}
> 
> ozone.scm.block.client.addressozone-managers-service:9876
> ozone.enabledTrue
> ozone.scm.datanode.id/tmp/datanode.id
> ozone.scm.client.addressozone-managers-service:9876
> ozone.metadata.dirs/tmp/metadata
> ozone.scm.namesozone-managers-service:9876
> ozone.om.addressozone-managers-service:9874
> ozone.handler.typedistributed
> ozone.scm.datanode.addressozone-managers-service:9876
> 
> {code}
> OM/SCM pod's ozone-site.xml
> {code:java}
> 
> ozone.scm.block.client.addresslocalhost
> ozone.enabledTrue
> ozone.scm.datanode.id/tmp/datanode.id
> ozone.scm.client.addresslocalhost
> ozone.metadata.dirs/tmp/metadata
> ozone.scm.nameslocalhost
> ozone.om.addresslocalhost
> ozone.handler.typedistributed
> ozone.scm.datanode.addresslocalhost
> 
> {code}
>  
>  
>Reporter: Josh Siegel
>Priority: Minor
>
> When starting the datanode on a seperate kubernetes pod than the SCM and OM, 
> the below error appears in the datanode's {{ozone.log}}. We verified basic 
> connectivity between the datanode pod and the OM/SCM pod.
> {code:java}
> 2019-06-17 17:14:16,449 [Datanode State Machine Thread - 0] ERROR 
> (EndpointStateMachine.java:207) - Unable to communicate to SCM server at 
> ozone-managers-service:9876 for past 31800 seconds.
> java.io.IOException: Failed on local exception: 
> org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; 
> Host Details : local host is: "ozone-datanode/10.244.84.187"; destination 
> host is: "ozone-managers-service":9876;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
> at org.apache.hadoop.ipc.Client.call(Client.java:1457)
> at org.apache.hadoop.ipc.Client.call(Client.java:1367)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy88.getVersion(Unknown Source)
> at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum 
> data length
> at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1830)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1173)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1069){code}
>  
> cc [~slamendola2_bloomberg]
> [~anu]
> [~elek]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1700) RPC Payload too large on datanode startup in kubernetes

2019-06-20 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer resolved HDDS-1700.

Resolution: Cannot Reproduce

> RPC Payload too large on datanode startup in kubernetes
> ---
>
> Key: HDDS-1700
> URL: https://issues.apache.org/jira/browse/HDDS-1700
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: docker, Ozone Datanode, SCM
>Affects Versions: 0.4.0
> Environment: datanode pod's ozone-site.xml
> {code:java}
> 
> ozone.scm.block.client.addressozone-managers-service:9876
> ozone.enabledTrue
> ozone.scm.datanode.id/tmp/datanode.id
> ozone.scm.client.addressozone-managers-service:9876
> ozone.metadata.dirs/tmp/metadata
> ozone.scm.namesozone-managers-service:9876
> ozone.om.addressozone-managers-service:9874
> ozone.handler.typedistributed
> ozone.scm.datanode.addressozone-managers-service:9876
> 
> {code}
> OM/SCM pod's ozone-site.xml
> {code:java}
> 
> ozone.scm.block.client.addresslocalhost
> ozone.enabledTrue
> ozone.scm.datanode.id/tmp/datanode.id
> ozone.scm.client.addresslocalhost
> ozone.metadata.dirs/tmp/metadata
> ozone.scm.nameslocalhost
> ozone.om.addresslocalhost
> ozone.handler.typedistributed
> ozone.scm.datanode.addresslocalhost
> 
> {code}
>  
>  
>Reporter: Josh Siegel
>Priority: Minor
>
> When starting the datanode on a seperate kubernetes pod than the SCM and OM, 
> the below error appears in the datanode's {{ozone.log}}. We verified basic 
> connectivity between the datanode pod and the OM/SCM pod.
> {code:java}
> 2019-06-17 17:14:16,449 [Datanode State Machine Thread - 0] ERROR 
> (EndpointStateMachine.java:207) - Unable to communicate to SCM server at 
> ozone-managers-service:9876 for past 31800 seconds.
> java.io.IOException: Failed on local exception: 
> org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; 
> Host Details : local host is: "ozone-datanode/10.244.84.187"; destination 
> host is: "ozone-managers-service":9876;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
> at org.apache.hadoop.ipc.Client.call(Client.java:1457)
> at org.apache.hadoop.ipc.Client.call(Client.java:1367)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy88.getVersion(Unknown Source)
> at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum 
> data length
> at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1830)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1173)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1069){code}
>  
> cc [~slamendola2_bloomberg]
> [~anu]
> [~elek]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1495) Create hadoop/ozone docker images with inline build process

2019-06-20 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868943#comment-16868943
 ] 

Eric Yang commented on HDDS-1495:
-

{quote}Care to explain why my core build path is slower with this patch? I am 
telling you the command that I use regularly to build, and my concern is really 
for the commands that I use.{quote}

The current trunk takes a short cut that it keep binary tarball packaging a 
secondary step by invoking -Pdist profile.  I think calling "mvn package", and 
not creating the package is a bit misleading, patch 005 was created prior to 
trunk made tarball creation optional.  Patch 005 kept the tarball packaging 
inline of mvn package.  The extra time was spent on making the tarball.  It is 
possible to change the tarball creation to dist profile, and it would result in 
the same time spent.

> Create hadoop/ozone docker images with inline build process
> ---
>
> Key: HDDS-1495
> URL: https://issues.apache.org/jira/browse/HDDS-1495
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Eric Yang
>Priority: Major
> Attachments: HADOOP-16091.001.patch, HADOOP-16091.002.patch, 
> HDDS-1495.003.patch, HDDS-1495.004.patch, HDDS-1495.005.patch, 
> HDDS-1495.006.patch, HDDS-1495.007.patch, HDDS-1495.008.patch, Hadoop Docker 
> Image inline build process.pdf
>
>
> This is proposed by [~eyang] in 
> [this|https://lists.apache.org/thread.html/33ac54bdeacb4beb023ebd452464603aaffa095bd104cb43c22f484e@%3Chdfs-dev.hadoop.apache.org%3E]
>  mailing thread.
> {quote}1, 3. There are 38 Apache projects hosting docker images on Docker hub 
> using Apache Organization. By browsing Apache github mirror. There are only 7 
> projects using a separate repository for docker image build. Popular projects 
> official images are not from Apache organization, such as zookeeper, tomcat, 
> httpd. We may not disrupt what other Apache projects are doing, but it looks 
> like inline build process is widely employed by majority of projects such as 
> Nifi, Brooklyn, thrift, karaf, syncope and others. The situation seems a bit 
> chaotic for Apache as a whole. However, Hadoop community can decide what is 
> best for Hadoop. My preference is to remove ozone from source tree naming, if 
> Ozone is intended to be subproject of Hadoop for long period of time. This 
> enables Hadoop community to host docker images for various subproject without 
> having to check out several source tree to trigger a grand build. However, 
> inline build process seems more popular than separated process. Hence, I 
> highly recommend making docker build inline if possible.
> {quote}
> The main challenges are also discussed in the thread:
> {code:java}
> 3. Technically it would be possible to add the Dockerfile to the source
> tree and publish the docker image together with the release by the
> release manager but it's also problematic:
> {code}
> a) there is no easy way to stage the images for the vote
>  c) it couldn't be flagged as automated on dockerhub
>  d) It couldn't support the critical updates.
>  * Updating existing images (for example in case of an ssl bug, rebuild
>  all the existing images with exactly the same payload but updated base
>  image/os environment)
>  * Creating image for older releases (We would like to provide images,
>  for hadoop 2.6/2.7/2.7/2.8/2.9. Especially for doing automatic testing
>  with different versions).
> {code:java}
>  {code}
> The a) can be solved (as [~eyang] suggested) with using a personal docker 
> image during the vote and publish it to the dockerhub after the vote (in case 
> the permission can be set by the INFRA)
> Note: based on LEGAL-270 and linked discussion both approaches (inline build 
> process / external build process) are compatible with the apache release.
> Note: HDDS-851 and HADOOP-14898 contains more information about these 
> problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue

2019-06-20 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868939#comment-16868939
 ] 

Erik Krogen commented on HDFS-14403:


Hey [~elgoiri], interesting question... I haven't heard of any such work and we 
don't have any plans for that from our side. I don't think we've experienced 
issues with that, or at least, if we have we haven't noticed.

> Cost-Based RPC FairCallQueue
> 
>
> Key: HDFS-14403
> URL: https://issues.apache.org/jira/browse/HDFS-14403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc, namenode
>Reporter: Erik Krogen
>Assignee: Christopher Gregorian
>Priority: Major
>  Labels: qos, rpc
> Attachments: CostBasedFairCallQueueDesign_v0.pdf, 
> HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, 
> HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, 
> HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, 
> HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, 
> HDFS-14403.branch-2.8.patch
>
>
> HADOOP-15016 initially described extensions to the Hadoop FairCallQueue 
> encompassing both cost-based analysis of incoming RPCs, as well as support 
> for reservations of RPC capacity for system/platform users. This JIRA intends 
> to track the former, as HADOOP-15016 was repurposed to more specifically 
> focus on the reservation portion of the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue

2019-06-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868925#comment-16868925
 ] 

Íñigo Goiri commented on HDFS-14403:


Are you guys aware of any work to do something similar for the Datanodes?
We used to be pretty bad with the Namenodes but with the fair queue and a 
couple improvements, we are in pretty good shape there now.
However, now we have a couple users triggering thousands of reads from a single 
block and they overload the DNs.
Is there any effort on doing some fairness for the xceivers in a DN?
The architecture of the xceivers is not as clean as the regular Hadoop RPC 
server used by the NN.

> Cost-Based RPC FairCallQueue
> 
>
> Key: HDFS-14403
> URL: https://issues.apache.org/jira/browse/HDFS-14403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc, namenode
>Reporter: Erik Krogen
>Assignee: Christopher Gregorian
>Priority: Major
>  Labels: qos, rpc
> Attachments: CostBasedFairCallQueueDesign_v0.pdf, 
> HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, 
> HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, 
> HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, 
> HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, 
> HDFS-14403.branch-2.8.patch
>
>
> HADOOP-15016 initially described extensions to the Hadoop FairCallQueue 
> encompassing both cost-based analysis of incoming RPCs, as well as support 
> for reservations of RPC capacity for system/platform users. This JIRA intends 
> to track the former, as HADOOP-15016 was repurposed to more specifically 
> focus on the reservation portion of the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1554) Create disk tests for fault injection test

2019-06-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868908#comment-16868908
 ] 

Hadoop QA commented on HDDS-1554:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m  
0s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:blue}0{color} | {color:blue} yamllint {color} | {color:blue}  0m  
0s{color} | {color:blue} yamllint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 30 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
52s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
46s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  5m 
23s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} hadolint {color} | {color:red}  0m  
2s{color} | {color:red} The patch generated 2 new + 4 unchanged - 0 fixed = 6 
total (was 4) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
13s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  9m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
28s{color} | {color:green} hadoop-hdds in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 18s{color} 
| {color:red} hadoop-ozone in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
53s{color} | 

[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264049
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 19:12
Start Date: 20/Jun/19 19:12
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295956276
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java
 ##
 @@ -154,79 +178,137 @@ public void releaseVolumeLock(String volume) {
   }
 
   /**
-   * Acquires S3 Bucket lock on the given resource.
+   * Acquires bucket lock on the given resource.
*
* If the lock is not available then the current thread becomes
-   * disabled for thread scheduling purposes and lies dormant until the lock 
has
-   * been acquired.
+   * disabled for thread scheduling purposes and lies dormant until the
+   * lock has been acquired.
*
-   * @param s3BucketName S3Bucket Name on which the lock has to be acquired
+   * @param bucket Bucket on which the lock has to be acquired
*/
-  public void acquireS3Lock(String s3BucketName) {
-// Calling thread should not hold any bucket lock.
-// You can take an Volume while holding S3 bucket lock, since
-// semantically an S3 bucket maps to the ozone volume. So we check here
-// only if ozone bucket lock is taken.
-if (hasAnyBucketLock()) {
+  public void acquireBucketLock(String volume, String bucket) {
+if (hasAnyUserLock()) {
   throw new RuntimeException(
   "Thread '" + Thread.currentThread().getName() +
-  "' cannot acquire S3 bucket lock while holding Ozone bucket " +
-  "lock(s).");
+  "' cannot acquire bucket lock while holding User lock.");
 }
-manager.lock(OM_S3_PREFIX + s3BucketName);
-myLocks.get().get(S3_BUCKET_LOCK).incrementAndGet();
+manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
+myLocks.get().get(BUCKET_LOCK).incrementAndGet();
   }
 
   /**
-   * Releases the volume lock on given resource.
+   * Releases the bucket lock on given resource.
*/
-  public void releaseS3Lock(String s3BucketName) {
-manager.unlock(OM_S3_PREFIX + s3BucketName);
-myLocks.get().get(S3_BUCKET_LOCK).decrementAndGet();
+  public void releaseBucketLock(String volume, String bucket) {
+manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
+myLocks.get().get(BUCKET_LOCK).decrementAndGet();
   }
 
   /**
-   * Acquires bucket lock on the given resource.
+   * Acquires user lock on the given resource.
*
* If the lock is not available then the current thread becomes
* disabled for thread scheduling purposes and lies dormant until the
* lock has been acquired.
*
-   * @param bucket Bucket on which the lock has to be acquired
+   * @param user User on which the lock has to be acquired
*/
-  public void acquireBucketLock(String volume, String bucket) {
-manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
-myLocks.get().get(BUCKET_LOCK).incrementAndGet();
+  public void acquireUserLock(String user) {
+// In order to not maintain username's on which we have acquired lock,
+// just checking have we acquired userLock before. If user want's to
+// acquire user lock on multiple user's they should use
+// acquireMultiUserLock. This is just a protection logic, to let not users
+// use this if acquiring lock on multiple users. As currently, we have only
+// use case we have for this is during setOwner operation in VolumeManager.
+if (hasAnyUserLock()) {
+  LOG.error("Already have userLock");
+  throw new RuntimeException("For acquiring lock on multiple users, use " +
+  "acquireMultiLock method");
+}
+manager.lock(OM_USER_PREFIX + user);
+myLocks.get().get(USER_LOCK).incrementAndGet();
   }
 
   /**
-   * Releases the bucket lock on given resource.
+   * Releases the user lock on given resource.
*/
-  public void releaseBucketLock(String volume, String bucket) {
-manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
-myLocks.get().get(BUCKET_LOCK).decrementAndGet();
+  public void releaseUserLock(String user) {
+manager.unlock(OM_USER_PREFIX + user);
+myLocks.get().get(USER_LOCK).decrementAndGet();
   }
 
   /**
-   * Returns true if the current thread holds any volume lock.
-   * @return true if current thread holds volume lock, else false
+   * Acquire user lock on 2 users. In this case, we compare 2 strings
+   * lexicographically, and acquire the locks according to the sorted order of
+   * the user names. In this way, when acquiring locks on multiple user's, we
+   * can avoid dead locks. This 

[jira] [Commented] (HDDS-1700) RPC Payload too large on datanode startup in kubernetes

2019-06-20 Thread Salvatore LaMendola (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868859#comment-16868859
 ] 

Salvatore LaMendola commented on HDDS-1700:
---

We will need to follow up with this later, as we've determined our 
configuration was the cause.

After rebuilding using {{0.5.0-SNAPSHOT}} and [~elek]'s configuration files 
with very minor modifications, the issue no longer persists, though we _can_ 
still reproduce it on that version using our own deployment configurations, 
which we'll look into further at a later date.

> RPC Payload too large on datanode startup in kubernetes
> ---
>
> Key: HDDS-1700
> URL: https://issues.apache.org/jira/browse/HDDS-1700
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: docker, Ozone Datanode, SCM
>Affects Versions: 0.4.0
> Environment: datanode pod's ozone-site.xml
> {code:java}
> 
> ozone.scm.block.client.addressozone-managers-service:9876
> ozone.enabledTrue
> ozone.scm.datanode.id/tmp/datanode.id
> ozone.scm.client.addressozone-managers-service:9876
> ozone.metadata.dirs/tmp/metadata
> ozone.scm.namesozone-managers-service:9876
> ozone.om.addressozone-managers-service:9874
> ozone.handler.typedistributed
> ozone.scm.datanode.addressozone-managers-service:9876
> 
> {code}
> OM/SCM pod's ozone-site.xml
> {code:java}
> 
> ozone.scm.block.client.addresslocalhost
> ozone.enabledTrue
> ozone.scm.datanode.id/tmp/datanode.id
> ozone.scm.client.addresslocalhost
> ozone.metadata.dirs/tmp/metadata
> ozone.scm.nameslocalhost
> ozone.om.addresslocalhost
> ozone.handler.typedistributed
> ozone.scm.datanode.addresslocalhost
> 
> {code}
>  
>  
>Reporter: Josh Siegel
>Priority: Minor
>
> When starting the datanode on a seperate kubernetes pod than the SCM and OM, 
> the below error appears in the datanode's {{ozone.log}}. We verified basic 
> connectivity between the datanode pod and the OM/SCM pod.
> {code:java}
> 2019-06-17 17:14:16,449 [Datanode State Machine Thread - 0] ERROR 
> (EndpointStateMachine.java:207) - Unable to communicate to SCM server at 
> ozone-managers-service:9876 for past 31800 seconds.
> java.io.IOException: Failed on local exception: 
> org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; 
> Host Details : local host is: "ozone-datanode/10.244.84.187"; destination 
> host is: "ozone-managers-service":9876;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
> at org.apache.hadoop.ipc.Client.call(Client.java:1457)
> at org.apache.hadoop.ipc.Client.call(Client.java:1367)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy88.getVersion(Unknown Source)
> at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum 
> data length
> at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1830)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1173)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1069){code}
>  
> cc [~slamendola2_bloomberg]
> [~anu]
> [~elek]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264046
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 19:09
Start Date: 20/Jun/19 19:09
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295955274
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java
 ##
 @@ -132,14 +157,13 @@ public void releaseUserLock(String user) {
* @param volume Volume on which the lock has to be acquired
*/
   public void acquireVolumeLock(String volume) {
-// Calling thread should not hold any bucket lock.
+// Calling thread should not hold any bucket/user lock.
 // You can take an Volume while holding S3 bucket lock, since
-// semantically an S3 bucket maps to the ozone volume. So we check here
-// only if ozone bucket lock is taken.
-if (hasAnyBucketLock()) {
+// semantically an S3 bucket maps to the ozone volume.
+if (hasAnyBucketLock() || hasAnyUserLock()) {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264046)
Time Spent: 7h  (was: 6h 50m)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # release volume lock. 
> Same issue is seen with SetOwner for Volume request also.
> During HDDS-1620 [~arp] brought up this issue. 
> I am proposing the above solution to solve this issue. Any other 
> idea/suggestions are welcome.
> This also resolves a bug in setOwner for Volume request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264043
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 19:08
Start Date: 20/Jun/19 19:08
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295955075
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java
 ##
 @@ -96,30 +112,39 @@ public OzoneManagerLock(Configuration conf) {
   }
 
   /**
-   * Acquires user lock on the given resource.
+   * Acquires S3 Bucket lock on the given resource.
*
* If the lock is not available then the current thread becomes
-   * disabled for thread scheduling purposes and lies dormant until the
-   * lock has been acquired.
+   * disabled for thread scheduling purposes and lies dormant until the lock 
has
+   * been acquired.
*
-   * @param user User on which the lock has to be acquired
+   * @param s3BucketName S3Bucket Name on which the lock has to be acquired
*/
-  public void acquireUserLock(String user) {
-// Calling thread should not hold any volume or bucket lock.
-if (hasAnyVolumeLock() || hasAnyBucketLock() || hasAnyS3Lock()) {
+  public void acquireS3BucketLock(String s3BucketName) {
+// Calling thread should not hold any volume/bucket/user lock.
+
+// Not added checks for prefix/s3 secret lock, as they will never be
+// taken with s3Bucket Lock. In this way, we can avoid 2 checks every
+// time we acquire s3Bucket lock.
+
+// Or do we need to add this for future safe?
+
+if (hasAnyVolumeLock() || hasAnyBucketLock() || hasAnyUserLock()) {
   throw new RuntimeException(
   "Thread '" + Thread.currentThread().getName() +
-  "' cannot acquire user lock" +
-  " while holding volume, bucket or S3 bucket lock(s).");
+  "' cannot acquire S3 bucket lock while holding Ozone " +
+  "Volume/Bucket/User lock(s).");
 }
-manager.lock(OM_USER_PREFIX + user);
+manager.lock(OM_S3_PREFIX + s3BucketName);
 
 Review comment:
   This will be done in a new jira.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264043)
Time Spent: 6h 50m  (was: 6h 40m)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # release volume lock. 
> Same issue is seen with SetOwner for Volume request also.
> During HDDS-1620 [~arp] brought up this issue. 
> I am proposing the above solution to solve this issue. Any other 
> idea/suggestions are welcome.
> This also resolves a bug in setOwner for Volume request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264041
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 19:08
Start Date: 20/Jun/19 19:08
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295954991
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java
 ##
 @@ -96,30 +112,39 @@ public OzoneManagerLock(Configuration conf) {
   }
 
   /**
-   * Acquires user lock on the given resource.
+   * Acquires S3 Bucket lock on the given resource.
*
* If the lock is not available then the current thread becomes
-   * disabled for thread scheduling purposes and lies dormant until the
-   * lock has been acquired.
+   * disabled for thread scheduling purposes and lies dormant until the lock 
has
+   * been acquired.
*
-   * @param user User on which the lock has to be acquired
+   * @param s3BucketName S3Bucket Name on which the lock has to be acquired
*/
-  public void acquireUserLock(String user) {
-// Calling thread should not hold any volume or bucket lock.
-if (hasAnyVolumeLock() || hasAnyBucketLock() || hasAnyS3Lock()) {
+  public void acquireS3BucketLock(String s3BucketName) {
+// Calling thread should not hold any volume/bucket/user lock.
+
+// Not added checks for prefix/s3 secret lock, as they will never be
+// taken with s3Bucket Lock. In this way, we can avoid 2 checks every
+// time we acquire s3Bucket lock.
+
+// Or do we need to add this for future safe?
+
+if (hasAnyVolumeLock() || hasAnyBucketLock() || hasAnyUserLock()) {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264041)
Time Spent: 6h 40m  (was: 6.5h)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # release volume lock. 
> Same issue is seen with SetOwner for Volume request also.
> During HDDS-1620 [~arp] brought up this issue. 
> I am proposing the above solution to solve this issue. Any other 
> idea/suggestions are welcome.
> This also resolves a bug in setOwner for Volume request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264040
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 19:08
Start Date: 20/Jun/19 19:08
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295954759
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java
 ##
 @@ -59,32 +68,39 @@
  * 
  * {@literal ->} acquireVolumeLock (will work)
  *   {@literal +->} acquireBucketLock (will work)
- * {@literal +-->} acquireUserLock (will throw Exception)
+ * {@literal +-->} acquireS3BucketLock (will throw Exception)
  * 
  * 
- * To acquire a user lock you should not hold any Volume/Bucket lock. Similarly
- * to acquire a Volume lock you should not hold any Bucket lock.
+ * To acquire a S3 lock you should not hold any Volume/Bucket lock. Similarly
+ * to acquire a Volume lock you should not hold any Bucket/User/S3
+ * Secret/Prefix lock.
  */
 public final class OzoneManagerLock {
 
+  private static final Logger LOG =
+  LoggerFactory.getLogger(OzoneManagerLock.class);
+
+  private static final String S3_BUCKET_LOCK = "s3BucketLock";
   private static final String VOLUME_LOCK = "volumeLock";
   private static final String BUCKET_LOCK = "bucketLock";
-  private static final String PREFIX_LOCK = "prefixLock";
-  private static final String S3_BUCKET_LOCK = "s3BucketLock";
+  private static final String USER_LOCK = "userLock";
   private static final String S3_SECRET_LOCK = "s3SecretetLock";
+  private static final String PREFIX_LOCK = "prefixLock";
+
 
   private final LockManager manager;
 
   // To maintain locks held by current thread.
   private final ThreadLocal> myLocks =
   ThreadLocal.withInitial(
-  () -> ImmutableMap.of(
-  VOLUME_LOCK, new AtomicInteger(0),
-  BUCKET_LOCK, new AtomicInteger(0),
-  PREFIX_LOCK, new AtomicInteger(0),
-  S3_BUCKET_LOCK, new AtomicInteger(0),
-  S3_SECRET_LOCK, new AtomicInteger(0)
-  )
+  () -> ImmutableMap.builder()
+  .put(S3_BUCKET_LOCK, new AtomicInteger(0))
+  .put(VOLUME_LOCK, new AtomicInteger(0))
+  .put(BUCKET_LOCK, new AtomicInteger(0))
+  .put(USER_LOCK, new AtomicInteger(0))
+  .put(S3_SECRET_LOCK, new AtomicInteger(0))
+  .put(PREFIX_LOCK, new AtomicInteger(0))
+  .build()
 
 Review comment:
   This will be taken up in a new jira.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264040)
Time Spent: 6.5h  (was: 6h 20m)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # 

[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264029
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 19:05
Start Date: 20/Jun/19 19:05
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295953686
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java
 ##
 @@ -154,79 +178,137 @@ public void releaseVolumeLock(String volume) {
   }
 
   /**
-   * Acquires S3 Bucket lock on the given resource.
+   * Acquires bucket lock on the given resource.
*
* If the lock is not available then the current thread becomes
-   * disabled for thread scheduling purposes and lies dormant until the lock 
has
-   * been acquired.
+   * disabled for thread scheduling purposes and lies dormant until the
+   * lock has been acquired.
*
-   * @param s3BucketName S3Bucket Name on which the lock has to be acquired
+   * @param bucket Bucket on which the lock has to be acquired
*/
-  public void acquireS3Lock(String s3BucketName) {
-// Calling thread should not hold any bucket lock.
-// You can take an Volume while holding S3 bucket lock, since
-// semantically an S3 bucket maps to the ozone volume. So we check here
-// only if ozone bucket lock is taken.
-if (hasAnyBucketLock()) {
+  public void acquireBucketLock(String volume, String bucket) {
+if (hasAnyUserLock()) {
   throw new RuntimeException(
   "Thread '" + Thread.currentThread().getName() +
-  "' cannot acquire S3 bucket lock while holding Ozone bucket " +
-  "lock(s).");
+  "' cannot acquire bucket lock while holding User lock.");
 }
-manager.lock(OM_S3_PREFIX + s3BucketName);
-myLocks.get().get(S3_BUCKET_LOCK).incrementAndGet();
+manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
+myLocks.get().get(BUCKET_LOCK).incrementAndGet();
   }
 
   /**
-   * Releases the volume lock on given resource.
+   * Releases the bucket lock on given resource.
*/
-  public void releaseS3Lock(String s3BucketName) {
-manager.unlock(OM_S3_PREFIX + s3BucketName);
-myLocks.get().get(S3_BUCKET_LOCK).decrementAndGet();
+  public void releaseBucketLock(String volume, String bucket) {
+manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
+myLocks.get().get(BUCKET_LOCK).decrementAndGet();
   }
 
   /**
-   * Acquires bucket lock on the given resource.
+   * Acquires user lock on the given resource.
*
* If the lock is not available then the current thread becomes
* disabled for thread scheduling purposes and lies dormant until the
* lock has been acquired.
*
-   * @param bucket Bucket on which the lock has to be acquired
+   * @param user User on which the lock has to be acquired
*/
-  public void acquireBucketLock(String volume, String bucket) {
-manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
-myLocks.get().get(BUCKET_LOCK).incrementAndGet();
+  public void acquireUserLock(String user) {
+// In order to not maintain username's on which we have acquired lock,
+// just checking have we acquired userLock before. If user want's to
+// acquire user lock on multiple user's they should use
+// acquireMultiUserLock. This is just a protection logic, to let not users
+// use this if acquiring lock on multiple users. As currently, we have only
+// use case we have for this is during setOwner operation in VolumeManager.
+if (hasAnyUserLock()) {
+  LOG.error("Already have userLock");
+  throw new RuntimeException("For acquiring lock on multiple users, use " +
+  "acquireMultiLock method");
+}
+manager.lock(OM_USER_PREFIX + user);
+myLocks.get().get(USER_LOCK).incrementAndGet();
   }
 
   /**
-   * Releases the bucket lock on given resource.
+   * Releases the user lock on given resource.
*/
-  public void releaseBucketLock(String volume, String bucket) {
-manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
-myLocks.get().get(BUCKET_LOCK).decrementAndGet();
+  public void releaseUserLock(String user) {
+manager.unlock(OM_USER_PREFIX + user);
+myLocks.get().get(USER_LOCK).decrementAndGet();
   }
 
   /**
-   * Returns true if the current thread holds any volume lock.
-   * @return true if current thread holds volume lock, else false
+   * Acquire user lock on 2 users. In this case, we compare 2 strings
+   * lexicographically, and acquire the locks according to the sorted order of
+   * the user names. In this way, when acquiring locks on multiple user's, we
+   * can avoid dead locks. This 

[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264025
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 19:02
Start Date: 20/Jun/19 19:02
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295952709
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java
 ##
 @@ -258,12 +348,61 @@ public void acquirePrefixLock(String prefixPath) {
 myLocks.get().get(PREFIX_LOCK).incrementAndGet();
   }
 
-  private boolean hasAnyPrefixLock() {
-return myLocks.get().get(PREFIX_LOCK).get() != 0;
-  }
-
+  /**
+   * Releases the prefix lock on given resource.
+   */
   public void releasePrefixLock(String prefixPath) {
 manager.unlock(prefixPath);
 myLocks.get().get(PREFIX_LOCK).decrementAndGet();
   }
+
+  /**
+   * Returns true if the current thread holds any volume lock.
+   * @return true if current thread holds volume lock, else false
+   */
+  private boolean hasAnyVolumeLock() {
+return myLocks.get().get(VOLUME_LOCK).get() != 0;
+  }
+
+  /**
+   * Returns true if the current thread holds any bucket lock.
+   * @return true if current thread holds bucket lock, else false
+   */
+  private boolean hasAnyBucketLock() {
+return myLocks.get().get(BUCKET_LOCK).get() != 0;
+  }
+
+  /**
+   * Returns true if the current thread holds any s3 bucket lock.
+   * @return true if current thread holds s3 bucket lock, else false
+   */
+  private boolean hasAnyS3BucketLock() {
+return myLocks.get().get(S3_BUCKET_LOCK).get() != 0;
+  }
+
+  /**
+   * Returns true if the current thread holds any user lock.
+   * @return true if current thread holds user lock, else false
+   */
+  private boolean hasAnyUserLock() {
+return myLocks.get().get(USER_LOCK).get() != 0;
 
 Review comment:
   Yes, added a call for hasAnyUserLock() in acquireUserLock() so that if some 
one is trying to acquire multiple user locks, he will immediately fail with 
RunTimeException. As said in code comments in acquireUserLock() this is a 
protection logic, in avoiding users do that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264025)
Time Spent: 6h 10m  (was: 6h)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # release volume lock. 
> Same issue is seen with SetOwner for Volume request also.
> During HDDS-1620 [~arp] brought up this issue. 
> I am proposing the above solution to solve this issue. Any other 
> idea/suggestions are welcome.
> This also resolves a bug in setOwner for Volume request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HDFS-14587) Support fail fast when client wait ACK by pipeline over threshold

2019-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868853#comment-16868853
 ] 

Wei-Chiu Chuang commented on HDFS-14587:


shouldn't it simply get a read timeout? i think we added a client side read 
timeout not too long ago.
The only possibility I can imagine is the client gets into a full GC. But even 
that, 9 hours seems like a stretch.

> Support fail fast when client wait ACK by pipeline over threshold
> -
>
> Key: HDFS-14587
> URL: https://issues.apache.org/jira/browse/HDFS-14587
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>
> Recently, I meet corner case that client wait for data to be acknowledged by 
> pipeline over 9 hours. After check branch trunk, I think this issue still 
> exist. So I propose to add threshold about wait timeout then fail fast.
> {code:java}
> 2019-06-18 12:53:46,217 WARN [Thread-127] org.apache.hadoop.hdfs.DFSClient: 
> Slow waitForAckedSeqno took 35560718ms (threshold=3ms)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264024=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264024
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 19:00
Start Date: 20/Jun/19 19:00
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295951701
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java
 ##
 @@ -154,79 +178,137 @@ public void releaseVolumeLock(String volume) {
   }
 
   /**
-   * Acquires S3 Bucket lock on the given resource.
+   * Acquires bucket lock on the given resource.
*
* If the lock is not available then the current thread becomes
-   * disabled for thread scheduling purposes and lies dormant until the lock 
has
-   * been acquired.
+   * disabled for thread scheduling purposes and lies dormant until the
+   * lock has been acquired.
*
-   * @param s3BucketName S3Bucket Name on which the lock has to be acquired
+   * @param bucket Bucket on which the lock has to be acquired
*/
-  public void acquireS3Lock(String s3BucketName) {
-// Calling thread should not hold any bucket lock.
-// You can take an Volume while holding S3 bucket lock, since
-// semantically an S3 bucket maps to the ozone volume. So we check here
-// only if ozone bucket lock is taken.
-if (hasAnyBucketLock()) {
+  public void acquireBucketLock(String volume, String bucket) {
+if (hasAnyUserLock()) {
   throw new RuntimeException(
   "Thread '" + Thread.currentThread().getName() +
-  "' cannot acquire S3 bucket lock while holding Ozone bucket " +
-  "lock(s).");
+  "' cannot acquire bucket lock while holding User lock.");
 }
-manager.lock(OM_S3_PREFIX + s3BucketName);
-myLocks.get().get(S3_BUCKET_LOCK).incrementAndGet();
+manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
 
 Review comment:
   Yes, we prefix volumeName with /(OM_KEY_PREFIX) and bucketName with 
/(OM_KEY_PREFIX)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264024)
Time Spent: 6h  (was: 5h 50m)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # release volume lock. 
> Same issue is seen with SetOwner for Volume request also.
> During HDDS-1620 [~arp] brought up this issue. 
> I am proposing the above solution to solve this issue. Any other 
> idea/suggestions are welcome.
> This also resolves a bug in setOwner for Volume request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264021
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 18:51
Start Date: 20/Jun/19 18:51
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295948310
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/OMVolumeDeleteRequest.java
 ##
 @@ -104,37 +104,23 @@ public OMClientResponse 
validateAndUpdateCache(OzoneManager ozoneManager,
 
 OmVolumeArgs omVolumeArgs = null;
 String owner = null;
-
+IOException exception = null;
+OzoneManagerProtocolProtos.VolumeList newVolumeList = null;
 omMetadataManager.getLock().acquireVolumeLock(volume);
 try {
   owner = getVolumeInfo(omMetadataManager, volume).getOwnerName();
-} catch (IOException ex) {
-  LOG.error("Volume deletion failed for volume:{}", volume, ex);
-  omMetrics.incNumVolumeDeleteFails();
-  auditLog(auditLogger, buildAuditMessage(OMAction.DELETE_VOLUME,
-  buildVolumeAuditMap(volume), ex, userInfo));
-  return new OMVolumeDeleteResponse(null, null, null,
-  createErrorOMResponse(omResponse, ex));
-} finally {
-  omMetadataManager.getLock().releaseVolumeLock(volume);
-}
 
-// Release and reacquire lock for now it will not be a problem for now, as
-// applyTransaction serializes the operation's.
-// TODO: Revisit this logic once HDDS-1672 checks in.
+  // Release and reacquire lock for now it will not be a problem for now, 
as
+  // applyTransaction serializes the operation's.
 
-// We cannot acquire user lock holding volume lock, so released volume
-// lock, and acquiring user and volume lock.
+  // We cannot acquire user lock holding volume lock, so released volume
+  // lock, and acquiring user and volume lock.
 
-omMetadataManager.getLock().acquireUserLock(owner);
-omMetadataManager.getLock().acquireVolumeLock(volume);
+  omMetadataManager.getLock().acquireUserLock(owner);
 
 Review comment:
   That is why we checked in finally owner!=null and then only release the lock 
in finally.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264021)
Time Spent: 5h 50m  (was: 5h 40m)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # release volume lock. 
> Same issue is seen with SetOwner for Volume request also.
> During HDDS-1620 [~arp] brought up this issue. 
> I am proposing the above solution to solve this issue. Any other 
> idea/suggestions are welcome.
> This also resolves a bug in setOwner for Volume request.



--
This message was sent by Atlassian JIRA

[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264020
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 18:49
Start Date: 20/Jun/19 18:49
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295947682
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/S3BucketManagerImpl.java
 ##
 @@ -101,34 +101,26 @@ public void createS3Bucket(String userName, String 
bucketName)
 // anonymous access to bucket where the user name is absent.
 String ozoneVolumeName = formatOzoneVolumeName(userName);
 
-omMetadataManager.getLock().acquireS3Lock(bucketName);
-try {
-  String bucket =
-  omMetadataManager.getS3Table().get(bucketName);
-
-  if (bucket != null) {
-LOG.debug("Bucket already exists. {}", bucketName);
-throw new OMException(
-"Unable to create S3 bucket. " + bucketName + " already exists.",
-OMException.ResultCodes.S3_BUCKET_ALREADY_EXISTS);
-  }
-  String ozoneBucketName = bucketName;
-  createOzoneBucket(ozoneVolumeName, ozoneBucketName);
-  String finalName = String.format("%s/%s", ozoneVolumeName,
-  ozoneBucketName);
+String bucket = omMetadataManager.getS3Table().get(bucketName);
 
-  omMetadataManager.getS3Table().put(bucketName, finalName);
-} finally {
-  omMetadataManager.getLock().releaseS3Lock(bucketName);
 
 Review comment:
   Do you mean s3 bucket lock, as we need to acquire that before creating 
volume. So, that is acquired in caller in OzoneManager.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264020)
Time Spent: 5h 40m  (was: 5.5h)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # release volume lock. 
> Same issue is seen with SetOwner for Volume request also.
> During HDDS-1620 [~arp] brought up this issue. 
> I am proposing the above solution to solve this issue. Any other 
> idea/suggestions are welcome.
> This also resolves a bug in setOwner for Volume request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264019
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 18:47
Start Date: 20/Jun/19 18:47
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295946812
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -2564,6 +2564,9 @@ public void createS3Bucket(String userName, String 
s3BucketName)
   }
   metrics.incNumBucketCreates();
   try {
+metadataManager.getLock().acquireS3BucketLock(s3BucketName);
+metadataManager.getLock().acquireVolumeLock(
 
 Review comment:
   On a side note: once we use new HA code this will be cleaned up, so not 
considered much refactoring here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264019)
Time Spent: 5.5h  (was: 5h 20m)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # release volume lock. 
> Same issue is seen with SetOwner for Volume request also.
> During HDDS-1620 [~arp] brought up this issue. 
> I am proposing the above solution to solve this issue. Any other 
> idea/suggestions are welcome.
> This also resolves a bug in setOwner for Volume request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264017=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264017
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 18:46
Start Date: 20/Jun/19 18:46
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #949: 
HDDS-1672. Improve locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295946327
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -2564,6 +2564,9 @@ public void createS3Bucket(String userName, String 
s3BucketName)
   }
   metrics.incNumBucketCreates();
   try {
+metadataManager.getLock().acquireS3BucketLock(s3BucketName);
+metadataManager.getLock().acquireVolumeLock(
 
 Review comment:
   I see the only case for failing is with RunTimeException. So, do we still 
need the flags?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264017)
Time Spent: 5h 20m  (was: 5h 10m)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # release volume lock. 
> Same issue is seen with SetOwner for Volume request also.
> During HDDS-1620 [~arp] brought up this issue. 
> I am proposing the above solution to solve this issue. Any other 
> idea/suggestions are welcome.
> This also resolves a bug in setOwner for Volume request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264012
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 18:37
Start Date: 20/Jun/19 18:37
Worklog Time Spent: 10m 
  Work Description: arp7 commented on pull request #949: HDDS-1672. Improve 
locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295942564
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/S3BucketManagerImpl.java
 ##
 @@ -101,34 +101,26 @@ public void createS3Bucket(String userName, String 
bucketName)
 // anonymous access to bucket where the user name is absent.
 String ozoneVolumeName = formatOzoneVolumeName(userName);
 
-omMetadataManager.getLock().acquireS3Lock(bucketName);
-try {
-  String bucket =
-  omMetadataManager.getS3Table().get(bucketName);
-
-  if (bucket != null) {
-LOG.debug("Bucket already exists. {}", bucketName);
-throw new OMException(
-"Unable to create S3 bucket. " + bucketName + " already exists.",
-OMException.ResultCodes.S3_BUCKET_ALREADY_EXISTS);
-  }
-  String ozoneBucketName = bucketName;
-  createOzoneBucket(ozoneVolumeName, ozoneBucketName);
-  String finalName = String.format("%s/%s", ozoneVolumeName,
-  ozoneBucketName);
+String bucket = omMetadataManager.getS3Table().get(bucketName);
 
-  omMetadataManager.getS3Table().put(bucketName, finalName);
-} finally {
-  omMetadataManager.getLock().releaseS3Lock(bucketName);
 
 Review comment:
   Sorry I didn't get why we removed the acquire/release bucket lock. Is the 
caller now supposed to get the lock?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 264012)
Time Spent: 5h 10m  (was: 5h)

> Improve locking in OzoneManager
> ---
>
> Key: HDDS-1672
> URL: https://issues.apache.org/jira/browse/HDDS-1672
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: Ozone Locks in OM.pdf
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> In this Jira, we shall follow the new lock ordering. In this way, in volume 
> requests we can solve the issue of acquire/release/reacquire problem. And few 
> bugs in the current implementation of S3Bucket/Volume operations.
>  
> Currently after acquiring volume lock, we cannot acquire user lock. 
> This is causing an issue in Volume request implementation, 
> acquire/release/reacquire volume lock.
>  
> Case of Delete Volume Request: 
>  # Acquire volume lock.
>  # Get Volume Info from DB
>  # Release Volume lock. (We are releasing the lock, because while acquiring 
> volume lock, we cannot acquire user lock0
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Acquire volume lock
>  # Do delete logic
>  # release volume lock
>  # release user lock
>  
> We can avoid this acquire/release/reacquire lock issue by making volume lock 
> as low weight. 
>  
> In this way, the above deleteVolume request will change as below
>  # Acquire volume lock
>  # Get Volume Info from DB
>  # Get owner from volume Info read from DB
>  # Acquire owner lock
>  # Do delete logic
>  # release owner lock
>  # release volume lock. 
> Same issue is seen with SetOwner for Volume request also.
> During HDDS-1620 [~arp] brought up this issue. 
> I am proposing the above solution to solve this issue. Any other 
> idea/suggestions are welcome.
> This also resolves a bug in setOwner for Volume request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1713) ReplicationManager fail to find proper node topology based on Datanode details from heartbeat

2019-06-20 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDDS-1713:


 Summary: ReplicationManager fail to find proper node topology 
based on Datanode details from heartbeat
 Key: HDDS-1713
 URL: https://issues.apache.org/jira/browse/HDDS-1713
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


DN does not have the topology info included in its heartbeat message for 
container report/pipeline report.

SCM is where the topology information is available. During the processing of 
heartbeat, we should not rely on the datanodedetails from report to choose 
datanodes for close container. Otherwise, all the datanode locations of 
existing container replicas will fallback to /default-rack.

 

The fix is to retrieve the corresponding datanode locations from scm 
nodemanager, which has authoritative network topology information. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1554) Create disk tests for fault injection test

2019-06-20 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868831#comment-16868831
 ] 

Eric Yang commented on HDDS-1554:
-

Patch 005 fixes hard coded uid:gid issues, and use a read-only mount for /data. 
 Disk tests will supply -u flag to ensure the mounting location does not create 
filesystem uid/gid inconsistency problem.  Other smoke tests are recommended to 
use -u flag to prevent containers to write out data of another user's uid/gid 
to host level file system, HDDS-1609 maybe a good place to start applying -u 
flag to tests outside of fault-injection tests.

> Create disk tests for fault injection test
> --
>
> Key: HDDS-1554
> URL: https://issues.apache.org/jira/browse/HDDS-1554
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: build
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch, 
> HDDS-1554.003.patch, HDDS-1554.004.patch, HDDS-1554.005.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The current plan for fault injection disk tests are:
>  # Scenario 1 - Read/Write test
>  ## Run docker-compose to bring up a cluster
>  ## Initialize scm and om
>  ## Upload data to Ozone cluster
>  ## Verify data is correct
>  ## Shutdown cluster
>  # Scenario 2 - Read/Only test
>  ## Repeat Scenario 1
>  ## Mount data disk as read only
>  ## Try to write data to Ozone cluster
>  ## Validate error message is correct
>  ## Shutdown cluster
>  # Scenario 3 - Corruption test
>  ## Repeat Scenario 2
>  ## Shutdown cluster
>  ## Modify data disk data
>  ## Restart cluster
>  ## Validate error message for read from corrupted data
>  ## Validate error message for write to corrupted volume



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager

2019-06-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264007
 ]

ASF GitHub Bot logged work on HDDS-1672:


Author: ASF GitHub Bot
Created on: 20/Jun/19 18:27
Start Date: 20/Jun/19 18:27
Worklog Time Spent: 10m 
  Work Description: arp7 commented on pull request #949: HDDS-1672. Improve 
locking in OzoneManager.
URL: https://github.com/apache/hadoop/pull/949#discussion_r295930273
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java
 ##
 @@ -154,79 +178,137 @@ public void releaseVolumeLock(String volume) {
   }
 
   /**
-   * Acquires S3 Bucket lock on the given resource.
+   * Acquires bucket lock on the given resource.
*
* If the lock is not available then the current thread becomes
-   * disabled for thread scheduling purposes and lies dormant until the lock 
has
-   * been acquired.
+   * disabled for thread scheduling purposes and lies dormant until the
+   * lock has been acquired.
*
-   * @param s3BucketName S3Bucket Name on which the lock has to be acquired
+   * @param bucket Bucket on which the lock has to be acquired
*/
-  public void acquireS3Lock(String s3BucketName) {
-// Calling thread should not hold any bucket lock.
-// You can take an Volume while holding S3 bucket lock, since
-// semantically an S3 bucket maps to the ozone volume. So we check here
-// only if ozone bucket lock is taken.
-if (hasAnyBucketLock()) {
+  public void acquireBucketLock(String volume, String bucket) {
+if (hasAnyUserLock()) {
   throw new RuntimeException(
   "Thread '" + Thread.currentThread().getName() +
-  "' cannot acquire S3 bucket lock while holding Ozone bucket " +
-  "lock(s).");
+  "' cannot acquire bucket lock while holding User lock.");
 }
-manager.lock(OM_S3_PREFIX + s3BucketName);
-myLocks.get().get(S3_BUCKET_LOCK).incrementAndGet();
+manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
+myLocks.get().get(BUCKET_LOCK).incrementAndGet();
   }
 
   /**
-   * Releases the volume lock on given resource.
+   * Releases the bucket lock on given resource.
*/
-  public void releaseS3Lock(String s3BucketName) {
-manager.unlock(OM_S3_PREFIX + s3BucketName);
-myLocks.get().get(S3_BUCKET_LOCK).decrementAndGet();
+  public void releaseBucketLock(String volume, String bucket) {
+manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
+myLocks.get().get(BUCKET_LOCK).decrementAndGet();
   }
 
   /**
-   * Acquires bucket lock on the given resource.
+   * Acquires user lock on the given resource.
*
* If the lock is not available then the current thread becomes
* disabled for thread scheduling purposes and lies dormant until the
* lock has been acquired.
*
-   * @param bucket Bucket on which the lock has to be acquired
+   * @param user User on which the lock has to be acquired
*/
-  public void acquireBucketLock(String volume, String bucket) {
-manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
-myLocks.get().get(BUCKET_LOCK).incrementAndGet();
+  public void acquireUserLock(String user) {
+// In order to not maintain username's on which we have acquired lock,
+// just checking have we acquired userLock before. If user want's to
+// acquire user lock on multiple user's they should use
+// acquireMultiUserLock. This is just a protection logic, to let not users
+// use this if acquiring lock on multiple users. As currently, we have only
+// use case we have for this is during setOwner operation in VolumeManager.
+if (hasAnyUserLock()) {
+  LOG.error("Already have userLock");
+  throw new RuntimeException("For acquiring lock on multiple users, use " +
+  "acquireMultiLock method");
+}
+manager.lock(OM_USER_PREFIX + user);
+myLocks.get().get(USER_LOCK).incrementAndGet();
   }
 
   /**
-   * Releases the bucket lock on given resource.
+   * Releases the user lock on given resource.
*/
-  public void releaseBucketLock(String volume, String bucket) {
-manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket);
-myLocks.get().get(BUCKET_LOCK).decrementAndGet();
+  public void releaseUserLock(String user) {
+manager.unlock(OM_USER_PREFIX + user);
+myLocks.get().get(USER_LOCK).decrementAndGet();
   }
 
   /**
-   * Returns true if the current thread holds any volume lock.
-   * @return true if current thread holds volume lock, else false
+   * Acquire user lock on 2 users. In this case, we compare 2 strings
+   * lexicographically, and acquire the locks according to the sorted order of
+   * the user names. In this way, when acquiring locks on multiple user's, we
+   * can avoid dead locks. This method 

  1   2   3   >