[jira] [Commented] (HDFS-14483) Backport HDFS-3246 ByteBuffer pread interface to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869202#comment-16869202 ] Hadoop QA commented on HDFS-14483: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} branch-2.8 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 39s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 4s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 26s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 17s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 41s{color} | {color:green} branch-2.8 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-hdfs-project/hadoop-hdfs-native-client {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 26s{color} | {color:red} hadoop-common-project/hadoop-common in branch-2.8 has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 25s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client in branch-2.8 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 41s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 25s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 42s{color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 1m 42s{color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 42s{color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 30s{color} | {color:red} root in the patch failed with JDK v1.8.0_212. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 1m 30s{color} | {color:red} root in the patch failed with JDK v1.8.0_212. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 30s{color} | {color:red} root in the patch failed with JDK v1.8.0_212. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} root: The patch generated 0 new + 116 unchanged - 1 fixed = 116 total (was 117) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 27s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 6 line(s) with tabs. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-hdfs-project/hadoop-hdfs-native-client {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 17s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s{color} | {color:green}
[jira] [Updated] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-14135: Attachment: HDFS-14135.013.patch > TestWebHdfsTimeouts Fails intermittently in trunk > - > > Key: HDFS-14135 > URL: https://issues.apache.org/jira/browse/HDFS-14135 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14135-01.patch, HDFS-14135-02.patch, > HDFS-14135-03.patch, HDFS-14135-04.patch, HDFS-14135-05.patch, > HDFS-14135-06.patch, HDFS-14135-07.patch, HDFS-14135-08.patch, > HDFS-14135.009.patch, HDFS-14135.010.patch, HDFS-14135.011.patch, > HDFS-14135.012.patch, HDFS-14135.013.patch > > > Reference to failure > https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/982/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14573) Backport Standby Read to branch-3
[ https://issues.apache.org/jira/browse/HDFS-14573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869194#comment-16869194 ] Hadoop QA commented on HDFS-14573: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 27 new or modified test files. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 5m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 39s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 10s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 51s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 15s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 20m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-hdfs-project/hadoop-hdfs-native-client {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 57s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 3s{color} | {color:green} branch-3.2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 14m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 14m 19s{color} | {color:red} root generated 170 new + 1158 unchanged - 170 fixed = 1328 total (was 1328) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 48s{color} | {color:orange} root: The patch generated 27 new + 2571 unchanged - 11 fixed = 2598 total (was 2582) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 51s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-hdfs-project/hadoop-hdfs-native-client {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 52s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 3s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 36s{color} | {color:red} hadoop-hdfs-native-client in the patch failed. {color} | | {color:green}+1{color} |
[jira] [Assigned] (HDDS-1691) RDBTable#isExist should use Rocksdb#keyMayExist
[ https://issues.apache.org/jira/browse/HDDS-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle reassigned HDDS-1691: - Assignee: Aravindan Vijayan (was: Nanda kumar) > RDBTable#isExist should use Rocksdb#keyMayExist > --- > > Key: HDDS-1691 > URL: https://issues.apache.org/jira/browse/HDDS-1691 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Mukul Kumar Singh >Assignee: Aravindan Vijayan >Priority: Major > > RDBTable#isExist can use Rocksdb#keyMayExist, this avoids the cost of reading > the value for the key. > Please refer, > https://github.com/facebook/rocksdb/blob/7a8d7358bb40b13a06c2c6adc62e80295d89ed05/java/src/main/java/org/rocksdb/RocksDB.java#L2184 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) [SBN Read]Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravuri Sushma sree updated HDFS-14528: -- Description: *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused This is encountered in two cases : When any other standby namenode is down or when any other zkfc is down was: *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* *When trying to exectue the failover command from active to standby* *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on connection exception: java.net.ConnectException: Connection refused; For more details see: [http://wiki.apache.org/hadoop/ConnectionRefused] at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) > [SBN Read]Failover from Active to Standby Failed > -- > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: ZKFC_issue.patch > > > *Started an HA Cluster with three nodes [ _Active ,Standby ,Observer_ ]* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in two cases : When any other standby namenode is down or > when any other zkfc is down -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869187#comment-16869187 ] Wei-Chiu Chuang commented on HDFS-14585: Filed HADOOP-16386 for the findbugs warning. > Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14585.branch-2.8.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869186#comment-16869186 ] Wei-Chiu Chuang commented on HDFS-14585: Hi [~leosun08] We typically backport from the newer branches to older branches. The branch-2.8 is the oldest active branch. Would you please rebase on branch-2? asflicense warning is false alarm. There are minor checkstyle warning that can be cleaned easily. The findbugs warning looks fishy. Looking back in my mailbox, the same warning came out a few times for branch-2. So probably unrelated, but we should file a jira for that. Suggest to update the jira summary, as this is essentially a reimplementation of HDFS-8901 for replicated blocks. Hadoop 2 does not support EC, and so there's no need to support striped blocks. How about "Reimplement HDFS-8901 for replicated block positional read in branch-2"? Do we need any test? Or is this purely an performance improvement and existing tests cover it all? > Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14585.branch-2.8.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14587) Support fail fast when client wait ACK by pipeline over threshold
[ https://issues.apache.org/jira/browse/HDFS-14587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869184#comment-16869184 ] He Xiaoqiao commented on HDFS-14587: Thanks [~jojochuang] for the helpful information, I will check if this JIRA could solve this issue. Thanks again. > Support fail fast when client wait ACK by pipeline over threshold > - > > Key: HDFS-14587 > URL: https://issues.apache.org/jira/browse/HDFS-14587 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > > Recently, I meet corner case that client wait for data to be acknowledged by > pipeline over 9 hours. After check branch trunk, I think this issue still > exist. So I propose to add threshold about wait timeout then fail fast. > {code:java} > 2019-06-18 12:53:46,217 WARN [Thread-127] org.apache.hadoop.hdfs.DFSClient: > Slow waitForAckedSeqno took 35560718ms (threshold=3ms) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12564) Add the documents of swebhdfs configurations on the client side
[ https://issues.apache.org/jira/browse/HDFS-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869180#comment-16869180 ] Takanobu Asanuma commented on HDFS-12564: - Thank you very much, [~jojochuang]! > Add the documents of swebhdfs configurations on the client side > --- > > Key: HDFS-12564 > URL: https://issues.apache.org/jira/browse/HDFS-12564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, webhdfs >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-12564.1.patch, HDFS-12564.2.patch, > HDFS-12564.3.patch, HDFS-12564.4.patch > > > Documentation does not cover the swebhdfs configurations on the client side. > We can reuse the hftp/hsftp documents which was removed from Hadoop-3.0 in > HDFS-5570, HDFS-9640. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14591) NameNode should move the replicas to the correct storages after the storage policy is changed.
[ https://issues.apache.org/jira/browse/HDFS-14591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869174#comment-16869174 ] Wei-Chiu Chuang commented on HDFS-14591: I think what Jinglun wants is a more advanced version, one that detects the temperature of data and move files around according to the temperature. That is, something similar to what's proposed in HDFS-7343. Frankly speaking, SPS looks interesting, but it is not built for my customers' use cases. I've been wanting to have SSM but it's not on the top of my list to implement... > NameNode should move the replicas to the correct storages after the storage > policy is changed. > -- > > Key: HDFS-14591 > URL: https://issues.apache.org/jira/browse/HDFS-14591 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > > Our Xiaomi HDFS has a cluster storaging both HOT and COLD data. We have a > backgroud process searching all the files to find those that are not accessed > for a period of time. Then we set them to COLD and start a mover to move the > replicas. After moving, all the replicas are consistent with the storage > policy. > It's a natural idea to let the NameNode handle the move. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log
[ https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiang Liu updated HDFS-14303: - Affects Version/s: 3.2.0 > check block directory logic not correct when there is only meta file, print > no meaning warn log > --- > > Key: HDFS-14303 > URL: https://issues.apache.org/jira/browse/HDFS-14303 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.7.3, 3.2.0, 2.9.2, 2.8.5 > Environment: env free >Reporter: qiang Liu >Assignee: qiang Liu >Priority: Minor > Labels: easy-fix > Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14303-branch-2.005.patch, > HDFS-14303-branch-2.009.patch, HDFS-14303-branch-2.010.patch, > HDFS-14303-branch-2.015.patch, HDFS-14303-branch-2.017.patch, > HDFS-14303-branch-2.7.001.patch, HDFS-14303-branch-2.7.004.patch, > HDFS-14303-branch-2.7.006.patch, HDFS-14303-branch-2.9.011.patch, > HDFS-14303-branch-2.9.012.patch, HDFS-14303-branch-2.9.013.patch, > HDFS-14303-trunk.014.patch, HDFS-14303-trunk.015.patch, > HDFS-14303-trunk.016.patch, HDFS-14303-trunk.016.path, > HDFS-14303.branch-3.2.017.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > chek block directory logic not correct when there is only meta file,print no > meaning warn log, eg: > WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block > ID-based layout. Actual block file path: > /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68, > expected block file path: > /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13893) DiskBalancer: no validations for Disk balancer commands
[ https://issues.apache.org/jira/browse/HDFS-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869162#comment-16869162 ] Hudson commented on HDFS-13893: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16800 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16800/]) HDFS-13893. DiskBalancer: no validations for Disk balancer commands. (weichiu: rev 272b96d243383d9f50241d48cb070f638243bc9c) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DiskBalancerCLI.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/diskbalancer/command/TestDiskBalancerCommand.java > DiskBalancer: no validations for Disk balancer commands > > > Key: HDFS-13893 > URL: https://issues.apache.org/jira/browse/HDFS-13893 > Project: Hadoop HDFS > Issue Type: Bug > Components: diskbalancer >Reporter: Harshakiran Reddy >Assignee: Lokesh Jain >Priority: Major > Labels: newbie > Fix For: 3.3.0, 3.2.1 > > Attachments: HDFS-13893.001.patch, HDFS-13893.002.patch, > HDFS-13893.003.patch > > > {{Scenario:-}} > > 1 Run the Disk Balancer commands with extra arguments passing > {noformat} > hadoopclient> hdfs diskbalancer -plan hostname --thresholdPercentage 2 > *sgfsdgfs* > 2018-08-31 14:57:35,454 INFO planner.GreedyPlanner: Starting plan for Node : > hostname:50077 > 2018-08-31 14:57:35,457 INFO planner.GreedyPlanner: Disk Volume set > fb67f00c-e333-4f38-a3a6-846a30d4205a Type : DISK plan completed. > 2018-08-31 14:57:35,457 INFO planner.GreedyPlanner: Compute Plan for Node : > hostname:50077 took 23 ms > 2018-08-31 14:57:35,457 INFO command.Command: Writing plan to: > 2018-08-31 14:57:35,457 INFO command.Command: > /system/diskbalancer/2018-Aug-31-14-57-35/hostname.plan.json > Writing plan to: > /system/diskbalancer/2018-Aug-31-14-57-35/hostname.plan.json > {noformat} > Expected Output:- > = > Disk balancer commands should be fail if we pass any invalid arguments or > extra arguments. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log
[ https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14303: --- Fix Version/s: 3.1.3 2.9.3 3.2.1 2.8.6 3.3.0 3.0.4 2.10.0 > check block directory logic not correct when there is only meta file, print > no meaning warn log > --- > > Key: HDFS-14303 > URL: https://issues.apache.org/jira/browse/HDFS-14303 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.7.3, 2.9.2, 2.8.5 > Environment: env free >Reporter: qiang Liu >Assignee: qiang Liu >Priority: Minor > Labels: easy-fix > Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14303-branch-2.005.patch, > HDFS-14303-branch-2.009.patch, HDFS-14303-branch-2.010.patch, > HDFS-14303-branch-2.015.patch, HDFS-14303-branch-2.017.patch, > HDFS-14303-branch-2.7.001.patch, HDFS-14303-branch-2.7.004.patch, > HDFS-14303-branch-2.7.006.patch, HDFS-14303-branch-2.9.011.patch, > HDFS-14303-branch-2.9.012.patch, HDFS-14303-branch-2.9.013.patch, > HDFS-14303-trunk.014.patch, HDFS-14303-trunk.015.patch, > HDFS-14303-trunk.016.patch, HDFS-14303-trunk.016.path, > HDFS-14303.branch-3.2.017.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > chek block directory logic not correct when there is only meta file,print no > meaning warn log, eg: > WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block > ID-based layout. Actual block file path: > /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68, > expected block file path: > /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log
[ https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14303: --- Resolution: Fixed Target Version/s: 2.9.2, 3.2.0 (was: 3.2.0, 2.9.2) Status: Resolved (was: Patch Available) Pushed the patches all the way through to branch-2.8. Thanks [~iamgd67]! > check block directory logic not correct when there is only meta file, print > no meaning warn log > --- > > Key: HDFS-14303 > URL: https://issues.apache.org/jira/browse/HDFS-14303 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.7.3, 2.9.2, 2.8.5 > Environment: env free >Reporter: qiang Liu >Assignee: qiang Liu >Priority: Minor > Labels: easy-fix > Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14303-branch-2.005.patch, > HDFS-14303-branch-2.009.patch, HDFS-14303-branch-2.010.patch, > HDFS-14303-branch-2.015.patch, HDFS-14303-branch-2.017.patch, > HDFS-14303-branch-2.7.001.patch, HDFS-14303-branch-2.7.004.patch, > HDFS-14303-branch-2.7.006.patch, HDFS-14303-branch-2.9.011.patch, > HDFS-14303-branch-2.9.012.patch, HDFS-14303-branch-2.9.013.patch, > HDFS-14303-trunk.014.patch, HDFS-14303-trunk.015.patch, > HDFS-14303-trunk.016.patch, HDFS-14303-trunk.016.path, > HDFS-14303.branch-3.2.017.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > chek block directory logic not correct when there is only meta file,print no > meaning warn log, eg: > WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block > ID-based layout. Actual block file path: > /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68, > expected block file path: > /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log
[ https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869155#comment-16869155 ] qiang Liu commented on HDFS-14303: -- [~jojochuang] patch for 3.2 failed because of timeout wainting minicluster to be active, should it be retriggered? and is [^HDFS-14303-branch-2.017.patch] ready to be pushed to branch-2? > check block directory logic not correct when there is only meta file, print > no meaning warn log > --- > > Key: HDFS-14303 > URL: https://issues.apache.org/jira/browse/HDFS-14303 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.7.3, 2.9.2, 2.8.5 > Environment: env free >Reporter: qiang Liu >Assignee: qiang Liu >Priority: Minor > Labels: easy-fix > Attachments: HDFS-14303-branch-2.005.patch, > HDFS-14303-branch-2.009.patch, HDFS-14303-branch-2.010.patch, > HDFS-14303-branch-2.015.patch, HDFS-14303-branch-2.017.patch, > HDFS-14303-branch-2.7.001.patch, HDFS-14303-branch-2.7.004.patch, > HDFS-14303-branch-2.7.006.patch, HDFS-14303-branch-2.9.011.patch, > HDFS-14303-branch-2.9.012.patch, HDFS-14303-branch-2.9.013.patch, > HDFS-14303-trunk.014.patch, HDFS-14303-trunk.015.patch, > HDFS-14303-trunk.016.patch, HDFS-14303-trunk.016.path, > HDFS-14303.branch-3.2.017.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > chek block directory logic not correct when there is only meta file,print no > meaning warn log, eg: > WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block > ID-based layout. Actual block file path: > /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68, > expected block file path: > /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869147#comment-16869147 ] Lisheng Sun commented on HDFS-14585: [~jojochuang] Could you please take a look ? HDFS-14483 is blocked by this issue. Thanks. > Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14585.branch-2.8.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13359) DataXceiver hung due to the lock in FsDatasetImpl#getBlockInputStream
[ https://issues.apache.org/jira/browse/HDFS-13359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869145#comment-16869145 ] Hadoop QA commented on HDFS-13359: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-13359 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13359 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27027/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > DataXceiver hung due to the lock in FsDatasetImpl#getBlockInputStream > - > > Key: HDFS-13359 > URL: https://issues.apache.org/jira/browse/HDFS-13359 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13359.001.patch, stack.jpg > > > DataXceiver hung due to the lock that locked by > {{FsDatasetImpl#getBlockInputStream}} (have attached stack). > {code:java} > @Override // FsDatasetSpi > public InputStream getBlockInputStream(ExtendedBlock b, > long seekOffset) throws IOException { > ReplicaInfo info; > synchronized(this) { > info = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); > } > ... > } > {code} > The lock {{synchronized(this)}} used here is expensive, there is already one > {{AutoCloseableLock}} type lock defined for {{ReplicaMap}}. We can use it > instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13359) DataXceiver hung due to the lock in FsDatasetImpl#getBlockInputStream
[ https://issues.apache.org/jira/browse/HDFS-13359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869141#comment-16869141 ] Wei-Chiu Chuang commented on HDFS-13359: I would love to improve datanode lock contention, especially in the context of dense DataNodes. That said, it would be really nice to have a performance benchmark to compare the performance before/after the change. > DataXceiver hung due to the lock in FsDatasetImpl#getBlockInputStream > - > > Key: HDFS-13359 > URL: https://issues.apache.org/jira/browse/HDFS-13359 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13359.001.patch, stack.jpg > > > DataXceiver hung due to the lock that locked by > {{FsDatasetImpl#getBlockInputStream}} (have attached stack). > {code:java} > @Override // FsDatasetSpi > public InputStream getBlockInputStream(ExtendedBlock b, > long seekOffset) throws IOException { > ReplicaInfo info; > synchronized(this) { > info = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); > } > ... > } > {code} > The lock {{synchronized(this)}} used here is expensive, there is already one > {{AutoCloseableLock}} type lock defined for {{ReplicaMap}}. We can use it > instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11298) Add storage policy info in FileStatus
[ https://issues.apache.org/jira/browse/HDFS-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869139#comment-16869139 ] Hadoop QA commented on HDFS-11298: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-11298 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-11298 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846064/HDFS-11298.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27026/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add storage policy info in FileStatus > - > > Key: HDFS-11298 > URL: https://issues.apache.org/jira/browse/HDFS-11298 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.7.2 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-11298.001.patch > > > Its good to add storagePolicy field in FileStatus. We no need to call > getStoragePolicy() API to get the policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14592) Support NIO transferTo semantics in HDFS
Chenzhao Guo created HDFS-14592: --- Summary: Support NIO transferTo semantics in HDFS Key: HDFS-14592 URL: https://issues.apache.org/jira/browse/HDFS-14592 Project: Hadoop HDFS Issue Type: New Feature Reporter: Chenzhao Guo I'm currently developing a Spark shuffle manager based on HDFS. I need to merge some spill files on HDFS to one, or rearrange some HDFS files. An API similar to NIO transferTo, which bypasses memory will be more efficient than manually reading and writing bytes(the method I'm using at present). So can HDFS implements something like NIO transferTo? Making path.transferTo(pathDestination) possible? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-6937) Another issue in handling checksum errors in write pipeline
[ https://issues.apache.org/jira/browse/HDFS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-6937: -- Resolution: Duplicate Assignee: (was: Wei-Chiu Chuang) Status: Resolved (was: Patch Available) We spent several years removing all kinds of data corruption bugs, and we no longer see a data corruption incidence any more. There is a good chance that this is fixed by HDFS-4660 (or HDFS-11160 , HDFS-11056 , or other ones) So, with that, I'll resolve this one as a dup. > Another issue in handling checksum errors in write pipeline > --- > > Key: HDFS-6937 > URL: https://issues.apache.org/jira/browse/HDFS-6937 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs-client >Affects Versions: 2.5.0 >Reporter: Yongjun Zhang >Priority: Major > Attachments: HDFS-6937.001.patch, HDFS-6937.002.patch > > > Given a write pipeline: > DN1 -> DN2 -> DN3 > DN3 detected cheksum error and terminate, DN2 truncates its replica to the > ACKed size. Then a new pipeline is attempted as > DN1 -> DN2 -> DN4 > DN4 detects checksum error again. Later when replaced DN4 with DN5 (and so > on), it failed for the same reason. This led to the observation that DN2's > data is corrupted. > Found that the software currently truncates DN2's replca to the ACKed size > after DN3 terminates. But it doesn't check the correctness of the data > already written to disk. > So intuitively, a solution would be, when downstream DN (DN3 here) found > checksum error, propagate this info back to upstream DN (DN2 here), DN2 > checks the correctness of the data already written to disk, and truncate the > replica to to MIN(correctDataSize, ACKedSize). > Found this issue is similar to what was reported by HDFS-3875, and the > truncation at DN2 was actually introduced as part of the HDFS-3875 solution. > Filing this jira for the issue reported here. HDFS-3875 was filed by > [~tlipcon] > and found he proposed something similar there. > {quote} > if the tail node in the pipeline detects a checksum error, then it returns a > special error code back up the pipeline indicating this (rather than just > disconnecting) > if a non-tail node receives this error code, then it immediately scans its > own block on disk (from the beginning up through the last acked length). If > it detects a corruption on its local copy, then it should assume that it is > the faulty one, rather than the downstream neighbor. If it detects no > corruption, then the faulty node is either the downstream mirror or the > network link between the two, and the current behavior is reasonable. > {quote} > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869135#comment-16869135 ] Hadoop QA commented on HDFS-14585: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 7m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.8 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 40s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 28s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 46s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 52s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} branch-2.8 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 48s{color} | {color:red} hadoop-common-project/hadoop-common in branch-2.8 has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 50s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client in branch-2.8 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 43s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 23s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 10s{color} | {color:orange} root: The patch generated 3 new + 202 unchanged - 6 fixed = 205 total (was 208) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 12s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 26s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 29s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 39s{color} | {color:black}
[jira] [Commented] (HDFS-11298) Add storage policy info in FileStatus
[ https://issues.apache.org/jira/browse/HDFS-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869133#comment-16869133 ] Wei-Chiu Chuang commented on HDFS-11298: Storage policy is a HDFS only feature. FileStatus is meant to support all file system abstraction, so I don't feel strongly about having this. > Add storage policy info in FileStatus > - > > Key: HDFS-11298 > URL: https://issues.apache.org/jira/browse/HDFS-11298 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.7.2 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-11298.001.patch > > > Its good to add storagePolicy field in FileStatus. We no need to call > getStoragePolicy() API to get the policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869132#comment-16869132 ] Masatake Iwasaki commented on HDFS-14135: - Failed tests call {{consumeConnectionBacklog}} in another thread. I'm going to update the patch to cover this. > TestWebHdfsTimeouts Fails intermittently in trunk > - > > Key: HDFS-14135 > URL: https://issues.apache.org/jira/browse/HDFS-14135 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14135-01.patch, HDFS-14135-02.patch, > HDFS-14135-03.patch, HDFS-14135-04.patch, HDFS-14135-05.patch, > HDFS-14135-06.patch, HDFS-14135-07.patch, HDFS-14135-08.patch, > HDFS-14135.009.patch, HDFS-14135.010.patch, HDFS-14135.011.patch, > HDFS-14135.012.patch > > > Reference to failure > https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/982/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14591) NameNode should move the replicas to the correct storages after the storage policy is changed.
[ https://issues.apache.org/jira/browse/HDFS-14591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869129#comment-16869129 ] Ayush Saxena edited comment on HDFS-14591 at 6/21/19 3:31 AM: -- can SPS help? Give a check to HDFS-10285 External SPS is done, I guess Internal SPS is still in phase. was (Author: ayushtkn): can SPS help? Give a check to HDFS-10285 > NameNode should move the replicas to the correct storages after the storage > policy is changed. > -- > > Key: HDFS-14591 > URL: https://issues.apache.org/jira/browse/HDFS-14591 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > > Our Xiaomi HDFS has a cluster storaging both HOT and COLD data. We have a > backgroud process searching all the files to find those that are not accessed > for a period of time. Then we set them to COLD and start a mover to move the > replicas. After moving, all the replicas are consistent with the storage > policy. > It's a natural idea to let the NameNode handle the move. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14591) NameNode should move the replicas to the correct storages after the storage policy is changed.
[ https://issues.apache.org/jira/browse/HDFS-14591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869129#comment-16869129 ] Ayush Saxena commented on HDFS-14591: - can SPS help? Give a check to HDFS-10285 > NameNode should move the replicas to the correct storages after the storage > policy is changed. > -- > > Key: HDFS-14591 > URL: https://issues.apache.org/jira/browse/HDFS-14591 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > > Our Xiaomi HDFS has a cluster storaging both HOT and COLD data. We have a > backgroud process searching all the files to find those that are not accessed > for a period of time. Then we set them to COLD and start a mover to move the > replicas. After moving, all the replicas are consistent with the storage > policy. > It's a natural idea to let the NameNode handle the move. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13893) DiskBalancer: no validations for Disk balancer commands
[ https://issues.apache.org/jira/browse/HDFS-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-13893: --- Resolution: Fixed Fix Version/s: 3.2.1 3.3.0 Status: Resolved (was: Patch Available) +1 patch still applies. Pushed to trunk, branch-3.2. There's a trivial conflict in branch-3.1 so I'll stop here. Feel free to reopen and work on it. > DiskBalancer: no validations for Disk balancer commands > > > Key: HDFS-13893 > URL: https://issues.apache.org/jira/browse/HDFS-13893 > Project: Hadoop HDFS > Issue Type: Bug > Components: diskbalancer >Reporter: Harshakiran Reddy >Assignee: Lokesh Jain >Priority: Major > Labels: newbie > Fix For: 3.3.0, 3.2.1 > > Attachments: HDFS-13893.001.patch, HDFS-13893.002.patch, > HDFS-13893.003.patch > > > {{Scenario:-}} > > 1 Run the Disk Balancer commands with extra arguments passing > {noformat} > hadoopclient> hdfs diskbalancer -plan hostname --thresholdPercentage 2 > *sgfsdgfs* > 2018-08-31 14:57:35,454 INFO planner.GreedyPlanner: Starting plan for Node : > hostname:50077 > 2018-08-31 14:57:35,457 INFO planner.GreedyPlanner: Disk Volume set > fb67f00c-e333-4f38-a3a6-846a30d4205a Type : DISK plan completed. > 2018-08-31 14:57:35,457 INFO planner.GreedyPlanner: Compute Plan for Node : > hostname:50077 took 23 ms > 2018-08-31 14:57:35,457 INFO command.Command: Writing plan to: > 2018-08-31 14:57:35,457 INFO command.Command: > /system/diskbalancer/2018-Aug-31-14-57-35/hostname.plan.json > Writing plan to: > /system/diskbalancer/2018-Aug-31-14-57-35/hostname.plan.json > {noformat} > Expected Output:- > = > Disk balancer commands should be fail if we pass any invalid arguments or > extra arguments. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14591) NameNode should move the replicas to the correct storages after the storage policy is changed.
Jinglun created HDFS-14591: -- Summary: NameNode should move the replicas to the correct storages after the storage policy is changed. Key: HDFS-14591 URL: https://issues.apache.org/jira/browse/HDFS-14591 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jinglun Assignee: Jinglun Our Xiaomi HDFS has a cluster storaging both HOT and COLD data. We have a backgroud process searching all the files to find those that are not accessed for a period of time. Then we set them to COLD and start a mover to move the replicas. After moving, all the replicas are consistent with the storage policy. It's a natural idea to let the NameNode handle the move. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14568) The quota and consume of the file's ancestors are not handled when the storage policy of the file is changed.
[ https://issues.apache.org/jira/browse/HDFS-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun updated HDFS-14568: --- Description: The quota and consume of the file's ancestors are not handled when the storage policy of the file is changed. For example: 1. Set quota StorageType.SSD fileSpace-1 to the parent dir; 2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} under it; 3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and expect a QuotaByStorageTypeExceededException. Because the quota and consume is not handled, the expected exception is not threw out. There are 3 reasons why we should handle the consume and the quota. 1. Replication uses the new storage policy. Considering a file with BlockType CONTIGUOUS. It's replication factor is 3 and it's storage policy is "HOT". Now we change the policy to "ONE_SSD". If a DN goes down and the file needs replication, the NN will choose storages in policy "ONE_SSD" and replicate the block to a SSD storage. 2. We acturally have a cluster storaging both HOT and COLD data. We have a backgroud process searching all the files to find those that are not accessed for a period of time. Then we set them to COLD and start a mover to move the replicas. After moving, all the replicas are consistent with the storage policy. 3. The NameNode manages the global state of the cluster. If there is any inconsistent situation, such as the replicas doesn't match the storage policy of the file, we should take the NameNode as the standard and make the cluster to match the NameNode. The block replication is a good example of the rule. When we count the consume of a file(CONTIGUOUS), we multiply the replication factor with the file's length, no matter the file is under replicated or excessed. So does the storage type quota and consume. was: The quota and consume of the file's ancestors are not handled when the storage policy of the file is changed. For example: 1. Set quota StorageType.SSD fileSpace-1 to the parent dir; 2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} under it; 3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and expect a QuotaByStorageTypeExceededException. Because the quota and consume is not handled, the expected exception is not threw out. > The quota and consume of the file's ancestors are not handled when the > storage policy of the file is changed. > - > > Key: HDFS-14568 > URL: https://issues.apache.org/jira/browse/HDFS-14568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.1.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-14568-001.patch, HDFS-14568-unit-test.patch > > > The quota and consume of the file's ancestors are not handled when the > storage policy of the file is changed. For example: > 1. Set quota StorageType.SSD fileSpace-1 to the parent dir; > 2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} > under it; > 3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and > expect a QuotaByStorageTypeExceededException. > Because the quota and consume is not handled, the expected exception is not > threw out. > > There are 3 reasons why we should handle the consume and the quota. > 1. Replication uses the new storage policy. Considering a file with BlockType > CONTIGUOUS. It's replication factor is 3 and it's storage policy is "HOT". > Now we change the policy to "ONE_SSD". If a DN goes down and the file needs > replication, the NN will choose storages in policy "ONE_SSD" and replicate > the block to a SSD storage. > 2. We acturally have a cluster storaging both HOT and COLD data. We have a > backgroud process searching all the files to find those that are not accessed > for a period of time. Then we set them to COLD and start a mover to move the > replicas. After moving, all the replicas are consistent with the storage > policy. > 3. The NameNode manages the global state of the cluster. If there is any > inconsistent situation, such as the replicas doesn't match the storage policy > of the file, we should take the NameNode as the standard and make the cluster > to match the NameNode. The block replication is a good example of the rule. > When we count the consume of a file(CONTIGUOUS), we multiply the replication > factor with the file's length, no matter the file is under replicated or > excessed. So does the storage type quota and consume. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For
[jira] [Updated] (HDFS-12564) Add the documents of swebhdfs configurations on the client side
[ https://issues.apache.org/jira/browse/HDFS-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-12564: --- Resolution: Fixed Fix Version/s: 3.1.3 3.2.1 3.3.0 Status: Resolved (was: Patch Available) +1 patch still applies. Pushed to trunk, branch-3.2 and branch-3.1. Thanks [~tasanuma]! > Add the documents of swebhdfs configurations on the client side > --- > > Key: HDFS-12564 > URL: https://issues.apache.org/jira/browse/HDFS-12564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, webhdfs >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-12564.1.patch, HDFS-12564.2.patch, > HDFS-12564.3.patch, HDFS-12564.4.patch > > > Documentation does not cover the swebhdfs configurations on the client side. > We can reuse the hftp/hsftp documents which was removed from Hadoop-3.0 in > HDFS-5570, HDFS-9640. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14568) The quota and consume of the file's ancestors are not handled when the storage policy of the file is changed.
[ https://issues.apache.org/jira/browse/HDFS-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869122#comment-16869122 ] Jinglun commented on HDFS-14568: Hi [~ayushtkn], thanks for your comments. There are 3 reasons why we should handle the consume and the quota. 1. Replication uses the new storage policy. Considering a file with BlockType CONTIGUOUS. It's replication factor is 3 and it's storage policy is "HOT". Now we change the policy to "ONE_SSD". If a DN goes down and the file needs replication, the NN will choose storages in policy "ONE_SSD" and replicate the block to a SSD storage. 2. We acturally have a cluster storaging both HOT and COLD data. We have a backgroud process searching all the files to find those that are not accessed for a period of time. Then we set them to COLD and start a mover to move the replicas. After moving, all the replicas are consistent with the storage policy. 3. The NameNode manages the global state of the cluster. If there is any inconsistent situation, such as the replicas doesn't match the storage policy of the file, we should take the NameNode as the standard and make the cluster to match the NameNode. The block replication is a good example of the rule. When we count the consume of a file(CONTIGUOUS), we multiply the replication factor with the file's length, no matter the file is under replicated or excessed. So does the storage type quota and consume. After the change of the storage type policy, the replicas should be moved to the right storages automatically by the NameNode. Let's start a new jira for that. > The quota and consume of the file's ancestors are not handled when the > storage policy of the file is changed. > - > > Key: HDFS-14568 > URL: https://issues.apache.org/jira/browse/HDFS-14568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.1.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-14568-001.patch, HDFS-14568-unit-test.patch > > > The quota and consume of the file's ancestors are not handled when the > storage policy of the file is changed. For example: > 1. Set quota StorageType.SSD fileSpace-1 to the parent dir; > 2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} > under it; > 3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and > expect a QuotaByStorageTypeExceededException. > Because the quota and consume is not handled, the expected exception is not > threw out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12564) Add the documents of swebhdfs configurations on the client side
[ https://issues.apache.org/jira/browse/HDFS-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869123#comment-16869123 ] Hudson commented on HDFS-12564: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16799 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16799/]) HDFS-12564. Add the documents of swebhdfs configurations on the client (weichiu: rev 98d20656433cdec76c2108d24ff3b935657c1e80) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/WebHDFS.md * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/ServerSetup.md.vm * (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm > Add the documents of swebhdfs configurations on the client side > --- > > Key: HDFS-12564 > URL: https://issues.apache.org/jira/browse/HDFS-12564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, webhdfs >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HDFS-12564.1.patch, HDFS-12564.2.patch, > HDFS-12564.3.patch, HDFS-12564.4.patch > > > Documentation does not cover the swebhdfs configurations on the client side. > We can reuse the hftp/hsftp documents which was removed from Hadoop-3.0 in > HDFS-5570, HDFS-9640. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14483) Backport HDFS-3246 ByteBuffer pread interface to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14483: --- Attachment: HDFS-14483.branch-2.8.v1.patch Status: Patch Available (was: Open) > Backport HDFS-3246 ByteBuffer pread interface to branch-2.8.x > - > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14465) When the Block expected replications is larger than the number of DataNodes, entering maintenance will never exit.
[ https://issues.apache.org/jira/browse/HDFS-14465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14465: --- Resolution: Fixed Fix Version/s: 2.9.3 2.10.0 Status: Resolved (was: Patch Available) Pushed to branch-2 and branch-2.9. Thanks! > When the Block expected replications is larger than the number of DataNodes, > entering maintenance will never exit. > -- > > Key: HDFS-14465 > URL: https://issues.apache.org/jira/browse/HDFS-14465 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14465.01.patch, HDFS-14465.02.patch, > HDFS-14465.branch-2.9.01.patch > > > Scenes: > There is a small HDFS cluster with 5 DataNodes; one of them is maintained, > added to the maintenance list, and set > dfs.namenode.maintenance.replication.min to 1. > When refresh Nodes, the NameNode starts checking whether the blocks on the > node require a new replication. > The replications of the MapReduce task job file is 10 by default, > isNeededReplicationForMaintenance will determine to false, and > isSufficientlyReplicated will determine to false, so the block of the job > file needs to increase the replication. > When adding a replication, since the cluster has only 5 DataNodes, all the > nodes have the replications of the block, chooseTargetInOrder will throw a > NotEnoughReplicasException, so that the replication cannot be increase, and > the Entering Maintenance cannot be ended. > This issue will cause the independent small cluster to be unable to use the > maintenance mode. > > {panel:title=chooseTarget exception log} > 2019-05-03 23:42:31,008 [31545331] - WARN > [ReplicationMonitor:BlockPlacementPolicyDefault@431] - Failed to place enough > replicas, still in need of 1 to reach 5 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and > org.apache.hadoop.net.NetworkTopology > {panel} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14590) [SBN Read] Add the document link to the top page
[ https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869109#comment-16869109 ] Hadoop QA commented on HDFS-14590: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 27m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 40m 18s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14590 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972394/HDFS-14590.001.patch | | Optional Tests | dupname asflicense mvnsite xml | | uname | Linux c6f774a85d5d 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d9a9e99 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 436 (vs. ulimit of 1) | | modules | C: hadoop-project U: hadoop-project | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27023/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [SBN Read] Add the document link to the top page > > > Key: HDFS-14590 > URL: https://issues.apache.org/jira/browse/HDFS-14590 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HDFS-14590.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14586) Trash missing delete the folder which near timeout checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869107#comment-16869107 ] maobaolong commented on HDFS-14586: --- [~huyongfa] This is what our company most want? Thank you for your contribution! > Trash missing delete the folder which near timeout checkpoint > - > > Key: HDFS-14586 > URL: https://issues.apache.org/jira/browse/HDFS-14586 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hu yongfa >Priority: Major > Attachments: HDFS-14586.001.patch > > > when trash timeout checkpoint coming, trash will delete the old folder first, > then create a new checkpoint folder. > as the delete action may spend a long time, such as 2 minutes, so the new > checkpoint folder created late. > at the next trash timeout checkpoint, trash will skip delete the new > checkpoint folder, because the new checkpoint folder is > less than a checkpoint interval. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869104#comment-16869104 ] Ayush Saxena commented on HDFS-14135: - The test failed in this build. https://builds.apache.org/job/PreCommit-HDFS-Build/27020/testReport/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/ > TestWebHdfsTimeouts Fails intermittently in trunk > - > > Key: HDFS-14135 > URL: https://issues.apache.org/jira/browse/HDFS-14135 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14135-01.patch, HDFS-14135-02.patch, > HDFS-14135-03.patch, HDFS-14135-04.patch, HDFS-14135-05.patch, > HDFS-14135-06.patch, HDFS-14135-07.patch, HDFS-14135-08.patch, > HDFS-14135.009.patch, HDFS-14135.010.patch, HDFS-14135.011.patch, > HDFS-14135.012.patch > > > Reference to failure > https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/982/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14586) Trash missing delete the folder which near timeout checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869102#comment-16869102 ] Wu Weiwei commented on HDFS-14586: -- Great! This problem has also occurred in our production environment. More than one day of the checkpoint directory was retained, triggering an annoying alarm call, and we had to manually delete the expired directory. This patch can solve our problem. > Trash missing delete the folder which near timeout checkpoint > - > > Key: HDFS-14586 > URL: https://issues.apache.org/jira/browse/HDFS-14586 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hu yongfa >Priority: Major > Attachments: HDFS-14586.001.patch > > > when trash timeout checkpoint coming, trash will delete the old folder first, > then create a new checkpoint folder. > as the delete action may spend a long time, such as 2 minutes, so the new > checkpoint folder created late. > at the next trash timeout checkpoint, trash will skip delete the new > checkpoint folder, because the new checkpoint folder is > less than a checkpoint interval. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14585: --- Attachment: HDFS-14585.branch-2.8.v1.patch Status: Patch Available (was: Open) > Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14585.branch-2.8.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14585: --- Attachment: (was: HDFS-14585.branch-2.8.5.v1.patch) > Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14585: --- Status: Open (was: Patch Available) > Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14585: --- Attachment: (was: HDFS-14585.branch285.000.patch) > Backport HDFS-8901 Use ByteBuffer in striping positional read to branch-2.8.x > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869092#comment-16869092 ] Hadoop QA commented on HDFS-12914: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 9s{color} | {color:red} HDFS-12914 does not apply to branch-3.1. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12914 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972354/HDFS-12914.branch-3.1.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27022/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, > HDFS-12914.branch-3.2.patch, HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14589) RPC fairness for Datanode data transfers
[ https://issues.apache.org/jira/browse/HDFS-14589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869090#comment-16869090 ] Wei-Chiu Chuang commented on HDFS-14589: If this capability exists, we can avoid the situation found in HDFS-12737. > RPC fairness for Datanode data transfers > > > Key: HDFS-14589 > URL: https://issues.apache.org/jira/browse/HDFS-14589 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Íñigo Goiri >Assignee: Xue Liu >Priority: Major > > Currently, the Datanode just replies to the data transfers from the clients > as soon as they come. > Eventually, when the {{DataXceiverServer}} runs out of threads, it just > refuses: > {code} > // Make sure the xceiver count is not exceeded > int curXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > throw new IOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xcievers: " > + maxXceiverCount); > } > {code} > We had a situation where a user had many containers accessing the same block > and ending up saturating the 3 Datanodes and messing with the other users. > Ideally, the Namenode should manage this situation some degree but we can > still get into this situation. > We should have some smart in the DN to track this and apply some fairness to > the number of requests per user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14590) [SBN Read] Add the document link to the top page
[ https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-14590: Status: Patch Available (was: Open) > [SBN Read] Add the document link to the top page > > > Key: HDFS-14590 > URL: https://issues.apache.org/jira/browse/HDFS-14590 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HDFS-14590.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14590) [SBN Read] Add the document link to the top page
[ https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869087#comment-16869087 ] Takanobu Asanuma commented on HDFS-14590: - Uploaded the 1st patch. > [SBN Read] Add the document link to the top page > > > Key: HDFS-14590 > URL: https://issues.apache.org/jira/browse/HDFS-14590 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HDFS-14590.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14590) [SBN Read] Add the document link to the top page
[ https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-14590: Attachment: HDFS-14590.001.patch > [SBN Read] Add the document link to the top page > > > Key: HDFS-14590 > URL: https://issues.apache.org/jira/browse/HDFS-14590 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HDFS-14590.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14590) [SBN Read] Add the document link to the top page
Takanobu Asanuma created HDFS-14590: --- Summary: [SBN Read] Add the document link to the top page Key: HDFS-14590 URL: https://issues.apache.org/jira/browse/HDFS-14590 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869073#comment-16869073 ] Hadoop QA commented on HDFS-14135: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 13s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}139m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14135 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972380/HDFS-14135.012.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a1d3cfafc8ff 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d9a9e99 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/27020/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27020/testReport/ | | Max. process+thread count | 4225 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Commented] (HDDS-1667) Docker compose file may referring to incorrect docker image name
[ https://issues.apache.org/jira/browse/HDDS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869053#comment-16869053 ] Hadoop QA commented on HDDS-1667: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} yamllint {color} | {color:blue} 0m 0s{color} | {color:blue} yamllint was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 32s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 32m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 36s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 36s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 56s{color} | {color:green} hadoop-hdds in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 59s{color} | {color:red} hadoop-ozone in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 83m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.ozone.client.rpc.TestFailureHandlingByClient | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider | | | hadoop.hdds.scm.safemode.TestSCMSafeModeWithPipelineRules | | | hadoop.ozone.om.TestOzoneManagerHA | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/PreCommit-HDDS-Build/2735/artifact/out/Dockerfile | | JIRA Issue | HDDS-1667 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972383/HDDS-1667.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml yamllint | | uname | Linux 9b17e4d12cb2 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018
[jira] [Updated] (HDFS-14573) Backport Standby Read to branch-3
[ https://issues.apache.org/jira/browse/HDFS-14573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14573: -- Attachment: HDFS-14573-branch-3.2.004.patch > Backport Standby Read to branch-3 > - > > Key: HDFS-14573 > URL: https://issues.apache.org/jira/browse/HDFS-14573 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14573-branch-3.0.001.patch, > HDFS-14573-branch-3.1.001.patch, HDFS-14573-branch-3.2.001.patch, > HDFS-14573-branch-3.2.002.patch, HDFS-14573-branch-3.2.003.patch, > HDFS-14573-branch-3.2.004.patch > > > This Jira tracks backporting the feature consistent read from standby > (HDFS-12943) to branch-3.x, including 3.0, 3.1, 3.2. This is required for > backporting to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869039#comment-16869039 ] Íñigo Goiri commented on HDFS-14403: I opened a JIRA for the DN to use this kind of fairness (HDFS-14589). I wanted to double check that this was not already in the air. >From the code in DataXceiverServer and the DataNode it doesn't look like this >is done at all. Regarding this JIRA itself, I have a couple minor comments: * The location of {{applyWeights()}} in {{ProcessingDetails}} is a little dangerous; should we make sure that weight and timing are the same size? I'm not sure the {{applyWeights()}} naming is the most intuitive either. It looks more like a {{getCost()}}. I would actually just move the full code to {{WeightedTimeCostProvider}}. Not sure it adds much value to add it to {{ProcessingDetails}}. * Do we want to add a couple unit tests with corner cases? Like having no requests, etc? * Use logger format for the logs (using {}). * Use lambda for the {{GenericTestUtils#waitFor()}}. * I would add some high level text about the cost based approach not only describing the fields but describing what the goal is and how one would use it and set it up "end to end"-ish. > Cost-Based RPC FairCallQueue > > > Key: HDFS-14403 > URL: https://issues.apache.org/jira/browse/HDFS-14403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc, namenode >Reporter: Erik Krogen >Assignee: Christopher Gregorian >Priority: Major > Labels: qos, rpc > Attachments: CostBasedFairCallQueueDesign_v0.pdf, > HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, > HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, > HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, > HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, > HDFS-14403.branch-2.8.patch > > > HADOOP-15016 initially described extensions to the Hadoop FairCallQueue > encompassing both cost-based analysis of incoming RPCs, as well as support > for reservations of RPC capacity for system/platform users. This JIRA intends > to track the former, as HADOOP-15016 was repurposed to more specifically > focus on the reservation portion of the work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14589) RPC fairness for Datanode data transfers
[ https://issues.apache.org/jira/browse/HDFS-14589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869036#comment-16869036 ] Íñigo Goiri commented on HDFS-14589: Not sure if we should use similat to the cost-based RPC FairCallQueue in HDFS-14403 as the requests might all be the same size but it is worth experimenting. > RPC fairness for Datanode data transfers > > > Key: HDFS-14589 > URL: https://issues.apache.org/jira/browse/HDFS-14589 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Íñigo Goiri >Assignee: Xue Liu >Priority: Major > > Currently, the Datanode just replies to the data transfers from the clients > as soon as they come. > Eventually, when the {{DataXceiverServer}} runs out of threads, it just > refuses: > {code} > // Make sure the xceiver count is not exceeded > int curXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > throw new IOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xcievers: " > + maxXceiverCount); > } > {code} > We had a situation where a user had many containers accessing the same block > and ending up saturating the 3 Datanodes and messing with the other users. > Ideally, the Namenode should manage this situation some degree but we can > still get into this situation. > We should have some smart in the DN to track this and apply some fairness to > the number of requests per user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14589) RPC fairness for Datanode data transfers
[ https://issues.apache.org/jira/browse/HDFS-14589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869036#comment-16869036 ] Íñigo Goiri edited comment on HDFS-14589 at 6/20/19 11:52 PM: -- Not sure if we should use similar to the cost-based RPC FairCallQueue in HDFS-14403 as the requests might all be the same size but it is worth experimenting. was (Author: elgoiri): Not sure if we should use similat to the cost-based RPC FairCallQueue in HDFS-14403 as the requests might all be the same size but it is worth experimenting. > RPC fairness for Datanode data transfers > > > Key: HDFS-14589 > URL: https://issues.apache.org/jira/browse/HDFS-14589 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Íñigo Goiri >Assignee: Xue Liu >Priority: Major > > Currently, the Datanode just replies to the data transfers from the clients > as soon as they come. > Eventually, when the {{DataXceiverServer}} runs out of threads, it just > refuses: > {code} > // Make sure the xceiver count is not exceeded > int curXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > throw new IOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xcievers: " > + maxXceiverCount); > } > {code} > We had a situation where a user had many containers accessing the same block > and ending up saturating the 3 Datanodes and messing with the other users. > Ideally, the Namenode should manage this situation some degree but we can > still get into this situation. > We should have some smart in the DN to track this and apply some fairness to > the number of requests per user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14589) RPC fairness for Datanode data transfers
Íñigo Goiri created HDFS-14589: -- Summary: RPC fairness for Datanode data transfers Key: HDFS-14589 URL: https://issues.apache.org/jira/browse/HDFS-14589 Project: Hadoop HDFS Issue Type: Bug Reporter: Íñigo Goiri Assignee: Xue Liu Currently, the Datanode just replies to the data transfers from the clients as soon as they come. Eventually, when the {{DataXceiverServer}} runs out of threads, it just refuses: {code} // Make sure the xceiver count is not exceeded int curXceiverCount = datanode.getXceiverCount(); if (curXceiverCount > maxXceiverCount) { throw new IOException("Xceiver count " + curXceiverCount + " exceeds the limit of concurrent xcievers: " + maxXceiverCount); } {code} We had a situation where a user had many containers accessing the same block and ending up saturating the 3 Datanodes and messing with the other users. Ideally, the Namenode should manage this situation some degree but we can still get into this situation. We should have some smart in the DN to track this and apply some fairness to the number of requests per user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1711) Set a global reference property for Ozone image name
[ https://issues.apache.org/jira/browse/HDDS-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated HDDS-1711: Attachment: HDDS-1711.001.patch > Set a global reference property for Ozone image name > > > Key: HDDS-1711 > URL: https://issues.apache.org/jira/browse/HDDS-1711 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: HDDS-1711.001.patch > > > Ozone Kubernetes templates are using docker.image property for controlling > which image to run Kubernetes examples. It would be best to rename the > property to ozone.docker.image to prevent conflict between Ozone and other > Hadoop sub-projects. > There are also a few typo in the existing Kubernetes templates that reference > to elek/ozone. This looks like need to match the default Ozone image. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads (Dynamometer)
[ https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869026#comment-16869026 ] Hadoop QA commented on HDFS-12345: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 19 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 24m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 17m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-assemblies hadoop-tools hadoop-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 15s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 27s{color} | {color:orange} root: The patch generated 18 new + 0 unchanged - 0 fixed = 18 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 16m 58s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} shellcheck {color} | {color:red} 0m 1s{color} | {color:red} The patch generated 5 new + 1 unchanged - 0 fixed = 6 total (was 1) {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 18s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 6 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 22s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-assemblies hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-dist hadoop-tools/hadoop-dynamometer hadoop-tools hadoop-dist . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}175m 26s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 2s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}355m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed
[jira] [Commented] (HDDS-1667) Docker compose file may referring to incorrect docker image name
[ https://issues.apache.org/jira/browse/HDDS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869024#comment-16869024 ] Eric Yang commented on HDDS-1667: - [~elek] Patch 006 ensures docker.image can continue to work for Kubernetes templates until HDDS-1711 is committed. What should be the default value for HADOOP_IMAGE? > Docker compose file may referring to incorrect docker image name > > > Key: HDDS-1667 > URL: https://issues.apache.org/jira/browse/HDDS-1667 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Trivial > Fix For: 0.4.1 > > Attachments: HDDS-1667.001.patch, HDDS-1667.002.patch, > HDDS-1667.003.patch, HDDS-1667.004.patch, HDDS-1667.005.patch, > HDDS-1667.006.patch > > > In fault injection test, the docker compose file is templated using: > ${user.name}/ozone:${project.version} > If user pass in parameter -Ddocker.image to cause docker build to generate a > different name. This can cause fault injection test to fail/stuck because it > could not find the required docker image. The fix is simply use docker.image > token to filter docker compose file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1667) Docker compose file may referring to incorrect docker image name
[ https://issues.apache.org/jira/browse/HDDS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated HDDS-1667: Attachment: HDDS-1667.006.patch > Docker compose file may referring to incorrect docker image name > > > Key: HDDS-1667 > URL: https://issues.apache.org/jira/browse/HDDS-1667 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Trivial > Fix For: 0.4.1 > > Attachments: HDDS-1667.001.patch, HDDS-1667.002.patch, > HDDS-1667.003.patch, HDDS-1667.004.patch, HDDS-1667.005.patch, > HDDS-1667.006.patch > > > In fault injection test, the docker compose file is templated using: > ${user.name}/ozone:${project.version} > If user pass in parameter -Ddocker.image to cause docker build to generate a > different name. This can cause fault injection test to fail/stuck because it > could not find the required docker image. The fix is simply use docker.image > token to filter docker compose file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14587) Support fail fast when client wait ACK by pipeline over threshold
[ https://issues.apache.org/jira/browse/HDFS-14587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869010#comment-16869010 ] Wei-Chiu Chuang commented on HDFS-14587: This one: HDFS-8311. Note that despite the summary suggests it's for data transfer, the timeout applies to clients as well. See the analysis in HDFS-13103. So I guess you're on 2.7? HDFS-8311 is in 2.8.0 > Support fail fast when client wait ACK by pipeline over threshold > - > > Key: HDFS-14587 > URL: https://issues.apache.org/jira/browse/HDFS-14587 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > > Recently, I meet corner case that client wait for data to be acknowledged by > pipeline over 9 hours. After check branch trunk, I think this issue still > exist. So I propose to add threshold about wait timeout then fail fast. > {code:java} > 2019-06-18 12:53:46,217 WARN [Thread-127] org.apache.hadoop.hdfs.DFSClient: > Slow waitForAckedSeqno took 35560718ms (threshold=3ms) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1690) ContainerController should provide a way to retrieve containers per volume
[ https://issues.apache.org/jira/browse/HDDS-1690?focusedWorklogId=264190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264190 ] ASF GitHub Bot logged work on HDDS-1690: Author: ASF GitHub Bot Created on: 20/Jun/19 22:22 Start Date: 20/Jun/19 22:22 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #986: [HDDS-1690] ContainerController should provide a way to retrieve cont… URL: https://github.com/apache/hadoop/pull/986#issuecomment-504220709 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 29 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 485 | trunk passed | | +1 | compile | 260 | trunk passed | | +1 | checkstyle | 73 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 866 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 163 | trunk passed | | 0 | spotbugs | 313 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 503 | trunk passed | ||| _ Patch Compile Tests _ | | +1 | mvninstall | 444 | the patch passed | | +1 | compile | 265 | the patch passed | | +1 | javac | 265 | the patch passed | | +1 | checkstyle | 78 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 1 | The patch has no whitespace issues. | | +1 | shadedclient | 675 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 153 | the patch passed | | +1 | findbugs | 520 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 235 | hadoop-hdds in the patch passed. | | -1 | unit | 1076 | hadoop-ozone in the patch failed. | | +1 | asflicense | 35 | The patch does not generate ASF License warnings. | | | | 6027 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-986/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/986 | | JIRA Issue | HDDS-1690 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 722a02f0507a 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / d9a9e99 | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-986/3/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-986/3/testReport/ | | Max. process+thread count | 4643 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/container-service U: hadoop-hdds/container-service | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-986/3/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264190) Time Spent: 2h 10m (was: 2h) > ContainerController should provide a way to retrieve containers per volume > -- > > Key: HDDS-1690 > URL: https://issues.apache.org/jira/browse/HDDS-1690 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Hrishikesh Gadre >Assignee: Hrishikesh Gadre >Priority: Major > Labels: pull-request-available >
[jira] [Commented] (HDDS-1690) ContainerController should provide a way to retrieve containers per volume
[ https://issues.apache.org/jira/browse/HDDS-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868978#comment-16868978 ] Hadoop QA commented on HDDS-1690: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 43s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 5m 13s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 55s{color} | {color:green} hadoop-hdds in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 56s{color} | {color:red} hadoop-ozone in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}100m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-986/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/986 | | JIRA Issue | HDDS-1690 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 722a02f0507a 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | |
[jira] [Commented] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868977#comment-16868977 ] Masatake Iwasaki commented on HDFS-14135: - 012 for addressing checkstyle errors and javac warnings. > TestWebHdfsTimeouts Fails intermittently in trunk > - > > Key: HDFS-14135 > URL: https://issues.apache.org/jira/browse/HDFS-14135 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14135-01.patch, HDFS-14135-02.patch, > HDFS-14135-03.patch, HDFS-14135-04.patch, HDFS-14135-05.patch, > HDFS-14135-06.patch, HDFS-14135-07.patch, HDFS-14135-08.patch, > HDFS-14135.009.patch, HDFS-14135.010.patch, HDFS-14135.011.patch, > HDFS-14135.012.patch > > > Reference to failure > https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/982/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14135) TestWebHdfsTimeouts Fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-14135: Attachment: HDFS-14135.012.patch > TestWebHdfsTimeouts Fails intermittently in trunk > - > > Key: HDFS-14135 > URL: https://issues.apache.org/jira/browse/HDFS-14135 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14135-01.patch, HDFS-14135-02.patch, > HDFS-14135-03.patch, HDFS-14135-04.patch, HDFS-14135-05.patch, > HDFS-14135-06.patch, HDFS-14135-07.patch, HDFS-14135-08.patch, > HDFS-14135.009.patch, HDFS-14135.010.patch, HDFS-14135.011.patch, > HDFS-14135.012.patch > > > Reference to failure > https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/982/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsTimeouts/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1685) Recon: Add support for "start" query param to containers and containers/{id} endpoints
[ https://issues.apache.org/jira/browse/HDDS-1685?focusedWorklogId=264183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264183 ] ASF GitHub Bot logged work on HDDS-1685: Author: ASF GitHub Bot Created on: 20/Jun/19 22:15 Start Date: 20/Jun/19 22:15 Worklog Time Spent: 10m Work Description: avijayanhwx commented on pull request #987: HDDS-1685. Recon: Add support for 'start' query param to containers… URL: https://github.com/apache/hadoop/pull/987#discussion_r296032332 ## File path: hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/ContainerDBServiceProviderImpl.java ## @@ -164,38 +194,42 @@ public Integer getCountForForContainerKeyPrefix( return prefixes; } - /** - * Get all the containers. - * - * @return Map of containerID -> containerMetadata. - * @throws IOException - */ - @Override - public Map getContainers() throws IOException { -// Set a negative limit to get all the containers. -return getContainers(-1); Review comment: Can we name the -1 as something like Container.ALL? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264183) Time Spent: 1h (was: 50m) > Recon: Add support for "start" query param to containers and containers/{id} > endpoints > -- > > Key: HDDS-1685 > URL: https://issues.apache.org/jira/browse/HDDS-1685 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Affects Versions: 0.4.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > * Support "start" query param to seek to the given key in RocksDB. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1685) Recon: Add support for "start" query param to containers and containers/{id} endpoints
[ https://issues.apache.org/jira/browse/HDDS-1685?focusedWorklogId=264185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264185 ] ASF GitHub Bot logged work on HDDS-1685: Author: ASF GitHub Bot Created on: 20/Jun/19 22:15 Start Date: 20/Jun/19 22:15 Worklog Time Spent: 10m Work Description: avijayanhwx commented on pull request #987: HDDS-1685. Recon: Add support for 'start' query param to containers… URL: https://github.com/apache/hadoop/pull/987#discussion_r296030739 ## File path: hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/ContainerDBServiceProviderImpl.java ## @@ -128,23 +128,53 @@ public Integer getCountForForContainerKeyPrefix( } /** - * Use the DB's prefix seek iterator to start the scan from the given - * container ID prefix. + * Get key prefixes for the given container ID. * * @param containerId the given containerId. * @return Map of (Key-Prefix,Count of Keys). */ @Override public Map getKeyPrefixesForContainer( long containerId) throws IOException { +// set the default startKeyPrefix to empty string +return getKeyPrefixesForContainer(containerId, ""); Review comment: Maybe we can use StringUtils.EMPTY. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264185) Time Spent: 1h 20m (was: 1h 10m) > Recon: Add support for "start" query param to containers and containers/{id} > endpoints > -- > > Key: HDDS-1685 > URL: https://issues.apache.org/jira/browse/HDDS-1685 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Affects Versions: 0.4.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > * Support "start" query param to seek to the given key in RocksDB. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1685) Recon: Add support for "start" query param to containers and containers/{id} endpoints
[ https://issues.apache.org/jira/browse/HDDS-1685?focusedWorklogId=264184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264184 ] ASF GitHub Bot logged work on HDDS-1685: Author: ASF GitHub Bot Created on: 20/Jun/19 22:15 Start Date: 20/Jun/19 22:15 Worklog Time Spent: 10m Work Description: avijayanhwx commented on pull request #987: HDDS-1685. Recon: Add support for 'start' query param to containers… URL: https://github.com/apache/hadoop/pull/987#discussion_r296032651 ## File path: hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/spi/ContainerDBServiceProvider.java ## @@ -85,7 +86,8 @@ Integer getCountForForContainerKeyPrefix( * @return Map of containerID -> containerMetadata. * @throws IOException */ - Map getContainers(int limit) throws IOException; + Map getContainers(int limit, long start) Review comment: According to the implementation, the start key will be skipped if present in the seek. This is to support a pagination kind of API. Maybe, we can change the param name to reflect this. Something like previous instead of start? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264184) Time Spent: 1h 10m (was: 1h) > Recon: Add support for "start" query param to containers and containers/{id} > endpoints > -- > > Key: HDDS-1685 > URL: https://issues.apache.org/jira/browse/HDDS-1685 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Affects Versions: 0.4.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > * Support "start" query param to seek to the given key in RocksDB. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14588) Client retries Standby NN continuously even if Active NN is available (WebHDFS)
[ https://issues.apache.org/jira/browse/HDFS-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868973#comment-16868973 ] Íñigo Goiri commented on HDFS-14588: I'm guessing the solution is to throw the standby exception and that's it? I would expect this to happen already. Can you put a unit test showing this behavior? In the last couple months we had an issue with active/standby with WebHDFS; it might be worth mentioning. The client connects to the NN asking to write a file say (reading should be pretty straightforward). The NN replies the address of a DN with a parameter called "namenoderpcaddress" (this is the tricky one). When the DN receives the write request it creates a regular RPC client (DFSClient to be specific) which connects with the NN again and does the write. The issue we had in the past is the namenoderpcaddress being the address of the active NN. When the NN failed over to some other NN, the DN couldn't find the NN to complete, etc. Bottomline, for active/standby the namenoderpcaddress can be a source of issues. Not sure is the same, but worth bringing it up. > Client retries Standby NN continuously even if Active NN is available > (WebHDFS) > --- > > Key: HDFS-14588 > URL: https://issues.apache.org/jira/browse/HDFS-14588 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Priority: Major > > This is a behavior we have observed in our HA setup of HDFS. > # Active NN is up and serving traffic. > # Stand By NN is restarted for maintenance. > # After step 2 all new clients (webhdfs only) which connect to Stand By keep > seeing Retriable Exception as Stand By NN is not yet started (Rpc server is > yet to come up as FS image is loading) but http server is started and ready > to accept traffic. This keeps happening till rpcserver is up and SNN knows > that it's truely standby. Based on start up time this behavior can continue > based on start-up times which is high (many minutes) for big clusters. > This above behavior is causing low availability of HDFS when HDFS is actually > still available. > Ideally webhdfs should throw standby exception (if HA is enabled) and let > clients connect to active following that. If active is also not available > clients will bounce and automatically connect to the right active. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14588) Client retries Standby NN continuously even if Active NN is available (WebHDFS)
[ https://issues.apache.org/jira/browse/HDFS-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868970#comment-16868970 ] CR Hota commented on HDFS-14588: [~xkrogen] Thanks for the review. Yes to throw StandbyException but ONLY if HA is enabled. > Client retries Standby NN continuously even if Active NN is available > (WebHDFS) > --- > > Key: HDFS-14588 > URL: https://issues.apache.org/jira/browse/HDFS-14588 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Priority: Major > > This is a behavior we have observed in our HA setup of HDFS. > # Active NN is up and serving traffic. > # Stand By NN is restarted for maintenance. > # After step 2 all new clients (webhdfs only) which connect to Stand By keep > seeing Retriable Exception as Stand By NN is not yet started (Rpc server is > yet to come up as FS image is loading) but http server is started and ready > to accept traffic. This keeps happening till rpcserver is up and SNN knows > that it's truely standby. Based on start up time this behavior can continue > based on start-up times which is high (many minutes) for big clusters. > This above behavior is causing low availability of HDFS when HDFS is actually > still available. > Ideally webhdfs should throw standby exception (if HA is enabled) and let > clients connect to active following that. If active is also not available > clients will bounce and automatically connect to the right active. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11021) Add FSNamesystemLock metrics for BlockManager operations
[ https://issues.apache.org/jira/browse/HDFS-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868969#comment-16868969 ] Konstantin Shvachko commented on HDFS-11021: I think separating IBR and FBR lock metrics out of OTHER will be valuable for monitoring cluster health. This are internal (not client-facing) operations, so increase in latency can be treated as an alert that something is going wrong on the cluster. > Add FSNamesystemLock metrics for BlockManager operations > > > Key: HDFS-11021 > URL: https://issues.apache.org/jira/browse/HDFS-11021 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > Right now the operations which the {{BlockManager}} issues to the > {{Namesystem}} will not emit metrics about which operation caused the > {{FSNamesystemLock}} to be held; they are all grouped under "OTHER". We > should fix this since the {{BlockManager}} creates many acquisitions of both > the read and write locks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11021) Add FSNamesystemLock metrics for BlockManager operations
[ https://issues.apache.org/jira/browse/HDFS-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868969#comment-16868969 ] Konstantin Shvachko edited comment on HDFS-11021 at 6/20/19 10:09 PM: -- I think separating IBR and FBR lock metrics out of OTHER will be valuable for monitoring cluster health. These are internal (not client-facing) operations, so increase in latency can be treated as an alert that something is going wrong on the cluster. was (Author: shv): I think separating IBR and FBR lock metrics out of OTHER will be valuable for monitoring cluster health. This are internal (not client-facing) operations, so increase in latency can be treated as an alert that something is going wrong on the cluster. > Add FSNamesystemLock metrics for BlockManager operations > > > Key: HDFS-11021 > URL: https://issues.apache.org/jira/browse/HDFS-11021 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > Right now the operations which the {{BlockManager}} issues to the > {{Namesystem}} will not emit metrics about which operation caused the > {{FSNamesystemLock}} to be held; they are all grouped under "OTHER". We > should fix this since the {{BlockManager}} creates many acquisitions of both > the read and write locks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14588) Client retries Standby NN continuously even if Active NN is available (WebHDFS)
[ https://issues.apache.org/jira/browse/HDFS-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868968#comment-16868968 ] Erik Krogen commented on HDFS-14588: Seems like bad behavior. Am I correct in saying that your proposed fix is to have WebHDFS throw a {{StandbyException}} when the FSImage is in a loading state? > Client retries Standby NN continuously even if Active NN is available > (WebHDFS) > --- > > Key: HDFS-14588 > URL: https://issues.apache.org/jira/browse/HDFS-14588 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Priority: Major > > This is a behavior we have observed in our HA setup of HDFS. > # Active NN is up and serving traffic. > # Stand By NN is restarted for maintenance. > # After step 2 all new clients (webhdfs only) which connect to Stand By keep > seeing Retriable Exception as Stand By NN is not yet started (Rpc server is > yet to come up as FS image is loading) but http server is started and ready > to accept traffic. This keeps happening till rpcserver is up and SNN knows > that it's truely standby. Based on start up time this behavior can continue > based on start-up times which is high (many minutes) for big clusters. > This above behavior is causing low availability of HDFS when HDFS is actually > still available. > Ideally webhdfs should throw standby exception (if HA is enabled) and let > clients connect to active following that. If active is also not available > clients will bounce and automatically connect to the right active. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14588) Client retries Standby NN continuously even if Active NN is available (WebHDFS)
[ https://issues.apache.org/jira/browse/HDFS-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868966#comment-16868966 ] CR Hota commented on HDFS-14588: [~xkrogen] [~elgoiri] [~jojochuang] Thoughts on this ? > Client retries Standby NN continuously even if Active NN is available > (WebHDFS) > --- > > Key: HDFS-14588 > URL: https://issues.apache.org/jira/browse/HDFS-14588 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Priority: Major > > This is a behavior we have observed in our HA setup of HDFS. > # Active NN is up and serving traffic. > # Stand By NN is restarted for maintenance. > # After step 2 all new clients (webhdfs only) which connect to Stand By keep > seeing Retriable Exception as Stand By NN is not yet started (Rpc server is > yet to come up as FS image is loading) but http server is started and ready > to accept traffic. This keeps happening till rpcserver is up and SNN knows > that it's truely standby. Based on start up time this behavior can continue > based on start-up times which is high (many minutes) for big clusters. > This above behavior is causing low availability of HDFS when HDFS is actually > still available. > Ideally webhdfs should throw standby exception (if HA is enabled) and let > clients connect to active following that. If active is also not available > clients will bounce and automatically connect to the right active. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14588) Client retries Standby NN continuously even if Active NN is available (WebHDFS)
CR Hota created HDFS-14588: -- Summary: Client retries Standby NN continuously even if Active NN is available (WebHDFS) Key: HDFS-14588 URL: https://issues.apache.org/jira/browse/HDFS-14588 Project: Hadoop HDFS Issue Type: Bug Reporter: CR Hota This is a behavior we have observed in our HA setup of HDFS. # Active NN is up and serving traffic. # Stand By NN is restarted for maintenance. # After step 2 all new clients (webhdfs only) which connect to Stand By keep seeing Retriable Exception as Stand By NN is not yet started (Rpc server is yet to come up as FS image is loading) but http server is started and ready to accept traffic. This keeps happening till rpcserver is up and SNN knows that it's truely standby. Based on start up time this behavior can continue based on start-up times which is high (many minutes) for big clusters. This above behavior is causing low availability of HDFS when HDFS is actually still available. Ideally webhdfs should throw standby exception (if HA is enabled) and let clients connect to active following that. If active is also not available clients will bounce and automatically connect to the right active. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868956#comment-16868956 ] Wei-Chiu Chuang commented on HDFS-14403: I would really love to review this one, looks interesting. But CDH didn't support fair call queue historically, so I am afraid I may not be able to offer the best opinions. That said, if Chao +1 it, I am willing to rubber-stamp the commit :) > Cost-Based RPC FairCallQueue > > > Key: HDFS-14403 > URL: https://issues.apache.org/jira/browse/HDFS-14403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc, namenode >Reporter: Erik Krogen >Assignee: Christopher Gregorian >Priority: Major > Labels: qos, rpc > Attachments: CostBasedFairCallQueueDesign_v0.pdf, > HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, > HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, > HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, > HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, > HDFS-14403.branch-2.8.patch > > > HADOOP-15016 initially described extensions to the Hadoop FairCallQueue > encompassing both cost-based analysis of incoming RPCs, as well as support > for reservations of RPC capacity for system/platform users. This JIRA intends > to track the former, as HADOOP-15016 was repurposed to more specifically > focus on the reservation portion of the work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1700) RPC Payload too large on datanode startup in kubernetes
[ https://issues.apache.org/jira/browse/HDDS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868951#comment-16868951 ] Salvatore LaMendola commented on HDDS-1700: --- Will do. I'll join the mailing lists now and get ready to send my non-binding +1 when needed. :) > RPC Payload too large on datanode startup in kubernetes > --- > > Key: HDDS-1700 > URL: https://issues.apache.org/jira/browse/HDDS-1700 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: docker, Ozone Datanode, SCM >Affects Versions: 0.4.0 > Environment: datanode pod's ozone-site.xml > {code:java} > > ozone.scm.block.client.addressozone-managers-service:9876 > ozone.enabledTrue > ozone.scm.datanode.id/tmp/datanode.id > ozone.scm.client.addressozone-managers-service:9876 > ozone.metadata.dirs/tmp/metadata > ozone.scm.namesozone-managers-service:9876 > ozone.om.addressozone-managers-service:9874 > ozone.handler.typedistributed > ozone.scm.datanode.addressozone-managers-service:9876 > > {code} > OM/SCM pod's ozone-site.xml > {code:java} > > ozone.scm.block.client.addresslocalhost > ozone.enabledTrue > ozone.scm.datanode.id/tmp/datanode.id > ozone.scm.client.addresslocalhost > ozone.metadata.dirs/tmp/metadata > ozone.scm.nameslocalhost > ozone.om.addresslocalhost > ozone.handler.typedistributed > ozone.scm.datanode.addresslocalhost > > {code} > > >Reporter: Josh Siegel >Priority: Minor > > When starting the datanode on a seperate kubernetes pod than the SCM and OM, > the below error appears in the datanode's {{ozone.log}}. We verified basic > connectivity between the datanode pod and the OM/SCM pod. > {code:java} > 2019-06-17 17:14:16,449 [Datanode State Machine Thread - 0] ERROR > (EndpointStateMachine.java:207) - Unable to communicate to SCM server at > ozone-managers-service:9876 for past 31800 seconds. > java.io.IOException: Failed on local exception: > org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; > Host Details : local host is: "ozone-datanode/10.244.84.187"; destination > host is: "ozone-managers-service":9876; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy88.getVersion(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum > data length > at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1830) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1173) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1069){code} > > cc [~slamendola2_bloomberg] > [~anu] > [~elek] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1700) RPC Payload too large on datanode startup in kubernetes
[ https://issues.apache.org/jira/browse/HDDS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868948#comment-16868948 ] Anu Engineer commented on HDDS-1700: Thank you for root causing this issue. Appreciate the effort. It would be nice if you can vote when the 0.4.1 release comes up since you would have already tested the yet-to-release the k8s packages. > RPC Payload too large on datanode startup in kubernetes > --- > > Key: HDDS-1700 > URL: https://issues.apache.org/jira/browse/HDDS-1700 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: docker, Ozone Datanode, SCM >Affects Versions: 0.4.0 > Environment: datanode pod's ozone-site.xml > {code:java} > > ozone.scm.block.client.addressozone-managers-service:9876 > ozone.enabledTrue > ozone.scm.datanode.id/tmp/datanode.id > ozone.scm.client.addressozone-managers-service:9876 > ozone.metadata.dirs/tmp/metadata > ozone.scm.namesozone-managers-service:9876 > ozone.om.addressozone-managers-service:9874 > ozone.handler.typedistributed > ozone.scm.datanode.addressozone-managers-service:9876 > > {code} > OM/SCM pod's ozone-site.xml > {code:java} > > ozone.scm.block.client.addresslocalhost > ozone.enabledTrue > ozone.scm.datanode.id/tmp/datanode.id > ozone.scm.client.addresslocalhost > ozone.metadata.dirs/tmp/metadata > ozone.scm.nameslocalhost > ozone.om.addresslocalhost > ozone.handler.typedistributed > ozone.scm.datanode.addresslocalhost > > {code} > > >Reporter: Josh Siegel >Priority: Minor > > When starting the datanode on a seperate kubernetes pod than the SCM and OM, > the below error appears in the datanode's {{ozone.log}}. We verified basic > connectivity between the datanode pod and the OM/SCM pod. > {code:java} > 2019-06-17 17:14:16,449 [Datanode State Machine Thread - 0] ERROR > (EndpointStateMachine.java:207) - Unable to communicate to SCM server at > ozone-managers-service:9876 for past 31800 seconds. > java.io.IOException: Failed on local exception: > org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; > Host Details : local host is: "ozone-datanode/10.244.84.187"; destination > host is: "ozone-managers-service":9876; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy88.getVersion(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum > data length > at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1830) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1173) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1069){code} > > cc [~slamendola2_bloomberg] > [~anu] > [~elek] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1700) RPC Payload too large on datanode startup in kubernetes
[ https://issues.apache.org/jira/browse/HDDS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer resolved HDDS-1700. Resolution: Cannot Reproduce > RPC Payload too large on datanode startup in kubernetes > --- > > Key: HDDS-1700 > URL: https://issues.apache.org/jira/browse/HDDS-1700 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: docker, Ozone Datanode, SCM >Affects Versions: 0.4.0 > Environment: datanode pod's ozone-site.xml > {code:java} > > ozone.scm.block.client.addressozone-managers-service:9876 > ozone.enabledTrue > ozone.scm.datanode.id/tmp/datanode.id > ozone.scm.client.addressozone-managers-service:9876 > ozone.metadata.dirs/tmp/metadata > ozone.scm.namesozone-managers-service:9876 > ozone.om.addressozone-managers-service:9874 > ozone.handler.typedistributed > ozone.scm.datanode.addressozone-managers-service:9876 > > {code} > OM/SCM pod's ozone-site.xml > {code:java} > > ozone.scm.block.client.addresslocalhost > ozone.enabledTrue > ozone.scm.datanode.id/tmp/datanode.id > ozone.scm.client.addresslocalhost > ozone.metadata.dirs/tmp/metadata > ozone.scm.nameslocalhost > ozone.om.addresslocalhost > ozone.handler.typedistributed > ozone.scm.datanode.addresslocalhost > > {code} > > >Reporter: Josh Siegel >Priority: Minor > > When starting the datanode on a seperate kubernetes pod than the SCM and OM, > the below error appears in the datanode's {{ozone.log}}. We verified basic > connectivity between the datanode pod and the OM/SCM pod. > {code:java} > 2019-06-17 17:14:16,449 [Datanode State Machine Thread - 0] ERROR > (EndpointStateMachine.java:207) - Unable to communicate to SCM server at > ozone-managers-service:9876 for past 31800 seconds. > java.io.IOException: Failed on local exception: > org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; > Host Details : local host is: "ozone-datanode/10.244.84.187"; destination > host is: "ozone-managers-service":9876; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy88.getVersion(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum > data length > at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1830) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1173) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1069){code} > > cc [~slamendola2_bloomberg] > [~anu] > [~elek] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1495) Create hadoop/ozone docker images with inline build process
[ https://issues.apache.org/jira/browse/HDDS-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868943#comment-16868943 ] Eric Yang commented on HDDS-1495: - {quote}Care to explain why my core build path is slower with this patch? I am telling you the command that I use regularly to build, and my concern is really for the commands that I use.{quote} The current trunk takes a short cut that it keep binary tarball packaging a secondary step by invoking -Pdist profile. I think calling "mvn package", and not creating the package is a bit misleading, patch 005 was created prior to trunk made tarball creation optional. Patch 005 kept the tarball packaging inline of mvn package. The extra time was spent on making the tarball. It is possible to change the tarball creation to dist profile, and it would result in the same time spent. > Create hadoop/ozone docker images with inline build process > --- > > Key: HDDS-1495 > URL: https://issues.apache.org/jira/browse/HDDS-1495 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Eric Yang >Priority: Major > Attachments: HADOOP-16091.001.patch, HADOOP-16091.002.patch, > HDDS-1495.003.patch, HDDS-1495.004.patch, HDDS-1495.005.patch, > HDDS-1495.006.patch, HDDS-1495.007.patch, HDDS-1495.008.patch, Hadoop Docker > Image inline build process.pdf > > > This is proposed by [~eyang] in > [this|https://lists.apache.org/thread.html/33ac54bdeacb4beb023ebd452464603aaffa095bd104cb43c22f484e@%3Chdfs-dev.hadoop.apache.org%3E] > mailing thread. > {quote}1, 3. There are 38 Apache projects hosting docker images on Docker hub > using Apache Organization. By browsing Apache github mirror. There are only 7 > projects using a separate repository for docker image build. Popular projects > official images are not from Apache organization, such as zookeeper, tomcat, > httpd. We may not disrupt what other Apache projects are doing, but it looks > like inline build process is widely employed by majority of projects such as > Nifi, Brooklyn, thrift, karaf, syncope and others. The situation seems a bit > chaotic for Apache as a whole. However, Hadoop community can decide what is > best for Hadoop. My preference is to remove ozone from source tree naming, if > Ozone is intended to be subproject of Hadoop for long period of time. This > enables Hadoop community to host docker images for various subproject without > having to check out several source tree to trigger a grand build. However, > inline build process seems more popular than separated process. Hence, I > highly recommend making docker build inline if possible. > {quote} > The main challenges are also discussed in the thread: > {code:java} > 3. Technically it would be possible to add the Dockerfile to the source > tree and publish the docker image together with the release by the > release manager but it's also problematic: > {code} > a) there is no easy way to stage the images for the vote > c) it couldn't be flagged as automated on dockerhub > d) It couldn't support the critical updates. > * Updating existing images (for example in case of an ssl bug, rebuild > all the existing images with exactly the same payload but updated base > image/os environment) > * Creating image for older releases (We would like to provide images, > for hadoop 2.6/2.7/2.7/2.8/2.9. Especially for doing automatic testing > with different versions). > {code:java} > {code} > The a) can be solved (as [~eyang] suggested) with using a personal docker > image during the vote and publish it to the dockerhub after the vote (in case > the permission can be set by the INFRA) > Note: based on LEGAL-270 and linked discussion both approaches (inline build > process / external build process) are compatible with the apache release. > Note: HDDS-851 and HADOOP-14898 contains more information about these > problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868939#comment-16868939 ] Erik Krogen commented on HDFS-14403: Hey [~elgoiri], interesting question... I haven't heard of any such work and we don't have any plans for that from our side. I don't think we've experienced issues with that, or at least, if we have we haven't noticed. > Cost-Based RPC FairCallQueue > > > Key: HDFS-14403 > URL: https://issues.apache.org/jira/browse/HDFS-14403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc, namenode >Reporter: Erik Krogen >Assignee: Christopher Gregorian >Priority: Major > Labels: qos, rpc > Attachments: CostBasedFairCallQueueDesign_v0.pdf, > HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, > HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, > HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, > HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, > HDFS-14403.branch-2.8.patch > > > HADOOP-15016 initially described extensions to the Hadoop FairCallQueue > encompassing both cost-based analysis of incoming RPCs, as well as support > for reservations of RPC capacity for system/platform users. This JIRA intends > to track the former, as HADOOP-15016 was repurposed to more specifically > focus on the reservation portion of the work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868925#comment-16868925 ] Íñigo Goiri commented on HDFS-14403: Are you guys aware of any work to do something similar for the Datanodes? We used to be pretty bad with the Namenodes but with the fair queue and a couple improvements, we are in pretty good shape there now. However, now we have a couple users triggering thousands of reads from a single block and they overload the DNs. Is there any effort on doing some fairness for the xceivers in a DN? The architecture of the xceivers is not as clean as the regular Hadoop RPC server used by the NN. > Cost-Based RPC FairCallQueue > > > Key: HDFS-14403 > URL: https://issues.apache.org/jira/browse/HDFS-14403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc, namenode >Reporter: Erik Krogen >Assignee: Christopher Gregorian >Priority: Major > Labels: qos, rpc > Attachments: CostBasedFairCallQueueDesign_v0.pdf, > HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, > HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, > HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, > HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, > HDFS-14403.branch-2.8.patch > > > HADOOP-15016 initially described extensions to the Hadoop FairCallQueue > encompassing both cost-based analysis of incoming RPCs, as well as support > for reservations of RPC capacity for system/platform users. This JIRA intends > to track the former, as HADOOP-15016 was repurposed to more specifically > focus on the reservation portion of the work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1554) Create disk tests for fault injection test
[ https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868908#comment-16868908 ] Hadoop QA commented on HDDS-1554: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 0s{color} | {color:blue} Shelldocs was not available. {color} | | {color:blue}0{color} | {color:blue} yamllint {color} | {color:blue} 0m 0s{color} | {color:blue} yamllint was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 30 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 52s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 5m 23s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} hadolint {color} | {color:red} 0m 2s{color} | {color:red} The patch generated 2 new + 4 unchanged - 0 fixed = 6 total (was 4) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 13s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 9m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 28s{color} | {color:green} hadoop-hdds in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 18s{color} | {color:red} hadoop-ozone in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 53s{color} |
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264049 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 19:12 Start Date: 20/Jun/19 19:12 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295956276 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java ## @@ -154,79 +178,137 @@ public void releaseVolumeLock(String volume) { } /** - * Acquires S3 Bucket lock on the given resource. + * Acquires bucket lock on the given resource. * * If the lock is not available then the current thread becomes - * disabled for thread scheduling purposes and lies dormant until the lock has - * been acquired. + * disabled for thread scheduling purposes and lies dormant until the + * lock has been acquired. * - * @param s3BucketName S3Bucket Name on which the lock has to be acquired + * @param bucket Bucket on which the lock has to be acquired */ - public void acquireS3Lock(String s3BucketName) { -// Calling thread should not hold any bucket lock. -// You can take an Volume while holding S3 bucket lock, since -// semantically an S3 bucket maps to the ozone volume. So we check here -// only if ozone bucket lock is taken. -if (hasAnyBucketLock()) { + public void acquireBucketLock(String volume, String bucket) { +if (hasAnyUserLock()) { throw new RuntimeException( "Thread '" + Thread.currentThread().getName() + - "' cannot acquire S3 bucket lock while holding Ozone bucket " + - "lock(s)."); + "' cannot acquire bucket lock while holding User lock."); } -manager.lock(OM_S3_PREFIX + s3BucketName); -myLocks.get().get(S3_BUCKET_LOCK).incrementAndGet(); +manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); +myLocks.get().get(BUCKET_LOCK).incrementAndGet(); } /** - * Releases the volume lock on given resource. + * Releases the bucket lock on given resource. */ - public void releaseS3Lock(String s3BucketName) { -manager.unlock(OM_S3_PREFIX + s3BucketName); -myLocks.get().get(S3_BUCKET_LOCK).decrementAndGet(); + public void releaseBucketLock(String volume, String bucket) { +manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); +myLocks.get().get(BUCKET_LOCK).decrementAndGet(); } /** - * Acquires bucket lock on the given resource. + * Acquires user lock on the given resource. * * If the lock is not available then the current thread becomes * disabled for thread scheduling purposes and lies dormant until the * lock has been acquired. * - * @param bucket Bucket on which the lock has to be acquired + * @param user User on which the lock has to be acquired */ - public void acquireBucketLock(String volume, String bucket) { -manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); -myLocks.get().get(BUCKET_LOCK).incrementAndGet(); + public void acquireUserLock(String user) { +// In order to not maintain username's on which we have acquired lock, +// just checking have we acquired userLock before. If user want's to +// acquire user lock on multiple user's they should use +// acquireMultiUserLock. This is just a protection logic, to let not users +// use this if acquiring lock on multiple users. As currently, we have only +// use case we have for this is during setOwner operation in VolumeManager. +if (hasAnyUserLock()) { + LOG.error("Already have userLock"); + throw new RuntimeException("For acquiring lock on multiple users, use " + + "acquireMultiLock method"); +} +manager.lock(OM_USER_PREFIX + user); +myLocks.get().get(USER_LOCK).incrementAndGet(); } /** - * Releases the bucket lock on given resource. + * Releases the user lock on given resource. */ - public void releaseBucketLock(String volume, String bucket) { -manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); -myLocks.get().get(BUCKET_LOCK).decrementAndGet(); + public void releaseUserLock(String user) { +manager.unlock(OM_USER_PREFIX + user); +myLocks.get().get(USER_LOCK).decrementAndGet(); } /** - * Returns true if the current thread holds any volume lock. - * @return true if current thread holds volume lock, else false + * Acquire user lock on 2 users. In this case, we compare 2 strings + * lexicographically, and acquire the locks according to the sorted order of + * the user names. In this way, when acquiring locks on multiple user's, we + * can avoid dead locks. This
[jira] [Commented] (HDDS-1700) RPC Payload too large on datanode startup in kubernetes
[ https://issues.apache.org/jira/browse/HDDS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868859#comment-16868859 ] Salvatore LaMendola commented on HDDS-1700: --- We will need to follow up with this later, as we've determined our configuration was the cause. After rebuilding using {{0.5.0-SNAPSHOT}} and [~elek]'s configuration files with very minor modifications, the issue no longer persists, though we _can_ still reproduce it on that version using our own deployment configurations, which we'll look into further at a later date. > RPC Payload too large on datanode startup in kubernetes > --- > > Key: HDDS-1700 > URL: https://issues.apache.org/jira/browse/HDDS-1700 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: docker, Ozone Datanode, SCM >Affects Versions: 0.4.0 > Environment: datanode pod's ozone-site.xml > {code:java} > > ozone.scm.block.client.addressozone-managers-service:9876 > ozone.enabledTrue > ozone.scm.datanode.id/tmp/datanode.id > ozone.scm.client.addressozone-managers-service:9876 > ozone.metadata.dirs/tmp/metadata > ozone.scm.namesozone-managers-service:9876 > ozone.om.addressozone-managers-service:9874 > ozone.handler.typedistributed > ozone.scm.datanode.addressozone-managers-service:9876 > > {code} > OM/SCM pod's ozone-site.xml > {code:java} > > ozone.scm.block.client.addresslocalhost > ozone.enabledTrue > ozone.scm.datanode.id/tmp/datanode.id > ozone.scm.client.addresslocalhost > ozone.metadata.dirs/tmp/metadata > ozone.scm.nameslocalhost > ozone.om.addresslocalhost > ozone.handler.typedistributed > ozone.scm.datanode.addresslocalhost > > {code} > > >Reporter: Josh Siegel >Priority: Minor > > When starting the datanode on a seperate kubernetes pod than the SCM and OM, > the below error appears in the datanode's {{ozone.log}}. We verified basic > connectivity between the datanode pod and the OM/SCM pod. > {code:java} > 2019-06-17 17:14:16,449 [Datanode State Machine Thread - 0] ERROR > (EndpointStateMachine.java:207) - Unable to communicate to SCM server at > ozone-managers-service:9876 for past 31800 seconds. > java.io.IOException: Failed on local exception: > org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; > Host Details : local host is: "ozone-datanode/10.244.84.187"; destination > host is: "ozone-managers-service":9876; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy88.getVersion(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum > data length > at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1830) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1173) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1069){code} > > cc [~slamendola2_bloomberg] > [~anu] > [~elek] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264046 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 19:09 Start Date: 20/Jun/19 19:09 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295955274 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java ## @@ -132,14 +157,13 @@ public void releaseUserLock(String user) { * @param volume Volume on which the lock has to be acquired */ public void acquireVolumeLock(String volume) { -// Calling thread should not hold any bucket lock. +// Calling thread should not hold any bucket/user lock. // You can take an Volume while holding S3 bucket lock, since -// semantically an S3 bucket maps to the ozone volume. So we check here -// only if ozone bucket lock is taken. -if (hasAnyBucketLock()) { +// semantically an S3 bucket maps to the ozone volume. +if (hasAnyBucketLock() || hasAnyUserLock()) { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264046) Time Spent: 7h (was: 6h 50m) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 7h > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > # release volume lock. > Same issue is seen with SetOwner for Volume request also. > During HDDS-1620 [~arp] brought up this issue. > I am proposing the above solution to solve this issue. Any other > idea/suggestions are welcome. > This also resolves a bug in setOwner for Volume request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264043 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 19:08 Start Date: 20/Jun/19 19:08 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295955075 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java ## @@ -96,30 +112,39 @@ public OzoneManagerLock(Configuration conf) { } /** - * Acquires user lock on the given resource. + * Acquires S3 Bucket lock on the given resource. * * If the lock is not available then the current thread becomes - * disabled for thread scheduling purposes and lies dormant until the - * lock has been acquired. + * disabled for thread scheduling purposes and lies dormant until the lock has + * been acquired. * - * @param user User on which the lock has to be acquired + * @param s3BucketName S3Bucket Name on which the lock has to be acquired */ - public void acquireUserLock(String user) { -// Calling thread should not hold any volume or bucket lock. -if (hasAnyVolumeLock() || hasAnyBucketLock() || hasAnyS3Lock()) { + public void acquireS3BucketLock(String s3BucketName) { +// Calling thread should not hold any volume/bucket/user lock. + +// Not added checks for prefix/s3 secret lock, as they will never be +// taken with s3Bucket Lock. In this way, we can avoid 2 checks every +// time we acquire s3Bucket lock. + +// Or do we need to add this for future safe? + +if (hasAnyVolumeLock() || hasAnyBucketLock() || hasAnyUserLock()) { throw new RuntimeException( "Thread '" + Thread.currentThread().getName() + - "' cannot acquire user lock" + - " while holding volume, bucket or S3 bucket lock(s)."); + "' cannot acquire S3 bucket lock while holding Ozone " + + "Volume/Bucket/User lock(s)."); } -manager.lock(OM_USER_PREFIX + user); +manager.lock(OM_S3_PREFIX + s3BucketName); Review comment: This will be done in a new jira. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264043) Time Spent: 6h 50m (was: 6h 40m) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 6h 50m > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > # release volume lock. > Same issue is seen with SetOwner for Volume request also. > During HDDS-1620 [~arp] brought up this issue. > I am proposing the above solution to solve this issue. Any other > idea/suggestions are welcome. > This also resolves a bug in setOwner for Volume request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264041 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 19:08 Start Date: 20/Jun/19 19:08 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295954991 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java ## @@ -96,30 +112,39 @@ public OzoneManagerLock(Configuration conf) { } /** - * Acquires user lock on the given resource. + * Acquires S3 Bucket lock on the given resource. * * If the lock is not available then the current thread becomes - * disabled for thread scheduling purposes and lies dormant until the - * lock has been acquired. + * disabled for thread scheduling purposes and lies dormant until the lock has + * been acquired. * - * @param user User on which the lock has to be acquired + * @param s3BucketName S3Bucket Name on which the lock has to be acquired */ - public void acquireUserLock(String user) { -// Calling thread should not hold any volume or bucket lock. -if (hasAnyVolumeLock() || hasAnyBucketLock() || hasAnyS3Lock()) { + public void acquireS3BucketLock(String s3BucketName) { +// Calling thread should not hold any volume/bucket/user lock. + +// Not added checks for prefix/s3 secret lock, as they will never be +// taken with s3Bucket Lock. In this way, we can avoid 2 checks every +// time we acquire s3Bucket lock. + +// Or do we need to add this for future safe? + +if (hasAnyVolumeLock() || hasAnyBucketLock() || hasAnyUserLock()) { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264041) Time Spent: 6h 40m (was: 6.5h) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 6h 40m > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > # release volume lock. > Same issue is seen with SetOwner for Volume request also. > During HDDS-1620 [~arp] brought up this issue. > I am proposing the above solution to solve this issue. Any other > idea/suggestions are welcome. > This also resolves a bug in setOwner for Volume request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264040 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 19:08 Start Date: 20/Jun/19 19:08 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295954759 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java ## @@ -59,32 +68,39 @@ * * {@literal ->} acquireVolumeLock (will work) * {@literal +->} acquireBucketLock (will work) - * {@literal +-->} acquireUserLock (will throw Exception) + * {@literal +-->} acquireS3BucketLock (will throw Exception) * * - * To acquire a user lock you should not hold any Volume/Bucket lock. Similarly - * to acquire a Volume lock you should not hold any Bucket lock. + * To acquire a S3 lock you should not hold any Volume/Bucket lock. Similarly + * to acquire a Volume lock you should not hold any Bucket/User/S3 + * Secret/Prefix lock. */ public final class OzoneManagerLock { + private static final Logger LOG = + LoggerFactory.getLogger(OzoneManagerLock.class); + + private static final String S3_BUCKET_LOCK = "s3BucketLock"; private static final String VOLUME_LOCK = "volumeLock"; private static final String BUCKET_LOCK = "bucketLock"; - private static final String PREFIX_LOCK = "prefixLock"; - private static final String S3_BUCKET_LOCK = "s3BucketLock"; + private static final String USER_LOCK = "userLock"; private static final String S3_SECRET_LOCK = "s3SecretetLock"; + private static final String PREFIX_LOCK = "prefixLock"; + private final LockManager manager; // To maintain locks held by current thread. private final ThreadLocal> myLocks = ThreadLocal.withInitial( - () -> ImmutableMap.of( - VOLUME_LOCK, new AtomicInteger(0), - BUCKET_LOCK, new AtomicInteger(0), - PREFIX_LOCK, new AtomicInteger(0), - S3_BUCKET_LOCK, new AtomicInteger(0), - S3_SECRET_LOCK, new AtomicInteger(0) - ) + () -> ImmutableMap.builder() + .put(S3_BUCKET_LOCK, new AtomicInteger(0)) + .put(VOLUME_LOCK, new AtomicInteger(0)) + .put(BUCKET_LOCK, new AtomicInteger(0)) + .put(USER_LOCK, new AtomicInteger(0)) + .put(S3_SECRET_LOCK, new AtomicInteger(0)) + .put(PREFIX_LOCK, new AtomicInteger(0)) + .build() Review comment: This will be taken up in a new jira. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264040) Time Spent: 6.5h (was: 6h 20m) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 6.5h > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > #
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264029 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 19:05 Start Date: 20/Jun/19 19:05 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295953686 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java ## @@ -154,79 +178,137 @@ public void releaseVolumeLock(String volume) { } /** - * Acquires S3 Bucket lock on the given resource. + * Acquires bucket lock on the given resource. * * If the lock is not available then the current thread becomes - * disabled for thread scheduling purposes and lies dormant until the lock has - * been acquired. + * disabled for thread scheduling purposes and lies dormant until the + * lock has been acquired. * - * @param s3BucketName S3Bucket Name on which the lock has to be acquired + * @param bucket Bucket on which the lock has to be acquired */ - public void acquireS3Lock(String s3BucketName) { -// Calling thread should not hold any bucket lock. -// You can take an Volume while holding S3 bucket lock, since -// semantically an S3 bucket maps to the ozone volume. So we check here -// only if ozone bucket lock is taken. -if (hasAnyBucketLock()) { + public void acquireBucketLock(String volume, String bucket) { +if (hasAnyUserLock()) { throw new RuntimeException( "Thread '" + Thread.currentThread().getName() + - "' cannot acquire S3 bucket lock while holding Ozone bucket " + - "lock(s)."); + "' cannot acquire bucket lock while holding User lock."); } -manager.lock(OM_S3_PREFIX + s3BucketName); -myLocks.get().get(S3_BUCKET_LOCK).incrementAndGet(); +manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); +myLocks.get().get(BUCKET_LOCK).incrementAndGet(); } /** - * Releases the volume lock on given resource. + * Releases the bucket lock on given resource. */ - public void releaseS3Lock(String s3BucketName) { -manager.unlock(OM_S3_PREFIX + s3BucketName); -myLocks.get().get(S3_BUCKET_LOCK).decrementAndGet(); + public void releaseBucketLock(String volume, String bucket) { +manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); +myLocks.get().get(BUCKET_LOCK).decrementAndGet(); } /** - * Acquires bucket lock on the given resource. + * Acquires user lock on the given resource. * * If the lock is not available then the current thread becomes * disabled for thread scheduling purposes and lies dormant until the * lock has been acquired. * - * @param bucket Bucket on which the lock has to be acquired + * @param user User on which the lock has to be acquired */ - public void acquireBucketLock(String volume, String bucket) { -manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); -myLocks.get().get(BUCKET_LOCK).incrementAndGet(); + public void acquireUserLock(String user) { +// In order to not maintain username's on which we have acquired lock, +// just checking have we acquired userLock before. If user want's to +// acquire user lock on multiple user's they should use +// acquireMultiUserLock. This is just a protection logic, to let not users +// use this if acquiring lock on multiple users. As currently, we have only +// use case we have for this is during setOwner operation in VolumeManager. +if (hasAnyUserLock()) { + LOG.error("Already have userLock"); + throw new RuntimeException("For acquiring lock on multiple users, use " + + "acquireMultiLock method"); +} +manager.lock(OM_USER_PREFIX + user); +myLocks.get().get(USER_LOCK).incrementAndGet(); } /** - * Releases the bucket lock on given resource. + * Releases the user lock on given resource. */ - public void releaseBucketLock(String volume, String bucket) { -manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); -myLocks.get().get(BUCKET_LOCK).decrementAndGet(); + public void releaseUserLock(String user) { +manager.unlock(OM_USER_PREFIX + user); +myLocks.get().get(USER_LOCK).decrementAndGet(); } /** - * Returns true if the current thread holds any volume lock. - * @return true if current thread holds volume lock, else false + * Acquire user lock on 2 users. In this case, we compare 2 strings + * lexicographically, and acquire the locks according to the sorted order of + * the user names. In this way, when acquiring locks on multiple user's, we + * can avoid dead locks. This
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264025 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 19:02 Start Date: 20/Jun/19 19:02 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295952709 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java ## @@ -258,12 +348,61 @@ public void acquirePrefixLock(String prefixPath) { myLocks.get().get(PREFIX_LOCK).incrementAndGet(); } - private boolean hasAnyPrefixLock() { -return myLocks.get().get(PREFIX_LOCK).get() != 0; - } - + /** + * Releases the prefix lock on given resource. + */ public void releasePrefixLock(String prefixPath) { manager.unlock(prefixPath); myLocks.get().get(PREFIX_LOCK).decrementAndGet(); } + + /** + * Returns true if the current thread holds any volume lock. + * @return true if current thread holds volume lock, else false + */ + private boolean hasAnyVolumeLock() { +return myLocks.get().get(VOLUME_LOCK).get() != 0; + } + + /** + * Returns true if the current thread holds any bucket lock. + * @return true if current thread holds bucket lock, else false + */ + private boolean hasAnyBucketLock() { +return myLocks.get().get(BUCKET_LOCK).get() != 0; + } + + /** + * Returns true if the current thread holds any s3 bucket lock. + * @return true if current thread holds s3 bucket lock, else false + */ + private boolean hasAnyS3BucketLock() { +return myLocks.get().get(S3_BUCKET_LOCK).get() != 0; + } + + /** + * Returns true if the current thread holds any user lock. + * @return true if current thread holds user lock, else false + */ + private boolean hasAnyUserLock() { +return myLocks.get().get(USER_LOCK).get() != 0; Review comment: Yes, added a call for hasAnyUserLock() in acquireUserLock() so that if some one is trying to acquire multiple user locks, he will immediately fail with RunTimeException. As said in code comments in acquireUserLock() this is a protection logic, in avoiding users do that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264025) Time Spent: 6h 10m (was: 6h) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 6h 10m > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > # release volume lock. > Same issue is seen with SetOwner for Volume request also. > During HDDS-1620 [~arp] brought up this issue. > I am proposing the above solution to solve this issue. Any other > idea/suggestions are welcome. > This also resolves a bug in setOwner for Volume request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-14587) Support fail fast when client wait ACK by pipeline over threshold
[ https://issues.apache.org/jira/browse/HDFS-14587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868853#comment-16868853 ] Wei-Chiu Chuang commented on HDFS-14587: shouldn't it simply get a read timeout? i think we added a client side read timeout not too long ago. The only possibility I can imagine is the client gets into a full GC. But even that, 9 hours seems like a stretch. > Support fail fast when client wait ACK by pipeline over threshold > - > > Key: HDFS-14587 > URL: https://issues.apache.org/jira/browse/HDFS-14587 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > > Recently, I meet corner case that client wait for data to be acknowledged by > pipeline over 9 hours. After check branch trunk, I think this issue still > exist. So I propose to add threshold about wait timeout then fail fast. > {code:java} > 2019-06-18 12:53:46,217 WARN [Thread-127] org.apache.hadoop.hdfs.DFSClient: > Slow waitForAckedSeqno took 35560718ms (threshold=3ms) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264024=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264024 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 19:00 Start Date: 20/Jun/19 19:00 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295951701 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java ## @@ -154,79 +178,137 @@ public void releaseVolumeLock(String volume) { } /** - * Acquires S3 Bucket lock on the given resource. + * Acquires bucket lock on the given resource. * * If the lock is not available then the current thread becomes - * disabled for thread scheduling purposes and lies dormant until the lock has - * been acquired. + * disabled for thread scheduling purposes and lies dormant until the + * lock has been acquired. * - * @param s3BucketName S3Bucket Name on which the lock has to be acquired + * @param bucket Bucket on which the lock has to be acquired */ - public void acquireS3Lock(String s3BucketName) { -// Calling thread should not hold any bucket lock. -// You can take an Volume while holding S3 bucket lock, since -// semantically an S3 bucket maps to the ozone volume. So we check here -// only if ozone bucket lock is taken. -if (hasAnyBucketLock()) { + public void acquireBucketLock(String volume, String bucket) { +if (hasAnyUserLock()) { throw new RuntimeException( "Thread '" + Thread.currentThread().getName() + - "' cannot acquire S3 bucket lock while holding Ozone bucket " + - "lock(s)."); + "' cannot acquire bucket lock while holding User lock."); } -manager.lock(OM_S3_PREFIX + s3BucketName); -myLocks.get().get(S3_BUCKET_LOCK).incrementAndGet(); +manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); Review comment: Yes, we prefix volumeName with /(OM_KEY_PREFIX) and bucketName with /(OM_KEY_PREFIX) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264024) Time Spent: 6h (was: 5h 50m) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 6h > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > # release volume lock. > Same issue is seen with SetOwner for Volume request also. > During HDDS-1620 [~arp] brought up this issue. > I am proposing the above solution to solve this issue. Any other > idea/suggestions are welcome. > This also resolves a bug in setOwner for Volume request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264021 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 18:51 Start Date: 20/Jun/19 18:51 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295948310 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/OMVolumeDeleteRequest.java ## @@ -104,37 +104,23 @@ public OMClientResponse validateAndUpdateCache(OzoneManager ozoneManager, OmVolumeArgs omVolumeArgs = null; String owner = null; - +IOException exception = null; +OzoneManagerProtocolProtos.VolumeList newVolumeList = null; omMetadataManager.getLock().acquireVolumeLock(volume); try { owner = getVolumeInfo(omMetadataManager, volume).getOwnerName(); -} catch (IOException ex) { - LOG.error("Volume deletion failed for volume:{}", volume, ex); - omMetrics.incNumVolumeDeleteFails(); - auditLog(auditLogger, buildAuditMessage(OMAction.DELETE_VOLUME, - buildVolumeAuditMap(volume), ex, userInfo)); - return new OMVolumeDeleteResponse(null, null, null, - createErrorOMResponse(omResponse, ex)); -} finally { - omMetadataManager.getLock().releaseVolumeLock(volume); -} -// Release and reacquire lock for now it will not be a problem for now, as -// applyTransaction serializes the operation's. -// TODO: Revisit this logic once HDDS-1672 checks in. + // Release and reacquire lock for now it will not be a problem for now, as + // applyTransaction serializes the operation's. -// We cannot acquire user lock holding volume lock, so released volume -// lock, and acquiring user and volume lock. + // We cannot acquire user lock holding volume lock, so released volume + // lock, and acquiring user and volume lock. -omMetadataManager.getLock().acquireUserLock(owner); -omMetadataManager.getLock().acquireVolumeLock(volume); + omMetadataManager.getLock().acquireUserLock(owner); Review comment: That is why we checked in finally owner!=null and then only release the lock in finally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264021) Time Spent: 5h 50m (was: 5h 40m) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 5h 50m > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > # release volume lock. > Same issue is seen with SetOwner for Volume request also. > During HDDS-1620 [~arp] brought up this issue. > I am proposing the above solution to solve this issue. Any other > idea/suggestions are welcome. > This also resolves a bug in setOwner for Volume request. -- This message was sent by Atlassian JIRA
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264020 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 18:49 Start Date: 20/Jun/19 18:49 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295947682 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/S3BucketManagerImpl.java ## @@ -101,34 +101,26 @@ public void createS3Bucket(String userName, String bucketName) // anonymous access to bucket where the user name is absent. String ozoneVolumeName = formatOzoneVolumeName(userName); -omMetadataManager.getLock().acquireS3Lock(bucketName); -try { - String bucket = - omMetadataManager.getS3Table().get(bucketName); - - if (bucket != null) { -LOG.debug("Bucket already exists. {}", bucketName); -throw new OMException( -"Unable to create S3 bucket. " + bucketName + " already exists.", -OMException.ResultCodes.S3_BUCKET_ALREADY_EXISTS); - } - String ozoneBucketName = bucketName; - createOzoneBucket(ozoneVolumeName, ozoneBucketName); - String finalName = String.format("%s/%s", ozoneVolumeName, - ozoneBucketName); +String bucket = omMetadataManager.getS3Table().get(bucketName); - omMetadataManager.getS3Table().put(bucketName, finalName); -} finally { - omMetadataManager.getLock().releaseS3Lock(bucketName); Review comment: Do you mean s3 bucket lock, as we need to acquire that before creating volume. So, that is acquired in caller in OzoneManager. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264020) Time Spent: 5h 40m (was: 5.5h) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 5h 40m > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > # release volume lock. > Same issue is seen with SetOwner for Volume request also. > During HDDS-1620 [~arp] brought up this issue. > I am proposing the above solution to solve this issue. Any other > idea/suggestions are welcome. > This also resolves a bug in setOwner for Volume request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264019 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 18:47 Start Date: 20/Jun/19 18:47 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295946812 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -2564,6 +2564,9 @@ public void createS3Bucket(String userName, String s3BucketName) } metrics.incNumBucketCreates(); try { +metadataManager.getLock().acquireS3BucketLock(s3BucketName); +metadataManager.getLock().acquireVolumeLock( Review comment: On a side note: once we use new HA code this will be cleaned up, so not considered much refactoring here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264019) Time Spent: 5.5h (was: 5h 20m) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 5.5h > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > # release volume lock. > Same issue is seen with SetOwner for Volume request also. > During HDDS-1620 [~arp] brought up this issue. > I am proposing the above solution to solve this issue. Any other > idea/suggestions are welcome. > This also resolves a bug in setOwner for Volume request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264017=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264017 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 18:46 Start Date: 20/Jun/19 18:46 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295946327 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -2564,6 +2564,9 @@ public void createS3Bucket(String userName, String s3BucketName) } metrics.incNumBucketCreates(); try { +metadataManager.getLock().acquireS3BucketLock(s3BucketName); +metadataManager.getLock().acquireVolumeLock( Review comment: I see the only case for failing is with RunTimeException. So, do we still need the flags? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264017) Time Spent: 5h 20m (was: 5h 10m) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 5h 20m > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > # release volume lock. > Same issue is seen with SetOwner for Volume request also. > During HDDS-1620 [~arp] brought up this issue. > I am proposing the above solution to solve this issue. Any other > idea/suggestions are welcome. > This also resolves a bug in setOwner for Volume request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264012 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 18:37 Start Date: 20/Jun/19 18:37 Worklog Time Spent: 10m Work Description: arp7 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295942564 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/S3BucketManagerImpl.java ## @@ -101,34 +101,26 @@ public void createS3Bucket(String userName, String bucketName) // anonymous access to bucket where the user name is absent. String ozoneVolumeName = formatOzoneVolumeName(userName); -omMetadataManager.getLock().acquireS3Lock(bucketName); -try { - String bucket = - omMetadataManager.getS3Table().get(bucketName); - - if (bucket != null) { -LOG.debug("Bucket already exists. {}", bucketName); -throw new OMException( -"Unable to create S3 bucket. " + bucketName + " already exists.", -OMException.ResultCodes.S3_BUCKET_ALREADY_EXISTS); - } - String ozoneBucketName = bucketName; - createOzoneBucket(ozoneVolumeName, ozoneBucketName); - String finalName = String.format("%s/%s", ozoneVolumeName, - ozoneBucketName); +String bucket = omMetadataManager.getS3Table().get(bucketName); - omMetadataManager.getS3Table().put(bucketName, finalName); -} finally { - omMetadataManager.getLock().releaseS3Lock(bucketName); Review comment: Sorry I didn't get why we removed the acquire/release bucket lock. Is the caller now supposed to get the lock? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 264012) Time Spent: 5h 10m (was: 5h) > Improve locking in OzoneManager > --- > > Key: HDDS-1672 > URL: https://issues.apache.org/jira/browse/HDDS-1672 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Attachments: Ozone Locks in OM.pdf > > Time Spent: 5h 10m > Remaining Estimate: 0h > > In this Jira, we shall follow the new lock ordering. In this way, in volume > requests we can solve the issue of acquire/release/reacquire problem. And few > bugs in the current implementation of S3Bucket/Volume operations. > > Currently after acquiring volume lock, we cannot acquire user lock. > This is causing an issue in Volume request implementation, > acquire/release/reacquire volume lock. > > Case of Delete Volume Request: > # Acquire volume lock. > # Get Volume Info from DB > # Release Volume lock. (We are releasing the lock, because while acquiring > volume lock, we cannot acquire user lock0 > # Get owner from volume Info read from DB > # Acquire owner lock > # Acquire volume lock > # Do delete logic > # release volume lock > # release user lock > > We can avoid this acquire/release/reacquire lock issue by making volume lock > as low weight. > > In this way, the above deleteVolume request will change as below > # Acquire volume lock > # Get Volume Info from DB > # Get owner from volume Info read from DB > # Acquire owner lock > # Do delete logic > # release owner lock > # release volume lock. > Same issue is seen with SetOwner for Volume request also. > During HDDS-1620 [~arp] brought up this issue. > I am proposing the above solution to solve this issue. Any other > idea/suggestions are welcome. > This also resolves a bug in setOwner for Volume request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1713) ReplicationManager fail to find proper node topology based on Datanode details from heartbeat
Xiaoyu Yao created HDDS-1713: Summary: ReplicationManager fail to find proper node topology based on Datanode details from heartbeat Key: HDDS-1713 URL: https://issues.apache.org/jira/browse/HDDS-1713 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao DN does not have the topology info included in its heartbeat message for container report/pipeline report. SCM is where the topology information is available. During the processing of heartbeat, we should not rely on the datanodedetails from report to choose datanodes for close container. Otherwise, all the datanode locations of existing container replicas will fallback to /default-rack. The fix is to retrieve the corresponding datanode locations from scm nodemanager, which has authoritative network topology information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1554) Create disk tests for fault injection test
[ https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868831#comment-16868831 ] Eric Yang commented on HDDS-1554: - Patch 005 fixes hard coded uid:gid issues, and use a read-only mount for /data. Disk tests will supply -u flag to ensure the mounting location does not create filesystem uid/gid inconsistency problem. Other smoke tests are recommended to use -u flag to prevent containers to write out data of another user's uid/gid to host level file system, HDDS-1609 maybe a good place to start applying -u flag to tests outside of fault-injection tests. > Create disk tests for fault injection test > -- > > Key: HDDS-1554 > URL: https://issues.apache.org/jira/browse/HDDS-1554 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: pull-request-available > Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch, > HDDS-1554.003.patch, HDDS-1554.004.patch, HDDS-1554.005.patch > > Time Spent: 20m > Remaining Estimate: 0h > > The current plan for fault injection disk tests are: > # Scenario 1 - Read/Write test > ## Run docker-compose to bring up a cluster > ## Initialize scm and om > ## Upload data to Ozone cluster > ## Verify data is correct > ## Shutdown cluster > # Scenario 2 - Read/Only test > ## Repeat Scenario 1 > ## Mount data disk as read only > ## Try to write data to Ozone cluster > ## Validate error message is correct > ## Shutdown cluster > # Scenario 3 - Corruption test > ## Repeat Scenario 2 > ## Shutdown cluster > ## Modify data disk data > ## Restart cluster > ## Validate error message for read from corrupted data > ## Validate error message for write to corrupted volume -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1672) Improve locking in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-1672?focusedWorklogId=264007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264007 ] ASF GitHub Bot logged work on HDDS-1672: Author: ASF GitHub Bot Created on: 20/Jun/19 18:27 Start Date: 20/Jun/19 18:27 Worklog Time Spent: 10m Work Description: arp7 commented on pull request #949: HDDS-1672. Improve locking in OzoneManager. URL: https://github.com/apache/hadoop/pull/949#discussion_r295930273 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerLock.java ## @@ -154,79 +178,137 @@ public void releaseVolumeLock(String volume) { } /** - * Acquires S3 Bucket lock on the given resource. + * Acquires bucket lock on the given resource. * * If the lock is not available then the current thread becomes - * disabled for thread scheduling purposes and lies dormant until the lock has - * been acquired. + * disabled for thread scheduling purposes and lies dormant until the + * lock has been acquired. * - * @param s3BucketName S3Bucket Name on which the lock has to be acquired + * @param bucket Bucket on which the lock has to be acquired */ - public void acquireS3Lock(String s3BucketName) { -// Calling thread should not hold any bucket lock. -// You can take an Volume while holding S3 bucket lock, since -// semantically an S3 bucket maps to the ozone volume. So we check here -// only if ozone bucket lock is taken. -if (hasAnyBucketLock()) { + public void acquireBucketLock(String volume, String bucket) { +if (hasAnyUserLock()) { throw new RuntimeException( "Thread '" + Thread.currentThread().getName() + - "' cannot acquire S3 bucket lock while holding Ozone bucket " + - "lock(s)."); + "' cannot acquire bucket lock while holding User lock."); } -manager.lock(OM_S3_PREFIX + s3BucketName); -myLocks.get().get(S3_BUCKET_LOCK).incrementAndGet(); +manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); +myLocks.get().get(BUCKET_LOCK).incrementAndGet(); } /** - * Releases the volume lock on given resource. + * Releases the bucket lock on given resource. */ - public void releaseS3Lock(String s3BucketName) { -manager.unlock(OM_S3_PREFIX + s3BucketName); -myLocks.get().get(S3_BUCKET_LOCK).decrementAndGet(); + public void releaseBucketLock(String volume, String bucket) { +manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); +myLocks.get().get(BUCKET_LOCK).decrementAndGet(); } /** - * Acquires bucket lock on the given resource. + * Acquires user lock on the given resource. * * If the lock is not available then the current thread becomes * disabled for thread scheduling purposes and lies dormant until the * lock has been acquired. * - * @param bucket Bucket on which the lock has to be acquired + * @param user User on which the lock has to be acquired */ - public void acquireBucketLock(String volume, String bucket) { -manager.lock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); -myLocks.get().get(BUCKET_LOCK).incrementAndGet(); + public void acquireUserLock(String user) { +// In order to not maintain username's on which we have acquired lock, +// just checking have we acquired userLock before. If user want's to +// acquire user lock on multiple user's they should use +// acquireMultiUserLock. This is just a protection logic, to let not users +// use this if acquiring lock on multiple users. As currently, we have only +// use case we have for this is during setOwner operation in VolumeManager. +if (hasAnyUserLock()) { + LOG.error("Already have userLock"); + throw new RuntimeException("For acquiring lock on multiple users, use " + + "acquireMultiLock method"); +} +manager.lock(OM_USER_PREFIX + user); +myLocks.get().get(USER_LOCK).incrementAndGet(); } /** - * Releases the bucket lock on given resource. + * Releases the user lock on given resource. */ - public void releaseBucketLock(String volume, String bucket) { -manager.unlock(OM_KEY_PREFIX + volume + OM_KEY_PREFIX + bucket); -myLocks.get().get(BUCKET_LOCK).decrementAndGet(); + public void releaseUserLock(String user) { +manager.unlock(OM_USER_PREFIX + user); +myLocks.get().get(USER_LOCK).decrementAndGet(); } /** - * Returns true if the current thread holds any volume lock. - * @return true if current thread holds volume lock, else false + * Acquire user lock on 2 users. In this case, we compare 2 strings + * lexicographically, and acquire the locks according to the sorted order of + * the user names. In this way, when acquiring locks on multiple user's, we + * can avoid dead locks. This method