[jira] [Updated] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15266: Attachment: HDFS-15266-02.patch > Add missing DFSOps Statistics in WebHDFS > > > Key: HDFS-15266 > URL: https://issues.apache.org/jira/browse/HDFS-15266 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-15266-01.patch, HDFS-15266-02.patch > > > Couple of operations doesn't increment the count of number of read/write ops > and DFSOpsCountStatistics > like : getStoragePolicy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo
[ https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082808#comment-17082808 ] Xiaoqiao He commented on HDFS-15274: Hi [~marvelrock], Thanks for involve me here. Trigger Jenkins manually, please reference to https://builds.apache.org/job/PreCommit-HDFS-Build/29155/ Sorry I am not very familiar for this feature. cc [~cnauroth],[~weichiu],[~elgoiri] would you like to take reviews. Thanks. > NN doesn't remove the blocks from the failed DatanodeStorageInfo > > > Key: HDFS-15274 > URL: https://issues.apache.org/jira/browse/HDFS-15274 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch > > > In our federation cluster, we found there were some inconsistency failure > volumes between two namespaces. The following logs are two NS separately. > NS1 received the failed storage info and removed the blocks associated with > the failed storage. > {code:java} > [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes > from 0 to 1 > [INFO] [IPC Server handler 76 on 8021] : > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs > failed. > [INFO] > [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3] > : Removed blocks associated with storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010 > [INFO] [IPC Server handler 73 on 8021] : Removed storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010{code} > NS2 just received the failed storage. > {code:java} > [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes > from 0 to 1 {code} > > After digging into the code and trying to simulate disk failed with > {code:java} > echo offline > /sys/block/sda/device/state > echo 1 > /sys/block/sda/device/delete > # re-mount the failed disk > rescan-scsi-bus.sh -a > systemctl daemon-reload > mount /data0 > {code} > I found the root reason is the inconsistency between StorageReport and > VolumeFailureSummary in BPServiceActor#sendHeartBeat. > {code} > StorageReport[] reports = > dn.getFSDataset().getStorageReports(bpos.getBlockPoolId()); > .. > // the DISK may FAILED before executing the next line > VolumeFailureSummary volumeFailureSummary = dn.getFSDataset() > .getVolumeFailureSummary(); > int numFailedVolumes = volumeFailureSummary != null ? > volumeFailureSummary.getFailedStorageLocations().length : 0; > {code} > I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve > this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081309#comment-17081309 ] Toshihiro Suzuki edited comment on HDFS-15217 at 4/14/20, 1:10 AM: --- I created a PR for this. After applying this patch, we can see additional information in the lock report message as follows: {code:java} 2020-04-11 23:04:36,020 [IPC Server handler 5 on default port 62641] INFO namenode.FSNamesystem (FSNamesystemLock.java:writeUnlock(321)) - Number of suppressed write-lock reports: 0 Longest write-lock held at 2020-04-11 23:04:36,020+0900 for 3ms by delete (ugi=bob (auth:SIMPLE),ip=/127.0.0.1,src=/file,dst=null,perm=null) via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:302) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1746) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3274) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1130) org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:724) org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1016) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:944) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:422) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2948) Total suppressed write-lock held time: 0.0 {code} This patch adds the additional information *"by delete (ugi=bob (auth:SIMPLE),ip=/127.0.0.1,src=/file,dst=null,perm=null)"* which is similar to the audit log format. was (Author: brfrn169): I created a PR for this. After this patch, we can see additional information in the lock report message as follows: {code:java} 2020-04-11 23:04:36,020 [IPC Server handler 5 on default port 62641] INFO namenode.FSNamesystem (FSNamesystemLock.java:writeUnlock(321)) - Number of suppressed write-lock reports: 0 Longest write-lock held at 2020-04-11 23:04:36,020+0900 for 3ms by delete (ugi=bob (auth:SIMPLE),ip=/127.0.0.1,src=/file,dst=null,perm=null) via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:302) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1746) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3274) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1130) org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:724) org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1016) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:944) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:422) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2948) Total suppressed write-lock held time: 0.0 {code} This patch adds the additional information *"by delete (ugi=bob (auth:SIMPLE),ip=/127.0.0.1,src=/file,dst=null,perm=null)"* which is similar to the audit log format. > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > > Currently, we can see the stack trace in the longest write/read lock held > log, but
[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo
[ https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082766#comment-17082766 ] HuangTao commented on HDFS-15274: - cc [~hexiaoqiao] could you help me trigger jenkins again? > NN doesn't remove the blocks from the failed DatanodeStorageInfo > > > Key: HDFS-15274 > URL: https://issues.apache.org/jira/browse/HDFS-15274 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch > > > In our federation cluster, we found there were some inconsistency failure > volumes between two namespaces. The following logs are two NS separately. > NS1 received the failed storage info and removed the blocks associated with > the failed storage. > {code:java} > [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes > from 0 to 1 > [INFO] [IPC Server handler 76 on 8021] : > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs > failed. > [INFO] > [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3] > : Removed blocks associated with storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010 > [INFO] [IPC Server handler 73 on 8021] : Removed storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010{code} > NS2 just received the failed storage. > {code:java} > [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes > from 0 to 1 {code} > > After digging into the code and trying to simulate disk failed with > {code:java} > echo offline > /sys/block/sda/device/state > echo 1 > /sys/block/sda/device/delete > # re-mount the failed disk > rescan-scsi-bus.sh -a > systemctl daemon-reload > mount /data0 > {code} > I found the root reason is the inconsistency between StorageReport and > VolumeFailureSummary in BPServiceActor#sendHeartBeat. > {code} > StorageReport[] reports = > dn.getFSDataset().getStorageReports(bpos.getBlockPoolId()); > .. > // the DISK may FAILED before executing the next line > VolumeFailureSummary volumeFailureSummary = dn.getFSDataset() > .getVolumeFailureSummary(); > int numFailedVolumes = volumeFailureSummary != null ? > volumeFailureSummary.getFailedStorageLocations().length : 0; > {code} > I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve > this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082762#comment-17082762 ] Hadoop QA commented on HDFS-15266: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 0s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 14s{color} | {color:red} hadoop-hdfs-project generated 5 new + 743 unchanged - 5 fixed = 748 total (was 748) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 51s{color} | {color:orange} hadoop-hdfs-project: The patch generated 5 new + 155 unchanged - 0 fixed = 160 total (was 155) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 52s{color} | {color:red} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}105m 12s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}188m 31s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d | | JIRA Issue | HDFS-15266 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12999828/HDFS-15266-01.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a2c7f27832d0 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3edbe87 | | maven |
[jira] [Updated] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15266: Status: Patch Available (was: Open) > Add missing DFSOps Statistics in WebHDFS > > > Key: HDFS-15266 > URL: https://issues.apache.org/jira/browse/HDFS-15266 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-15266-01.patch > > > Couple of operations doesn't increment the count of number of read/write ops > and DFSOpsCountStatistics > like : getStoragePolicy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082643#comment-17082643 ] Ayush Saxena commented on HDFS-15266: - Was trying to sync with {{DistributedFileSystem}}, Seems DFS was missing some Ops present in {{WebHdfsFileSystem}}, so added them there too. Hopefully haven't missed any... > Add missing DFSOps Statistics in WebHDFS > > > Key: HDFS-15266 > URL: https://issues.apache.org/jira/browse/HDFS-15266 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-15266-01.patch > > > Couple of operations doesn't increment the count of number of read/write ops > and DFSOpsCountStatistics > like : getStoragePolicy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15266) Add missing DFSOps Statistics in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15266: Attachment: HDFS-15266-01.patch > Add missing DFSOps Statistics in WebHDFS > > > Key: HDFS-15266 > URL: https://issues.apache.org/jira/browse/HDFS-15266 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-15266-01.patch > > > Couple of operations doesn't increment the count of number of read/write ops > and DFSOpsCountStatistics > like : getStoragePolicy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15275) HttpFS: Response of Create was not correct with noredirect and data are true
[ https://issues.apache.org/jira/browse/HDFS-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082597#comment-17082597 ] Íñigo Goiri commented on HDFS-15275: Can you make the indentation consistent to what was there before? Everything else makes sense. > HttpFS: Response of Create was not correct with noredirect and data are true > > > Key: HDFS-15275 > URL: https://issues.apache.org/jira/browse/HDFS-15275 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15275.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15275) HttpFS: Response of Create was not correct with noredirect and data are true
[ https://issues.apache.org/jira/browse/HDFS-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082497#comment-17082497 ] Hadoop QA commented on HDFS-15275: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 18s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-httpfs: The patch generated 2 new + 296 unchanged - 46 fixed = 298 total (was 342) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 28s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d | | JIRA Issue | HDFS-15275 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12999804/HDFS-15275.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3958ef638ea7 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8d49229 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/29153/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-httpfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/29153/testReport/ | | Max. process+thread count | 632 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-httpfs U: hadoop-hdfs-project/hadoop-hdfs-httpfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29153/console | | Powered by | Apache
[jira] [Updated] (HDFS-15276) Concat on INodeRefernce fails with illegal state exception
[ https://issues.apache.org/jira/browse/HDFS-15276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hemanthboyina updated HDFS-15276: - Description: Performing concat operation on INodeRefernce throwing illegal state exception On verifySrcFiles , the src inode was getting converted to Inodefile {code:java} final INode srcINode = iip.getLastINode(); final INodeFile srcINodeFile = INodeFile.valueOf(srcINode, src);{code} If this INode is an INodeRefernce , it fails at Preconditions.checkstate as the child is an refernce but we have converted that as file {code:java} INodeDirectory#removeChild final INode removed = children.remove(i); Preconditions.checkState(removed == child); {code} > Concat on INodeRefernce fails with illegal state exception > -- > > Key: HDFS-15276 > URL: https://issues.apache.org/jira/browse/HDFS-15276 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > > Performing concat operation on INodeRefernce throwing illegal state exception > On verifySrcFiles , the src inode was getting converted to Inodefile > {code:java} > final INode srcINode = iip.getLastINode(); > final INodeFile srcINodeFile = INodeFile.valueOf(srcINode, src);{code} > If this INode is an INodeRefernce , it fails at Preconditions.checkstate as > the child is an refernce but we have converted that as file > {code:java} > INodeDirectory#removeChild > final INode removed = children.remove(i); > Preconditions.checkState(removed == child); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15276) Concat on INodeRefernce fails with illegal state exception
hemanthboyina created HDFS-15276: Summary: Concat on INodeRefernce fails with illegal state exception Key: HDFS-15276 URL: https://issues.apache.org/jira/browse/HDFS-15276 Project: Hadoop HDFS Issue Type: Bug Reporter: hemanthboyina Assignee: hemanthboyina -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15275) HttpFS: Response of Create was not correct with noredirect and data are true
[ https://issues.apache.org/jira/browse/HDFS-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hemanthboyina updated HDFS-15275: - Attachment: HDFS-15275.001.patch Status: Patch Available (was: Open) > HttpFS: Response of Create was not correct with noredirect and data are true > > > Key: HDFS-15275 > URL: https://issues.apache.org/jira/browse/HDFS-15275 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15275.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15275) HttpFS: Response of Create was not correct with noredirect and data are true
hemanthboyina created HDFS-15275: Summary: HttpFS: Response of Create was not correct with noredirect and data are true Key: HDFS-15275 URL: https://issues.apache.org/jira/browse/HDFS-15275 Project: Hadoop HDFS Issue Type: Bug Reporter: hemanthboyina Assignee: hemanthboyina -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-1820) FTPFileSystem attempts to close the outputstream even when it is not initialised
[ https://issues.apache.org/jira/browse/HDFS-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082355#comment-17082355 ] Hadoop QA commented on HDFS-1820: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 7s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}102m 40s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d | | JIRA Issue | HDFS-1820 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12999777/HDFS-1820.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle | | uname | Linux 3aa101c0a22d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8d49229 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/29152/testReport/ | | Max. process+thread count | 1519 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29152/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > FTPFileSystem attempts
[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo
[ https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082348#comment-17082348 ] HuangTao commented on HDFS-15274: - All failed test cases can pass locally > NN doesn't remove the blocks from the failed DatanodeStorageInfo > > > Key: HDFS-15274 > URL: https://issues.apache.org/jira/browse/HDFS-15274 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch > > > In our federation cluster, we found there were some inconsistency failure > volumes between two namespaces. The following logs are two NS separately. > NS1 received the failed storage info and removed the blocks associated with > the failed storage. > {code:java} > [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes > from 0 to 1 > [INFO] [IPC Server handler 76 on 8021] : > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs > failed. > [INFO] > [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3] > : Removed blocks associated with storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010 > [INFO] [IPC Server handler 73 on 8021] : Removed storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010{code} > NS2 just received the failed storage. > {code:java} > [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes > from 0 to 1 {code} > > After digging into the code and trying to simulate disk failed with > {code:java} > echo offline > /sys/block/sda/device/state > echo 1 > /sys/block/sda/device/delete > # re-mount the failed disk > rescan-scsi-bus.sh -a > systemctl daemon-reload > mount /data0 > {code} > I found the root reason is the inconsistency between StorageReport and > VolumeFailureSummary in BPServiceActor#sendHeartBeat. > {code} > StorageReport[] reports = > dn.getFSDataset().getStorageReports(bpos.getBlockPoolId()); > .. > // the DISK may FAILED before executing the next line > VolumeFailureSummary volumeFailureSummary = dn.getFSDataset() > .getVolumeFailureSummary(); > int numFailedVolumes = volumeFailureSummary != null ? > volumeFailureSummary.getFailedStorageLocations().length : 0; > {code} > I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve > this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo
[ https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082347#comment-17082347 ] HuangTao commented on HDFS-15274: - retrigger QA > NN doesn't remove the blocks from the failed DatanodeStorageInfo > > > Key: HDFS-15274 > URL: https://issues.apache.org/jira/browse/HDFS-15274 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch > > > In our federation cluster, we found there were some inconsistency failure > volumes between two namespaces. The following logs are two NS separately. > NS1 received the failed storage info and removed the blocks associated with > the failed storage. > {code:java} > [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes > from 0 to 1 > [INFO] [IPC Server handler 76 on 8021] : > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs > failed. > [INFO] > [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3] > : Removed blocks associated with storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010 > [INFO] [IPC Server handler 73 on 8021] : Removed storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010{code} > NS2 just received the failed storage. > {code:java} > [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes > from 0 to 1 {code} > > After digging into the code and trying to simulate disk failed with > {code:java} > echo offline > /sys/block/sda/device/state > echo 1 > /sys/block/sda/device/delete > # re-mount the failed disk > rescan-scsi-bus.sh -a > systemctl daemon-reload > mount /data0 > {code} > I found the root reason is the inconsistency between StorageReport and > VolumeFailureSummary in BPServiceActor#sendHeartBeat. > {code} > StorageReport[] reports = > dn.getFSDataset().getStorageReports(bpos.getBlockPoolId()); > .. > // the DISK may FAILED before executing the next line > VolumeFailureSummary volumeFailureSummary = dn.getFSDataset() > .getVolumeFailureSummary(); > int numFailedVolumes = volumeFailureSummary != null ? > volumeFailureSummary.getFailedStorageLocations().length : 0; > {code} > I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve > this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo
[ https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082307#comment-17082307 ] Hadoop QA commented on HDFS-15274: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 54s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}157m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.server.datanode.TestBPOfferService | | | hadoop.hdfs.TestFileCreation | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d | | JIRA Issue | HDFS-15274 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12999761/HDFS-15274.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1e1f35842a9d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8d49229 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/29151/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/29151/testReport/ | | Max. process+thread count | 4358 (vs. ulimit of 5500) | | modules | C:
[jira] [Updated] (HDFS-1820) FTPFileSystem attempts to close the outputstream even when it is not initialised
[ https://issues.apache.org/jira/browse/HDFS-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Pryakhin updated HDFS-1820: --- Attachment: HDFS-1820.002.patch Status: Patch Available (was: In Progress) > FTPFileSystem attempts to close the outputstream even when it is not > initialised > > > Key: HDFS-1820 > URL: https://issues.apache.org/jira/browse/HDFS-1820 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 0.20.1 > Environment: occurs on all platforms >Reporter: Sudharsan Sampath >Assignee: Mikhail Pryakhin >Priority: Major > Labels: hadoop > Attachments: HDFS-1820.001.patch, HDFS-1820.002.patch > > > FTPFileSystem's create method attempts to close the outputstream even when it > is not initialized causing a null pointer exception. In our case the apache > commons FTPClient was not able to create the destination file due to > permissions issue. The FtpClient promptly reported a 553 : Permissions issue > but it was overlooked in FTPFileSystem create method. > The following code fails > if (!FTPReply.isPositivePreliminary(client.getReplyCode())) { > // The ftpClient is an inconsistent state. Must close the stream > // which in turn will logout and disconnect from FTP server > fos.close(); > throw new IOException("Unable to create file: " + file + ", Aborting"); > } > as 'fos' is null. As a result the proper error message "Unable to create file > XXX" is not reported but rather a null pointer exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo
[ https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HuangTao updated HDFS-15274: Attachment: HDFS-15274.002.patch > NN doesn't remove the blocks from the failed DatanodeStorageInfo > > > Key: HDFS-15274 > URL: https://issues.apache.org/jira/browse/HDFS-15274 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch > > > In our federation cluster, we found there were some inconsistency failure > volumes between two namespaces. The following logs are two NS separately. > NS1 received the failed storage info and removed the blocks associated with > the failed storage. > {code:java} > [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes > from 0 to 1 > [INFO] [IPC Server handler 76 on 8021] : > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs > failed. > [INFO] > [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3] > : Removed blocks associated with storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010 > [INFO] [IPC Server handler 73 on 8021] : Removed storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010{code} > NS2 just received the failed storage. > {code:java} > [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes > from 0 to 1 {code} > > After digging into the code and trying to simulate disk failed with > {code:java} > echo offline > /sys/block/sda/device/state > echo 1 > /sys/block/sda/device/delete > # re-mount the failed disk > rescan-scsi-bus.sh -a > systemctl daemon-reload > mount /data0 > {code} > I found the root reason is the inconsistency between StorageReport and > VolumeFailureSummary in BPServiceActor#sendHeartBeat. > {code} > StorageReport[] reports = > dn.getFSDataset().getStorageReports(bpos.getBlockPoolId()); > .. > // the DISK may FAILED before executing the next line > VolumeFailureSummary volumeFailureSummary = dn.getFSDataset() > .getVolumeFailureSummary(); > int numFailedVolumes = volumeFailureSummary != null ? > volumeFailureSummary.getFailedStorageLocations().length : 0; > {code} > I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve > this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-1820) FTPFileSystem attempts to close the outputstream even when it is not initialised
[ https://issues.apache.org/jira/browse/HDFS-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-1820 started by Mikhail Pryakhin. -- > FTPFileSystem attempts to close the outputstream even when it is not > initialised > > > Key: HDFS-1820 > URL: https://issues.apache.org/jira/browse/HDFS-1820 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 0.20.1 > Environment: occurs on all platforms >Reporter: Sudharsan Sampath >Assignee: Mikhail Pryakhin >Priority: Major > Labels: hadoop > Attachments: HDFS-1820.001.patch > > > FTPFileSystem's create method attempts to close the outputstream even when it > is not initialized causing a null pointer exception. In our case the apache > commons FTPClient was not able to create the destination file due to > permissions issue. The FtpClient promptly reported a 553 : Permissions issue > but it was overlooked in FTPFileSystem create method. > The following code fails > if (!FTPReply.isPositivePreliminary(client.getReplyCode())) { > // The ftpClient is an inconsistent state. Must close the stream > // which in turn will logout and disconnect from FTP server > fos.close(); > throw new IOException("Unable to create file: " + file + ", Aborting"); > } > as 'fos' is null. As a result the proper error message "Unable to create file > XXX" is not reported but rather a null pointer exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo
[ https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082151#comment-17082151 ] Hadoop QA commented on HDFS-15274: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 7 new + 121 unchanged - 0 fixed = 128 total (was 121) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 49s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}171m 13s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d | | JIRA Issue | HDFS-15274 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12999743/HDFS-15274.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 94502884026d 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8d49229 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/29150/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/29150/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/29150/testReport/ | | Max. process+thread count | 3280 (vs. ulimit of 5500) | |
[jira] [Commented] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo
[ https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082132#comment-17082132 ] HuangTao commented on HDFS-15274: - [~daryn] please help to take a look. > NN doesn't remove the blocks from the failed DatanodeStorageInfo > > > Key: HDFS-15274 > URL: https://issues.apache.org/jira/browse/HDFS-15274 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Attachments: HDFS-15274.001.patch > > > In our federation cluster, we found there were some inconsistency failure > volumes between two namespaces. The following logs are two NS separately. > NS1 received the failed storage info and removed the blocks associated with > the failed storage. > {code:java} > [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes > from 0 to 1 > [INFO] [IPC Server handler 76 on 8021] : > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs > failed. > [INFO] > [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3] > : Removed blocks associated with storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010 > [INFO] [IPC Server handler 73 on 8021] : Removed storage > [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs > from DataNode X.X.X.X:50010{code} > NS2 just received the failed storage. > {code:java} > [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes > from 0 to 1 {code} > > After digging into the code and trying to simulate disk failed with > {code:java} > echo offline > /sys/block/sda/device/state > echo 1 > /sys/block/sda/device/delete > # re-mount the failed disk > rescan-scsi-bus.sh -a > systemctl daemon-reload > mount /data0 > {code} > I found the root reason is the inconsistency between StorageReport and > VolumeFailureSummary in BPServiceActor#sendHeartBeat. > {code} > StorageReport[] reports = > dn.getFSDataset().getStorageReports(bpos.getBlockPoolId()); > .. > // the DISK may FAILED before executing the next line > VolumeFailureSummary volumeFailureSummary = dn.getFSDataset() > .getVolumeFailureSummary(); > int numFailedVolumes = volumeFailureSummary != null ? > volumeFailureSummary.getFailedStorageLocations().length : 0; > {code} > I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve > this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory
[ https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-14476: Target Version/s: (was: 3.4.0) Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch-2.10. Thanks, [~seanlook]. We should set "Fix Version/s" after the patch is committed. > lock too long when fix inconsistent blocks between disk and in-memory > - > > Key: HDFS-14476 > URL: https://issues.apache.org/jira/browse/HDFS-14476 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0, 2.7.0, 3.0.3 >Reporter: Sean Chow >Assignee: Sean Chow >Priority: Major > Fix For: 3.3.0, 2.10.1 > > Attachments: HDFS-14476-branch-2.01.patch, > HDFS-14476-branch-2.02.patch, HDFS-14476-branch-2.10.02.patch, > HDFS-14476.00.patch, HDFS-14476.002.patch, HDFS-14476.01.patch, > HDFS-14476.branch-3.2.001.patch, datanode-with-patch-14476.png > > > When directoryScanner have the results of differences between disk and > in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However > {{FsDatasetImpl.checkAndUpdate}} is a synchronized call > As I have about 6millions blocks for every datanodes and every 6hours' scan > will have about 25000 abnormal blocks to fix. That leads to a long lock > holding FsDatasetImpl object. > let's assume every block need 10ms to fix(because of latency of SAS disk), > that will cost 250 seconds to finish. That means all reads and writes will be > blocked for 3mins for that datanode. > > {code:java} > 2019-05-06 08:06:51,704 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing > metadata files:23574, missing block files:23574, missing blocks in > memory:47625, mismatched blocks:0 > ... > 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 588402ms to process 1 commands from NN > {code} > Take long time to process command from nn because threads are blocked. And > namenode will see long lastContact time for this datanode. > Maybe this affect all hdfs versions. > *how to fix:* > just like process invalidate command from namenode with 1000 batch size, fix > these abnormal block should be handled with batch too and sleep 2 seconds > between the batch to allow normal reading/writing blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15048) Fix findbug in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082108#comment-17082108 ] Masatake Iwasaki commented on HDFS-15048: - +1 on the HDFS-15048-branch-2.10.002.patch. > Fix findbug in DirectoryScanner > --- > > Key: HDFS-15048 > URL: https://issues.apache.org/jira/browse/HDFS-15048 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Masatake Iwasaki >Priority: Major > Fix For: 3.3.0, 2.10.1 > > Attachments: HDFS-15048-branch-2.10.002.patch, HDFS-15048.001.patch > > > There is a findbug in DirectoryScanner. > {noformat} > Multithreaded correctness Warnings > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() calls > Thread.sleep() with a lock held > Bug type SWL_SLEEP_WITH_LOCK_HELD (click for details) > In class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner > In method org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() > At DirectoryScanner.java:[line 441] > {noformat} > https://builds.apache.org/job/PreCommit-HDFS-Build/28498/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory
[ https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082107#comment-17082107 ] Masatake Iwasaki commented on HDFS-14476: - +1 on the HDFS-14476-branch-2.10.02.patch. Committing this and HDFS-15048-branch-2.10.002.patch to branch-2.10. > lock too long when fix inconsistent blocks between disk and in-memory > - > > Key: HDFS-14476 > URL: https://issues.apache.org/jira/browse/HDFS-14476 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0, 2.7.0, 3.0.3 >Reporter: Sean Chow >Assignee: Sean Chow >Priority: Major > Fix For: 3.3.0, 2.10.1 > > Attachments: HDFS-14476-branch-2.01.patch, > HDFS-14476-branch-2.02.patch, HDFS-14476-branch-2.10.02.patch, > HDFS-14476.00.patch, HDFS-14476.002.patch, HDFS-14476.01.patch, > HDFS-14476.branch-3.2.001.patch, datanode-with-patch-14476.png > > > When directoryScanner have the results of differences between disk and > in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However > {{FsDatasetImpl.checkAndUpdate}} is a synchronized call > As I have about 6millions blocks for every datanodes and every 6hours' scan > will have about 25000 abnormal blocks to fix. That leads to a long lock > holding FsDatasetImpl object. > let's assume every block need 10ms to fix(because of latency of SAS disk), > that will cost 250 seconds to finish. That means all reads and writes will be > blocked for 3mins for that datanode. > > {code:java} > 2019-05-06 08:06:51,704 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing > metadata files:23574, missing block files:23574, missing blocks in > memory:47625, mismatched blocks:0 > ... > 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 588402ms to process 1 commands from NN > {code} > Take long time to process command from nn because threads are blocked. And > namenode will see long lastContact time for this datanode. > Maybe this affect all hdfs versions. > *how to fix:* > just like process invalidate command from namenode with 1000 batch size, fix > these abnormal block should be handled with batch too and sleep 2 seconds > between the batch to allow normal reading/writing blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15048) Fix findbug in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082085#comment-17082085 ] Masatake Iwasaki commented on HDFS-15048: - [~seanlook] I'm going to commit both HDFS-14476-branch-2.10.02.patch and HDFS-15048-branch-2.10.002.patch after testing on my local. Thanks. > Fix findbug in DirectoryScanner > --- > > Key: HDFS-15048 > URL: https://issues.apache.org/jira/browse/HDFS-15048 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Masatake Iwasaki >Priority: Major > Fix For: 3.3.0, 2.10.1 > > Attachments: HDFS-15048-branch-2.10.002.patch, HDFS-15048.001.patch > > > There is a findbug in DirectoryScanner. > {noformat} > Multithreaded correctness Warnings > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() calls > Thread.sleep() with a lock held > Bug type SWL_SLEEP_WITH_LOCK_HELD (click for details) > In class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner > In method org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() > At DirectoryScanner.java:[line 441] > {noformat} > https://builds.apache.org/job/PreCommit-HDFS-Build/28498/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org