[jira] [Commented] (HDDS-339) Add block length and blockId in PutKeyResponse
[ https://issues.apache.org/jira/browse/HDDS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574324#comment-16574324 ] Mukul Kumar Singh commented on HDDS-339: Thanks for working on this [~shashikant]. Please find my review comments as following. 1) DatanodeContainerProtocol.proto:311, getCommittedBlockLength-> committedBlockLength 2) KeyValueHandler:43, Unused import 3) KeyValueHandler:438, I feel the length can be returned as part of commitKey, once the key has been committed successfully, the length can be returned. 4) KeyUtils:156, there is a unused variable here, I feel this can be removed 5) KeyUtils:140, putKeyResposne-> putKeyResponse 6) KeyUtils:130, returns putkey response success 7) KeyUtils:134, getPutKeyResponseSuccess -> putKeyResponseSuccess 8) KeyUtils:194, msg is a unused field > Add block length and blockId in PutKeyResponse > -- > > Key: HDDS-339 > URL: https://issues.apache.org/jira/browse/HDDS-339 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client, Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-339.00.patch > > > The putKey response will include blockId as well committed block length in > the PutKey response. This will be extended to include blockCommitSequenceId > as well all of which will be updated on Ozone Master. This all be required to > add validation as well handle 2 node failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13668) FSPermissionChecker may throws AIOOE when check if inode has permission
[ https://issues.apache.org/jira/browse/HDFS-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574300#comment-16574300 ] He Xiaoqiao commented on HDFS-13668: Thanks [~drankye],[~shashikant] for your suggestions, submit v002 following your advice and trigger jenkins. > FSPermissionChecker may throws AIOOE when check if inode has permission > --- > > Key: HDFS-13668 > URL: https://issues.apache.org/jira/browse/HDFS-13668 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.0, 2.10.0, 2.7.7 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13668-trunk.001.patch, HDFS-13668-trunk.002.patch > > > {{FSPermissionChecker}} may throw {{ArrayIndexOutOfBoundsException:0}} when > check if has permission, since it only check inode's {{aclFeature}} if null > or not but not check it's entry size. When it meets {{aclFeature}} not null > but it's entry size equal to 0, it will throw AIOOE. > {code:java} > private boolean hasPermission(INodeAttributes inode, FsAction access) { > .. > final AclFeature aclFeature = inode.getAclFeature(); > if (aclFeature != null) { > // It's possible that the inode has a default ACL but no access ACL. > int firstEntry = aclFeature.getEntryAt(0); > if (AclEntryStatusFormat.getScope(firstEntry) == AclEntryScope.ACCESS) { > return hasAclPermission(inode, access, mode, aclFeature); > } > } > .. > } > {code} > Actually if use default {{INodeAttributeProvider}}, it can ensure that when > {{inode}}'s aclFeature is not null and it's entry size also will be greater > than 0, but {{INodeAttributeProvider}} is a public interface, we could not > ensure external implement (e.g. Apache Sentry, Apache Ranger) also has the > similar constraint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13668) FSPermissionChecker may throws AIOOE when check if inode has permission
[ https://issues.apache.org/jira/browse/HDFS-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-13668: --- Attachment: HDFS-13668-trunk.002.patch > FSPermissionChecker may throws AIOOE when check if inode has permission > --- > > Key: HDFS-13668 > URL: https://issues.apache.org/jira/browse/HDFS-13668 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.0, 2.10.0, 2.7.7 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13668-trunk.001.patch, HDFS-13668-trunk.002.patch > > > {{FSPermissionChecker}} may throw {{ArrayIndexOutOfBoundsException:0}} when > check if has permission, since it only check inode's {{aclFeature}} if null > or not but not check it's entry size. When it meets {{aclFeature}} not null > but it's entry size equal to 0, it will throw AIOOE. > {code:java} > private boolean hasPermission(INodeAttributes inode, FsAction access) { > .. > final AclFeature aclFeature = inode.getAclFeature(); > if (aclFeature != null) { > // It's possible that the inode has a default ACL but no access ACL. > int firstEntry = aclFeature.getEntryAt(0); > if (AclEntryStatusFormat.getScope(firstEntry) == AclEntryScope.ACCESS) { > return hasAclPermission(inode, access, mode, aclFeature); > } > } > .. > } > {code} > Actually if use default {{INodeAttributeProvider}}, it can ensure that when > {{inode}}'s aclFeature is not null and it's entry size also will be greater > than 0, but {{INodeAttributeProvider}} is a public interface, we could not > ensure external implement (e.g. Apache Sentry, Apache Ranger) also has the > similar constraint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-314) ozoneShell putKey command overwrites the existing key having same name
[ https://issues.apache.org/jira/browse/HDDS-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-314: Attachment: HDDS-314.002.patch > ozoneShell putKey command overwrites the existing key having same name > -- > > Key: HDDS-314 > URL: https://issues.apache.org/jira/browse/HDDS-314 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Critical > Fix For: 0.2.1 > > Attachments: HDDS-314.001.patch, HDDS-314.002.patch > > > steps taken : > 1) created a volume root-volume and a bucket root-bucket. > 2) Ran following command to put a key with name 'passwd' > > {noformat} > hadoop@08315aa4b367:~/bin$ ./ozone oz -putKey /root-volume/root-bucket/passwd > -file /etc/services -v > 2018-08-02 09:20:17 WARN NativeCodeLoader:60 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Volume Name : root-volume > Bucket Name : root-bucket > Key Name : passwd > File Hash : 567c100888518c1163b3462993de7d47 > 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.rpc.type = GRPC (default) > 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 > (custom) > 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 > ms (default) > 2018-08-02 09:20:18 INFO ConfUtils:41 - > raft.client.async.outstanding-requests.max = 100 (default) > 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.client.async.scheduler-threads = > 3 (default) > 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB > (=1048576) (default) > 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 > (custom) > 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.client.rpc.request.timeout = > 3000 ms (default) > Aug 02, 2018 9:20:18 AM > org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl detectProxy > > {noformat} > 3) Ran following command to put a key with name 'passwd' again. > {noformat} > hadoop@08315aa4b367:~/bin$ ./ozone oz -putKey /root-volume/root-bucket/passwd > -file /etc/passwd -v > 2018-08-02 09:20:41 WARN NativeCodeLoader:60 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Volume Name : root-volume > Bucket Name : root-bucket > Key Name : passwd > File Hash : b056233571cc80d6879212911cb8e500 > 2018-08-02 09:20:41 INFO ConfUtils:41 - raft.rpc.type = GRPC (default) > 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 > (custom) > 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 > ms (default) > 2018-08-02 09:20:42 INFO ConfUtils:41 - > raft.client.async.outstanding-requests.max = 100 (default) > 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.client.async.scheduler-threads = > 3 (default) > 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB > (=1048576) (default) > 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 > (custom) > 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.client.rpc.request.timeout = > 3000 ms (default) > Aug 02, 2018 9:20:42 AM > org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl > detectProxy{noformat} > > key 'passwd' was overwritten with new content and it did not throw any saying > that the key is already present. > Expectation : > --- > key overwrite with same name should not be allowed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13808) [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and #getFileInfo() function
Rakesh R created HDFS-13808: --- Summary: [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and #getFileInfo() function Key: HDFS-13808 URL: https://issues.apache.org/jira/browse/HDFS-13808 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13668) FSPermissionChecker may throws AIOOE when check if inode has permission
[ https://issues.apache.org/jira/browse/HDFS-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574260#comment-16574260 ] Shashikant Banerjee commented on HDFS-13668: Thanks [~hexiaoqiao], for reporting and working on this. The patch looks good to me overall. Some minor comments: # There are some whitespace issues reported. Please fix them. # In TestINodeAttributeProvider#testAclFeature, we can remove this code as this may not be required {code:java} fs.rename(aclChildDir, aclChildDirTarget{code} I am + 1 after that. > FSPermissionChecker may throws AIOOE when check if inode has permission > --- > > Key: HDFS-13668 > URL: https://issues.apache.org/jira/browse/HDFS-13668 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.0, 2.10.0, 2.7.7 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13668-trunk.001.patch > > > {{FSPermissionChecker}} may throw {{ArrayIndexOutOfBoundsException:0}} when > check if has permission, since it only check inode's {{aclFeature}} if null > or not but not check it's entry size. When it meets {{aclFeature}} not null > but it's entry size equal to 0, it will throw AIOOE. > {code:java} > private boolean hasPermission(INodeAttributes inode, FsAction access) { > .. > final AclFeature aclFeature = inode.getAclFeature(); > if (aclFeature != null) { > // It's possible that the inode has a default ACL but no access ACL. > int firstEntry = aclFeature.getEntryAt(0); > if (AclEntryStatusFormat.getScope(firstEntry) == AclEntryScope.ACCESS) { > return hasAclPermission(inode, access, mode, aclFeature); > } > } > .. > } > {code} > Actually if use default {{INodeAttributeProvider}}, it can ensure that when > {{inode}}'s aclFeature is not null and it's entry size also will be greater > than 0, but {{INodeAttributeProvider}} is a public interface, we could not > ensure external implement (e.g. Apache Sentry, Apache Ranger) also has the > similar constraint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574255#comment-16574255 ] chencan commented on HDFS-13758: Hi [~jojochuang] , I have submited the branch-2 patch. Thanks! > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Status: Patch Available (was: Open) > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Attachment: (was: HDFS-13758.branch-2.patch) > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Attachment: HDFS-13758.branch-2.patch > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Attachment: HDFS-13758.branch-2.patch > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13668) FSPermissionChecker may throws AIOOE when check if inode has permission
[ https://issues.apache.org/jira/browse/HDFS-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574231#comment-16574231 ] Kai Zheng commented on HDFS-13668: -- This NPE looks like a good catch and fix. +1 from me. Thanks Xiaoqiao. [~jojochuang], would you mind also giving it a look? Thanks. > FSPermissionChecker may throws AIOOE when check if inode has permission > --- > > Key: HDFS-13668 > URL: https://issues.apache.org/jira/browse/HDFS-13668 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.0, 2.10.0, 2.7.7 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13668-trunk.001.patch > > > {{FSPermissionChecker}} may throw {{ArrayIndexOutOfBoundsException:0}} when > check if has permission, since it only check inode's {{aclFeature}} if null > or not but not check it's entry size. When it meets {{aclFeature}} not null > but it's entry size equal to 0, it will throw AIOOE. > {code:java} > private boolean hasPermission(INodeAttributes inode, FsAction access) { > .. > final AclFeature aclFeature = inode.getAclFeature(); > if (aclFeature != null) { > // It's possible that the inode has a default ACL but no access ACL. > int firstEntry = aclFeature.getEntryAt(0); > if (AclEntryStatusFormat.getScope(firstEntry) == AclEntryScope.ACCESS) { > return hasAclPermission(inode, access, mode, aclFeature); > } > } > .. > } > {code} > Actually if use default {{INodeAttributeProvider}}, it can ensure that when > {{inode}}'s aclFeature is not null and it's entry size also will be greater > than 0, but {{INodeAttributeProvider}} is a public interface, we could not > ensure external implement (e.g. Apache Sentry, Apache Ranger) also has the > similar constraint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Status: Open (was: Patch Available) > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Attachment: (was: HDFS-13758.branch-2.patch) > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Status: Patch Available (was: Open) > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Attachment: (was: HDFS-13758.branch-2.patch) > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Status: Open (was: Patch Available) > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Attachment: HDFS-13758.branch-2.patch > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chencan updated HDFS-13758: --- Attachment: HDFS-13758.branch-2.patch > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, > HDFS-13758.branch-2.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13805) Journal Nodes should allow to format non-empty directories with "-force" option
[ https://issues.apache.org/jira/browse/HDFS-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574224#comment-16574224 ] Surendra Singh Lilhore commented on HDFS-13805: --- Hi [~arpitagarwal], when we want to recover the NN with old backuped fsimage, we need to reinitialized the share edits to avoid the mismatch between fsimage and edits logs. > Journal Nodes should allow to format non-empty directories with "-force" > option > --- > > Key: HDFS-13805 > URL: https://issues.apache.org/jira/browse/HDFS-13805 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node >Affects Versions: 3.0.0-alpha4 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > > HDFS-2 completely restricted to re-format journalnode, but it should be > allowed when *"-force"* option is given. If user fill force option can > accidentally delete the data then he can disable it by configuring > "*dfs.reformat.disabled*" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13807) Large overhead when seek and read only a small piece from a file
Jack Fan created HDFS-13807: --- Summary: Large overhead when seek and read only a small piece from a file Key: HDFS-13807 URL: https://issues.apache.org/jira/browse/HDFS-13807 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Affects Versions: 2.7.6, 2.8.4 Environment: HDFS server is 2.8.2 HDFS client is 2.7.1 I use `pyarrow` with both `libhdfs` and `libhdfs3`, I observe the same behavior on both drivers. Reporter: Jack Fan I'm storing small files (~500KB in size) in big file chunks (256MB~2GB) in HDFS. I then maintain a separate index file about the offset and length of the small files in those file chunks. When I randomly read those small files, for each small file I open the corresponding file chunk, seek to the `offset`, and read `length` data. However, I noticed when I read a small piece of data (say, 500KB), the datanode will transfer more data (~4MB) than that to the HDFS client. I original thought this is the readahead feature on datanode, that sends more data to the client in advance to speed up streaming of file. However, I tried to set `dfs.client.cache.read ahead` to 0 on client configuration but the behavior still persist. I also use `tcpdump` to capture packets and discovered the datanode will keep sending data after the HDFS client closes the TCP connection for rpc (I observed a bunch of RST packets sent out by HDFS client). It seems the datanode spontaneously sends more data then requested to the HDFS client, I want to know how to stop such a behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
[ https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574175#comment-16574175 ] Fei Hui commented on HDFS-13802: [~elgoiri] Thanks. I will try to implement the router fsck > RBF: Remove FSCK from Router Web UI, because fsck is not supported currently > > > Key: HDFS-13802 > URL: https://issues.apache.org/jira/browse/HDFS-13802 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1, 3.0.3 >Reporter: Fei Hui >Priority: Major > Attachments: HDFS-13802.001.patch, HDFS-13802.002.patch > > > When i click FSCK on Router Web UI Utilities, i got errors > {quote} > HTTP ERROR 404 > Problem accessing /fsck. Reason: > NOT_FOUND > Powered by Jetty:// > {quote} > I deep into the source code and find that fsck is not supported currently, So > i think we should remove FSCK from Router Web UI -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer
[ https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574163#comment-16574163 ] Virajith Jalaparti commented on HDFS-13795: --- Failed tests in the last run seem unrelated. [~ehiggs] can you look at [^HDFS-13795.004.patch] ? Thanks! > Fix potential NPE in InMemoryLevelDBAliasMapServer > -- > > Key: HDFS-13795 > URL: https://issues.apache.org/jira/browse/HDFS-13795 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, > HDFS-13795.003.patch, HDFS-13795.004.patch > > > Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when > it is configured incorrectly. > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.
[ https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574081#comment-16574081 ] Erik Krogen commented on HDFS-13779: Attaching a WIP patch which covers this and HDFS-13780, and relies on some more refactoring in ConfiguredFailoverProxyProvider. The idea is basically to continue allowing the CFPP layer to manage only the Active/Standby NameNodes, and ORPP to manage the observers (then fall back to CFPP's non-observer proxies). _Failover_ refers only to switching Active NameNodes, _not_ switching between observers. * Refactor CFPP a bit to create a separate {{getProxies()}} method where the initialization can be done (solving HDFS-13780), and also where it can be overridden by ORPP. * On proxy initialization, also fetch all of the NameNode states. When CFPP requests proxies, filter to only non-observers. * Method invocation, on read methods, tries all of the NNs thought to currently be in observer state. If any throws a StandbyException, mark it as non-observer. Unfortunately we have no way to tell here if one of the thought-to-be-observers has actually become Active, but in this case failover will happen soon (at the next write request) and the situation will be fixed (see below). * If all observer NNs fail, or it is a write method, pass the request up to CFPP, which will try the current Active. This may trigger failover. If so, before picking a new node from the list of non-observers, refresh the states of all of the NameNodes. This handles the case where one of the previous observers is now active. [~shv], [~chliang], let me know your thoughts on the above. > Implement performFailover logic for ObserverReadProxyProvider. > -- > > Key: HDFS-13779 > URL: https://issues.apache.org/jira/browse/HDFS-13779 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Konstantin Shvachko >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13779-HDFS-12943.WIP00.patch > > > Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method > from {{ConfiguredFailoverProxyProvider}}, which simply increments the index > and switches over to another NameNode. The logic for ORPP should be smart > enough to choose another observer, otherwise it can switch to a SBN, where > reads are disallowed, or to an ANN, which defeats the purpose of reads from > standby. > This was discussed in HDFS-12976. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.
[ https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-13779: --- Attachment: HDFS-13779-HDFS-12943.WIP00.patch > Implement performFailover logic for ObserverReadProxyProvider. > -- > > Key: HDFS-13779 > URL: https://issues.apache.org/jira/browse/HDFS-13779 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Konstantin Shvachko >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13779-HDFS-12943.WIP00.patch > > > Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method > from {{ConfiguredFailoverProxyProvider}}, which simply increments the index > and switches over to another NameNode. The logic for ORPP should be smart > enough to choose another observer, otherwise it can switch to a SBN, where > reads are disallowed, or to an ANN, which defeats the purpose of reads from > standby. > This was discussed in HDFS-12976. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close
[ https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574055#comment-16574055 ] Hudson commented on HDDS-267: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14733 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14733/]) HDDS-267. Handle consistency issues during container update/close. (hanishakoneru: rev d81cd3611a449bcd7970ff2f1392a5e868e28f7e) * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/impl/TestContainerPersistence.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java * (edit) hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java * (edit) hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainer.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerData.java > Handle consistency issues during container update/close > --- > > Key: HDDS-267 > URL: https://issues.apache.org/jira/browse/HDDS-267 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-267.001.patch, HDDS-267.002.patch, > HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch > > > During container update and close, the .container file on disk is modified. > We should make sure that the in-memory state and the on-disk state for a > container are consistent. > A write lock is obtained before updating the container data during close or > update operations. > During update operation, if the on-disk update of .container file fails, then > the container metadata is in-memory is also reset to the old value. > During close operation, if the on-disk update of .container file fails, then > the in-memory containerState is set to CLOSING so that no new operations are > permitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close
[ https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574046#comment-16574046 ] Hanisha Koneru commented on HDDS-267: - Test failures are unrelated. Committed to trunk. Thanks [~bharatviswa] and [~arpitagarwal] for the reviews. > Handle consistency issues during container update/close > --- > > Key: HDDS-267 > URL: https://issues.apache.org/jira/browse/HDDS-267 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-267.001.patch, HDDS-267.002.patch, > HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch > > > During container update and close, the .container file on disk is modified. > We should make sure that the in-memory state and the on-disk state for a > container are consistent. > A write lock is obtained before updating the container data during close or > update operations. > During update operation, if the on-disk update of .container file fails, then > the container metadata is in-memory is also reset to the old value. > During close operation, if the on-disk update of .container file fails, then > the in-memory containerState is set to CLOSING so that no new operations are > permitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-267) Handle consistency issues during container update/close
[ https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-267: Resolution: Fixed Status: Resolved (was: Patch Available) > Handle consistency issues during container update/close > --- > > Key: HDDS-267 > URL: https://issues.apache.org/jira/browse/HDDS-267 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-267.001.patch, HDDS-267.002.patch, > HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch > > > During container update and close, the .container file on disk is modified. > We should make sure that the in-memory state and the on-disk state for a > container are consistent. > A write lock is obtained before updating the container data during close or > update operations. > During update operation, if the on-disk update of .container file fails, then > the container metadata is in-memory is also reset to the old value. > During close operation, if the on-disk update of .container file fails, then > the in-memory containerState is set to CLOSING so that no new operations are > permitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close
[ https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574012#comment-16574012 ] genericqa commented on HDDS-267: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 6s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 36s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s{color} | {color:green} container-service in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 0s{color} | {color:red} integration-test in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}129m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ozone.ozShell.TestOzoneShell | | | hadoop.ozone.scm.TestXceiverClientManager | | | hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion | | | hadoop.ozone.web.client.TestKeys | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HDDS-267 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12934874/HDDS-267.005.patch | | Optional Tests | asflicense
[jira] [Commented] (HDFS-13749) Implement a new client protocol method to get NameNode state
[ https://issues.apache.org/jira/browse/HDFS-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574011#comment-16574011 ] Chao Sun commented on HDFS-13749: - OK, I'll file a separate JIRA for this then. > Implement a new client protocol method to get NameNode state > > > Key: HDFS-13749 > URL: https://issues.apache.org/jira/browse/HDFS-13749 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13749-HDFS-12943.000.patch > > > Currently {{HAServiceProtocol#getServiceStatus}} requires super user > privilege. Therefore, as a temporary solution, in HDFS-12976 we discover > NameNode state by calling {{reportBadBlocks}}. Here, we'll properly implement > this by adding a new method in client protocol to get the NameNode state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13749) Implement a new client protocol method to get NameNode state
[ https://issues.apache.org/jira/browse/HDFS-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574008#comment-16574008 ] Konstantin Shvachko commented on HDFS-13749: We could have reused this jira, but it probably makes sense to create a new one to obtain wider visibility to the change. > Implement a new client protocol method to get NameNode state > > > Key: HDFS-13749 > URL: https://issues.apache.org/jira/browse/HDFS-13749 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13749-HDFS-12943.000.patch > > > Currently {{HAServiceProtocol#getServiceStatus}} requires super user > privilege. Therefore, as a temporary solution, in HDFS-12976 we discover > NameNode state by calling {{reportBadBlocks}}. Here, we'll properly implement > this by adding a new method in client protocol to get the NameNode state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12416) BlockPlacementPolicyDefault will cause NN shutdown if log level is changed
[ https://issues.apache.org/jira/browse/HDFS-12416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-12416: --- Resolution: Duplicate Status: Resolved (was: Patch Available) Thanks [~brahmareddy] resolve this as a dup. Thanks [~smao]. > BlockPlacementPolicyDefault will cause NN shutdown if log level is changed > -- > > Key: HDFS-12416 > URL: https://issues.apache.org/jira/browse/HDFS-12416 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement >Affects Versions: 2.7.4, 3.0.0-alpha3 >Reporter: Suhan Mao >Priority: Major > Attachments: HDFS-12416.001.patch, HDFS-12416.patch > > Original Estimate: 5h > Remaining Estimate: 5h > > In BlockPlacementPolicyDefault.chooseRandom method. > The code are in below structure: > {code:java} > StringBuilder builder = null; > if (LOG.isDebugEnabled()) { > builder = debugLoggingBuilder.get(); > builder.setLength(0); > builder.append("["); > } > while(numOfReplicas > 0){ > . > chooseDataNode(scope, excludedNodes) > . > if (LOG.isDebugEnabled()) { > builder.append("\nNode ").append(NodeBase.getPath(chosenNode)) > .append(" ["); > } > } > {code} > There's a possibility that the loglevel is INFO before entering while loop, > but the loglevel is changed to DEBUG inside the loop through web UI. > In that case, builder is not initialized in the beginning and > NullPointerException will throw and this will cause NN exiting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release
[ https://issues.apache.org/jira/browse/HDDS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573943#comment-16573943 ] Arpit Agarwal commented on HDDS-341: Makes sense. > HDDS/Ozone bits are leaking into Hadoop release > --- > > Key: HDDS-341 > URL: https://issues.apache.org/jira/browse/HDDS-341 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Anu Engineer >Priority: Blocker > Fix For: 0.2.1 > > > [~aw] in the Ozone release discussion reported that Ozone is leaking bits > into Hadoop. This has to be fixed before Hadoop 3.2 or Ozone 0.2.1 release. > I will make this a release blocker for Ozone. > > {noformat} > >Has anyone verified that a Hadoop release doesn't have _any_ of the extra > >ozone bits that are sprinkled outside the maven modules? > [aengineer] : As far as I know that is the state, we have had multiple Hadoop > releases after ozone has been merged. So far no one has reported Ozone bits > leaking into Hadoop. If we find something like that, it would be a bug. > [aw]: There hasn't been a release from a branch where Ozone has been merged > yet. The first one will be 3.2.0. Running create-release off of trunk > presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in > the Hadoop source tar ball. > So, consider this as a report. IMHO, cutting an Ozone release prior to > a Hadoop release ill-advised given the distribution impact and the > requirements of the merge vote. > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release
[ https://issues.apache.org/jira/browse/HDDS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573942#comment-16573942 ] Anu Engineer commented on HDDS-341: --- [~arpitagarwal] You are right, but I filed it here since it is a work item in the HDDS/Ozone. I am hoping that if we do this work, then it will not be a blocker for Hadoop. > HDDS/Ozone bits are leaking into Hadoop release > --- > > Key: HDDS-341 > URL: https://issues.apache.org/jira/browse/HDDS-341 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Anu Engineer >Priority: Blocker > Fix For: 0.2.1 > > > [~aw] in the Ozone release discussion reported that Ozone is leaking bits > into Hadoop. This has to be fixed before Hadoop 3.2 or Ozone 0.2.1 release. > I will make this a release blocker for Ozone. > > {noformat} > >Has anyone verified that a Hadoop release doesn't have _any_ of the extra > >ozone bits that are sprinkled outside the maven modules? > [aengineer] : As far as I know that is the state, we have had multiple Hadoop > releases after ozone has been merged. So far no one has reported Ozone bits > leaking into Hadoop. If we find something like that, it would be a bug. > [aw]: There hasn't been a release from a branch where Ozone has been merged > yet. The first one will be 3.2.0. Running create-release off of trunk > presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in > the Hadoop source tar ball. > So, consider this as a report. IMHO, cutting an Ozone release prior to > a Hadoop release ill-advised given the distribution impact and the > requirements of the merge vote. > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release
[ https://issues.apache.org/jira/browse/HDDS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573940#comment-16573940 ] Arpit Agarwal commented on HDDS-341: This should probably be moved to the Hadoop project and tagged as a blocker for Apache Hadoop 3.2.0. > HDDS/Ozone bits are leaking into Hadoop release > --- > > Key: HDDS-341 > URL: https://issues.apache.org/jira/browse/HDDS-341 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Anu Engineer >Priority: Blocker > Fix For: 0.2.1 > > > [~aw] in the Ozone release discussion reported that Ozone is leaking bits > into Hadoop. This has to be fixed before Hadoop 3.2 or Ozone 0.2.1 release. > I will make this a release blocker for Ozone. > > {noformat} > >Has anyone verified that a Hadoop release doesn't have _any_ of the extra > >ozone bits that are sprinkled outside the maven modules? > [aengineer] : As far as I know that is the state, we have had multiple Hadoop > releases after ozone has been merged. So far no one has reported Ozone bits > leaking into Hadoop. If we find something like that, it would be a bug. > [aw]: There hasn't been a release from a branch where Ozone has been merged > yet. The first one will be 3.2.0. Running create-release off of trunk > presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in > the Hadoop source tar ball. > So, consider this as a report. IMHO, cutting an Ozone release prior to > a Hadoop release ill-advised given the distribution impact and the > requirements of the merge vote. > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release
[ https://issues.apache.org/jira/browse/HDDS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-341: --- Fix Version/s: 0.2.1 > HDDS/Ozone bits are leaking into Hadoop release > --- > > Key: HDDS-341 > URL: https://issues.apache.org/jira/browse/HDDS-341 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Anu Engineer >Priority: Blocker > Fix For: 0.2.1 > > > [~aw] in the Ozone release discussion reported that Ozone is leaking bits > into Hadoop. This has to be fixed before Hadoop 3.2 or Ozone 0.2.1 release. > I will make this a release blocker for Ozone. > > {noformat} > >Has anyone verified that a Hadoop release doesn't have _any_ of the extra > >ozone bits that are sprinkled outside the maven modules? > [aengineer] : As far as I know that is the state, we have had multiple Hadoop > releases after ozone has been merged. So far no one has reported Ozone bits > leaking into Hadoop. If we find something like that, it would be a bug. > [aw]: There hasn't been a release from a branch where Ozone has been merged > yet. The first one will be 3.2.0. Running create-release off of trunk > presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in > the Hadoop source tar ball. > So, consider this as a report. IMHO, cutting an Ozone release prior to > a Hadoop release ill-advised given the distribution impact and the > requirements of the merge vote. > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close
[ https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573922#comment-16573922 ] Bharat Viswanadham commented on HDDS-267: - +1, Pending Jenkins. > Handle consistency issues during container update/close > --- > > Key: HDDS-267 > URL: https://issues.apache.org/jira/browse/HDDS-267 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-267.001.patch, HDDS-267.002.patch, > HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch > > > During container update and close, the .container file on disk is modified. > We should make sure that the in-memory state and the on-disk state for a > container are consistent. > A write lock is obtained before updating the container data during close or > update operations. > During update operation, if the on-disk update of .container file fails, then > the container metadata is in-memory is also reset to the old value. > During close operation, if the on-disk update of .container file fails, then > the in-memory containerState is set to CLOSING so that no new operations are > permitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release
[ https://issues.apache.org/jira/browse/HDDS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDDS-341: -- Priority: Blocker (was: Major) > HDDS/Ozone bits are leaking into Hadoop release > --- > > Key: HDDS-341 > URL: https://issues.apache.org/jira/browse/HDDS-341 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Anu Engineer >Priority: Blocker > > [~aw] in the Ozone release discussion reported that Ozone is leaking bits > into Hadoop. This has to be fixed before Hadoop 3.2 or Ozone 0.2.1 release. > I will make this a release blocker for Ozone. > > {noformat} > >Has anyone verified that a Hadoop release doesn't have _any_ of the extra > >ozone bits that are sprinkled outside the maven modules? > [aengineer] : As far as I know that is the state, we have had multiple Hadoop > releases after ozone has been merged. So far no one has reported Ozone bits > leaking into Hadoop. If we find something like that, it would be a bug. > [aw]: There hasn't been a release from a branch where Ozone has been merged > yet. The first one will be 3.2.0. Running create-release off of trunk > presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in > the Hadoop source tar ball. > So, consider this as a report. IMHO, cutting an Ozone release prior to > a Hadoop release ill-advised given the distribution impact and the > requirements of the merge vote. > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-284) CRC for ChunksData
[ https://issues.apache.org/jira/browse/HDDS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573908#comment-16573908 ] genericqa commented on HDDS-284: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 35m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 35m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 35m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 35m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s{color} | {color:green} common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} container-service in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s{color} | {color:green} client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 35s{color} | {color:green} ozone-manager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s{color} | {color:green}
[jira] [Commented] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction
[ https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573907#comment-16573907 ] Wei-Chiu Chuang commented on HDFS-13758: LGTM. Would you also contribute a branch-2 fix? > DatanodeManager should throw exception if it has BlockRecoveryCommand but the > block is not under construction > - > > Key: HDFS-13758 > URL: https://issues.apache.org/jira/browse/HDFS-13758 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: chencan >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch > > > In Hadoop 3, HDFS-8909 added an assertion assumption that if a > BlockRecoveryCommand exists for a block, the block is under construction. > > {code:title=DatanodeManager#getBlockRecoveryCommand()} > BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length); > for (BlockInfo b : blocks) { > BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); > assert uc != null; > ... > {code} > This assertion accidentally fixed one of the possible scenario of HDFS-10240 > data corruption, if a recoverLease() is made immediately followed by a > close(), before DataNodes have the chance to heartbeat. > In a unit test you'll get: > {noformat} > 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN ipc.Server > (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call > Call#41 Retry#0 > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from > 127.0.0.1:57903 > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I propose to change this assertion even though it address the data > corruption, because: > # We should throw an more meaningful exception than an NPE > # on a production cluster, the assert is ignored, and you'll get a more > noticeable NPE. Future HDFS developers might fix this NPE, causing > regression. An NPE is typically not captured and handled, so there's a chance > to result in internal state inconsistency. > # It doesn't address all possible scenarios of HDFS-10240. A proper fix > should reject close() if the block is being recovered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release
Anu Engineer created HDDS-341: - Summary: HDDS/Ozone bits are leaking into Hadoop release Key: HDDS-341 URL: https://issues.apache.org/jira/browse/HDDS-341 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Manager Reporter: Anu Engineer [~aw] in the Ozone release discussion reported that Ozone is leaking bits into Hadoop. This has to be fixed before Hadoop 3.2 or Ozone 0.2.1 release. I will make this a release blocker for Ozone. {noformat} >Has anyone verified that a Hadoop release doesn't have _any_ of the extra >ozone bits that are sprinkled outside the maven modules? [aengineer] : As far as I know that is the state, we have had multiple Hadoop releases after ozone has been merged. So far no one has reported Ozone bits leaking into Hadoop. If we find something like that, it would be a bug. [aw]: There hasn't been a release from a branch where Ozone has been merged yet. The first one will be 3.2.0. Running create-release off of trunk presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in the Hadoop source tar ball. So, consider this as a report. IMHO, cutting an Ozone release prior to a Hadoop release ill-advised given the distribution impact and the requirements of the merge vote. {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block
[ https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573905#comment-16573905 ] Wei-Chiu Chuang commented on HDFS-10240: Additionally I found through HDFS-13757 that the test should also disable IBR report so it doesn't become flaky. This test file https://issues.apache.org/jira/secure/attachment/12932515/HDFS-13757.test.02.patch has an example to disable IBR. > Race between close/recoverLease leads to missing block > -- > > Key: HDFS-10240 > URL: https://issues.apache.org/jira/browse/HDFS-10240 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: Jinglun >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-10240-001.patch, > HDFS-10240-002.patch, HDFS-10240-003.patch, HDFS-10240.test.patch > > > We got a missing block in our cluster, and logs related to the missing block > are as follows: > 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 > blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > recovery started, > primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW] > 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File XX has not been closed. Lease > recovery is in progress. RecoveryId = 153006357 for block > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,248 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, > primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > has not reached minimal replication 1 > 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.53:11402 is added to > blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > size 139 > 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:08,808 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345, > newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, > 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false) > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK >
[jira] [Commented] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer
[ https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573903#comment-16573903 ] genericqa commented on HDFS-13795: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}139m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.web.TestWebHdfsFileSystemContract | | | hadoop.hdfs.TestLeaseRecovery2 | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HDFS-13795 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12934864/HDFS-13795.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4effdb0c0fbd 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9499df7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/24730/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/24730/testReport/ | | Max. process+thread count | 3461 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output |
[jira] [Comment Edited] (HDDS-267) Handle consistency issues during container update/close
[ https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573841#comment-16573841 ] Hanisha Koneru edited comment on HDDS-267 at 8/8/18 9:04 PM: - Thanks [~arpitagarwal] and [~bharatviswa] for reviews. I have updated patch v05 to handle the create and update container file cases separately. Thanks Bharat for catching it. The test failures are unrelated to this patch and pass locally. was (Author: hanishakoneru): Thanks [~arpitagarwal] and [~bharatviswa] for reviews. I have updated patch v05 to handle the create and update container file cases separately. Thanks Bharat for catching it. > Handle consistency issues during container update/close > --- > > Key: HDDS-267 > URL: https://issues.apache.org/jira/browse/HDDS-267 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-267.001.patch, HDDS-267.002.patch, > HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch > > > During container update and close, the .container file on disk is modified. > We should make sure that the in-memory state and the on-disk state for a > container are consistent. > A write lock is obtained before updating the container data during close or > update operations. > During update operation, if the on-disk update of .container file fails, then > the container metadata is in-memory is also reset to the old value. > During close operation, if the on-disk update of .container file fails, then > the in-memory containerState is set to CLOSING so that no new operations are > permitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11398) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure still fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573860#comment-16573860 ] Wei-Chiu Chuang commented on HDFS-11398: Found the same test failure in HDFS-10240 precommit job. > TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure still fails > intermittently > > > Key: HDFS-11398 > URL: https://issues.apache.org/jira/browse/HDFS-11398 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-11398-reproduce.patch, HDFS-11398.001.patch, > HDFS-11398.002.patch, failure.log > > > The test {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} > still fails intermittently in trunk after HDFS-11316. The stack infos: > {code} > testUnderReplicationAfterVolFailure(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure) > Time elapsed: 95.021 sec <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for condition. > Thread diagnostics: > Timestamp: 2017-02-07 07:00:34,193 > > java.lang.Thread.State: RUNNABLE > at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native > Method) > at > org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52) > at > org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:511) > at java.lang.Thread.run(Thread.java:745) > at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:276) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure(TestDataNodeVolumeFailure.java:412) > {code} > I looked into this and found there is one chance that the vaule > {{UnderReplicatedBlocksCount}} will be no longer > 0. The following is my > analysation: > In test {{TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure}}, it > uses creating file to trigger the disk error checking. The related codes: > {code} > Path file1 = new Path("/test1"); > DFSTestUtil.createFile(fs, file1, 1024, (short)3, 1L); > DFSTestUtil.waitReplication(fs, file1, (short)3); > // Fail the first volume on both datanodes > File dn1Vol1 = new File(dataDir, "data"+(2*0+1)); > File dn2Vol1 = new File(dataDir, "data"+(2*1+1)); > DataNodeTestUtils.injectDataDirFailure(dn1Vol1, dn2Vol1); > Path file2 = new Path("/test2"); > DFSTestUtil.createFile(fs, file2, 1024, (short)3, 1L); > DFSTestUtil.waitReplication(fs, file2, (short)3); > {code} > This will lead one problem: If the cluster is busy, and it costs long time to > wait replication of file2 to be desired value. During this time, the under > replication blocks of file1 can also be rereplication in cluster. If this is > done, the condition {{underReplicatedBlocks > 0}} will never be satisfied. > And this can be reproduced in my local env. > Actually, we can use a easy way {{DataNodeTestUtils.waitForDiskError}} to > replace this, it runs fast and be more reliable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block
[ https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573855#comment-16573855 ] Wei-Chiu Chuang commented on HDFS-10240: HDFS-11398 tracks the test failure. > Race between close/recoverLease leads to missing block > -- > > Key: HDFS-10240 > URL: https://issues.apache.org/jira/browse/HDFS-10240 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: Jinglun >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-10240-001.patch, > HDFS-10240-002.patch, HDFS-10240-003.patch, HDFS-10240.test.patch > > > We got a missing block in our cluster, and logs related to the missing block > are as follows: > 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 > blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > recovery started, > primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW] > 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File XX has not been closed. Lease > recovery is in progress. RecoveryId = 153006357 for block > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,248 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, > primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > has not reached minimal replication 1 > 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.53:11402 is added to > blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > size 139 > 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:08,808 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345, > newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, > 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false) > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.44:11402 by /10.114.5.44 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > From the
[jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block
[ https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573854#comment-16573854 ] Wei-Chiu Chuang commented on HDFS-10240: [~LiJinglun] thanks for the patch. Test failure is unrelated. As for the TestDataNodeVolumeFailure failure, let's file another jira to deal with that. Please refrain from incorporating unrelated changes in the patch :) > Race between close/recoverLease leads to missing block > -- > > Key: HDFS-10240 > URL: https://issues.apache.org/jira/browse/HDFS-10240 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: Jinglun >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-10240-001.patch, > HDFS-10240-002.patch, HDFS-10240-003.patch, HDFS-10240.test.patch > > > We got a missing block in our cluster, and logs related to the missing block > are as follows: > 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 > blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > recovery started, > primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW] > 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File XX has not been closed. Lease > recovery is in progress. RecoveryId = 153006357 for block > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,248 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, > primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > has not reached minimal replication 1 > 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.53:11402 is added to > blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > size 139 > 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:08,808 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345, > newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, > 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false) > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap:
[jira] [Comment Edited] (HDFS-13769) Namenode gets stuck when deleting large dir in trash
[ https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573844#comment-16573844 ] Wei-Chiu Chuang edited comment on HDFS-13769 at 8/8/18 8:46 PM: Thanks for the new revision. * IMO, this jira should convert to a HADOOP jira. Trash is not a HDFS-specific feature and this could also be used in other file systems as well (S3A for example) * Instead of calling fs.listStatus(), would you please use fs.listStatusIterator()? ** The former gets *everything* under a path, so you would see a bump in JVM heap usage for a large dir. * I am still not satisfied with FileSystem#contentSummary(). The closest I could find is FileSystem#getQuotaUsage() which would return number of objects in a directory. but quota is not enabled by default. *Nits: {code} import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.*; {code} TrashPolicyWithSafeDelete should not do wildcard import * Nits2: {code} LOG.debug("DIR "+ path + " in trash is too large, try safe delete."); {code} This is not necessarily true, if skipCheckLimit is true. was (Author: jojochuang): Thanks for the new revision. IMO, this jira should convert to a HADOOP jira. Trash is not a HDFS-specific feature and this could also be used in other file systems as well (S3A for example) Instead of calling fs.listStatus(), would you please use fs.listStatusIterator()? The former gets *everything* under a path, so you would see a bump in JVM heap usage for a large dir. I am still not satisfied with FileSystem#contentSummary(). The closest I could find is FileSystem#getQuotaUsage() which would return number of objects in a directory. but quota is not enabled by default. Nits: {code} import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.*; {code} TrashPolicyWithSafeDelete should not do wildcard import {code} LOG.debug("DIR "+ path + " in trash is too large, try safe delete."); {code} This is not necessarily true, if skipCheckLimit is true. > Namenode gets stuck when deleting large dir in trash > > > Key: HDFS-13769 > URL: https://issues.apache.org/jira/browse/HDFS-13769 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.8.2, 3.1.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Major > Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch, > HDFS-13769.003.patch, HDFS-13769.004.patch > > > Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a > long time when deleting trash dir with a large mount of data. We found log in > namenode: > {quote} > 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for > 23018 ms via > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820) > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047) > {quote} > One simple solution is to avoid deleting large data in one delete RPC call. > We implement a trashPolicy that divide the delete operation into several > delete RPCs, and each single deletion would not delete too many files. > Any thought? [~linyiqun] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13769) Namenode gets stuck when deleting large dir in trash
[ https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573844#comment-16573844 ] Wei-Chiu Chuang edited comment on HDFS-13769 at 8/8/18 8:45 PM: Thanks for the new revision. IMO, this jira should convert to a HADOOP jira. Trash is not a HDFS-specific feature and this could also be used in other file systems as well (S3A for example) Instead of calling fs.listStatus(), would you please use fs.listStatusIterator()? The former gets *everything* under a path, so you would see a bump in JVM heap usage for a large dir. I am still not satisfied with FileSystem#contentSummary(). The closest I could find is FileSystem#getQuotaUsage() which would return number of objects in a directory. but quota is not enabled by default. Nits: {code} import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.*; {code} TrashPolicyWithSafeDelete should not do wildcard import {code} LOG.debug("DIR "+ path + " in trash is too large, try safe delete."); {code} This is not necessarily true, if skipCheckLimit is true. was (Author: jojochuang): Thanks for the new revision. IMO, this jira should convert to a HADOOP jira. Trash is not a HDFS-specific feature and this is Instead of calling fs.listStatus(), would you please use fs.listStatusIterator()? The former gets *everything* under a path, so you would see a bump in JVM heap usage for a large dir. I am still not satisfied with FileSystem#contentSummary(). The closest I could find is FileSystem#getQuotaUsage() which would return number of objects in a directory. but quota is not enabled by default. Nits: {code} import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.*; {code} TrashPolicyWithSafeDelete should not do wildcard import {code} LOG.debug("DIR "+ path + " in trash is too large, try safe delete."); {code} This is not necessarily true, if skipCheckLimit is true. > Namenode gets stuck when deleting large dir in trash > > > Key: HDFS-13769 > URL: https://issues.apache.org/jira/browse/HDFS-13769 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.8.2, 3.1.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Major > Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch, > HDFS-13769.003.patch, HDFS-13769.004.patch > > > Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a > long time when deleting trash dir with a large mount of data. We found log in > namenode: > {quote} > 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for > 23018 ms via > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820) > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047) > {quote} > One simple solution is to avoid deleting large data in one delete RPC call. > We implement a trashPolicy that divide the delete operation into several > delete RPCs, and each single deletion would not delete too many files. > Any thought? [~linyiqun] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash
[ https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573844#comment-16573844 ] Wei-Chiu Chuang commented on HDFS-13769: Thanks for the new revision. IMO, this jira should convert to a HADOOP jira. Trash is not a HDFS-specific feature and this is Instead of calling fs.listStatus(), would you please use fs.listStatusIterator()? The former gets *everything* under a path, so you would see a bump in JVM heap usage for a large dir. I am still not satisfied with FileSystem#contentSummary(). The closest I could find is FileSystem#getQuotaUsage() which would return number of objects in a directory. but quota is not enabled by default. Nits: {code} import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.*; {code} TrashPolicyWithSafeDelete should not do wildcard import {code} LOG.debug("DIR "+ path + " in trash is too large, try safe delete."); {code} This is not necessarily true, if skipCheckLimit is true. > Namenode gets stuck when deleting large dir in trash > > > Key: HDFS-13769 > URL: https://issues.apache.org/jira/browse/HDFS-13769 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.8.2, 3.1.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Major > Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch, > HDFS-13769.003.patch, HDFS-13769.004.patch > > > Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a > long time when deleting trash dir with a large mount of data. We found log in > namenode: > {quote} > 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for > 23018 ms via > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820) > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047) > {quote} > One simple solution is to avoid deleting large data in one delete RPC call. > We implement a trashPolicy that divide the delete operation into several > delete RPCs, and each single deletion would not delete too many files. > Any thought? [~linyiqun] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close
[ https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573841#comment-16573841 ] Hanisha Koneru commented on HDDS-267: - Thanks [~arpitagarwal] and [~bharatviswa] for reviews. I have updated patch v05 to handle the create and update container file cases separately. Thanks Bharat for catching it. > Handle consistency issues during container update/close > --- > > Key: HDDS-267 > URL: https://issues.apache.org/jira/browse/HDDS-267 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-267.001.patch, HDDS-267.002.patch, > HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch > > > During container update and close, the .container file on disk is modified. > We should make sure that the in-memory state and the on-disk state for a > container are consistent. > A write lock is obtained before updating the container data during close or > update operations. > During update operation, if the on-disk update of .container file fails, then > the container metadata is in-memory is also reset to the old value. > During close operation, if the on-disk update of .container file fails, then > the in-memory containerState is set to CLOSING so that no new operations are > permitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-267) Handle consistency issues during container update/close
[ https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-267: Attachment: HDDS-267.005.patch > Handle consistency issues during container update/close > --- > > Key: HDDS-267 > URL: https://issues.apache.org/jira/browse/HDDS-267 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-267.001.patch, HDDS-267.002.patch, > HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch > > > During container update and close, the .container file on disk is modified. > We should make sure that the in-memory state and the on-disk state for a > container are consistent. > A write lock is obtained before updating the container data during close or > update operations. > During update operation, if the on-disk update of .container file fails, then > the container metadata is in-memory is also reset to the old value. > During close operation, if the on-disk update of .container file fails, then > the in-memory containerState is set to CLOSING so that no new operations are > permitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-308) SCM should identify a container with pending deletes using container reports
[ https://issues.apache.org/jira/browse/HDDS-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573823#comment-16573823 ] genericqa commented on HDDS-308: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 32m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 42s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 30m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s{color} | {color:green} container-service in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 36s{color} | {color:green} server-scm in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 18s{color} | {color:red} integration-test in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline | | | hadoop.ozone.web.client.TestBuckets | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HDDS-308 | | JIRA Patch URL |
[jira] [Commented] (HDFS-13749) Implement a new client protocol method to get NameNode state
[ https://issues.apache.org/jira/browse/HDFS-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573814#comment-16573814 ] Chao Sun commented on HDFS-13749: - Thanks both [~zero45] and [~shv]. I agree with both of you and don't see why this restriction can't be dropped. Shall we file a JIRA for trunk to remove this restriction? > Implement a new client protocol method to get NameNode state > > > Key: HDFS-13749 > URL: https://issues.apache.org/jira/browse/HDFS-13749 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13749-HDFS-12943.000.patch > > > Currently {{HAServiceProtocol#getServiceStatus}} requires super user > privilege. Therefore, as a temporary solution, in HDFS-12976 we discover > NameNode state by calling {{reportBadBlocks}}. Here, we'll properly implement > this by adding a new method in client protocol to get the NameNode state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13789) Reduce logging frequency of QuorumJournalManager#selectInputStreams
[ https://issues.apache.org/jira/browse/HDFS-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13789: Resolution: Fixed Fix Version/s: HDFS-12943 Status: Resolved (was: Patch Available) Committed to the branch. Thanks [~xkrogen]. > Reduce logging frequency of QuorumJournalManager#selectInputStreams > --- > > Key: HDFS-13789 > URL: https://issues.apache.org/jira/browse/HDFS-13789 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode, qjm >Affects Versions: HDFS-12943 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Trivial > Fix For: HDFS-12943 > > Attachments: HDFS-13789-HDFS-12943.000.patch > > > As part of HDFS-13150, a logging statement was added to indicate whenever an > edit tail is performed via the RPC mechanism. To enable low latency tailing, > the tail frequency must be set very low, so this log statement gets printed > much too frequently at an INFO level. We should decrease to DEBUG. Note that > if there are actually edits available to tail, other log messages will get > printed; this is just targeting the case when it attempts to tail and there > are no new edits. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
[ https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573772#comment-16573772 ] genericqa commented on HDFS-13802: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-rbf generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}101m 19s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 35s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}213m 22s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.federation.router.RouterFsck.remoteFsck(MembershipState):in org.apache.hadoop.hdfs.server.federation.router.RouterFsck.remoteFsck(MembershipState): new java.io.InputStreamReader(InputStream) At RouterFsck.java:[line 130] | | | org.apache.hadoop.hdfs.server.federation.router.RouterFsck.remoteFsck(MembershipState) may fail to close stream At RouterFsck.java:stream At RouterFsck.java:[line 131] | | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce
[jira] [Commented] (HDDS-263) Add retries in Ozone Client to handle BLOCK_NOT_COMMITTED Exception
[ https://issues.apache.org/jira/browse/HDDS-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573755#comment-16573755 ] Shashikant Banerjee commented on HDDS-263: -- Patch v0 is blocked on HDDS-247. Not submitting it for now. > Add retries in Ozone Client to handle BLOCK_NOT_COMMITTED Exception > --- > > Key: HDDS-263 > URL: https://issues.apache.org/jira/browse/HDDS-263 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Blocker > Fix For: 0.2.1 > > Attachments: HDDS-263.00.patch > > > While Ozone client writes are going on, a container on a datanode can gets > closed because of node failures, disk out of space etc. In situations as > such, client write will fail with CLOSED_CONTAINER_IO. In this case, ozone > client should try to get the committed block length for the pending open > blocks and update the OzoneManager. While trying to get the committed block > length, it may fail with BLOCK_NOT_COMMITTED exception as the as a part of > transiton from CLOSING to CLOSED state for the container , it commits all > open blocks one by one. In such cases, client needs to retry to get the > committed block length for a fixed no of attempts and eventually throw the > exception to the application if its not able to successfully get and update > the length in the OzoneManager. This Jira aims to address this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-263) Add retries in Ozone Client to handle BLOCK_NOT_COMMITTED Exception
[ https://issues.apache.org/jira/browse/HDDS-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-263: - Attachment: HDDS-263.00.patch > Add retries in Ozone Client to handle BLOCK_NOT_COMMITTED Exception > --- > > Key: HDDS-263 > URL: https://issues.apache.org/jira/browse/HDDS-263 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Blocker > Fix For: 0.2.1 > > Attachments: HDDS-263.00.patch > > > While Ozone client writes are going on, a container on a datanode can gets > closed because of node failures, disk out of space etc. In situations as > such, client write will fail with CLOSED_CONTAINER_IO. In this case, ozone > client should try to get the committed block length for the pending open > blocks and update the OzoneManager. While trying to get the committed block > length, it may fail with BLOCK_NOT_COMMITTED exception as the as a part of > transiton from CLOSING to CLOSED state for the container , it commits all > open blocks one by one. In such cases, client needs to retry to get the > committed block length for a fixed no of attempts and eventually throw the > exception to the application if its not able to successfully get and update > the length in the OzoneManager. This Jira aims to address this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-339) Add block length and blockId in PutKeyResponse
[ https://issues.apache.org/jira/browse/HDDS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573718#comment-16573718 ] genericqa commented on HDDS-339: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 58s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 57s{color} | {color:red} hadoop-hdds/container-service generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s{color} | {color:green} common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s{color} | {color:green} container-service in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 42s{color} | {color:red} integration-test in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}136m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdds/container-service | | | Dead store to builder in
[jira] [Commented] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer
[ https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573707#comment-16573707 ] Virajith Jalaparti commented on HDFS-13795: --- Thanks [~elgoiri]. [^HDFS-13795.004.patch] fixes the failed test {{TestInMemoryLevelDBAliasMapClient}} > Fix potential NPE in InMemoryLevelDBAliasMapServer > -- > > Key: HDFS-13795 > URL: https://issues.apache.org/jira/browse/HDFS-13795 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, > HDFS-13795.003.patch, HDFS-13795.004.patch > > > Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when > it is configured incorrectly. > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer
[ https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virajith Jalaparti updated HDFS-13795: -- Status: Open (was: Patch Available) > Fix potential NPE in InMemoryLevelDBAliasMapServer > -- > > Key: HDFS-13795 > URL: https://issues.apache.org/jira/browse/HDFS-13795 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, > HDFS-13795.003.patch, HDFS-13795.004.patch > > > Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when > it is configured incorrectly. > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer
[ https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virajith Jalaparti updated HDFS-13795: -- Status: Patch Available (was: Open) > Fix potential NPE in InMemoryLevelDBAliasMapServer > -- > > Key: HDFS-13795 > URL: https://issues.apache.org/jira/browse/HDFS-13795 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, > HDFS-13795.003.patch, HDFS-13795.004.patch > > > Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when > it is configured incorrectly. > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer
[ https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virajith Jalaparti updated HDFS-13795: -- Attachment: HDFS-13795.004.patch > Fix potential NPE in InMemoryLevelDBAliasMapServer > -- > > Key: HDFS-13795 > URL: https://issues.apache.org/jira/browse/HDFS-13795 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, > HDFS-13795.003.patch, HDFS-13795.004.patch > > > Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when > it is configured incorrectly. > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-333) Create an Ozone Logo
[ https://issues.apache.org/jira/browse/HDDS-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Priyanka Nagwekar updated HDDS-333: --- Attachment: Ozone-Logo-Options.png > Create an Ozone Logo > > > Key: HDDS-333 > URL: https://issues.apache.org/jira/browse/HDDS-333 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Reporter: Anu Engineer >Assignee: Priyanka Nagwekar >Priority: Major > Fix For: 0.2.1 > > Attachments: Logo Final.zip, Logo-Ozone-Transparent-Bg.png, > Ozone-Logo-Options.png > > > As part of developing Ozone Website and Documentation, It would be nice to > have an Ozone Logo. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close
[ https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573689#comment-16573689 ] Bharat Viswanadham commented on HDDS-267: - Hi [~hanishakoneru] I have a question here, writeToContainerFile is called from create and close/update via updateContainerFile. So, in 2 other cases, close/update container file already exists, and rename will fail. As the destination file already exists in the case of close/update. > Handle consistency issues during container update/close > --- > > Key: HDDS-267 > URL: https://issues.apache.org/jira/browse/HDDS-267 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-267.001.patch, HDDS-267.002.patch, > HDDS-267.003.patch, HDDS-267.004.patch > > > During container update and close, the .container file on disk is modified. > We should make sure that the in-memory state and the on-disk state for a > container are consistent. > A write lock is obtained before updating the container data during close or > update operations. > During update operation, if the on-disk update of .container file fails, then > the container metadata is in-memory is also reset to the old value. > During close operation, if the on-disk update of .container file fails, then > the in-memory containerState is set to CLOSING so that no new operations are > permitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-284) CRC for ChunksData
[ https://issues.apache.org/jira/browse/HDDS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573676#comment-16573676 ] Bharat Viswanadham edited comment on HDDS-284 at 8/8/18 6:32 PM: - Attached patch v04. Fixed findbug issues. And the failed test case TestBuckets is passing locally, whereas TestKeys I have run 3 times, I have seen it fail one time. Need to look into this test issue. But I don't think that is related to this patch. was (Author: bharatviswa): Attached patch v04. Fixed findbug issues. And the failed testcase's are passing locally. TestKeys, in 3 times, I have seen randomly failing one time. Need to look into this test issue. But I don't think that is related to this patch. > CRC for ChunksData > -- > > Key: HDDS-284 > URL: https://issues.apache.org/jira/browse/HDDS-284 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-284.00.patch, HDDS-284.01.patch, HDDS-284.02.patch, > HDDS-284.03.patch, HDDS-284.04.patch, Interleaving CRC and Error Detection > for Containers.pdf > > > This Jira is to add CRC for chunks data. > > > Right now a Chunk Info structure looks like this: > > _message ChunkInfo {_ > _required string chunkName =_ _1__;_ > _required uint64 offset =_ _2__;_ > _required uint64 len =_ _3__;_ > _optional string checksum =_ _4__;_ > _repeated KeyValue metadata =_ _5__;_ > _}_ > > _Proposal is to change ChunkInfo structure as below:_ > > _message ChunkInfo {_ > _required string chunkName =_ _1__;_ > _required uint64 offset =_ _2__;_ > _required uint64 len =_ _3__;_ > _optional bytes checksum =_ _4__;_ > _optional CRCType checksumType =_ _5__;_ > _optional string legacyMetadata =_ _6__;_ > _optional string legacyData =_ _7__;_ > _repeated KeyValue metadata =_ _8__;_ > _}_ > > _Instead of changing disk format, we put the checksum, checksumtype and > legacy data fields in to chunkInfo._ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-284) CRC for ChunksData
[ https://issues.apache.org/jira/browse/HDDS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573676#comment-16573676 ] Bharat Viswanadham commented on HDDS-284: - Attached patch v04. Fixed findbug issues. And the failed testcase's are passing locally. TestKeys, in 3 times, I have seen randomly failing one time. Need to look into this test issue. But I don't think that is related to this patch. > CRC for ChunksData > -- > > Key: HDDS-284 > URL: https://issues.apache.org/jira/browse/HDDS-284 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-284.00.patch, HDDS-284.01.patch, HDDS-284.02.patch, > HDDS-284.03.patch, HDDS-284.04.patch, Interleaving CRC and Error Detection > for Containers.pdf > > > This Jira is to add CRC for chunks data. > > > Right now a Chunk Info structure looks like this: > > _message ChunkInfo {_ > _required string chunkName =_ _1__;_ > _required uint64 offset =_ _2__;_ > _required uint64 len =_ _3__;_ > _optional string checksum =_ _4__;_ > _repeated KeyValue metadata =_ _5__;_ > _}_ > > _Proposal is to change ChunkInfo structure as below:_ > > _message ChunkInfo {_ > _required string chunkName =_ _1__;_ > _required uint64 offset =_ _2__;_ > _required uint64 len =_ _3__;_ > _optional bytes checksum =_ _4__;_ > _optional CRCType checksumType =_ _5__;_ > _optional string legacyMetadata =_ _6__;_ > _optional string legacyData =_ _7__;_ > _repeated KeyValue metadata =_ _8__;_ > _}_ > > _Instead of changing disk format, we put the checksum, checksumtype and > legacy data fields in to chunkInfo._ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-335) Fix logging for scm events
[ https://issues.apache.org/jira/browse/HDDS-335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDDS-335: Resolution: Not A Problem Status: Resolved (was: Patch Available) > Fix logging for scm events > -- > > Key: HDDS-335 > URL: https://issues.apache.org/jira/browse/HDDS-335 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Attachments: HDDS-335.00.patch > > > Logs should print event type, classname and object currently logged are not > very useful. > > {code:java} > java.lang.IllegalArgumentException: No event handler registered for event > org.apache.hadoop.hdds.server.events.TypedEvent@69464649 > at > org.apache.hadoop.hdds.server.events.EventQueue.fireEvent(EventQueue.java:116) > at > org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher.dispatch(SCMDatanodeHeartbeatDispatcher.java:66) > at > org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.sendHeartbeat(SCMDatanodeProtocolServer.java:219) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolServerSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolServerSideTranslatorPB.java:90) > at > org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos$StorageContainerDatanodeProtocolService$2.callBlockingMethod(StorageContainerDatanodeProtocolProtos.java:19310){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-284) CRC for ChunksData
[ https://issues.apache.org/jira/browse/HDDS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-284: Attachment: HDDS-284.04.patch > CRC for ChunksData > -- > > Key: HDDS-284 > URL: https://issues.apache.org/jira/browse/HDDS-284 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-284.00.patch, HDDS-284.01.patch, HDDS-284.02.patch, > HDDS-284.03.patch, HDDS-284.04.patch, Interleaving CRC and Error Detection > for Containers.pdf > > > This Jira is to add CRC for chunks data. > > > Right now a Chunk Info structure looks like this: > > _message ChunkInfo {_ > _required string chunkName =_ _1__;_ > _required uint64 offset =_ _2__;_ > _required uint64 len =_ _3__;_ > _optional string checksum =_ _4__;_ > _repeated KeyValue metadata =_ _5__;_ > _}_ > > _Proposal is to change ChunkInfo structure as below:_ > > _message ChunkInfo {_ > _required string chunkName =_ _1__;_ > _required uint64 offset =_ _2__;_ > _required uint64 len =_ _3__;_ > _optional bytes checksum =_ _4__;_ > _optional CRCType checksumType =_ _5__;_ > _optional string legacyMetadata =_ _6__;_ > _optional string legacyData =_ _7__;_ > _repeated KeyValue metadata =_ _8__;_ > _}_ > > _Instead of changing disk format, we put the checksum, checksumtype and > legacy data fields in to chunkInfo._ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-335) Fix logging for scm events
[ https://issues.apache.org/jira/browse/HDDS-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573673#comment-16573673 ] Ajay Kumar commented on HDDS-335: - [~nandakumar131] thanks for checking this. Ya, that should handle it. Resolving ticket. > Fix logging for scm events > -- > > Key: HDDS-335 > URL: https://issues.apache.org/jira/browse/HDDS-335 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Attachments: HDDS-335.00.patch > > > Logs should print event type, classname and object currently logged are not > very useful. > > {code:java} > java.lang.IllegalArgumentException: No event handler registered for event > org.apache.hadoop.hdds.server.events.TypedEvent@69464649 > at > org.apache.hadoop.hdds.server.events.EventQueue.fireEvent(EventQueue.java:116) > at > org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher.dispatch(SCMDatanodeHeartbeatDispatcher.java:66) > at > org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.sendHeartbeat(SCMDatanodeProtocolServer.java:219) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolServerSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolServerSideTranslatorPB.java:90) > at > org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos$StorageContainerDatanodeProtocolService$2.callBlockingMethod(StorageContainerDatanodeProtocolProtos.java:19310){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close
[ https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573665#comment-16573665 ] Arpit Agarwal commented on HDDS-267: +1 lgtm. Thanks for this improvement [~hanishakoneru]. Are the unit-test failures related to the patch? > Handle consistency issues during container update/close > --- > > Key: HDDS-267 > URL: https://issues.apache.org/jira/browse/HDDS-267 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-267.001.patch, HDDS-267.002.patch, > HDDS-267.003.patch, HDDS-267.004.patch > > > During container update and close, the .container file on disk is modified. > We should make sure that the in-memory state and the on-disk state for a > container are consistent. > A write lock is obtained before updating the container data during close or > update operations. > During update operation, if the on-disk update of .container file fails, then > the container metadata is in-memory is also reset to the old value. > During close operation, if the on-disk update of .container file fails, then > the in-memory containerState is set to CLOSING so that no new operations are > permitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-174) Shell error messages are often cryptic
[ https://issues.apache.org/jira/browse/HDDS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573621#comment-16573621 ] Arpit Agarwal commented on HDDS-174: [~xyao], sorry I think I lost this patch :( I will see if I can rewrite my changes. > Shell error messages are often cryptic > -- > > Key: HDDS-174 > URL: https://issues.apache.org/jira/browse/HDDS-174 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Arpit Agarwal >Assignee: Nanda kumar >Priority: Critical > Labels: newbie > Fix For: 0.2.1 > > > Error messages in the Ozone shell are often too cryptic. e.g. > {code} > $ ozone oz -putKey /vol1/bucket1/key1 -file foo.txt > Command Failed : Create key failed, error:INTERNAL_ERROR > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13658) Expose HighestPriorityLowRedundancy blocks statistics
[ https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573610#comment-16573610 ] Hudson commented on HDFS-13658: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14729 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14729/]) HDFS-13658. Expose HighestPriorityLowRedundancy blocks statistics. (xiao: rev 9499df7b81b55b488a32fd59798a543dafef4ef8) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ReplicatedBlockStats.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestLowRedundancyBlockQueues.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/NamenodeBeanMetrics.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ErasureCoding.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeMXBean.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * (edit) hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ECBlockGroupStats.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdmin.java > Expose HighestPriorityLowRedundancy blocks statistics > - > > Key: HDFS-13658 > URL: https://issues.apache.org/jira/browse/HDFS-13658 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Fix For: 3.2.0 > > Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, > HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch, > HDFS-13658.006.patch, HDFS-13658.007.patch, HDFS-13658.008.patch, > HDFS-13658.009.patch, HDFS-13658.010.patch, HDFS-13658.011.patch, > HDFS-13658.012.patch > > > fsck, dfsadmin -report, and NN WebUI should report number of blocks that have > 1 replica. We have had many cases opened in which a customer has lost a disk > or a DN losing files/blocks due to the fact that they had blocks with only 1 > replica. We need to make the customer better aware of this situation and that > they should take action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-308) SCM should identify a container with pending deletes using container reports
[ https://issues.apache.org/jira/browse/HDDS-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573606#comment-16573606 ] Lokesh Jain commented on HDDS-308: -- v6 patch makes changes to ensure that the number of nodes in pipeline for a container match the replication factor in DeletedBlockLogImpl#commitTransactions > SCM should identify a container with pending deletes using container reports > > > Key: HDDS-308 > URL: https://issues.apache.org/jira/browse/HDDS-308 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-308.001.patch, HDDS-308.002.patch, > HDDS-308.003.patch, HDDS-308.004.patch, HDDS-308.005.patch, HDDS-308.006.patch > > > SCM should fire an event when it finds using container report that a > container's deleteTransactionID does not match SCM's deleteTransactionId. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-308) SCM should identify a container with pending deletes using container reports
[ https://issues.apache.org/jira/browse/HDDS-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-308: - Attachment: HDDS-308.006.patch > SCM should identify a container with pending deletes using container reports > > > Key: HDDS-308 > URL: https://issues.apache.org/jira/browse/HDDS-308 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-308.001.patch, HDDS-308.002.patch, > HDDS-308.003.patch, HDDS-308.004.patch, HDDS-308.005.patch, HDDS-308.006.patch > > > SCM should fire an event when it finds using container report that a > container's deleteTransactionID does not match SCM's deleteTransactionId. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-325) Add event watcher for delete blocks command
[ https://issues.apache.org/jira/browse/HDDS-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573588#comment-16573588 ] Lokesh Jain commented on HDDS-325: -- [~elek] Thanks for reviewing the patch! I will try to make the changes more clear below. # I wanted to have a single RetriableEventWatcher which watches over all the events which need to be retried. That is the reason I introduced watchEvents function in the EventWatcher class. I thought we could use this event watcher to watch over CloseContainer as well as ReplicationEvent. # The reason I have added onFinished/onTimeout api in RetriablePayload so as to support different events in a single event watcher. Different payloads can trigger events according to their requirement and can still use the same watcher. # For replication command we can pass the request as part of the payload. Then payload can fire the required event in its onTimeout function. The patch helps in getting a single watcher to watch over all the events. Using this approach we will not need to fire separate events for tracking the DATANODE_COMMAND. We can easily add RETRIABLE_DATANODE_COMMAND into the watcher and it can watch over all these events. I was also thinking of adding the timeout duration as part of the RetriablePayload api so that we can have different timeout for each event type. > Add event watcher for delete blocks command > --- > > Key: HDDS-325 > URL: https://issues.apache.org/jira/browse/HDDS-325 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-325.001.patch, HDDS-325.002.patch > > > This Jira aims to add watcher for deleteBlocks command. It removes the > current rpc call required for datanode to send the acknowledgement for > deleteBlocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13658) Expose HighestPriorityLowRedundancy blocks statistics
[ https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573566#comment-16573566 ] Xiao Chen edited comment on HDFS-13658 at 8/8/18 5:41 PM: -- +1. Committed to trunk. Thanks for the great work here Kitti, and Gabor / Andrew for the reviews and thoughts! was (Author: xiaochen): +1. Committed to trunk. Thanks for the great work here Kitti, and Andrew for the reviews and thoughts! > Expose HighestPriorityLowRedundancy blocks statistics > - > > Key: HDFS-13658 > URL: https://issues.apache.org/jira/browse/HDFS-13658 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Fix For: 3.2.0 > > Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, > HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch, > HDFS-13658.006.patch, HDFS-13658.007.patch, HDFS-13658.008.patch, > HDFS-13658.009.patch, HDFS-13658.010.patch, HDFS-13658.011.patch, > HDFS-13658.012.patch > > > fsck, dfsadmin -report, and NN WebUI should report number of blocks that have > 1 replica. We have had many cases opened in which a customer has lost a disk > or a DN losing files/blocks due to the fact that they had blocks with only 1 > replica. We need to make the customer better aware of this situation and that > they should take action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13658) Expose HighestPriorityLowRedundancy blocks statistics
[ https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-13658: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.2.0 Status: Resolved (was: Patch Available) +1. Committed to trunk. Thanks for the great work here Kitti, and Andrew for the reviews and thoughts! > Expose HighestPriorityLowRedundancy blocks statistics > - > > Key: HDFS-13658 > URL: https://issues.apache.org/jira/browse/HDFS-13658 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Fix For: 3.2.0 > > Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, > HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch, > HDFS-13658.006.patch, HDFS-13658.007.patch, HDFS-13658.008.patch, > HDFS-13658.009.patch, HDFS-13658.010.patch, HDFS-13658.011.patch, > HDFS-13658.012.patch > > > fsck, dfsadmin -report, and NN WebUI should report number of blocks that have > 1 replica. We have had many cases opened in which a customer has lost a disk > or a DN losing files/blocks due to the fact that they had blocks with only 1 > replica. We need to make the customer better aware of this situation and that > they should take action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13658) Expose HighestPriorityLowRedundancy blocks statistics
[ https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-13658: - Summary: Expose HighestPriorityLowRedundancy blocks statistics (was: fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 1 replica) > Expose HighestPriorityLowRedundancy blocks statistics > - > > Key: HDFS-13658 > URL: https://issues.apache.org/jira/browse/HDFS-13658 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, > HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch, > HDFS-13658.006.patch, HDFS-13658.007.patch, HDFS-13658.008.patch, > HDFS-13658.009.patch, HDFS-13658.010.patch, HDFS-13658.011.patch, > HDFS-13658.012.patch > > > fsck, dfsadmin -report, and NN WebUI should report number of blocks that have > 1 replica. We have had many cases opened in which a customer has lost a disk > or a DN losing files/blocks due to the fact that they had blocks with only 1 > replica. We need to make the customer better aware of this situation and that > they should take action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-333) Create an Ozone Logo
[ https://issues.apache.org/jira/browse/HDDS-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573532#comment-16573532 ] Tsz Wo Nicholas Sze commented on HDDS-333: -- +1 the logo looks good. > Create an Ozone Logo > > > Key: HDDS-333 > URL: https://issues.apache.org/jira/browse/HDDS-333 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Reporter: Anu Engineer >Assignee: Priyanka Nagwekar >Priority: Major > Fix For: 0.2.1 > > Attachments: Logo Final.zip, Logo-Ozone-Transparent-Bg.png > > > As part of developing Ozone Website and Documentation, It would be nice to > have an Ozone Logo. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-335) Fix logging for scm events
[ https://issues.apache.org/jira/browse/HDDS-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573517#comment-16573517 ] Nanda kumar commented on HDDS-335: -- [~ajayydv], in HDDS-199 we have added {{toString}} method to {{TypedEvent}}. After that patch, we should get proper event details in log instead of object address. > Fix logging for scm events > -- > > Key: HDDS-335 > URL: https://issues.apache.org/jira/browse/HDDS-335 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Attachments: HDDS-335.00.patch > > > Logs should print event type, classname and object currently logged are not > very useful. > > {code:java} > java.lang.IllegalArgumentException: No event handler registered for event > org.apache.hadoop.hdds.server.events.TypedEvent@69464649 > at > org.apache.hadoop.hdds.server.events.EventQueue.fireEvent(EventQueue.java:116) > at > org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher.dispatch(SCMDatanodeHeartbeatDispatcher.java:66) > at > org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.sendHeartbeat(SCMDatanodeProtocolServer.java:219) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolServerSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolServerSideTranslatorPB.java:90) > at > org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos$StorageContainerDatanodeProtocolService$2.callBlockingMethod(StorageContainerDatanodeProtocolProtos.java:19310){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13738) fsck -list-corruptfileblocks has infinite loop if user is not privileged.
[ https://issues.apache.org/jira/browse/HDFS-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573509#comment-16573509 ] Wei-Chiu Chuang commented on HDFS-13738: :) Sorry I forgot I crafted a test case previously. the errCode is returned, and eventually an exit code. Since the operation fails in this case, should we set errCode to a non-zero value? Looks to me that a -1 should be returned. Additionally, you will also want to make sure to check return code when calling runFsck(): {code:java} String outStr = runFsck(conf, -1, true, path, "-list-corruptfileblocks");{code} Other than that I am +1. > fsck -list-corruptfileblocks has infinite loop if user is not privileged. > - > > Key: HDFS-13738 > URL: https://issues.apache.org/jira/browse/HDFS-13738 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.6.0, 3.0.0 > Environment: Kerberized Hadoop cluster >Reporter: Wei-Chiu Chuang >Assignee: Yuen-Kuei Hsueh >Priority: Major > Attachments: HDFS-13738.001.patch, HDFS-13738.002.patch, > HDFS-13738.test.patch > > > Found an interesting bug. > Execute following command as any non-privileged user: > {noformat} > # run fsck > $ hdfs fsck / -list-corruptfileblocks > {noformat} > {noformat} > FSCK ended at Mon Jul 16 15:14:03 PDT 2018 in 1 milliseconds > Access denied for user systest. Superuser privilege is required > Fsck on path '/' FAILED > FSCK ended at Mon Jul 16 15:14:03 PDT 2018 in 0 milliseconds > Access denied for user systest. Superuser privilege is required > Fsck on path '/' FAILED > FSCK ended at Mon Jul 16 15:14:03 PDT 2018 in 1 milliseconds > Access denied for user systest. Superuser privilege is required > Fsck on path '/' FAILED > {noformat} > Reproducible on Hadoop 3.0.0 as well as 2.6.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-339) Add block length and blockId in PutKeyResponse
[ https://issues.apache.org/jira/browse/HDDS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573482#comment-16573482 ] Shashikant Banerjee commented on HDDS-339: -- Ptach v1 adds GetCommittedBlockLength response to PutKeyReponse. It also fixes a bug in OpenContainerBlockMap where the Chunk was being added to OpenContainerBlockMap during WRITE stage where it should be added in COMMIT stage of write chunk. > Add block length and blockId in PutKeyResponse > -- > > Key: HDDS-339 > URL: https://issues.apache.org/jira/browse/HDDS-339 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client, Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-339.00.patch > > > The putKey response will include blockId as well committed block length in > the PutKey response. This will be extended to include blockCommitSequenceId > as well all of which will be updated on Ozone Master. This all be required to > add validation as well handle 2 node failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-339) Add block length and blockId in PutKeyResponse
[ https://issues.apache.org/jira/browse/HDDS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-339: - Attachment: HDDS-339.00.patch > Add block length and blockId in PutKeyResponse > -- > > Key: HDDS-339 > URL: https://issues.apache.org/jira/browse/HDDS-339 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client, Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-339.00.patch > > > The putKey response will include blockId as well committed block length in > the PutKey response. This will be extended to include blockCommitSequenceId > as well all of which will be updated on Ozone Master. This all be required to > add validation as well handle 2 node failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-339) Add block length and blockId in PutKeyResponse
[ https://issues.apache.org/jira/browse/HDDS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-339: - Status: Patch Available (was: Open) > Add block length and blockId in PutKeyResponse > -- > > Key: HDDS-339 > URL: https://issues.apache.org/jira/browse/HDDS-339 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client, Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-339.00.patch > > > The putKey response will include blockId as well committed block length in > the PutKey response. This will be extended to include blockCommitSequenceId > as well all of which will be updated on Ozone Master. This all be required to > add validation as well handle 2 node failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-340) ContainerStateMachine#readStateMachinedata should read from temporary chunk file if the data is not present as committed chunk
Mukul Kumar Singh created HDDS-340: -- Summary: ContainerStateMachine#readStateMachinedata should read from temporary chunk file if the data is not present as committed chunk Key: HDDS-340 URL: https://issues.apache.org/jira/browse/HDDS-340 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.2.1 Reporter: Mukul Kumar Singh Assignee: Mukul Kumar Singh Fix For: 0.2.1 ContainerStateMachine#readStateMachinedata currently only reads data from a committed chunk right now. However for leader, it might be necessary to read the chunk data from the temporary chunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13532) RBF: Adding security
[ https://issues.apache.org/jira/browse/HDFS-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573454#comment-16573454 ] Ajay Kumar commented on HDFS-13532: --- [~crh], sure, i work out of PST. > RBF: Adding security > > > Key: HDFS-13532 > URL: https://issues.apache.org/jira/browse/HDFS-13532 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Íñigo Goiri >Assignee: Sherwood Zheng >Priority: Major > Attachments: RBF _ Security delegation token thoughts.pdf, > RBF-DelegationToken-Approach1b.pdf, Security_for_Router-based > Federation_design_doc.pdf > > > HDFS Router based federation should support security. This includes > authentication and delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13805) Journal Nodes should allow to format non-empty directories with "-force" option
[ https://issues.apache.org/jira/browse/HDFS-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573447#comment-16573447 ] Arpit Agarwal commented on HDFS-13805: -- Hi [~surendrasingh], just for my curiosity, what is the use case for reformatting JNs? > Journal Nodes should allow to format non-empty directories with "-force" > option > --- > > Key: HDFS-13805 > URL: https://issues.apache.org/jira/browse/HDFS-13805 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node >Affects Versions: 3.0.0-alpha4 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > > HDFS-2 completely restricted to re-format journalnode, but it should be > allowed when *"-force"* option is given. If user fill force option can > accidentally delete the data then he can disable it by configuring > "*dfs.reformat.disabled*" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13566) Add configurable additional RPC listener to NameNode
[ https://issues.apache.org/jira/browse/HDFS-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573443#comment-16573443 ] Erik Krogen commented on HDFS-13566: Cool, thanks for clarifying. My initial reaction to (3) was that it would be too cumbersome for clients, but I can see some ways to set up configs to make it easy to use: * Set up a separate namespace in the configs, {{namespace2-aux}}, which points to {{namespace2}} except using the auxiliary ports. Clients can specify {{hdfs://namespace2}} for the standard port and {{namespace2-aux}} for auxiliary. * For configs located within DC2 (where {{namespace2}} is located), set up {{namespace2}} with the standard port; for configs located in DC1, set up {{namespace2}} with the auxiliary port. I still wonder if it would be a reasonable addition to have something like {{dfs.namenode.rpc-address.namespace2.use-aux}} to have a single config to change for a client, but I think it is not necessary. > Add configurable additional RPC listener to NameNode > > > Key: HDFS-13566 > URL: https://issues.apache.org/jira/browse/HDFS-13566 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13566.001.patch, HDFS-13566.002.patch, > HDFS-13566.003.patch > > > This Jira aims to add the capability to NameNode to run additional > listener(s). Such that NameNode can be accessed from multiple ports. > Fundamentally, this Jira tries to extend ipc.Server to allow configured with > more listeners, binding to different ports, but sharing the same call queue > and the handlers. Useful when different clients are only allowed to access > certain different ports. Combined with HDFS-13547, this also allows different > ports to have different SASL security levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
[ https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13802: --- Attachment: (was: HDFS-13802.000.patch) > RBF: Remove FSCK from Router Web UI, because fsck is not supported currently > > > Key: HDFS-13802 > URL: https://issues.apache.org/jira/browse/HDFS-13802 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1, 3.0.3 >Reporter: Fei Hui >Priority: Major > Attachments: HDFS-13802.001.patch, HDFS-13802.002.patch > > > When i click FSCK on Router Web UI Utilities, i got errors > {quote} > HTTP ERROR 404 > Problem accessing /fsck. Reason: > NOT_FOUND > Powered by Jetty:// > {quote} > I deep into the source code and find that fsck is not supported currently, So > i think we should remove FSCK from Router Web UI -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
[ https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13802: --- Attachment: HDFS-13802.000.patch > RBF: Remove FSCK from Router Web UI, because fsck is not supported currently > > > Key: HDFS-13802 > URL: https://issues.apache.org/jira/browse/HDFS-13802 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1, 3.0.3 >Reporter: Fei Hui >Priority: Major > Attachments: HDFS-13802.001.patch, HDFS-13802.002.patch > > > When i click FSCK on Router Web UI Utilities, i got errors > {quote} > HTTP ERROR 404 > Problem accessing /fsck. Reason: > NOT_FOUND > Powered by Jetty:// > {quote} > I deep into the source code and find that fsck is not supported currently, So > i think we should remove FSCK from Router Web UI -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
[ https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13802: --- Attachment: HDFS-13802.002.patch > RBF: Remove FSCK from Router Web UI, because fsck is not supported currently > > > Key: HDFS-13802 > URL: https://issues.apache.org/jira/browse/HDFS-13802 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1, 3.0.3 >Reporter: Fei Hui >Priority: Major > Attachments: HDFS-13802.001.patch, HDFS-13802.002.patch > > > When i click FSCK on Router Web UI Utilities, i got errors > {quote} > HTTP ERROR 404 > Problem accessing /fsck. Reason: > NOT_FOUND > Powered by Jetty:// > {quote} > I deep into the source code and find that fsck is not supported currently, So > i think we should remove FSCK from Router Web UI -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer
[ https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573429#comment-16573429 ] Íñigo Goiri commented on HDFS-13795: The error in {{TestInMemoryLevelDBAliasMapClient}} seems related. > Fix potential NPE in InMemoryLevelDBAliasMapServer > -- > > Key: HDFS-13795 > URL: https://issues.apache.org/jira/browse/HDFS-13795 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, > HDFS-13795.003.patch > > > Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when > it is configured incorrectly. > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13805) Journal Nodes should allow to format non-empty directories with "-force" option
[ https://issues.apache.org/jira/browse/HDFS-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573426#comment-16573426 ] Íñigo Goiri commented on HDFS-13805: We see this issue in our Windows deployment. I thought it was a Windows exclusive only. If we add the force option, I'd like to make sure it works for Windows. > Journal Nodes should allow to format non-empty directories with "-force" > option > --- > > Key: HDFS-13805 > URL: https://issues.apache.org/jira/browse/HDFS-13805 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node >Affects Versions: 3.0.0-alpha4 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > > HDFS-2 completely restricted to re-format journalnode, but it should be > allowed when *"-force"* option is given. If user fill force option can > accidentally delete the data then he can disable it by configuring > "*dfs.reformat.disabled*" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
[ https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573424#comment-16573424 ] Íñigo Goiri commented on HDFS-13802: I would prefer to implement it. Let me post an example. > RBF: Remove FSCK from Router Web UI, because fsck is not supported currently > > > Key: HDFS-13802 > URL: https://issues.apache.org/jira/browse/HDFS-13802 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1, 3.0.3 >Reporter: Fei Hui >Priority: Major > Attachments: HDFS-13802.001.patch > > > When i click FSCK on Router Web UI Utilities, i got errors > {quote} > HTTP ERROR 404 > Problem accessing /fsck. Reason: > NOT_FOUND > Powered by Jetty:// > {quote} > I deep into the source code and find that fsck is not supported currently, So > i think we should remove FSCK from Router Web UI -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-335) Fix logging for scm events
[ https://issues.apache.org/jira/browse/HDDS-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573422#comment-16573422 ] genericqa commented on HDDS-335: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} framework in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HDDS-335 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12934825/HDDS-335.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9e8a275ba316 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5b898c1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDDS-Build/725/testReport/ | | Max. process+thread count | 336 (vs. ulimit of 1) | | modules | C: hadoop-hdds/framework U: hadoop-hdds/framework | | Console output | https://builds.apache.org/job/PreCommit-HDDS-Build/725/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fix logging for scm events > -- > > Key: HDDS-335 > URL: https://issues.apache.org/jira/browse/HDDS-335 > Project: Hadoop Distributed Data Store >
[jira] [Commented] (HDFS-13447) Fix Typos - Node Not Chosen
[ https://issues.apache.org/jira/browse/HDFS-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573416#comment-16573416 ] Hudson commented on HDFS-13447: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14727 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14727/]) HDFS-13447. Fix Typos - Node Not Chosen. Contributed by Beluga Behr. (elek: rev 36c0d742d484f8bf01d7cb01c7b1c9e3627625dc) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java > Fix Typos - Node Not Chosen > --- > > Key: HDFS-13447 > URL: https://issues.apache.org/jira/browse/HDFS-13447 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.2.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Fix For: 3.2.0 > > Attachments: HDFS-13447.1.patch > > > Fix typo and improve: > > {code:java} > private enum NodeNotChosenReason { > NOT_IN_SERVICE("the node isn't in service"), > NODE_STALE("the node is stale"), > NODE_TOO_BUSY("the node is too busy"), > TOO_MANY_NODES_ON_RACK("the rack has too many chosen nodes"), > NOT_ENOUGH_STORAGE_SPACE("no enough storage space to place the > block");{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13658) fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 1 replica
[ https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573393#comment-16573393 ] genericqa commented on HDFS-13658: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 21s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 46s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 54s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 47s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}269m 26s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.client.impl.TestBlockReaderLocal | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.TestMaintenanceState | \\ \\ || Subsystem || Report/Notes || | Docker |
[jira] [Updated] (HDFS-13447) Fix Typos - Node Not Chosen
[ https://issues.apache.org/jira/browse/HDFS-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDFS-13447: Resolution: Fixed Fix Version/s: 3.2.0 Status: Resolved (was: Patch Available) I am just committed it to the trunk. Thank you very much the contribution [~belugabehr] > Fix Typos - Node Not Chosen > --- > > Key: HDFS-13447 > URL: https://issues.apache.org/jira/browse/HDFS-13447 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.2.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Fix For: 3.2.0 > > Attachments: HDFS-13447.1.patch > > > Fix typo and improve: > > {code:java} > private enum NodeNotChosenReason { > NOT_IN_SERVICE("the node isn't in service"), > NODE_STALE("the node is stale"), > NODE_TOO_BUSY("the node is too busy"), > TOO_MANY_NODES_ON_RACK("the rack has too many chosen nodes"), > NOT_ENOUGH_STORAGE_SPACE("no enough storage space to place the > block");{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13447) Fix Typos - Node Not Chosen
[ https://issues.apache.org/jira/browse/HDFS-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573376#comment-16573376 ] Elek, Marton commented on HDFS-13447: - +1. Seems to be reasonable. > Fix Typos - Node Not Chosen > --- > > Key: HDFS-13447 > URL: https://issues.apache.org/jira/browse/HDFS-13447 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.2.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: HDFS-13447.1.patch > > > Fix typo and improve: > > {code:java} > private enum NodeNotChosenReason { > NOT_IN_SERVICE("the node isn't in service"), > NODE_STALE("the node is stale"), > NODE_TOO_BUSY("the node is too busy"), > TOO_MANY_NODES_ON_RACK("the rack has too many chosen nodes"), > NOT_ENOUGH_STORAGE_SPACE("no enough storage space to place the > block");{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org