[jira] [Commented] (HDDS-339) Add block length and blockId in PutKeyResponse

2018-08-08 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574324#comment-16574324
 ] 

Mukul Kumar Singh commented on HDDS-339:


Thanks for working on this [~shashikant]. Please find my review comments as 
following.

1) DatanodeContainerProtocol.proto:311, getCommittedBlockLength-> 
committedBlockLength
2) KeyValueHandler:43, Unused import
3)  KeyValueHandler:438, I feel the length can be returned as part of 
commitKey, once the key has been committed successfully, the length can be 
returned.
4) KeyUtils:156, there is a unused variable here, I feel this can be removed
5) KeyUtils:140, putKeyResposne-> putKeyResponse
6) KeyUtils:130, returns putkey response success
7) KeyUtils:134, getPutKeyResponseSuccess -> putKeyResponseSuccess
8) KeyUtils:194, msg is a unused field


> Add block length and blockId in PutKeyResponse
> --
>
> Key: HDDS-339
> URL: https://issues.apache.org/jira/browse/HDDS-339
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-339.00.patch
>
>
> The putKey response will include blockId as well committed block length in 
> the PutKey response. This will be extended to include blockCommitSequenceId 
> as well all of which will be updated on Ozone Master. This all be required to 
> add validation as well handle 2 node failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13668) FSPermissionChecker may throws AIOOE when check if inode has permission

2018-08-08 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574300#comment-16574300
 ] 

He Xiaoqiao commented on HDFS-13668:


Thanks [~drankye],[~shashikant] for your suggestions, submit v002 following 
your advice and trigger jenkins.

> FSPermissionChecker may throws AIOOE when check if inode has permission
> ---
>
> Key: HDFS-13668
> URL: https://issues.apache.org/jira/browse/HDFS-13668
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.0, 2.10.0, 2.7.7
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13668-trunk.001.patch, HDFS-13668-trunk.002.patch
>
>
> {{FSPermissionChecker}} may throw {{ArrayIndexOutOfBoundsException:0}} when 
> check if has permission, since it only check inode's {{aclFeature}} if null 
> or not but not check it's entry size. When it meets {{aclFeature}} not null 
> but it's entry size equal to 0, it will throw AIOOE.
> {code:java}
> private boolean hasPermission(INodeAttributes inode, FsAction access) {
>   ..
>   final AclFeature aclFeature = inode.getAclFeature();
>   if (aclFeature != null) {
> // It's possible that the inode has a default ACL but no access ACL.
> int firstEntry = aclFeature.getEntryAt(0);
> if (AclEntryStatusFormat.getScope(firstEntry) == AclEntryScope.ACCESS) {
>   return hasAclPermission(inode, access, mode, aclFeature);
> }
>   }
>   ..
> }
> {code}
> Actually if use default {{INodeAttributeProvider}}, it can ensure that when 
> {{inode}}'s aclFeature is not null and it's entry size also will be greater 
> than 0, but {{INodeAttributeProvider}} is a public interface, we could not 
> ensure external implement (e.g. Apache Sentry, Apache Ranger) also has the 
> similar constraint. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13668) FSPermissionChecker may throws AIOOE when check if inode has permission

2018-08-08 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-13668:
---
Attachment: HDFS-13668-trunk.002.patch

> FSPermissionChecker may throws AIOOE when check if inode has permission
> ---
>
> Key: HDFS-13668
> URL: https://issues.apache.org/jira/browse/HDFS-13668
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.0, 2.10.0, 2.7.7
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13668-trunk.001.patch, HDFS-13668-trunk.002.patch
>
>
> {{FSPermissionChecker}} may throw {{ArrayIndexOutOfBoundsException:0}} when 
> check if has permission, since it only check inode's {{aclFeature}} if null 
> or not but not check it's entry size. When it meets {{aclFeature}} not null 
> but it's entry size equal to 0, it will throw AIOOE.
> {code:java}
> private boolean hasPermission(INodeAttributes inode, FsAction access) {
>   ..
>   final AclFeature aclFeature = inode.getAclFeature();
>   if (aclFeature != null) {
> // It's possible that the inode has a default ACL but no access ACL.
> int firstEntry = aclFeature.getEntryAt(0);
> if (AclEntryStatusFormat.getScope(firstEntry) == AclEntryScope.ACCESS) {
>   return hasAclPermission(inode, access, mode, aclFeature);
> }
>   }
>   ..
> }
> {code}
> Actually if use default {{INodeAttributeProvider}}, it can ensure that when 
> {{inode}}'s aclFeature is not null and it's entry size also will be greater 
> than 0, but {{INodeAttributeProvider}} is a public interface, we could not 
> ensure external implement (e.g. Apache Sentry, Apache Ranger) also has the 
> similar constraint. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-314) ozoneShell putKey command overwrites the existing key having same name

2018-08-08 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-314:

Attachment: HDDS-314.002.patch

> ozoneShell putKey command overwrites the existing key having same name
> --
>
> Key: HDDS-314
> URL: https://issues.apache.org/jira/browse/HDDS-314
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: HDDS-314.001.patch, HDDS-314.002.patch
>
>
> steps taken : 
> 1) created a volume root-volume and a bucket root-bucket.
> 2)  Ran following command to put a key with name 'passwd'
>  
> {noformat}
> hadoop@08315aa4b367:~/bin$ ./ozone oz -putKey /root-volume/root-bucket/passwd 
> -file /etc/services -v
> 2018-08-02 09:20:17 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Volume Name : root-volume
> Bucket Name : root-bucket
> Key Name : passwd
> File Hash : 567c100888518c1163b3462993de7d47
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.rpc.type = GRPC (default)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 
> ms (default)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - 
> raft.client.async.outstanding-requests.max = 100 (default)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.client.async.scheduler-threads = 
> 3 (default)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB 
> (=1048576) (default)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-02 09:20:18 INFO ConfUtils:41 - raft.client.rpc.request.timeout = 
> 3000 ms (default)
> Aug 02, 2018 9:20:18 AM 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl detectProxy
>  
> {noformat}
> 3) Ran following command to put a key with name 'passwd' again.
> {noformat}
> hadoop@08315aa4b367:~/bin$ ./ozone oz -putKey /root-volume/root-bucket/passwd 
> -file /etc/passwd -v
> 2018-08-02 09:20:41 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Volume Name : root-volume
> Bucket Name : root-bucket
> Key Name : passwd
> File Hash : b056233571cc80d6879212911cb8e500
> 2018-08-02 09:20:41 INFO ConfUtils:41 - raft.rpc.type = GRPC (default)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 
> ms (default)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - 
> raft.client.async.outstanding-requests.max = 100 (default)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.client.async.scheduler-threads = 
> 3 (default)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB 
> (=1048576) (default)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-02 09:20:42 INFO ConfUtils:41 - raft.client.rpc.request.timeout = 
> 3000 ms (default)
> Aug 02, 2018 9:20:42 AM 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl 
> detectProxy{noformat}
>  
> key 'passwd' was overwritten with new content and it did not throw any saying 
> that the key is already present.
> Expectation :
> ---
> key overwrite with same name should not be allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13808) [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and #getFileInfo() function

2018-08-08 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13808:
---

 Summary: [SPS]: Remove unwanted FSNamesystem 
#isFileOpenedForWrite() and #getFileInfo() function
 Key: HDFS-13808
 URL: https://issues.apache.org/jira/browse/HDFS-13808
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13668) FSPermissionChecker may throws AIOOE when check if inode has permission

2018-08-08 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574260#comment-16574260
 ] 

Shashikant Banerjee commented on HDFS-13668:


Thanks [~hexiaoqiao], for reporting and working on this. The patch looks good 
to me overall. Some minor comments:
 # There are some whitespace issues reported. Please fix them.
 # In TestINodeAttributeProvider#testAclFeature, we can remove this code as 
this may not be required
{code:java}
fs.rename(aclChildDir, aclChildDirTarget{code}

I am + 1 after that.

> FSPermissionChecker may throws AIOOE when check if inode has permission
> ---
>
> Key: HDFS-13668
> URL: https://issues.apache.org/jira/browse/HDFS-13668
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.0, 2.10.0, 2.7.7
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13668-trunk.001.patch
>
>
> {{FSPermissionChecker}} may throw {{ArrayIndexOutOfBoundsException:0}} when 
> check if has permission, since it only check inode's {{aclFeature}} if null 
> or not but not check it's entry size. When it meets {{aclFeature}} not null 
> but it's entry size equal to 0, it will throw AIOOE.
> {code:java}
> private boolean hasPermission(INodeAttributes inode, FsAction access) {
>   ..
>   final AclFeature aclFeature = inode.getAclFeature();
>   if (aclFeature != null) {
> // It's possible that the inode has a default ACL but no access ACL.
> int firstEntry = aclFeature.getEntryAt(0);
> if (AclEntryStatusFormat.getScope(firstEntry) == AclEntryScope.ACCESS) {
>   return hasAclPermission(inode, access, mode, aclFeature);
> }
>   }
>   ..
> }
> {code}
> Actually if use default {{INodeAttributeProvider}}, it can ensure that when 
> {{inode}}'s aclFeature is not null and it's entry size also will be greater 
> than 0, but {{INodeAttributeProvider}} is a public interface, we could not 
> ensure external implement (e.g. Apache Sentry, Apache Ranger) also has the 
> similar constraint. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574255#comment-16574255
 ] 

chencan commented on HDFS-13758:


Hi [~jojochuang] , I have submited the branch-2 patch. Thanks!

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Status: Patch Available  (was: Open)

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Attachment: (was: HDFS-13758.branch-2.patch)

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Attachment: HDFS-13758.branch-2.patch

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Attachment: HDFS-13758.branch-2.patch

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13668) FSPermissionChecker may throws AIOOE when check if inode has permission

2018-08-08 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574231#comment-16574231
 ] 

Kai Zheng commented on HDFS-13668:
--

This NPE looks like a good catch and fix. +1 from me. Thanks Xiaoqiao.

[~jojochuang], would you mind also giving it a look? Thanks.

> FSPermissionChecker may throws AIOOE when check if inode has permission
> ---
>
> Key: HDFS-13668
> URL: https://issues.apache.org/jira/browse/HDFS-13668
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.0, 2.10.0, 2.7.7
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13668-trunk.001.patch
>
>
> {{FSPermissionChecker}} may throw {{ArrayIndexOutOfBoundsException:0}} when 
> check if has permission, since it only check inode's {{aclFeature}} if null 
> or not but not check it's entry size. When it meets {{aclFeature}} not null 
> but it's entry size equal to 0, it will throw AIOOE.
> {code:java}
> private boolean hasPermission(INodeAttributes inode, FsAction access) {
>   ..
>   final AclFeature aclFeature = inode.getAclFeature();
>   if (aclFeature != null) {
> // It's possible that the inode has a default ACL but no access ACL.
> int firstEntry = aclFeature.getEntryAt(0);
> if (AclEntryStatusFormat.getScope(firstEntry) == AclEntryScope.ACCESS) {
>   return hasAclPermission(inode, access, mode, aclFeature);
> }
>   }
>   ..
> }
> {code}
> Actually if use default {{INodeAttributeProvider}}, it can ensure that when 
> {{inode}}'s aclFeature is not null and it's entry size also will be greater 
> than 0, but {{INodeAttributeProvider}} is a public interface, we could not 
> ensure external implement (e.g. Apache Sentry, Apache Ranger) also has the 
> similar constraint. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Status: Open  (was: Patch Available)

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Attachment: (was: HDFS-13758.branch-2.patch)

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Status: Patch Available  (was: Open)

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Attachment: (was: HDFS-13758.branch-2.patch)

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Status: Open  (was: Patch Available)

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Attachment: HDFS-13758.branch-2.patch

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread chencan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chencan updated HDFS-13758:
---
Attachment: HDFS-13758.branch-2.patch

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13805) Journal Nodes should allow to format non-empty directories with "-force" option

2018-08-08 Thread Surendra Singh Lilhore (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574224#comment-16574224
 ] 

Surendra Singh Lilhore commented on HDFS-13805:
---

Hi [~arpitagarwal], when we want to recover the NN with old backuped fsimage, 
we need to reinitialized the share edits to avoid the mismatch between fsimage 
and edits logs. 

> Journal Nodes should allow to format non-empty directories with "-force" 
> option
> ---
>
> Key: HDFS-13805
> URL: https://issues.apache.org/jira/browse/HDFS-13805
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node
>Affects Versions: 3.0.0-alpha4
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
>
> HDFS-2 completely restricted to re-format journalnode, but it should be 
> allowed when *"-force"* option is given. If user fill force option can 
> accidentally delete the data then he can disable it by configuring 
> "*dfs.reformat.disabled*"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13807) Large overhead when seek and read only a small piece from a file

2018-08-08 Thread Jack Fan (JIRA)
Jack Fan created HDFS-13807:
---

 Summary: Large overhead when seek and read only a small piece from 
a file
 Key: HDFS-13807
 URL: https://issues.apache.org/jira/browse/HDFS-13807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 2.7.6, 2.8.4
 Environment: HDFS server is 2.8.2

HDFS client is 2.7.1

I use `pyarrow` with both `libhdfs` and `libhdfs3`, I observe the same behavior 
on both drivers.
Reporter: Jack Fan


I'm storing small files (~500KB in size) in big file chunks (256MB~2GB) in HDFS.

I then maintain a separate index file about the offset and length of the small 
files in those file chunks.

When I randomly read those small files, for each small file I open the 
corresponding file chunk, seek to the `offset`, and read `length` data.

However, I noticed when I read a small piece of data (say, 500KB), the datanode 
will transfer more data (~4MB) than that to the HDFS client.

I original thought this is the readahead feature on datanode, that sends more 
data to the client in advance to speed up streaming of file. However, I tried 
to set `dfs.client.cache.read ahead` to 0 on client configuration but the 
behavior still persist.

I also use `tcpdump` to capture packets and discovered the datanode will keep 
sending data after the HDFS client closes the TCP connection for rpc (I 
observed a bunch of RST packets sent out by HDFS client).

 

It seems the datanode spontaneously sends more data then requested to the HDFS 
client, I want to know how to stop such a behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently

2018-08-08 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574175#comment-16574175
 ] 

Fei Hui commented on HDFS-13802:


[~elgoiri] Thanks. I will try to implement the router fsck

> RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
> 
>
> Key: HDFS-13802
> URL: https://issues.apache.org/jira/browse/HDFS-13802
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1, 3.0.3
>Reporter: Fei Hui
>Priority: Major
> Attachments: HDFS-13802.001.patch, HDFS-13802.002.patch
>
>
> When i click FSCK on Router Web UI Utilities, i got errors
> {quote}
> HTTP ERROR 404
> Problem accessing /fsck. Reason:
> NOT_FOUND
> Powered by Jetty://
> {quote}
> I deep into the source code and find that fsck is not supported currently, So 
> i think we should remove FSCK from Router Web UI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer

2018-08-08 Thread Virajith Jalaparti (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574163#comment-16574163
 ] 

Virajith Jalaparti commented on HDFS-13795:
---

Failed tests in the last run seem unrelated. [~ehiggs] can you look at  
[^HDFS-13795.004.patch] ? Thanks!

> Fix potential NPE in InMemoryLevelDBAliasMapServer
> --
>
> Key: HDFS-13795
> URL: https://issues.apache.org/jira/browse/HDFS-13795
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, 
> HDFS-13795.003.patch, HDFS-13795.004.patch
>
>
> Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when 
> it is configured incorrectly.
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-08 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574081#comment-16574081
 ] 

Erik Krogen commented on HDFS-13779:


Attaching a WIP patch which covers this and HDFS-13780, and relies on some more 
refactoring in ConfiguredFailoverProxyProvider.

The idea is basically to continue allowing the CFPP layer to manage only the 
Active/Standby NameNodes, and ORPP to manage the observers (then fall back to 
CFPP's non-observer proxies). _Failover_ refers only to switching Active 
NameNodes, _not_ switching between observers.
 * Refactor CFPP a bit to create a separate {{getProxies()}} method where the 
initialization can be done (solving HDFS-13780), and also where it can be 
overridden by ORPP.
 * On proxy initialization, also fetch all of the NameNode states. When CFPP 
requests proxies, filter to only non-observers.
 * Method invocation, on read methods, tries all of the NNs thought to 
currently be in observer state. If any throws a StandbyException, mark it as 
non-observer. Unfortunately we have no way to tell here if one of the 
thought-to-be-observers has actually become Active, but in this case failover 
will happen soon (at the next write request) and the situation will be fixed 
(see below).
 * If all observer NNs fail, or it is a write method, pass the request up to 
CFPP, which will try the current Active. This may trigger failover. If so, 
before picking a new node from the list of non-observers, refresh the states of 
all of the NameNodes. This handles the case where one of the previous observers 
is now active.

[~shv], [~chliang], let me know your thoughts on the above.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-08 Thread Erik Krogen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-13779:
---
Attachment: HDFS-13779-HDFS-12943.WIP00.patch

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close

2018-08-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574055#comment-16574055
 ] 

Hudson commented on HDDS-267:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14733 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14733/])
HDDS-267. Handle consistency issues during container update/close. 
(hanishakoneru: rev d81cd3611a449bcd7970ff2f1392a5e868e28f7e)
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/impl/TestContainerPersistence.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
* (edit) 
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
* (edit) 
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainer.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerData.java


> Handle consistency issues during container update/close
> ---
>
> Key: HDDS-267
> URL: https://issues.apache.org/jira/browse/HDDS-267
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-267.001.patch, HDDS-267.002.patch, 
> HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch
>
>
> During container update and close, the .container file on disk is modified. 
> We should make sure that the in-memory state and the on-disk state for a 
> container are consistent. 
> A write lock is obtained before updating the container data during close or 
> update operations.
> During update operation, if the on-disk update of .container file fails, then 
> the container metadata is in-memory is also reset to the old value.
> During close operation, if the on-disk update of .container file fails, then 
> the in-memory containerState is set to CLOSING so that no new operations are 
> permitted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close

2018-08-08 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574046#comment-16574046
 ] 

Hanisha Koneru commented on HDDS-267:
-

Test failures are unrelated.

Committed to trunk. Thanks [~bharatviswa] and [~arpitagarwal] for the reviews.

> Handle consistency issues during container update/close
> ---
>
> Key: HDDS-267
> URL: https://issues.apache.org/jira/browse/HDDS-267
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-267.001.patch, HDDS-267.002.patch, 
> HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch
>
>
> During container update and close, the .container file on disk is modified. 
> We should make sure that the in-memory state and the on-disk state for a 
> container are consistent. 
> A write lock is obtained before updating the container data during close or 
> update operations.
> During update operation, if the on-disk update of .container file fails, then 
> the container metadata is in-memory is also reset to the old value.
> During close operation, if the on-disk update of .container file fails, then 
> the in-memory containerState is set to CLOSING so that no new operations are 
> permitted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-267) Handle consistency issues during container update/close

2018-08-08 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-267:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Handle consistency issues during container update/close
> ---
>
> Key: HDDS-267
> URL: https://issues.apache.org/jira/browse/HDDS-267
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-267.001.patch, HDDS-267.002.patch, 
> HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch
>
>
> During container update and close, the .container file on disk is modified. 
> We should make sure that the in-memory state and the on-disk state for a 
> container are consistent. 
> A write lock is obtained before updating the container data during close or 
> update operations.
> During update operation, if the on-disk update of .container file fails, then 
> the container metadata is in-memory is also reset to the old value.
> During close operation, if the on-disk update of .container file fails, then 
> the in-memory containerState is set to CLOSING so that no new operations are 
> permitted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close

2018-08-08 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574012#comment-16574012
 ] 

genericqa commented on HDDS-267:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
58s{color} | {color:green} container-service in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m  0s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}129m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.ozShell.TestOzoneShell |
|   | hadoop.ozone.scm.TestXceiverClientManager |
|   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
|   | hadoop.ozone.web.client.TestKeys |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-267 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934874/HDDS-267.005.patch |
| Optional Tests |  asflicense 

[jira] [Commented] (HDFS-13749) Implement a new client protocol method to get NameNode state

2018-08-08 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574011#comment-16574011
 ] 

Chao Sun commented on HDFS-13749:
-

OK, I'll file a separate JIRA for this then.

> Implement a new client protocol method to get NameNode state
> 
>
> Key: HDFS-13749
> URL: https://issues.apache.org/jira/browse/HDFS-13749
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13749-HDFS-12943.000.patch
>
>
> Currently {{HAServiceProtocol#getServiceStatus}} requires super user 
> privilege. Therefore, as a temporary solution, in HDFS-12976 we discover 
> NameNode state by calling {{reportBadBlocks}}. Here, we'll properly implement 
> this by adding a new method in client protocol to get the NameNode state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13749) Implement a new client protocol method to get NameNode state

2018-08-08 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574008#comment-16574008
 ] 

Konstantin Shvachko commented on HDFS-13749:


We could have reused this jira, but it probably makes sense to create a new one 
to obtain wider visibility to the change.

> Implement a new client protocol method to get NameNode state
> 
>
> Key: HDFS-13749
> URL: https://issues.apache.org/jira/browse/HDFS-13749
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13749-HDFS-12943.000.patch
>
>
> Currently {{HAServiceProtocol#getServiceStatus}} requires super user 
> privilege. Therefore, as a temporary solution, in HDFS-12976 we discover 
> NameNode state by calling {{reportBadBlocks}}. Here, we'll properly implement 
> this by adding a new method in client protocol to get the NameNode state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12416) BlockPlacementPolicyDefault will cause NN shutdown if log level is changed

2018-08-08 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-12416:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Thanks [~brahmareddy] resolve this as a dup. Thanks [~smao].

> BlockPlacementPolicyDefault will cause NN shutdown if log level is changed
> --
>
> Key: HDFS-12416
> URL: https://issues.apache.org/jira/browse/HDFS-12416
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.7.4, 3.0.0-alpha3
>Reporter: Suhan Mao
>Priority: Major
> Attachments: HDFS-12416.001.patch, HDFS-12416.patch
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> In BlockPlacementPolicyDefault.chooseRandom method.
> The code are in below structure:
> {code:java}
> StringBuilder builder = null;
> if (LOG.isDebugEnabled()) {
>   builder = debugLoggingBuilder.get();
>   builder.setLength(0);
>   builder.append("[");
> }
> while(numOfReplicas > 0){
> .
> chooseDataNode(scope, excludedNodes)
> .
> if (LOG.isDebugEnabled()) {
> builder.append("\nNode ").append(NodeBase.getPath(chosenNode))
> .append(" [");
>   }
> }
> {code}
> There's a possibility that the loglevel is INFO before entering while loop, 
> but the loglevel is changed to DEBUG inside the loop through web UI.
> In that case, builder is not initialized in the beginning and 
> NullPointerException will throw and this will cause NN exiting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release

2018-08-08 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573943#comment-16573943
 ] 

Arpit Agarwal commented on HDDS-341:


Makes sense.

> HDDS/Ozone bits are leaking into Hadoop release
> ---
>
> Key: HDDS-341
> URL: https://issues.apache.org/jira/browse/HDDS-341
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Anu Engineer
>Priority: Blocker
> Fix For: 0.2.1
>
>
> [~aw] in the Ozone release discussion reported that Ozone is leaking bits 
> into Hadoop. This has to be fixed before  Hadoop 3.2 or Ozone 0.2.1 release. 
> I will make this a release blocker for Ozone.
>  
> {noformat}
> >Has anyone verified that a Hadoop release doesn't have _any_ of the extra 
> >ozone bits that are sprinkled outside the maven modules?
> [aengineer] : As far as I know that is the state, we have had multiple Hadoop 
> releases after ozone has been merged. So far no one has reported Ozone bits 
> leaking into Hadoop. If we find something like that, it would be a bug.
> [aw]: There hasn't been a release from a branch where Ozone has been merged 
> yet. The first one will be 3.2.0.  Running create-release off of trunk 
> presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in 
> the Hadoop source tar ball.
>   So, consider this as a report. IMHO, cutting an Ozone release prior to 
> a Hadoop release ill-advised given the distribution impact and the 
> requirements of the merge vote.  
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release

2018-08-08 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573942#comment-16573942
 ] 

Anu Engineer commented on HDDS-341:
---

[~arpitagarwal] You are right, but I filed it here since it is a work item in 
the HDDS/Ozone. I am hoping that if we do this work, then it will not be a 
blocker for Hadoop.

 

> HDDS/Ozone bits are leaking into Hadoop release
> ---
>
> Key: HDDS-341
> URL: https://issues.apache.org/jira/browse/HDDS-341
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Anu Engineer
>Priority: Blocker
> Fix For: 0.2.1
>
>
> [~aw] in the Ozone release discussion reported that Ozone is leaking bits 
> into Hadoop. This has to be fixed before  Hadoop 3.2 or Ozone 0.2.1 release. 
> I will make this a release blocker for Ozone.
>  
> {noformat}
> >Has anyone verified that a Hadoop release doesn't have _any_ of the extra 
> >ozone bits that are sprinkled outside the maven modules?
> [aengineer] : As far as I know that is the state, we have had multiple Hadoop 
> releases after ozone has been merged. So far no one has reported Ozone bits 
> leaking into Hadoop. If we find something like that, it would be a bug.
> [aw]: There hasn't been a release from a branch where Ozone has been merged 
> yet. The first one will be 3.2.0.  Running create-release off of trunk 
> presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in 
> the Hadoop source tar ball.
>   So, consider this as a report. IMHO, cutting an Ozone release prior to 
> a Hadoop release ill-advised given the distribution impact and the 
> requirements of the merge vote.  
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release

2018-08-08 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573940#comment-16573940
 ] 

Arpit Agarwal commented on HDDS-341:


This should probably be moved to the Hadoop project and tagged as a blocker for 
Apache Hadoop 3.2.0.

> HDDS/Ozone bits are leaking into Hadoop release
> ---
>
> Key: HDDS-341
> URL: https://issues.apache.org/jira/browse/HDDS-341
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Anu Engineer
>Priority: Blocker
> Fix For: 0.2.1
>
>
> [~aw] in the Ozone release discussion reported that Ozone is leaking bits 
> into Hadoop. This has to be fixed before  Hadoop 3.2 or Ozone 0.2.1 release. 
> I will make this a release blocker for Ozone.
>  
> {noformat}
> >Has anyone verified that a Hadoop release doesn't have _any_ of the extra 
> >ozone bits that are sprinkled outside the maven modules?
> [aengineer] : As far as I know that is the state, we have had multiple Hadoop 
> releases after ozone has been merged. So far no one has reported Ozone bits 
> leaking into Hadoop. If we find something like that, it would be a bug.
> [aw]: There hasn't been a release from a branch where Ozone has been merged 
> yet. The first one will be 3.2.0.  Running create-release off of trunk 
> presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in 
> the Hadoop source tar ball.
>   So, consider this as a report. IMHO, cutting an Ozone release prior to 
> a Hadoop release ill-advised given the distribution impact and the 
> requirements of the merge vote.  
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release

2018-08-08 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-341:
---
Fix Version/s: 0.2.1

> HDDS/Ozone bits are leaking into Hadoop release
> ---
>
> Key: HDDS-341
> URL: https://issues.apache.org/jira/browse/HDDS-341
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Anu Engineer
>Priority: Blocker
> Fix For: 0.2.1
>
>
> [~aw] in the Ozone release discussion reported that Ozone is leaking bits 
> into Hadoop. This has to be fixed before  Hadoop 3.2 or Ozone 0.2.1 release. 
> I will make this a release blocker for Ozone.
>  
> {noformat}
> >Has anyone verified that a Hadoop release doesn't have _any_ of the extra 
> >ozone bits that are sprinkled outside the maven modules?
> [aengineer] : As far as I know that is the state, we have had multiple Hadoop 
> releases after ozone has been merged. So far no one has reported Ozone bits 
> leaking into Hadoop. If we find something like that, it would be a bug.
> [aw]: There hasn't been a release from a branch where Ozone has been merged 
> yet. The first one will be 3.2.0.  Running create-release off of trunk 
> presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in 
> the Hadoop source tar ball.
>   So, consider this as a report. IMHO, cutting an Ozone release prior to 
> a Hadoop release ill-advised given the distribution impact and the 
> requirements of the merge vote.  
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close

2018-08-08 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573922#comment-16573922
 ] 

Bharat Viswanadham commented on HDDS-267:
-

+1, Pending Jenkins.

> Handle consistency issues during container update/close
> ---
>
> Key: HDDS-267
> URL: https://issues.apache.org/jira/browse/HDDS-267
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-267.001.patch, HDDS-267.002.patch, 
> HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch
>
>
> During container update and close, the .container file on disk is modified. 
> We should make sure that the in-memory state and the on-disk state for a 
> container are consistent. 
> A write lock is obtained before updating the container data during close or 
> update operations.
> During update operation, if the on-disk update of .container file fails, then 
> the container metadata is in-memory is also reset to the old value.
> During close operation, if the on-disk update of .container file fails, then 
> the in-memory containerState is set to CLOSING so that no new operations are 
> permitted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release

2018-08-08 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDDS-341:
--
Priority: Blocker  (was: Major)

> HDDS/Ozone bits are leaking into Hadoop release
> ---
>
> Key: HDDS-341
> URL: https://issues.apache.org/jira/browse/HDDS-341
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Anu Engineer
>Priority: Blocker
>
> [~aw] in the Ozone release discussion reported that Ozone is leaking bits 
> into Hadoop. This has to be fixed before  Hadoop 3.2 or Ozone 0.2.1 release. 
> I will make this a release blocker for Ozone.
>  
> {noformat}
> >Has anyone verified that a Hadoop release doesn't have _any_ of the extra 
> >ozone bits that are sprinkled outside the maven modules?
> [aengineer] : As far as I know that is the state, we have had multiple Hadoop 
> releases after ozone has been merged. So far no one has reported Ozone bits 
> leaking into Hadoop. If we find something like that, it would be a bug.
> [aw]: There hasn't been a release from a branch where Ozone has been merged 
> yet. The first one will be 3.2.0.  Running create-release off of trunk 
> presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in 
> the Hadoop source tar ball.
>   So, consider this as a report. IMHO, cutting an Ozone release prior to 
> a Hadoop release ill-advised given the distribution impact and the 
> requirements of the merge vote.  
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-284) CRC for ChunksData

2018-08-08 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573908#comment-16573908
 ] 

genericqa commented on HDDS-284:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 35m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 35m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 35m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 35m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
3s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} container-service in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
35s{color} | {color:green} ozone-manager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} 

[jira] [Commented] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-08 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573907#comment-16573907
 ] 

Wei-Chiu Chuang commented on HDFS-13758:


LGTM. Would you also contribute a branch-2 fix?

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-341) HDDS/Ozone bits are leaking into Hadoop release

2018-08-08 Thread Anu Engineer (JIRA)
Anu Engineer created HDDS-341:
-

 Summary: HDDS/Ozone bits are leaking into Hadoop release
 Key: HDDS-341
 URL: https://issues.apache.org/jira/browse/HDDS-341
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Manager
Reporter: Anu Engineer


[~aw] in the Ozone release discussion reported that Ozone is leaking bits into 
Hadoop. This has to be fixed before  Hadoop 3.2 or Ozone 0.2.1 release. I will 
make this a release blocker for Ozone.

 
{noformat}
>Has anyone verified that a Hadoop release doesn't have _any_ of the extra 
>ozone bits that are sprinkled outside the maven modules?

[aengineer] : As far as I know that is the state, we have had multiple Hadoop 
releases after ozone has been merged. So far no one has reported Ozone bits 
leaking into Hadoop. If we find something like that, it would be a bug.


[aw]:   There hasn't been a release from a branch where Ozone has been merged 
yet. The first one will be 3.2.0.  Running create-release off of trunk 
presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in the 
Hadoop source tar ball.

So, consider this as a report. IMHO, cutting an Ozone release prior to 
a Hadoop release ill-advised given the distribution impact and the requirements 
of the merge vote.  

{noformat}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block

2018-08-08 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573905#comment-16573905
 ] 

Wei-Chiu Chuang commented on HDFS-10240:


Additionally I found through HDFS-13757 that the test should also disable IBR 
report so it doesn't become flaky.
This test file 
https://issues.apache.org/jira/secure/attachment/12932515/HDFS-13757.test.02.patch
 has an example to disable IBR.

> Race between close/recoverLease leads to missing block
> --
>
> Key: HDFS-10240
> URL: https://issues.apache.org/jira/browse/HDFS-10240
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhouyingchao
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-10240-001.patch, 
> HDFS-10240-002.patch, HDFS-10240-003.patch, HDFS-10240.test.patch
>
>
> We got a missing block in our cluster, and logs related to the missing block 
> are as follows:
> 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 
> blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* 
> blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
>  recovery started, 
> primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]
> 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseLease: File XX has not been closed. Lease 
> recovery is in progress. RecoveryId = 153006357 for block 
> blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,248 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, 
> primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
>  has not reached minimal replication 1
> 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.5.53:11402 is added to 
> blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
>  size 139
> 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size 
> 139
> 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size 
> 139
> 2016-03-28,10:00:08,808 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345,
>  newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, 
> 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false)
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK 
> 

[jira] [Commented] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer

2018-08-08 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573903#comment-16573903
 ] 

genericqa commented on HDFS-13795:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 54s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m  7s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}139m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
|   | hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13795 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934864/HDFS-13795.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4effdb0c0fbd 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9499df7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24730/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24730/testReport/ |
| Max. process+thread count | 3461 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Comment Edited] (HDDS-267) Handle consistency issues during container update/close

2018-08-08 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573841#comment-16573841
 ] 

Hanisha Koneru edited comment on HDDS-267 at 8/8/18 9:04 PM:
-

Thanks [~arpitagarwal] and [~bharatviswa] for reviews.
 I have updated patch v05 to handle the create and update container file cases 
separately. Thanks Bharat for catching it.

The test failures are unrelated to this patch and pass locally.


was (Author: hanishakoneru):
Thanks [~arpitagarwal] and [~bharatviswa] for reviews.
I have updated patch v05 to handle the create and update container file cases 
separately. Thanks Bharat for catching it.

> Handle consistency issues during container update/close
> ---
>
> Key: HDDS-267
> URL: https://issues.apache.org/jira/browse/HDDS-267
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-267.001.patch, HDDS-267.002.patch, 
> HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch
>
>
> During container update and close, the .container file on disk is modified. 
> We should make sure that the in-memory state and the on-disk state for a 
> container are consistent. 
> A write lock is obtained before updating the container data during close or 
> update operations.
> During update operation, if the on-disk update of .container file fails, then 
> the container metadata is in-memory is also reset to the old value.
> During close operation, if the on-disk update of .container file fails, then 
> the in-memory containerState is set to CLOSING so that no new operations are 
> permitted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11398) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure still fails intermittently

2018-08-08 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573860#comment-16573860
 ] 

Wei-Chiu Chuang commented on HDFS-11398:


Found the same test failure in HDFS-10240 precommit job.

> TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure still fails 
> intermittently
> 
>
> Key: HDFS-11398
> URL: https://issues.apache.org/jira/browse/HDFS-11398
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-11398-reproduce.patch, HDFS-11398.001.patch, 
> HDFS-11398.002.patch, failure.log
>
>
> The test {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} 
> still fails intermittently in trunk after HDFS-11316. The stack infos:
> {code}
> testUnderReplicationAfterVolFailure(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure)
>   Time elapsed: 95.021 sec  <<< ERROR!
> java.util.concurrent.TimeoutException: Timed out waiting for condition. 
> Thread diagnostics:
> Timestamp: 2017-02-07 07:00:34,193
> 
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:511)
> at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:276)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure(TestDataNodeVolumeFailure.java:412)
> {code}
> I looked into this and found there is one chance that the vaule 
> {{UnderReplicatedBlocksCount}} will be no longer > 0. The following is my 
> analysation:
> In test {{TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure}}, it 
> uses creating file to trigger the disk error checking. The related codes:
> {code}
> Path file1 = new Path("/test1");
> DFSTestUtil.createFile(fs, file1, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file1, (short)3);
> // Fail the first volume on both datanodes
> File dn1Vol1 = new File(dataDir, "data"+(2*0+1));
> File dn2Vol1 = new File(dataDir, "data"+(2*1+1));
> DataNodeTestUtils.injectDataDirFailure(dn1Vol1, dn2Vol1);
> Path file2 = new Path("/test2");
> DFSTestUtil.createFile(fs, file2, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file2, (short)3);
> {code}
> This will lead one problem: If the cluster is busy, and it costs long time to 
> wait replication of file2 to be desired value. During this time, the under 
> replication blocks of file1 can also be rereplication in cluster. If this is 
> done, the condition {{underReplicatedBlocks > 0}} will never be  satisfied.
> And this can be reproduced in my local env.
> Actually, we can use a easy way {{DataNodeTestUtils.waitForDiskError}} to 
> replace this, it runs fast and be more reliable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block

2018-08-08 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573855#comment-16573855
 ] 

Wei-Chiu Chuang commented on HDFS-10240:


HDFS-11398 tracks the test failure.

> Race between close/recoverLease leads to missing block
> --
>
> Key: HDFS-10240
> URL: https://issues.apache.org/jira/browse/HDFS-10240
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhouyingchao
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-10240-001.patch, 
> HDFS-10240-002.patch, HDFS-10240-003.patch, HDFS-10240.test.patch
>
>
> We got a missing block in our cluster, and logs related to the missing block 
> are as follows:
> 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 
> blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* 
> blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
>  recovery started, 
> primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]
> 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseLease: File XX has not been closed. Lease 
> recovery is in progress. RecoveryId = 153006357 for block 
> blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,248 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, 
> primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
>  has not reached minimal replication 1
> 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.5.53:11402 is added to 
> blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
>  size 139
> 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size 
> 139
> 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size 
> 139
> 2016-03-28,10:00:08,808 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345,
>  newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, 
> 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false)
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.5.44:11402 by /10.114.5.44 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> From the 

[jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block

2018-08-08 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573854#comment-16573854
 ] 

Wei-Chiu Chuang commented on HDFS-10240:


[~LiJinglun] thanks for the patch. Test failure is unrelated.
As for the TestDataNodeVolumeFailure failure, let's file another jira to deal 
with that. Please refrain from incorporating unrelated changes in the patch :)



> Race between close/recoverLease leads to missing block
> --
>
> Key: HDFS-10240
> URL: https://issues.apache.org/jira/browse/HDFS-10240
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhouyingchao
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-10240-001.patch, 
> HDFS-10240-002.patch, HDFS-10240-003.patch, HDFS-10240.test.patch
>
>
> We got a missing block in our cluster, and logs related to the missing block 
> are as follows:
> 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 
> blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* 
> blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
>  recovery started, 
> primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]
> 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseLease: File XX has not been closed. Lease 
> recovery is in progress. RecoveryId = 153006357 for block 
> blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,248 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, 
> primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
>  has not reached minimal replication 1
> 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.5.53:11402 is added to 
> blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
>  size 139
> 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size 
> 139
> 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size 
> 139
> 2016-03-28,10:00:08,808 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345,
>  newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, 
> 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false)
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: 

[jira] [Comment Edited] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-08-08 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573844#comment-16573844
 ] 

Wei-Chiu Chuang edited comment on HDFS-13769 at 8/8/18 8:46 PM:


Thanks for the new revision.

* IMO, this jira should convert to a HADOOP jira. Trash is not a HDFS-specific 
feature and this could also be used in other file systems as well (S3A for 
example)

* Instead of calling fs.listStatus(), would you please use 
fs.listStatusIterator()?

** The former gets *everything* under a path, so you would see a bump in JVM 
heap usage for a large dir.

* I am still not satisfied with FileSystem#contentSummary(). The closest I 
could find is FileSystem#getQuotaUsage() which would return number of objects 
in a directory. but quota is not enabled by default.

*Nits:
{code}
import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.*;
{code}
TrashPolicyWithSafeDelete should not do wildcard import


* Nits2:
{code}
LOG.debug("DIR "+ path + " in trash is too large, try safe delete.");
{code}
This is not necessarily true, if skipCheckLimit is true.


was (Author: jojochuang):
Thanks for the new revision.

IMO, this jira should convert to a HADOOP jira. Trash is not a HDFS-specific 
feature and this could also be used in other file systems as well (S3A for 
example)

Instead of calling fs.listStatus(), would you please use 
fs.listStatusIterator()?

The former gets *everything* under a path, so you would see a bump in JVM heap 
usage for a large dir.

I am still not satisfied with FileSystem#contentSummary(). The closest I could 
find is FileSystem#getQuotaUsage() which would return number of objects in a 
directory. but quota is not enabled by default.

Nits:
{code}
import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.*;
{code}
TrashPolicyWithSafeDelete should not do wildcard import


{code}
LOG.debug("DIR "+ path + " in trash is too large, try safe delete.");
{code}
This is not necessarily true, if skipCheckLimit is true.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch, 
> HDFS-13769.003.patch, HDFS-13769.004.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-08-08 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573844#comment-16573844
 ] 

Wei-Chiu Chuang edited comment on HDFS-13769 at 8/8/18 8:45 PM:


Thanks for the new revision.

IMO, this jira should convert to a HADOOP jira. Trash is not a HDFS-specific 
feature and this could also be used in other file systems as well (S3A for 
example)

Instead of calling fs.listStatus(), would you please use 
fs.listStatusIterator()?

The former gets *everything* under a path, so you would see a bump in JVM heap 
usage for a large dir.

I am still not satisfied with FileSystem#contentSummary(). The closest I could 
find is FileSystem#getQuotaUsage() which would return number of objects in a 
directory. but quota is not enabled by default.

Nits:
{code}
import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.*;
{code}
TrashPolicyWithSafeDelete should not do wildcard import


{code}
LOG.debug("DIR "+ path + " in trash is too large, try safe delete.");
{code}
This is not necessarily true, if skipCheckLimit is true.


was (Author: jojochuang):
Thanks for the new revision.

IMO, this jira should convert to a HADOOP jira. Trash is not a HDFS-specific 
feature and this is
Instead of calling fs.listStatus(), would you please use 
fs.listStatusIterator()?

The former gets *everything* under a path, so you would see a bump in JVM heap 
usage for a large dir.

I am still not satisfied with FileSystem#contentSummary(). The closest I could 
find is FileSystem#getQuotaUsage() which would return number of objects in a 
directory. but quota is not enabled by default.

Nits:
{code}
import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.*;
{code}
TrashPolicyWithSafeDelete should not do wildcard import


{code}
LOG.debug("DIR "+ path + " in trash is too large, try safe delete.");
{code}
This is not necessarily true, if skipCheckLimit is true.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch, 
> HDFS-13769.003.patch, HDFS-13769.004.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-08-08 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573844#comment-16573844
 ] 

Wei-Chiu Chuang commented on HDFS-13769:


Thanks for the new revision.

IMO, this jira should convert to a HADOOP jira. Trash is not a HDFS-specific 
feature and this is
Instead of calling fs.listStatus(), would you please use 
fs.listStatusIterator()?

The former gets *everything* under a path, so you would see a bump in JVM heap 
usage for a large dir.

I am still not satisfied with FileSystem#contentSummary(). The closest I could 
find is FileSystem#getQuotaUsage() which would return number of objects in a 
directory. but quota is not enabled by default.

Nits:
{code}
import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.*;
{code}
TrashPolicyWithSafeDelete should not do wildcard import


{code}
LOG.debug("DIR "+ path + " in trash is too large, try safe delete.");
{code}
This is not necessarily true, if skipCheckLimit is true.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch, 
> HDFS-13769.003.patch, HDFS-13769.004.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close

2018-08-08 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573841#comment-16573841
 ] 

Hanisha Koneru commented on HDDS-267:
-

Thanks [~arpitagarwal] and [~bharatviswa] for reviews.
I have updated patch v05 to handle the create and update container file cases 
separately. Thanks Bharat for catching it.

> Handle consistency issues during container update/close
> ---
>
> Key: HDDS-267
> URL: https://issues.apache.org/jira/browse/HDDS-267
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-267.001.patch, HDDS-267.002.patch, 
> HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch
>
>
> During container update and close, the .container file on disk is modified. 
> We should make sure that the in-memory state and the on-disk state for a 
> container are consistent. 
> A write lock is obtained before updating the container data during close or 
> update operations.
> During update operation, if the on-disk update of .container file fails, then 
> the container metadata is in-memory is also reset to the old value.
> During close operation, if the on-disk update of .container file fails, then 
> the in-memory containerState is set to CLOSING so that no new operations are 
> permitted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-267) Handle consistency issues during container update/close

2018-08-08 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-267:

Attachment: HDDS-267.005.patch

> Handle consistency issues during container update/close
> ---
>
> Key: HDDS-267
> URL: https://issues.apache.org/jira/browse/HDDS-267
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-267.001.patch, HDDS-267.002.patch, 
> HDDS-267.003.patch, HDDS-267.004.patch, HDDS-267.005.patch
>
>
> During container update and close, the .container file on disk is modified. 
> We should make sure that the in-memory state and the on-disk state for a 
> container are consistent. 
> A write lock is obtained before updating the container data during close or 
> update operations.
> During update operation, if the on-disk update of .container file fails, then 
> the container metadata is in-memory is also reset to the old value.
> During close operation, if the on-disk update of .container file fails, then 
> the in-memory containerState is set to CLOSING so that no new operations are 
> permitted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-308) SCM should identify a container with pending deletes using container reports

2018-08-08 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573823#comment-16573823
 ] 

genericqa commented on HDDS-308:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 32m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
42s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 30m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
54s{color} | {color:green} container-service in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
36s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 18s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline
 |
|   | hadoop.ozone.web.client.TestBuckets |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-308 |
| JIRA Patch URL | 

[jira] [Commented] (HDFS-13749) Implement a new client protocol method to get NameNode state

2018-08-08 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573814#comment-16573814
 ] 

Chao Sun commented on HDFS-13749:
-

Thanks both [~zero45] and [~shv]. I agree with both of you and don't see why 
this restriction can't be dropped. Shall we file a JIRA for trunk to remove 
this restriction?

> Implement a new client protocol method to get NameNode state
> 
>
> Key: HDFS-13749
> URL: https://issues.apache.org/jira/browse/HDFS-13749
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13749-HDFS-12943.000.patch
>
>
> Currently {{HAServiceProtocol#getServiceStatus}} requires super user 
> privilege. Therefore, as a temporary solution, in HDFS-12976 we discover 
> NameNode state by calling {{reportBadBlocks}}. Here, we'll properly implement 
> this by adding a new method in client protocol to get the NameNode state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13789) Reduce logging frequency of QuorumJournalManager#selectInputStreams

2018-08-08 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13789:

   Resolution: Fixed
Fix Version/s: HDFS-12943
   Status: Resolved  (was: Patch Available)

Committed to the branch. Thanks [~xkrogen].

> Reduce logging frequency of QuorumJournalManager#selectInputStreams
> ---
>
> Key: HDFS-13789
> URL: https://issues.apache.org/jira/browse/HDFS-13789
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, qjm
>Affects Versions: HDFS-12943
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Trivial
> Fix For: HDFS-12943
>
> Attachments: HDFS-13789-HDFS-12943.000.patch
>
>
> As part of HDFS-13150, a logging statement was added to indicate whenever an 
> edit tail is performed via the RPC mechanism. To enable low latency tailing, 
> the tail frequency must be set very low, so this log statement gets printed 
> much too frequently at an INFO level. We should decrease to DEBUG. Note that 
> if there are actually edits available to tail, other log messages will get 
> printed; this is just targeting the case when it attempts to tail and there 
> are no new edits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently

2018-08-08 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573772#comment-16573772
 ] 

genericqa commented on HDFS-13802:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
58s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-rbf generated 2 new + 
0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}101m 19s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 
35s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}213m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf |
|  |  Found reliance on default encoding in 
org.apache.hadoop.hdfs.server.federation.router.RouterFsck.remoteFsck(MembershipState):in
 
org.apache.hadoop.hdfs.server.federation.router.RouterFsck.remoteFsck(MembershipState):
 new java.io.InputStreamReader(InputStream)  At RouterFsck.java:[line 130] |
|  |  
org.apache.hadoop.hdfs.server.federation.router.RouterFsck.remoteFsck(MembershipState)
 may fail to close stream  At RouterFsck.java:stream  At RouterFsck.java:[line 
131] |
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce 

[jira] [Commented] (HDDS-263) Add retries in Ozone Client to handle BLOCK_NOT_COMMITTED Exception

2018-08-08 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573755#comment-16573755
 ] 

Shashikant Banerjee commented on HDDS-263:
--

Patch v0 is blocked on HDDS-247. Not submitting it for now.

> Add retries in Ozone Client to handle BLOCK_NOT_COMMITTED Exception
> ---
>
> Key: HDDS-263
> URL: https://issues.apache.org/jira/browse/HDDS-263
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-263.00.patch
>
>
> While Ozone client writes are going on, a container on a datanode can gets 
> closed because of node failures, disk out of space etc. In situations as 
> such, client write will fail with CLOSED_CONTAINER_IO. In this case, ozone 
> client should try to get the committed block length for the pending open 
> blocks and update the OzoneManager. While trying to get the committed block 
> length, it may fail with BLOCK_NOT_COMMITTED exception as the as a part of 
> transiton from CLOSING to CLOSED state for the container , it commits all 
> open blocks one by one. In such cases, client needs to retry to get the 
> committed block length for a fixed no of attempts and eventually throw the 
> exception to the application if its not able to successfully get and update 
> the length in the OzoneManager. This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-263) Add retries in Ozone Client to handle BLOCK_NOT_COMMITTED Exception

2018-08-08 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-263:
-
Attachment: HDDS-263.00.patch

> Add retries in Ozone Client to handle BLOCK_NOT_COMMITTED Exception
> ---
>
> Key: HDDS-263
> URL: https://issues.apache.org/jira/browse/HDDS-263
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-263.00.patch
>
>
> While Ozone client writes are going on, a container on a datanode can gets 
> closed because of node failures, disk out of space etc. In situations as 
> such, client write will fail with CLOSED_CONTAINER_IO. In this case, ozone 
> client should try to get the committed block length for the pending open 
> blocks and update the OzoneManager. While trying to get the committed block 
> length, it may fail with BLOCK_NOT_COMMITTED exception as the as a part of 
> transiton from CLOSING to CLOSED state for the container , it commits all 
> open blocks one by one. In such cases, client needs to retry to get the 
> committed block length for a fixed no of attempts and eventually throw the 
> exception to the application if its not able to successfully get and update 
> the length in the OzoneManager. This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-339) Add block length and blockId in PutKeyResponse

2018-08-08 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573718#comment-16573718
 ] 

genericqa commented on HDDS-339:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
58s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
57s{color} | {color:red} hadoop-hdds/container-service generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
3s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
58s{color} | {color:green} container-service in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 42s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}136m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdds/container-service |
|  |  Dead store to builder in 

[jira] [Commented] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer

2018-08-08 Thread Virajith Jalaparti (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573707#comment-16573707
 ] 

Virajith Jalaparti commented on HDFS-13795:
---

Thanks [~elgoiri]. [^HDFS-13795.004.patch] fixes the failed test 
{{TestInMemoryLevelDBAliasMapClient}}

> Fix potential NPE in InMemoryLevelDBAliasMapServer
> --
>
> Key: HDFS-13795
> URL: https://issues.apache.org/jira/browse/HDFS-13795
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, 
> HDFS-13795.003.patch, HDFS-13795.004.patch
>
>
> Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when 
> it is configured incorrectly.
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer

2018-08-08 Thread Virajith Jalaparti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HDFS-13795:
--
Status: Open  (was: Patch Available)

> Fix potential NPE in InMemoryLevelDBAliasMapServer
> --
>
> Key: HDFS-13795
> URL: https://issues.apache.org/jira/browse/HDFS-13795
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, 
> HDFS-13795.003.patch, HDFS-13795.004.patch
>
>
> Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when 
> it is configured incorrectly.
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer

2018-08-08 Thread Virajith Jalaparti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HDFS-13795:
--
Status: Patch Available  (was: Open)

> Fix potential NPE in InMemoryLevelDBAliasMapServer
> --
>
> Key: HDFS-13795
> URL: https://issues.apache.org/jira/browse/HDFS-13795
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, 
> HDFS-13795.003.patch, HDFS-13795.004.patch
>
>
> Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when 
> it is configured incorrectly.
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer

2018-08-08 Thread Virajith Jalaparti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HDFS-13795:
--
Attachment: HDFS-13795.004.patch

> Fix potential NPE in InMemoryLevelDBAliasMapServer
> --
>
> Key: HDFS-13795
> URL: https://issues.apache.org/jira/browse/HDFS-13795
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, 
> HDFS-13795.003.patch, HDFS-13795.004.patch
>
>
> Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when 
> it is configured incorrectly.
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-333) Create an Ozone Logo

2018-08-08 Thread Priyanka Nagwekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyanka Nagwekar updated HDDS-333:
---
Attachment: Ozone-Logo-Options.png

> Create an Ozone Logo
> 
>
> Key: HDDS-333
> URL: https://issues.apache.org/jira/browse/HDDS-333
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Anu Engineer
>Assignee: Priyanka Nagwekar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: Logo Final.zip, Logo-Ozone-Transparent-Bg.png, 
> Ozone-Logo-Options.png
>
>
> As part of developing Ozone Website and Documentation, It would be nice to 
> have an Ozone Logo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close

2018-08-08 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573689#comment-16573689
 ] 

Bharat Viswanadham commented on HDDS-267:
-

Hi [~hanishakoneru]

I have a question here, writeToContainerFile is called from create and 
close/update via updateContainerFile. So, in 2 other cases, close/update 
container file already exists, and rename will fail. As the destination file 
already exists in the case of close/update. 

 

> Handle consistency issues during container update/close
> ---
>
> Key: HDDS-267
> URL: https://issues.apache.org/jira/browse/HDDS-267
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-267.001.patch, HDDS-267.002.patch, 
> HDDS-267.003.patch, HDDS-267.004.patch
>
>
> During container update and close, the .container file on disk is modified. 
> We should make sure that the in-memory state and the on-disk state for a 
> container are consistent. 
> A write lock is obtained before updating the container data during close or 
> update operations.
> During update operation, if the on-disk update of .container file fails, then 
> the container metadata is in-memory is also reset to the old value.
> During close operation, if the on-disk update of .container file fails, then 
> the in-memory containerState is set to CLOSING so that no new operations are 
> permitted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-284) CRC for ChunksData

2018-08-08 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573676#comment-16573676
 ] 

Bharat Viswanadham edited comment on HDDS-284 at 8/8/18 6:32 PM:
-

Attached patch v04.

Fixed findbug issues.

And the failed test case TestBuckets is passing locally, whereas TestKeys I 
have run 3 times, I have seen it fail one time. Need to look into this test 
issue. But I don't think that is related to this patch.


was (Author: bharatviswa):
Attached patch v04.

Fixed findbug issues.

And the failed testcase's are passing locally.

TestKeys, in 3 times, I have seen randomly failing one time. Need to look into 
this test issue. But I don't think that is related to this patch.

> CRC for ChunksData
> --
>
> Key: HDDS-284
> URL: https://issues.apache.org/jira/browse/HDDS-284
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: HDDS-284.00.patch, HDDS-284.01.patch, HDDS-284.02.patch, 
> HDDS-284.03.patch, HDDS-284.04.patch, Interleaving CRC and Error Detection 
> for Containers.pdf
>
>
> This Jira is to add CRC for chunks data.
>  
>  
> Right now a Chunk Info structure looks like this:
>  
> _message ChunkInfo {_
>  _required string chunkName =_ _1__;_
> _required uint64 offset =_ _2__;_
> _required uint64 len =_ _3__;_
> _optional string checksum =_ _4__;_
> _repeated KeyValue metadata =_ _5__;_
> _}_
>  
> _Proposal is to change ChunkInfo structure as below:_
>  
> _message ChunkInfo {_
>  _required string chunkName =_ _1__;_
>  _required uint64 offset =_ _2__;_
>  _required uint64 len =_ _3__;_
>  _optional bytes checksum =_ _4__;_
>  _optional CRCType checksumType =_ _5__;_
>  _optional string legacyMetadata =_ _6__;_
>  _optional string legacyData =_ _7__;_
>  _repeated KeyValue metadata =_ _8__;_
> _}_
>  
> _Instead of changing disk format, we put the checksum, checksumtype and 
> legacy data fields in to chunkInfo._
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-284) CRC for ChunksData

2018-08-08 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573676#comment-16573676
 ] 

Bharat Viswanadham commented on HDDS-284:
-

Attached patch v04.

Fixed findbug issues.

And the failed testcase's are passing locally.

TestKeys, in 3 times, I have seen randomly failing one time. Need to look into 
this test issue. But I don't think that is related to this patch.

> CRC for ChunksData
> --
>
> Key: HDDS-284
> URL: https://issues.apache.org/jira/browse/HDDS-284
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: HDDS-284.00.patch, HDDS-284.01.patch, HDDS-284.02.patch, 
> HDDS-284.03.patch, HDDS-284.04.patch, Interleaving CRC and Error Detection 
> for Containers.pdf
>
>
> This Jira is to add CRC for chunks data.
>  
>  
> Right now a Chunk Info structure looks like this:
>  
> _message ChunkInfo {_
>  _required string chunkName =_ _1__;_
> _required uint64 offset =_ _2__;_
> _required uint64 len =_ _3__;_
> _optional string checksum =_ _4__;_
> _repeated KeyValue metadata =_ _5__;_
> _}_
>  
> _Proposal is to change ChunkInfo structure as below:_
>  
> _message ChunkInfo {_
>  _required string chunkName =_ _1__;_
>  _required uint64 offset =_ _2__;_
>  _required uint64 len =_ _3__;_
>  _optional bytes checksum =_ _4__;_
>  _optional CRCType checksumType =_ _5__;_
>  _optional string legacyMetadata =_ _6__;_
>  _optional string legacyData =_ _7__;_
>  _repeated KeyValue metadata =_ _8__;_
> _}_
>  
> _Instead of changing disk format, we put the checksum, checksumtype and 
> legacy data fields in to chunkInfo._
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-335) Fix logging for scm events

2018-08-08 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-335:

Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

> Fix logging for scm events
> --
>
> Key: HDDS-335
> URL: https://issues.apache.org/jira/browse/HDDS-335
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDDS-335.00.patch
>
>
> Logs should print event type,  classname and object currently logged are not 
> very useful.
>  
> {code:java}
> java.lang.IllegalArgumentException: No event handler registered for event 
> org.apache.hadoop.hdds.server.events.TypedEvent@69464649
>  at 
> org.apache.hadoop.hdds.server.events.EventQueue.fireEvent(EventQueue.java:116)
>  at 
> org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher.dispatch(SCMDatanodeHeartbeatDispatcher.java:66)
>  at 
> org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.sendHeartbeat(SCMDatanodeProtocolServer.java:219)
>  at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolServerSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolServerSideTranslatorPB.java:90)
>  at 
> org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos$StorageContainerDatanodeProtocolService$2.callBlockingMethod(StorageContainerDatanodeProtocolProtos.java:19310){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-284) CRC for ChunksData

2018-08-08 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-284:

Attachment: HDDS-284.04.patch

> CRC for ChunksData
> --
>
> Key: HDDS-284
> URL: https://issues.apache.org/jira/browse/HDDS-284
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: HDDS-284.00.patch, HDDS-284.01.patch, HDDS-284.02.patch, 
> HDDS-284.03.patch, HDDS-284.04.patch, Interleaving CRC and Error Detection 
> for Containers.pdf
>
>
> This Jira is to add CRC for chunks data.
>  
>  
> Right now a Chunk Info structure looks like this:
>  
> _message ChunkInfo {_
>  _required string chunkName =_ _1__;_
> _required uint64 offset =_ _2__;_
> _required uint64 len =_ _3__;_
> _optional string checksum =_ _4__;_
> _repeated KeyValue metadata =_ _5__;_
> _}_
>  
> _Proposal is to change ChunkInfo structure as below:_
>  
> _message ChunkInfo {_
>  _required string chunkName =_ _1__;_
>  _required uint64 offset =_ _2__;_
>  _required uint64 len =_ _3__;_
>  _optional bytes checksum =_ _4__;_
>  _optional CRCType checksumType =_ _5__;_
>  _optional string legacyMetadata =_ _6__;_
>  _optional string legacyData =_ _7__;_
>  _repeated KeyValue metadata =_ _8__;_
> _}_
>  
> _Instead of changing disk format, we put the checksum, checksumtype and 
> legacy data fields in to chunkInfo._
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-335) Fix logging for scm events

2018-08-08 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573673#comment-16573673
 ] 

Ajay Kumar commented on HDDS-335:
-

[~nandakumar131] thanks for checking this. Ya, that should handle it. Resolving 
ticket.

> Fix logging for scm events
> --
>
> Key: HDDS-335
> URL: https://issues.apache.org/jira/browse/HDDS-335
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDDS-335.00.patch
>
>
> Logs should print event type,  classname and object currently logged are not 
> very useful.
>  
> {code:java}
> java.lang.IllegalArgumentException: No event handler registered for event 
> org.apache.hadoop.hdds.server.events.TypedEvent@69464649
>  at 
> org.apache.hadoop.hdds.server.events.EventQueue.fireEvent(EventQueue.java:116)
>  at 
> org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher.dispatch(SCMDatanodeHeartbeatDispatcher.java:66)
>  at 
> org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.sendHeartbeat(SCMDatanodeProtocolServer.java:219)
>  at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolServerSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolServerSideTranslatorPB.java:90)
>  at 
> org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos$StorageContainerDatanodeProtocolService$2.callBlockingMethod(StorageContainerDatanodeProtocolProtos.java:19310){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-267) Handle consistency issues during container update/close

2018-08-08 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573665#comment-16573665
 ] 

Arpit Agarwal commented on HDDS-267:


+1 lgtm. Thanks for this improvement [~hanishakoneru].

Are the unit-test failures related to the patch?

 

> Handle consistency issues during container update/close
> ---
>
> Key: HDDS-267
> URL: https://issues.apache.org/jira/browse/HDDS-267
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-267.001.patch, HDDS-267.002.patch, 
> HDDS-267.003.patch, HDDS-267.004.patch
>
>
> During container update and close, the .container file on disk is modified. 
> We should make sure that the in-memory state and the on-disk state for a 
> container are consistent. 
> A write lock is obtained before updating the container data during close or 
> update operations.
> During update operation, if the on-disk update of .container file fails, then 
> the container metadata is in-memory is also reset to the old value.
> During close operation, if the on-disk update of .container file fails, then 
> the in-memory containerState is set to CLOSING so that no new operations are 
> permitted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-174) Shell error messages are often cryptic

2018-08-08 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573621#comment-16573621
 ] 

Arpit Agarwal commented on HDDS-174:


[~xyao], sorry I think I lost this patch :( I will see if I can rewrite my 
changes.

> Shell error messages are often cryptic
> --
>
> Key: HDDS-174
> URL: https://issues.apache.org/jira/browse/HDDS-174
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Nanda kumar
>Priority: Critical
>  Labels: newbie
> Fix For: 0.2.1
>
>
> Error messages in the Ozone shell are often too cryptic. e.g.
> {code}
> $ ozone oz -putKey /vol1/bucket1/key1 -file foo.txt
> Command Failed : Create key failed, error:INTERNAL_ERROR
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13658) Expose HighestPriorityLowRedundancy blocks statistics

2018-08-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573610#comment-16573610
 ] 

Hudson commented on HDFS-13658:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14729 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14729/])
HDFS-13658. Expose HighestPriorityLowRedundancy blocks statistics. (xiao: rev 
9499df7b81b55b488a32fd59798a543dafef4ef8)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ReplicatedBlockStats.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestLowRedundancyBlockQueues.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/NamenodeBeanMetrics.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ErasureCoding.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeMXBean.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* (edit) hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ECBlockGroupStats.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdmin.java


> Expose HighestPriorityLowRedundancy blocks statistics
> -
>
> Key: HDFS-13658
> URL: https://issues.apache.org/jira/browse/HDFS-13658
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, 
> HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch, 
> HDFS-13658.006.patch, HDFS-13658.007.patch, HDFS-13658.008.patch, 
> HDFS-13658.009.patch, HDFS-13658.010.patch, HDFS-13658.011.patch, 
> HDFS-13658.012.patch
>
>
> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica. We have had many cases opened in which a customer has lost a disk 
> or a DN losing files/blocks due to the fact that they had blocks with only 1 
> replica. We need to make the customer better aware of this situation and that 
> they should take action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-308) SCM should identify a container with pending deletes using container reports

2018-08-08 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573606#comment-16573606
 ] 

Lokesh Jain commented on HDDS-308:
--

v6 patch makes changes to ensure that the number of nodes in pipeline for a 
container match the replication factor in DeletedBlockLogImpl#commitTransactions

> SCM should identify a container with pending deletes using container reports
> 
>
> Key: HDDS-308
> URL: https://issues.apache.org/jira/browse/HDDS-308
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-308.001.patch, HDDS-308.002.patch, 
> HDDS-308.003.patch, HDDS-308.004.patch, HDDS-308.005.patch, HDDS-308.006.patch
>
>
> SCM should fire an event when it finds using container report that a 
> container's deleteTransactionID does not match SCM's deleteTransactionId.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-308) SCM should identify a container with pending deletes using container reports

2018-08-08 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-308:
-
Attachment: HDDS-308.006.patch

> SCM should identify a container with pending deletes using container reports
> 
>
> Key: HDDS-308
> URL: https://issues.apache.org/jira/browse/HDDS-308
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-308.001.patch, HDDS-308.002.patch, 
> HDDS-308.003.patch, HDDS-308.004.patch, HDDS-308.005.patch, HDDS-308.006.patch
>
>
> SCM should fire an event when it finds using container report that a 
> container's deleteTransactionID does not match SCM's deleteTransactionId.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-325) Add event watcher for delete blocks command

2018-08-08 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573588#comment-16573588
 ] 

Lokesh Jain commented on HDDS-325:
--

[~elek] Thanks for reviewing the patch! I will try to make the changes more 
clear below.
 # I wanted to have a single RetriableEventWatcher which watches over all the 
events which need to be retried. That is the reason I introduced watchEvents 
function in the EventWatcher class. I thought we could use this event watcher 
to watch over CloseContainer as well as ReplicationEvent.
 # The reason I have added onFinished/onTimeout api in RetriablePayload so as 
to support different events in a single event watcher. Different payloads can 
trigger events according to their requirement and can still use the same 
watcher.
 # For replication command we can pass the request as part of the payload. Then 
payload can fire the required event in its onTimeout function.

The patch helps in getting a single watcher to watch over all the events. Using 
this approach we will not need to fire separate events for tracking the 
DATANODE_COMMAND. We can easily add RETRIABLE_DATANODE_COMMAND into the watcher 
and it can watch over all these events. I was also thinking of adding the 
timeout duration as part of the RetriablePayload api so that we can have 
different timeout for each event type. 

> Add event watcher for delete blocks command
> ---
>
> Key: HDDS-325
> URL: https://issues.apache.org/jira/browse/HDDS-325
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-325.001.patch, HDDS-325.002.patch
>
>
> This Jira aims to add watcher for deleteBlocks command. It removes the 
> current rpc call required for datanode to send the acknowledgement for 
> deleteBlocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13658) Expose HighestPriorityLowRedundancy blocks statistics

2018-08-08 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573566#comment-16573566
 ] 

Xiao Chen edited comment on HDFS-13658 at 8/8/18 5:41 PM:
--

+1. Committed to trunk.

Thanks for the great work here Kitti, and Gabor / Andrew for the reviews and 
thoughts!


was (Author: xiaochen):
+1. Committed to trunk.

 

Thanks for the great work here Kitti, and Andrew for the reviews and thoughts!

> Expose HighestPriorityLowRedundancy blocks statistics
> -
>
> Key: HDFS-13658
> URL: https://issues.apache.org/jira/browse/HDFS-13658
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, 
> HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch, 
> HDFS-13658.006.patch, HDFS-13658.007.patch, HDFS-13658.008.patch, 
> HDFS-13658.009.patch, HDFS-13658.010.patch, HDFS-13658.011.patch, 
> HDFS-13658.012.patch
>
>
> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica. We have had many cases opened in which a customer has lost a disk 
> or a DN losing files/blocks due to the fact that they had blocks with only 1 
> replica. We need to make the customer better aware of this situation and that 
> they should take action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13658) Expose HighestPriorityLowRedundancy blocks statistics

2018-08-08 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13658:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

+1. Committed to trunk.

 

Thanks for the great work here Kitti, and Andrew for the reviews and thoughts!

> Expose HighestPriorityLowRedundancy blocks statistics
> -
>
> Key: HDFS-13658
> URL: https://issues.apache.org/jira/browse/HDFS-13658
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, 
> HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch, 
> HDFS-13658.006.patch, HDFS-13658.007.patch, HDFS-13658.008.patch, 
> HDFS-13658.009.patch, HDFS-13658.010.patch, HDFS-13658.011.patch, 
> HDFS-13658.012.patch
>
>
> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica. We have had many cases opened in which a customer has lost a disk 
> or a DN losing files/blocks due to the fact that they had blocks with only 1 
> replica. We need to make the customer better aware of this situation and that 
> they should take action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13658) Expose HighestPriorityLowRedundancy blocks statistics

2018-08-08 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13658:
-
Summary: Expose HighestPriorityLowRedundancy blocks statistics  (was: fsck, 
dfsadmin -report, and NN WebUI should report number of blocks that have 1 
replica)

> Expose HighestPriorityLowRedundancy blocks statistics
> -
>
> Key: HDFS-13658
> URL: https://issues.apache.org/jira/browse/HDFS-13658
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, 
> HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch, 
> HDFS-13658.006.patch, HDFS-13658.007.patch, HDFS-13658.008.patch, 
> HDFS-13658.009.patch, HDFS-13658.010.patch, HDFS-13658.011.patch, 
> HDFS-13658.012.patch
>
>
> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica. We have had many cases opened in which a customer has lost a disk 
> or a DN losing files/blocks due to the fact that they had blocks with only 1 
> replica. We need to make the customer better aware of this situation and that 
> they should take action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-333) Create an Ozone Logo

2018-08-08 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573532#comment-16573532
 ] 

Tsz Wo Nicholas Sze commented on HDDS-333:
--

+1 the logo looks good.

> Create an Ozone Logo
> 
>
> Key: HDDS-333
> URL: https://issues.apache.org/jira/browse/HDDS-333
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Anu Engineer
>Assignee: Priyanka Nagwekar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: Logo Final.zip, Logo-Ozone-Transparent-Bg.png
>
>
> As part of developing Ozone Website and Documentation, It would be nice to 
> have an Ozone Logo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-335) Fix logging for scm events

2018-08-08 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573517#comment-16573517
 ] 

Nanda kumar commented on HDDS-335:
--

[~ajayydv], in HDDS-199 we have added {{toString}} method to {{TypedEvent}}. 
After that patch, we should get proper event details in log instead of object 
address.

> Fix logging for scm events
> --
>
> Key: HDDS-335
> URL: https://issues.apache.org/jira/browse/HDDS-335
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDDS-335.00.patch
>
>
> Logs should print event type,  classname and object currently logged are not 
> very useful.
>  
> {code:java}
> java.lang.IllegalArgumentException: No event handler registered for event 
> org.apache.hadoop.hdds.server.events.TypedEvent@69464649
>  at 
> org.apache.hadoop.hdds.server.events.EventQueue.fireEvent(EventQueue.java:116)
>  at 
> org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher.dispatch(SCMDatanodeHeartbeatDispatcher.java:66)
>  at 
> org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.sendHeartbeat(SCMDatanodeProtocolServer.java:219)
>  at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolServerSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolServerSideTranslatorPB.java:90)
>  at 
> org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos$StorageContainerDatanodeProtocolService$2.callBlockingMethod(StorageContainerDatanodeProtocolProtos.java:19310){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13738) fsck -list-corruptfileblocks has infinite loop if user is not privileged.

2018-08-08 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573509#comment-16573509
 ] 

Wei-Chiu Chuang commented on HDFS-13738:


:) Sorry I forgot I crafted a test case previously.

the errCode is returned, and eventually an exit code. Since the operation fails 
in this case, should we set errCode to a non-zero value? Looks to me that a -1 
should be returned.

 

Additionally, you will also want to make sure to check return code when calling 
runFsck():
{code:java}

String outStr = runFsck(conf, -1, true, path, "-list-corruptfileblocks");{code}

Other than that I am +1.

> fsck -list-corruptfileblocks has infinite loop if user is not privileged.
> -
>
> Key: HDFS-13738
> URL: https://issues.apache.org/jira/browse/HDFS-13738
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.6.0, 3.0.0
> Environment: Kerberized Hadoop cluster
>Reporter: Wei-Chiu Chuang
>Assignee: Yuen-Kuei Hsueh
>Priority: Major
> Attachments: HDFS-13738.001.patch, HDFS-13738.002.patch, 
> HDFS-13738.test.patch
>
>
> Found an interesting bug.
> Execute following command as any non-privileged user:
> {noformat}
> # run fsck
> $ hdfs fsck / -list-corruptfileblocks
> {noformat}
> {noformat}
> FSCK ended at Mon Jul 16 15:14:03 PDT 2018 in 1 milliseconds
> Access denied for user systest. Superuser privilege is required
> Fsck on path '/' FAILED
> FSCK ended at Mon Jul 16 15:14:03 PDT 2018 in 0 milliseconds
> Access denied for user systest. Superuser privilege is required
> Fsck on path '/' FAILED
> FSCK ended at Mon Jul 16 15:14:03 PDT 2018 in 1 milliseconds
> Access denied for user systest. Superuser privilege is required
> Fsck on path '/' FAILED
> {noformat}
> Reproducible on Hadoop 3.0.0 as well as 2.6.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-339) Add block length and blockId in PutKeyResponse

2018-08-08 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573482#comment-16573482
 ] 

Shashikant Banerjee commented on HDDS-339:
--

Ptach v1 adds GetCommittedBlockLength response to PutKeyReponse. It also fixes 
a bug in OpenContainerBlockMap where the Chunk was being added to 
OpenContainerBlockMap during WRITE stage where it should be added in COMMIT 
stage of write chunk.

> Add block length and blockId in PutKeyResponse
> --
>
> Key: HDDS-339
> URL: https://issues.apache.org/jira/browse/HDDS-339
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-339.00.patch
>
>
> The putKey response will include blockId as well committed block length in 
> the PutKey response. This will be extended to include blockCommitSequenceId 
> as well all of which will be updated on Ozone Master. This all be required to 
> add validation as well handle 2 node failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-339) Add block length and blockId in PutKeyResponse

2018-08-08 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-339:
-
Attachment: HDDS-339.00.patch

> Add block length and blockId in PutKeyResponse
> --
>
> Key: HDDS-339
> URL: https://issues.apache.org/jira/browse/HDDS-339
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-339.00.patch
>
>
> The putKey response will include blockId as well committed block length in 
> the PutKey response. This will be extended to include blockCommitSequenceId 
> as well all of which will be updated on Ozone Master. This all be required to 
> add validation as well handle 2 node failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-339) Add block length and blockId in PutKeyResponse

2018-08-08 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-339:
-
Status: Patch Available  (was: Open)

> Add block length and blockId in PutKeyResponse
> --
>
> Key: HDDS-339
> URL: https://issues.apache.org/jira/browse/HDDS-339
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-339.00.patch
>
>
> The putKey response will include blockId as well committed block length in 
> the PutKey response. This will be extended to include blockCommitSequenceId 
> as well all of which will be updated on Ozone Master. This all be required to 
> add validation as well handle 2 node failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-340) ContainerStateMachine#readStateMachinedata should read from temporary chunk file if the data is not present as committed chunk

2018-08-08 Thread Mukul Kumar Singh (JIRA)
Mukul Kumar Singh created HDDS-340:
--

 Summary: ContainerStateMachine#readStateMachinedata should read 
from temporary chunk file if the data is not present as committed chunk
 Key: HDDS-340
 URL: https://issues.apache.org/jira/browse/HDDS-340
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.2.1
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh
 Fix For: 0.2.1


ContainerStateMachine#readStateMachinedata currently only reads data from a 
committed chunk right now. However for leader, it might be necessary to read 
the chunk data from the temporary chunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13532) RBF: Adding security

2018-08-08 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573454#comment-16573454
 ] 

Ajay Kumar commented on HDFS-13532:
---

[~crh], sure, i work out of PST.

> RBF: Adding security
> 
>
> Key: HDFS-13532
> URL: https://issues.apache.org/jira/browse/HDFS-13532
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Íñigo Goiri
>Assignee: Sherwood Zheng
>Priority: Major
> Attachments: RBF _ Security delegation token thoughts.pdf, 
> RBF-DelegationToken-Approach1b.pdf, Security_for_Router-based 
> Federation_design_doc.pdf
>
>
> HDFS Router based federation should support security. This includes 
> authentication and delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13805) Journal Nodes should allow to format non-empty directories with "-force" option

2018-08-08 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573447#comment-16573447
 ] 

Arpit Agarwal commented on HDFS-13805:
--

Hi [~surendrasingh], just for my curiosity, what is the use case for 
reformatting JNs?

 

> Journal Nodes should allow to format non-empty directories with "-force" 
> option
> ---
>
> Key: HDFS-13805
> URL: https://issues.apache.org/jira/browse/HDFS-13805
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node
>Affects Versions: 3.0.0-alpha4
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
>
> HDFS-2 completely restricted to re-format journalnode, but it should be 
> allowed when *"-force"* option is given. If user fill force option can 
> accidentally delete the data then he can disable it by configuring 
> "*dfs.reformat.disabled*"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13566) Add configurable additional RPC listener to NameNode

2018-08-08 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573443#comment-16573443
 ] 

Erik Krogen commented on HDFS-13566:


Cool, thanks for clarifying. My initial reaction to (3) was that it would be 
too cumbersome for clients, but I can see some ways to set up configs to make 
it easy to use:
* Set up a separate namespace in the configs, {{namespace2-aux}}, which points 
to {{namespace2}} except using the auxiliary ports. Clients can specify 
{{hdfs://namespace2}} for the standard port and {{namespace2-aux}} for 
auxiliary.
* For configs located within DC2 (where {{namespace2}} is located), set up 
{{namespace2}} with the standard port; for configs located in DC1, set up 
{{namespace2}} with the auxiliary port.

I still wonder if it would be a reasonable addition to have something like 
{{dfs.namenode.rpc-address.namespace2.use-aux}} to have a single config to 
change for a client, but I think it is not necessary.

> Add configurable additional RPC listener to NameNode
> 
>
> Key: HDFS-13566
> URL: https://issues.apache.org/jira/browse/HDFS-13566
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13566.001.patch, HDFS-13566.002.patch, 
> HDFS-13566.003.patch
>
>
> This Jira aims to add the capability to NameNode to run additional 
> listener(s). Such that NameNode can be accessed from multiple ports. 
> Fundamentally, this Jira tries to extend ipc.Server to allow configured with 
> more listeners, binding to different ports, but sharing the same call queue 
> and the handlers. Useful when different clients are only allowed to access 
> certain different ports. Combined with HDFS-13547, this also allows different 
> ports to have different SASL security levels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently

2018-08-08 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13802:
---
Attachment: (was: HDFS-13802.000.patch)

> RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
> 
>
> Key: HDFS-13802
> URL: https://issues.apache.org/jira/browse/HDFS-13802
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1, 3.0.3
>Reporter: Fei Hui
>Priority: Major
> Attachments: HDFS-13802.001.patch, HDFS-13802.002.patch
>
>
> When i click FSCK on Router Web UI Utilities, i got errors
> {quote}
> HTTP ERROR 404
> Problem accessing /fsck. Reason:
> NOT_FOUND
> Powered by Jetty://
> {quote}
> I deep into the source code and find that fsck is not supported currently, So 
> i think we should remove FSCK from Router Web UI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently

2018-08-08 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13802:
---
Attachment: HDFS-13802.000.patch

> RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
> 
>
> Key: HDFS-13802
> URL: https://issues.apache.org/jira/browse/HDFS-13802
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1, 3.0.3
>Reporter: Fei Hui
>Priority: Major
> Attachments: HDFS-13802.001.patch, HDFS-13802.002.patch
>
>
> When i click FSCK on Router Web UI Utilities, i got errors
> {quote}
> HTTP ERROR 404
> Problem accessing /fsck. Reason:
> NOT_FOUND
> Powered by Jetty://
> {quote}
> I deep into the source code and find that fsck is not supported currently, So 
> i think we should remove FSCK from Router Web UI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently

2018-08-08 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13802:
---
Attachment: HDFS-13802.002.patch

> RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
> 
>
> Key: HDFS-13802
> URL: https://issues.apache.org/jira/browse/HDFS-13802
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1, 3.0.3
>Reporter: Fei Hui
>Priority: Major
> Attachments: HDFS-13802.001.patch, HDFS-13802.002.patch
>
>
> When i click FSCK on Router Web UI Utilities, i got errors
> {quote}
> HTTP ERROR 404
> Problem accessing /fsck. Reason:
> NOT_FOUND
> Powered by Jetty://
> {quote}
> I deep into the source code and find that fsck is not supported currently, So 
> i think we should remove FSCK from Router Web UI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13795) Fix potential NPE in InMemoryLevelDBAliasMapServer

2018-08-08 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573429#comment-16573429
 ] 

Íñigo Goiri commented on HDFS-13795:


The error in {{TestInMemoryLevelDBAliasMapClient}} seems related.

> Fix potential NPE in InMemoryLevelDBAliasMapServer
> --
>
> Key: HDFS-13795
> URL: https://issues.apache.org/jira/browse/HDFS-13795
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13795.001.patch, HDFS-13795.002.patch, 
> HDFS-13795.003.patch
>
>
> Namenode fails to stop correctly due to NPE in InMemoryAliasMapServer, when 
> it is configured incorrectly.
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.close(InMemoryLevelDBAliasMapServer.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:1023)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13805) Journal Nodes should allow to format non-empty directories with "-force" option

2018-08-08 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573426#comment-16573426
 ] 

Íñigo Goiri commented on HDFS-13805:


We see this issue in our Windows deployment.
I thought it was a Windows exclusive only.
If we add the force option, I'd like to make sure it works for Windows.

> Journal Nodes should allow to format non-empty directories with "-force" 
> option
> ---
>
> Key: HDFS-13805
> URL: https://issues.apache.org/jira/browse/HDFS-13805
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node
>Affects Versions: 3.0.0-alpha4
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
>
> HDFS-2 completely restricted to re-format journalnode, but it should be 
> allowed when *"-force"* option is given. If user fill force option can 
> accidentally delete the data then he can disable it by configuring 
> "*dfs.reformat.disabled*"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13802) RBF: Remove FSCK from Router Web UI, because fsck is not supported currently

2018-08-08 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573424#comment-16573424
 ] 

Íñigo Goiri commented on HDFS-13802:


I would prefer to implement it.
Let me post an example.

> RBF: Remove FSCK from Router Web UI, because fsck is not supported currently
> 
>
> Key: HDFS-13802
> URL: https://issues.apache.org/jira/browse/HDFS-13802
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1, 3.0.3
>Reporter: Fei Hui
>Priority: Major
> Attachments: HDFS-13802.001.patch
>
>
> When i click FSCK on Router Web UI Utilities, i got errors
> {quote}
> HTTP ERROR 404
> Problem accessing /fsck. Reason:
> NOT_FOUND
> Powered by Jetty://
> {quote}
> I deep into the source code and find that fsck is not supported currently, So 
> i think we should remove FSCK from Router Web UI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-335) Fix logging for scm events

2018-08-08 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573422#comment-16573422
 ] 

genericqa commented on HDDS-335:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} framework in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-335 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934825/HDDS-335.00.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9e8a275ba316 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5b898c1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDDS-Build/725/testReport/ |
| Max. process+thread count | 336 (vs. ulimit of 1) |
| modules | C: hadoop-hdds/framework U: hadoop-hdds/framework |
| Console output | 
https://builds.apache.org/job/PreCommit-HDDS-Build/725/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Fix logging for scm events
> --
>
> Key: HDDS-335
> URL: https://issues.apache.org/jira/browse/HDDS-335
> Project: Hadoop Distributed Data Store
>   

[jira] [Commented] (HDFS-13447) Fix Typos - Node Not Chosen

2018-08-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573416#comment-16573416
 ] 

Hudson commented on HDFS-13447:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14727 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14727/])
HDFS-13447. Fix Typos - Node Not Chosen. Contributed by Beluga Behr. (elek: rev 
36c0d742d484f8bf01d7cb01c7b1c9e3627625dc)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java


> Fix Typos - Node Not Chosen
> ---
>
> Key: HDFS-13447
> URL: https://issues.apache.org/jira/browse/HDFS-13447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.2.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.2.0
>
> Attachments: HDFS-13447.1.patch
>
>
> Fix typo and improve:
>  
> {code:java}
> private enum NodeNotChosenReason {
>   NOT_IN_SERVICE("the node isn't in service"),
>   NODE_STALE("the node is stale"),
>   NODE_TOO_BUSY("the node is too busy"),
>   TOO_MANY_NODES_ON_RACK("the rack has too many chosen nodes"),
>   NOT_ENOUGH_STORAGE_SPACE("no enough storage space to place the 
> block");{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13658) fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 1 replica

2018-08-08 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573393#comment-16573393
 ] 

genericqa commented on HDFS-13658:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
21s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
46s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 54s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 
47s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}269m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.TestMaintenanceState |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | 

[jira] [Updated] (HDFS-13447) Fix Typos - Node Not Chosen

2018-08-08 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-13447:

   Resolution: Fixed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

 I am just committed it to the trunk. Thank you very much the contribution 
[~belugabehr]

> Fix Typos - Node Not Chosen
> ---
>
> Key: HDFS-13447
> URL: https://issues.apache.org/jira/browse/HDFS-13447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.2.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.2.0
>
> Attachments: HDFS-13447.1.patch
>
>
> Fix typo and improve:
>  
> {code:java}
> private enum NodeNotChosenReason {
>   NOT_IN_SERVICE("the node isn't in service"),
>   NODE_STALE("the node is stale"),
>   NODE_TOO_BUSY("the node is too busy"),
>   TOO_MANY_NODES_ON_RACK("the rack has too many chosen nodes"),
>   NOT_ENOUGH_STORAGE_SPACE("no enough storage space to place the 
> block");{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13447) Fix Typos - Node Not Chosen

2018-08-08 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573376#comment-16573376
 ] 

Elek, Marton commented on HDFS-13447:
-

+1. Seems to be reasonable.


> Fix Typos - Node Not Chosen
> ---
>
> Key: HDFS-13447
> URL: https://issues.apache.org/jira/browse/HDFS-13447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.2.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HDFS-13447.1.patch
>
>
> Fix typo and improve:
>  
> {code:java}
> private enum NodeNotChosenReason {
>   NOT_IN_SERVICE("the node isn't in service"),
>   NODE_STALE("the node is stale"),
>   NODE_TOO_BUSY("the node is too busy"),
>   TOO_MANY_NODES_ON_RACK("the rack has too many chosen nodes"),
>   NOT_ENOUGH_STORAGE_SPACE("no enough storage space to place the 
> block");{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >