[jira] [Updated] (HDFS-13192) change the code order in getFileEncryptionInfo to avoid unnecessary call of assignment

2018-02-23 Thread LiXin Ge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiXin Ge updated HDFS-13192:

Status: Patch Available  (was: Open)

> change the code order in getFileEncryptionInfo to avoid unnecessary call of 
> assignment
> --
>
> Key: HDFS-13192
> URL: https://issues.apache.org/jira/browse/HDFS-13192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption
>Affects Versions: 3.1.0
>Reporter: LiXin Ge
>Assignee: LiXin Ge
>Priority: Minor
> Attachments: HDFS-13192.001.patch
>
>
> The assignment of {{version,suite}} and {{keyName}} should happen lazily, 
> right before it's used in case the {{fileXAttr}} is *null*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13192) change the code order in getFileEncryptionInfo to avoid unnecessary call of assignment

2018-02-23 Thread LiXin Ge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiXin Ge reassigned HDFS-13192:
---

Assignee: LiXin Ge

> change the code order in getFileEncryptionInfo to avoid unnecessary call of 
> assignment
> --
>
> Key: HDFS-13192
> URL: https://issues.apache.org/jira/browse/HDFS-13192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption
>Affects Versions: 3.1.0
>Reporter: LiXin Ge
>Assignee: LiXin Ge
>Priority: Minor
> Attachments: HDFS-13192.001.patch
>
>
> The assignment of {{version,suite}} and {{keyName}} should happen lazily, 
> right before it's used in case the {{fileXAttr}} is *null*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13192) change the code order in getFileEncryptionInfo to avoid unnecessary call of assignment

2018-02-23 Thread LiXin Ge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiXin Ge updated HDFS-13192:

Attachment: HDFS-13192.001.patch

> change the code order in getFileEncryptionInfo to avoid unnecessary call of 
> assignment
> --
>
> Key: HDFS-13192
> URL: https://issues.apache.org/jira/browse/HDFS-13192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption
>Affects Versions: 3.1.0
>Reporter: LiXin Ge
>Priority: Minor
> Attachments: HDFS-13192.001.patch
>
>
> The assignment of {{version,suite}} and {{keyName}} should happen lazily, 
> right before it's used in case the {{fileXAttr}} is *null*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13192) change the code order in getFileEncryptionInfo to avoid unnecessary call of assignment

2018-02-23 Thread LiXin Ge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiXin Ge updated HDFS-13192:

Summary: change the code order in getFileEncryptionInfo to avoid 
unnecessary call of assignment  (was: change the code order to avoid 
unnecessary call of assignment)

> change the code order in getFileEncryptionInfo to avoid unnecessary call of 
> assignment
> --
>
> Key: HDFS-13192
> URL: https://issues.apache.org/jira/browse/HDFS-13192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption
>Affects Versions: 3.1.0
>Reporter: LiXin Ge
>Priority: Minor
>
> The assignment of {{version,suite}} and {{keyName}} should happen lazily, 
> right before it's used in case the {{fileXAttr}} is *null*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13192) change the code order to avoid unnecessary call of assignment

2018-02-23 Thread LiXin Ge (JIRA)
LiXin Ge created HDFS-13192:
---

 Summary: change the code order to avoid unnecessary call of 
assignment
 Key: HDFS-13192
 URL: https://issues.apache.org/jira/browse/HDFS-13192
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: encryption
Affects Versions: 3.1.0
Reporter: LiXin Ge


The assignment of {{version,suite}} and {{keyName}} should happen lazily, right 
before it's used in case the {{fileXAttr}} is *null*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13170) Port webhdfs unmaskedpermission parameter to HTTPFS

2018-02-23 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375384#comment-16375384
 ] 

Xiao Chen commented on HDFS-13170:
--

Hi [~sodonnell], thanks for reporting the issue and providing the fix! Looks 
pretty good to me. I agree the unit test is unrelated to your change. A few 
minor comments.
 - The \{var = param == -1 ? blah: multi-line blah}} is syntactically correct. 
However, for readability would you mind change it to some thing like:
{code:java}
FsPermission fsPermission = new FsPermission(permission);
if (unmaskedPermission != -1) {
  fsPermission = FsCreateModes.create(...)
}
{code}

 - double line break after class {{UnmaskedPermissionParam}}
 - audit log changes are usually hard to be really compatible. For create, do 
you think adding the new field to then end may be slightly better? (in case 
there are scripts consuming the audit log with awk)
 - Please fix the checkstyle warnings.

> Port webhdfs unmaskedpermission parameter to HTTPFS
> ---
>
> Key: HDFS-13170
> URL: https://issues.apache.org/jira/browse/HDFS-13170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-13170.001.patch, HDFS-13170.002.patch
>
>
> HDFS-6962 fixed a long standing issue where default ACLs are not correctly 
> applied to files when they are created from the hadoop shell.
> With this change, if you create a file with default ACLs against the parent 
> directory, with dfs.namenode.posix.acl.inheritance.enabled=false, the result 
> is:
> {code}
> # file: /test_acl/file_from_shell_off
> # owner: user1
> # group: supergroup
> user::rw-
> user:user1:rwx    #effective:r--
> user:user2:rwx    #effective:r--
> group::r-x    #effective:r--
> group:users:rwx    #effective:r--
> mask::r--
> other::r--
> {code}
> And if you enable this, to fix the bug above, the result is as you would 
> expect:
> {code}
> # file: /test_acl/file_from_shell
> # owner: user1
> # group: supergroup
> user::rw-
> user:user1:rwx    #effective:rw-
> user:user2:rwx    #effective:rw-
> group::r-x    #effective:r--
> group:users:rwx    #effective:rw-
> mask::rw-
> other::r--
> {code}
> If I then create a file over HTTPFS or webHDFS, the behaviour is not the same 
> as above:
> {code}
> # file: /test_acl/default_permissions
> # owner: user1
> # group: supergroup
> user::rwx
> user:user1:rwx    #effective:r-x
> user:user2:rwx    #effective:r-x
> group::r-x
> group:users:rwx    #effective:r-x
> mask::r-x
> other::r-x
> {code}
> Notice the mask is set to r-x and this remove the write permission on the new 
> file.
> As part of HDFS-6962 a new parameter was added to webhdfs 
> 'unmaskedpermission'. By passing it to a webhdfs call, it can result in the 
> same behaviour as when a file is written from the CLI:
> {code}
> curl -i -X PUT -T test.txt --header "Content-Type:application/octet-stream"  
> "http://namenode:50075/webhdfs/v1/test_acl/unmasked__770?op=CREATE=user1=namenode:8020=false=770;
> # file: /test_acl/unmasked__770
> # owner: user1
> # group: supergroup
> user::rwx
> user:user1:rwx
> user:user2:rwx
> group::r-x
> group:users:rwx
> mask::rwx
> other::---
> {code}
> However, this parameter was never ported to HTTPFS.
> This Jira is to replicate the same changes to HTTPFS so this parameter is 
> available there too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13170) Port webhdfs unmaskedpermission parameter to HTTPFS

2018-02-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13170:
-
Affects Version/s: (was: 3.2.0)
   3.0.0-alpha2

> Port webhdfs unmaskedpermission parameter to HTTPFS
> ---
>
> Key: HDFS-13170
> URL: https://issues.apache.org/jira/browse/HDFS-13170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-13170.001.patch, HDFS-13170.002.patch
>
>
> HDFS-6962 fixed a long standing issue where default ACLs are not correctly 
> applied to files when they are created from the hadoop shell.
> With this change, if you create a file with default ACLs against the parent 
> directory, with dfs.namenode.posix.acl.inheritance.enabled=false, the result 
> is:
> {code}
> # file: /test_acl/file_from_shell_off
> # owner: user1
> # group: supergroup
> user::rw-
> user:user1:rwx    #effective:r--
> user:user2:rwx    #effective:r--
> group::r-x    #effective:r--
> group:users:rwx    #effective:r--
> mask::r--
> other::r--
> {code}
> And if you enable this, to fix the bug above, the result is as you would 
> expect:
> {code}
> # file: /test_acl/file_from_shell
> # owner: user1
> # group: supergroup
> user::rw-
> user:user1:rwx    #effective:rw-
> user:user2:rwx    #effective:rw-
> group::r-x    #effective:r--
> group:users:rwx    #effective:rw-
> mask::rw-
> other::r--
> {code}
> If I then create a file over HTTPFS or webHDFS, the behaviour is not the same 
> as above:
> {code}
> # file: /test_acl/default_permissions
> # owner: user1
> # group: supergroup
> user::rwx
> user:user1:rwx    #effective:r-x
> user:user2:rwx    #effective:r-x
> group::r-x
> group:users:rwx    #effective:r-x
> mask::r-x
> other::r-x
> {code}
> Notice the mask is set to r-x and this remove the write permission on the new 
> file.
> As part of HDFS-6962 a new parameter was added to webhdfs 
> 'unmaskedpermission'. By passing it to a webhdfs call, it can result in the 
> same behaviour as when a file is written from the CLI:
> {code}
> curl -i -X PUT -T test.txt --header "Content-Type:application/octet-stream"  
> "http://namenode:50075/webhdfs/v1/test_acl/unmasked__770?op=CREATE=user1=namenode:8020=false=770;
> # file: /test_acl/unmasked__770
> # owner: user1
> # group: supergroup
> user::rwx
> user:user1:rwx
> user:user2:rwx
> group::r-x
> group:users:rwx
> mask::rwx
> other::---
> {code}
> However, this parameter was never ported to HTTPFS.
> This Jira is to replicate the same changes to HTTPFS so this parameter is 
> available there too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13145:

Affects Version/s: 3.0.0

> SBN crash when transition to ANN with in-progress edit tailing enabled
> --
>
> Key: HDFS-13145
> URL: https://issues.apache.org/jira/browse/HDFS-13145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 3.0.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13145.000.patch, HDFS-13145.001.patch
>
>
> With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} 
> will send two batches to JNs, one normal edit batch followed by a dummy batch 
> to update the commit ID on JNs.
> {code}
>   QuorumCall qcall = loggers.sendEdits(
>   segmentTxId, firstTxToFlush,
>   numReadyTxns, data);
>   loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");
>   
>   // Since we successfully wrote this batch, let the loggers know. Any 
> future
>   // RPCs will thus let the loggers know of the most recent transaction, 
> even
>   // if a logger has fallen behind.
>   loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1);
>   // If we don't have this dummy send, committed TxId might be one-batch
>   // stale on the Journal Nodes
>   if (updateCommittedTxId) {
> QuorumCall fakeCall = loggers.sendEdits(
> segmentTxId, firstTxToFlush,
> 0, new byte[0]);
> loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits");
>   }
> {code}
> Between each batch, it will wait for the JNs to reach a quorum. However, if 
> the ANN crashes in between, then SBN will crash while transiting to ANN:
> {code}
> java.lang.IllegalStateException: Cannot start writing at txid 24312595802 
> when there is a stream available for read: ..
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490)
> 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> This is because without the dummy batch, the {{commitTxnId}} will lag behind 
> the {{endTxId}}, which caused the check in {{openForWrite}} to fail:
> {code}
> List streams = new ArrayList();
> journalSet.selectInputStreams(streams, segmentTxId, true, false);
> if (!streams.isEmpty()) {
>   String error = String.format("Cannot start writing at txid %s " +
> "when there is a stream available for read: %s",
> segmentTxId, streams.get(0));
>   IOUtils.cleanupWithLogger(LOG,
>   streams.toArray(new EditLogInputStream[0]));
>   throw new IllegalStateException(error);
> }
> {code}
> In our environment, this can be reproduced pretty consistently, which will 
> leave the cluster with no running namenodes. Even though we are using a 2.8.2 
> backport, I believe the same issue also exist in 3.0.x. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375351#comment-16375351
 ] 

Chao Sun edited comment on HDFS-13145 at 2/24/18 6:35 AM:
--

Thank you for the review [~xkrogen]! I've attached patch v1 addressing the 
comments. Also I used {{verifyEdits()}} to check the selected input streams, 
which seems a better choice.

 {quote}
Considering that HDFS-10519 is in 3.0, I'm thinking we should target this for 
branch-3.0 and up?
{quote}
Yes I think we should target this as branch-3.0 and up.


was (Author: csun):
Thank you for the review [~xkrogen]! I've attached patch v1 addressing the 
comments. Also I used {{verifyEdits()}} to check the selected input streams, 
which seems a better choice.

> SBN crash when transition to ANN with in-progress edit tailing enabled
> --
>
> Key: HDFS-13145
> URL: https://issues.apache.org/jira/browse/HDFS-13145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13145.000.patch, HDFS-13145.001.patch
>
>
> With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} 
> will send two batches to JNs, one normal edit batch followed by a dummy batch 
> to update the commit ID on JNs.
> {code}
>   QuorumCall qcall = loggers.sendEdits(
>   segmentTxId, firstTxToFlush,
>   numReadyTxns, data);
>   loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");
>   
>   // Since we successfully wrote this batch, let the loggers know. Any 
> future
>   // RPCs will thus let the loggers know of the most recent transaction, 
> even
>   // if a logger has fallen behind.
>   loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1);
>   // If we don't have this dummy send, committed TxId might be one-batch
>   // stale on the Journal Nodes
>   if (updateCommittedTxId) {
> QuorumCall fakeCall = loggers.sendEdits(
> segmentTxId, firstTxToFlush,
> 0, new byte[0]);
> loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits");
>   }
> {code}
> Between each batch, it will wait for the JNs to reach a quorum. However, if 
> the ANN crashes in between, then SBN will crash while transiting to ANN:
> {code}
> java.lang.IllegalStateException: Cannot start writing at txid 24312595802 
> when there is a stream available for read: ..
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490)
> 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> This is because without the dummy batch, the {{commitTxnId}} will lag behind 
> the {{endTxId}}, which caused the check in {{openForWrite}} to fail:
> {code}
> List streams = new ArrayList();
> journalSet.selectInputStreams(streams, segmentTxId, true, false);
> if (!streams.isEmpty()) {
>   String error = String.format("Cannot start writing 

[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375351#comment-16375351
 ] 

Chao Sun commented on HDFS-13145:
-

Thank you for the review [~xkrogen]! I've attached patch v1 addressing the 
comments. Also I used {{verifyEdits()}} to check the selected input streams, 
which seems a better choice.

> SBN crash when transition to ANN with in-progress edit tailing enabled
> --
>
> Key: HDFS-13145
> URL: https://issues.apache.org/jira/browse/HDFS-13145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13145.000.patch, HDFS-13145.001.patch
>
>
> With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} 
> will send two batches to JNs, one normal edit batch followed by a dummy batch 
> to update the commit ID on JNs.
> {code}
>   QuorumCall qcall = loggers.sendEdits(
>   segmentTxId, firstTxToFlush,
>   numReadyTxns, data);
>   loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");
>   
>   // Since we successfully wrote this batch, let the loggers know. Any 
> future
>   // RPCs will thus let the loggers know of the most recent transaction, 
> even
>   // if a logger has fallen behind.
>   loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1);
>   // If we don't have this dummy send, committed TxId might be one-batch
>   // stale on the Journal Nodes
>   if (updateCommittedTxId) {
> QuorumCall fakeCall = loggers.sendEdits(
> segmentTxId, firstTxToFlush,
> 0, new byte[0]);
> loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits");
>   }
> {code}
> Between each batch, it will wait for the JNs to reach a quorum. However, if 
> the ANN crashes in between, then SBN will crash while transiting to ANN:
> {code}
> java.lang.IllegalStateException: Cannot start writing at txid 24312595802 
> when there is a stream available for read: ..
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490)
> 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> This is because without the dummy batch, the {{commitTxnId}} will lag behind 
> the {{endTxId}}, which caused the check in {{openForWrite}} to fail:
> {code}
> List streams = new ArrayList();
> journalSet.selectInputStreams(streams, segmentTxId, true, false);
> if (!streams.isEmpty()) {
>   String error = String.format("Cannot start writing at txid %s " +
> "when there is a stream available for read: %s",
> segmentTxId, streams.get(0));
>   IOUtils.cleanupWithLogger(LOG,
>   streams.toArray(new EditLogInputStream[0]));
>   throw new IllegalStateException(error);
> }
> {code}
> In our environment, this can be reproduced pretty consistently, which will 
> leave the cluster with no running namenodes. Even though we are using a 

[jira] [Updated] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13145:

Attachment: HDFS-13145.001.patch

> SBN crash when transition to ANN with in-progress edit tailing enabled
> --
>
> Key: HDFS-13145
> URL: https://issues.apache.org/jira/browse/HDFS-13145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13145.000.patch, HDFS-13145.001.patch
>
>
> With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} 
> will send two batches to JNs, one normal edit batch followed by a dummy batch 
> to update the commit ID on JNs.
> {code}
>   QuorumCall qcall = loggers.sendEdits(
>   segmentTxId, firstTxToFlush,
>   numReadyTxns, data);
>   loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");
>   
>   // Since we successfully wrote this batch, let the loggers know. Any 
> future
>   // RPCs will thus let the loggers know of the most recent transaction, 
> even
>   // if a logger has fallen behind.
>   loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1);
>   // If we don't have this dummy send, committed TxId might be one-batch
>   // stale on the Journal Nodes
>   if (updateCommittedTxId) {
> QuorumCall fakeCall = loggers.sendEdits(
> segmentTxId, firstTxToFlush,
> 0, new byte[0]);
> loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits");
>   }
> {code}
> Between each batch, it will wait for the JNs to reach a quorum. However, if 
> the ANN crashes in between, then SBN will crash while transiting to ANN:
> {code}
> java.lang.IllegalStateException: Cannot start writing at txid 24312595802 
> when there is a stream available for read: ..
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490)
> 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> This is because without the dummy batch, the {{commitTxnId}} will lag behind 
> the {{endTxId}}, which caused the check in {{openForWrite}} to fail:
> {code}
> List streams = new ArrayList();
> journalSet.selectInputStreams(streams, segmentTxId, true, false);
> if (!streams.isEmpty()) {
>   String error = String.format("Cannot start writing at txid %s " +
> "when there is a stream available for read: %s",
> segmentTxId, streams.get(0));
>   IOUtils.cleanupWithLogger(LOG,
>   streams.toArray(new EditLogInputStream[0]));
>   throw new IllegalStateException(error);
> }
> {code}
> In our environment, this can be reproduced pretty consistently, which will 
> leave the cluster with no running namenodes. Even though we are using a 2.8.2 
> backport, I believe the same issue also exist in 3.0.x. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375346#comment-16375346
 ] 

genericqa commented on HDFS-13166:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-10285 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 9s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} HDFS-10285 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 522 unchanged - 0 fixed = 523 total (was 522) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.namenode.TestTruncateQuotaUpdate |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13166 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911864/HDFS-13166-HDFS-10285-01.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  xml  |
| uname | Linux 81fbc7ff1521 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-10285 / bd77157 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (HDFS-13191) Internal buffer-sizing details are inadvertently baked into FileChecksum and BlockGroupChecksum

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375334#comment-16375334
 ] 

genericqa commented on HDFS-13191:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
38s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
32s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m  7s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}207m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages |
|   | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation |
|   | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13191 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911860/HDFS-13191.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c40c49a3aaf7 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 

[jira] [Commented] (HDFS-13184) RBF: Improve the unit test TestRouterRPCClientRetries

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375301#comment-16375301
 ] 

genericqa commented on HDFS-13184:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 44s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 41s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13184 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911859/HDFS-13184.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cbb05c2c9a93 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 329a4fd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23184/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23184/testReport/ |
| Max. process+thread count | 4632 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart

2018-02-23 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375299#comment-16375299
 ] 

He Xiaoqiao commented on HDFS-12749:


[~xkrogen] Thanks for your review and comments.
{quote}Are you running without Kerberos? Do you see a relevant WARN statement 
from the log for ipc.Client?
{quote}
a. This is security cluster with Kerberos,
 b. All relevant WARN or EXCEPTION depict as [~tanyuxin] mentioned above 
(description & second comment.)

Based on the exception logs that [~tanyuxin] provided, I think 
{{SocketTimeoutException}} was over-wrapped in extra {{IOException}} by 
{{Client#cleanupCalls}}, the following notes based branch-2.7 (maybe I am 
wrong, if that please correct me.)
 a. Client#call (line:1448) throws {{IOException}} which wrapped 
{{SocketTimeoutException}} when {{call.error}} is not null and it is not 
instance of {{RemoteException}}, thus this exception is wrapped by 
{{NetUtils#wrapException}}:
{code:java}
  public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
  ConnectionId remoteId, int serviceClass,
  AtomicBoolean fallbackToSimpleAuth) throws IOException {
final Call call = createCall(rpcKind, rpcRequest);
Connection connection = getConnection(remoteId, call, serviceClass,
  fallbackToSimpleAuth);
try {
  connection.sendRpcRequest(call); // send the rpc request
} catch (RejectedExecutionException e) {
  throw new IOException("connection has been closed", e);
} catch (InterruptedException e) {
  Thread.currentThread().interrupt();
  LOG.warn("interrupted waiting to send rpc request to server", e);
  throw new IOException(e);
}

synchronized (call) {
  while (!call.done) {
try {
  call.wait();   // wait for the result
} catch (InterruptedException ie) {
  Thread.currentThread().interrupt();
  throw new InterruptedIOException("Call interrupted");
}
  }

  if (call.error != null) {
if (call.error instanceof RemoteException) {
  call.error.fillInStackTrace();
  throw call.error;
} else { // local exception
  InetSocketAddress address = connection.getRemoteAddress();
  throw NetUtils.wrapException(address.getHostName(),
  address.getPort(),
  NetUtils.getHostname(),
  0,
  call.error);
}
  } else {
return call.getRpcResponse();
  }
}
  }
{code}
b. {{NetUtils#wrapException}} can distinguish {{SocketTimeoutException}} if 
{{call.error}} is instance of, but not actually so logs `Failed on local 
exception: java.io.IOException: ...`
 c. {{call.error}} is set only by client#setException which invoked by 
{{Client#receiveRpcResponse}} and {{Client#cleanupCalls}}, however 
{{call.error}} is set RemoteException always in {{Client#receiveRpcResponse}}. 
Evidently, the only possibility is that {{SocketTimeoutException}} was 
over-wrapped in {{IOException}} by {{Client#cleanupCalls}}.
 d.The key point in {{Client#cleanupCalls}} is {{#closeException}} which is set 
by {{Client#markClosed}} invoked by {{Client#sendRpcRequest}} and it catch all 
{{IOException}} then set {{#closeException}} equal it.
{code:java}
public void sendRpcRequest(final Call call)
throws InterruptedException, IOException {
  ..
  synchronized (sendRpcRequestLock) {
Future senderFuture = sendParamsExecutor.submit(new Runnable() {
  @Override
  public void run() {
try {
  ..
} catch (IOException e) {
  // exception at this point would leave the connection in an
  // unrecoverable state (eg half a call left on the wire).
  // So, close the connection, killing any outstanding calls
  markClosed(e);
} finally {
  //the buffer is just an in-memory buffer, but it is still polite 
to
  // close early
  IOUtils.closeStream(d);
}
  }
});
.
  }
}
{code}
[~xkrogen],[~kihwal] any suggestions? 

> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1, 2.8.3, 2.7.5, 3.0.0, 2.9.1
>Reporter: TanYuxin
>Priority: Major
> Attachments: HDFS-12749-branch-2.7.002.patch, 
> HDFS-12749-trunk.003.patch, HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to 

[jira] [Commented] (HDFS-13052) WebHDFS: Add support for snasphot diff

2018-02-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375294#comment-16375294
 ] 

Hudson commented on HDFS-13052:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13713 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13713/])
HDFS-13052. WebHDFS: Add support for snasphot diff. Contributed by (xyao: rev 
1e84e46f1621fe694f806bfc41d3b2a06c9500b6)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/JsonUtilClient.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHDFS.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/resources/GetOpParam.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshotDiffReport.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOpsCountStatistics.java


> WebHDFS: Add support for snasphot diff
> --
>
> Key: HDFS-13052
> URL: https://issues.apache.org/jira/browse/HDFS-13052
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: snapshot, webhdfs
> Fix For: 3.1.0, 3.0.2, 3.2.0
>
> Attachments: HDFS-13052.001.patch, HDFS-13052.002.patch, 
> HDFS-13052.003.patch, HDFS-13052.004.patch, HDFS-13052.005.patch, 
> HDFS-13052.006.patch, HDFS-13052.007.patch
>
>
> This Jira aims to implement snapshot diff operation for webHdfs filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13052) WebHDFS: Add support for snasphot diff

2018-02-23 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13052:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks [~ljain] for the contribution. I've committed the fix to trunk, 
branch-3.1 and branch-3.0.

> WebHDFS: Add support for snasphot diff
> --
>
> Key: HDFS-13052
> URL: https://issues.apache.org/jira/browse/HDFS-13052
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: snapshot, webhdfs
> Attachments: HDFS-13052.001.patch, HDFS-13052.002.patch, 
> HDFS-13052.003.patch, HDFS-13052.004.patch, HDFS-13052.005.patch, 
> HDFS-13052.006.patch, HDFS-13052.007.patch
>
>
> This Jira aims to implement snapshot diff operation for webHdfs filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13052) WebHDFS: Add support for snasphot diff

2018-02-23 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13052:
--
Labels: snapshot webhdfs  (was: )

> WebHDFS: Add support for snasphot diff
> --
>
> Key: HDFS-13052
> URL: https://issues.apache.org/jira/browse/HDFS-13052
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: snapshot, webhdfs
> Fix For: 3.1.0, 3.0.2, 3.2.0
>
> Attachments: HDFS-13052.001.patch, HDFS-13052.002.patch, 
> HDFS-13052.003.patch, HDFS-13052.004.patch, HDFS-13052.005.patch, 
> HDFS-13052.006.patch, HDFS-13052.007.patch
>
>
> This Jira aims to implement snapshot diff operation for webHdfs filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13052) WebHDFS: Add support for snasphot diff

2018-02-23 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13052:
--
Fix Version/s: 3.2.0
   3.0.2
   3.1.0

> WebHDFS: Add support for snasphot diff
> --
>
> Key: HDFS-13052
> URL: https://issues.apache.org/jira/browse/HDFS-13052
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: snapshot, webhdfs
> Fix For: 3.1.0, 3.0.2, 3.2.0
>
> Attachments: HDFS-13052.001.patch, HDFS-13052.002.patch, 
> HDFS-13052.003.patch, HDFS-13052.004.patch, HDFS-13052.005.patch, 
> HDFS-13052.006.patch, HDFS-13052.007.patch
>
>
> This Jira aims to implement snapshot diff operation for webHdfs filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13188) Disk Balancer: Support multiple block pools during block move

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375287#comment-16375287
 ] 

genericqa commented on HDFS-13188:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}118m 13s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}179m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13188 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911850/HDFS-13188.02.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ee5acec78c23 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 033f9c6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23183/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23183/testReport/ |
| Max. process+thread count | 2985 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23183/console |
| Powered by | 

[jira] [Commented] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

2018-02-23 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375276#comment-16375276
 ] 

Rakesh R commented on HDFS-13166:
-

oops, I forgot to add new file. Attached another patch including it.

> [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly 
> getLiveDatanodeStorageReport() calls
> -
>
> Key: HDFS-13166
> URL: https://issues.apache.org/jira/browse/HDFS-13166
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13166-HDFS-10285-00.patch, 
> HDFS-13166-HDFS-10285-01.patch
>
>
> Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and 
> does the computation. This Jira sub-task is to discuss and implement a cache 
> mechanism which in turn reduces the number of function calls. Also, could 
> define a configurable refresh interval and periodically refresh the DN cache 
> by fetching latest {{#getLiveDatanodeStorageReport}} on this interval.
>  Following comments taken from HDFS-10285, 
> [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347472]
>  Comment-7)
> {quote}Adding getDatanodeStorageReport is concerning. 
> getDatanodeListForReport is already a very bad method that should be avoided 
> for anything but jmx – even then it’s a concern. I eliminated calls to it 
> years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn 
> lock for an excessive length of time. Beyond that, the response is going to 
> be pretty large and tagging all the storage reports is not going to be cheap.
> verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem 
> lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its 
> storageMap?
> Appears to be calling getLiveDatanodeStorageReport for every file. As 
> mentioned earlier, this is NOT cheap. The SPS should be able to operate on a 
> fuzzy/cached state of the world. Then it gets another datanode report to 
> determine the number of live nodes to decide if it should sleep before 
> processing the next path. The number of nodes from the prior cached view of 
> the world should suffice.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

2018-02-23 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13166:

Attachment: HDFS-13166-HDFS-10285-01.patch

> [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly 
> getLiveDatanodeStorageReport() calls
> -
>
> Key: HDFS-13166
> URL: https://issues.apache.org/jira/browse/HDFS-13166
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13166-HDFS-10285-00.patch, 
> HDFS-13166-HDFS-10285-01.patch
>
>
> Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and 
> does the computation. This Jira sub-task is to discuss and implement a cache 
> mechanism which in turn reduces the number of function calls. Also, could 
> define a configurable refresh interval and periodically refresh the DN cache 
> by fetching latest {{#getLiveDatanodeStorageReport}} on this interval.
>  Following comments taken from HDFS-10285, 
> [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347472]
>  Comment-7)
> {quote}Adding getDatanodeStorageReport is concerning. 
> getDatanodeListForReport is already a very bad method that should be avoided 
> for anything but jmx – even then it’s a concern. I eliminated calls to it 
> years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn 
> lock for an excessive length of time. Beyond that, the response is going to 
> be pretty large and tagging all the storage reports is not going to be cheap.
> verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem 
> lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its 
> storageMap?
> Appears to be calling getLiveDatanodeStorageReport for every file. As 
> mentioned earlier, this is NOT cheap. The SPS should be able to operate on a 
> fuzzy/cached state of the world. Then it gets another datanode report to 
> determine the number of live nodes to decide if it should sleep before 
> processing the next path. The number of nodes from the prior cached view of 
> the world should suffice.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13191) Internal buffer-sizing details are inadvertently baked into FileChecksum and BlockGroupChecksum

2018-02-23 Thread Dennis Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Huo updated HDFS-13191:
--
Status: Patch Available  (was: Open)

> Internal buffer-sizing details are inadvertently baked into FileChecksum and 
> BlockGroupChecksum
> ---
>
> Key: HDFS-13191
> URL: https://issues.apache.org/jira/browse/HDFS-13191
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: Dennis Huo
>Priority: Minor
> Attachments: HDFS-13191.001.patch
>
>
> The org.apache.hadoop.io.DataOutputBuffer is used as an "optimization" in 
> many places to allow a reusable form of ByteArrayOutputStream, but requires 
> the caller to be careful to use getLength() instead of getData().length to 
> determine the number of actually valid bytes to consume.
> At least three places in the path of constructing FileChecksums have 
> incorrect usage of DataOutputBuffer:
> [FileChecksumHelper digesting block 
> MD5s|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java#L239]
> [BlockChecksumHelper digesting striped block MD5s to construct block-group 
> checksum|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockChecksumHelper.java#L412]
> [MD5MD5CRC32FileChecksum.getBytes()|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java#L76]
> The net effect is that FileChecksum consumes exact BlockChecksums if there 
> are 1 or 2 blocks (at 16 and 32 bytes respectively), but at 3 blocks will 
> round up to 64 bytes, effectively returning the same FileChecksum as if there 
> were 4 blocks and the 4th block happened to have an MD5 exactly equal to 
> 0x00...00. Similarly, BlockGroupChecksum will behave as if there is a 
> power-of-2 number of bytes from BlockChecksums in the BlockGroup.
> This appears to have been a latent bug for at least 9 years for FileChecksum 
> (and since inception for the implementation of striped files), and works fine 
> as long as HDFS implementations strictly stick to the same internal buffering 
> semantics.
> However, this also makes the implementation extremely brittle unless 
> carefully documented. For example, if code is ever refactored to pass around 
> a MessageDigest that consumes block MD5s as they come rather than writing 
> into a DataOutputBuffer before digesting the entire buffer, then the 
> resulting checksum calculations will change unexpectedly.
> At the same time, "fixing" the bug would also be backwards-incompatible, so 
> the bug might need to stick around. At least for the FileChecksum-level 
> calculation, it seems the bug has been latent for a very long time. Since 
> striped files are fairly new, the BlockChecksumHelper could probably be fixed 
> sooner rather than later to avoid perpetuating a bug. The getBytes() method 
> for FileChecksum is more innocuous, so could likely be fixed or left as-is 
> without too much impact either way.
> The bug can be highlighted by changing the internal buffer-growing semantics 
> of the DataOutputBuffer, or simply returning a randomly-sized byte buffer in 
> getData() while only ensuring the first getLength() bytes are actually 
> present, for example:
>  
> {code:java}
> diff --git 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
>  
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> index 4c2fa67f8f2..f2df94e898f 100644
> --- 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> +++ 
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> @@ -103,7 +103,17 @@ private DataOutputBuffer(Buffer buffer) {
> /** Returns the current contents of the buffer.
> * Data is only valid to {@link #getLength()}.
> */
> - public byte[] getData() { return buffer.getData(); }
> + public byte[] getData() {
> + java.util.Random rand = new java.util.Random();
> + byte[] bufferData = buffer.getData();
> + byte[] ret = new byte[rand.nextInt(bufferData.length) + bufferData.length];
> + System.arraycopy(bufferData, 0, ret, 0, getLength());
> + return ret;
> + }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13191) Internal buffer-sizing details are inadvertently baked into FileChecksum and BlockGroupChecksum

2018-02-23 Thread Dennis Huo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375253#comment-16375253
 ] 

Dennis Huo commented on HDFS-13191:
---

[^HDFS-13191.001.patch] illustrates a basic fix without adding new tests, which 
is fairly straightforward. Unittests would be tricky without duplicating all 
the layers of chunk -> block -> file checksum computation in the test case, 
though doing so could be worthwhile as a regression test to detect when 
checksum implementations change even if they're internally consistent within a 
fresh set of tests. If I apply the DataOutputBuffer randomization change 
without these fixes, then TestFileChecksum fails; the test succeeds with the 
patch.

> Internal buffer-sizing details are inadvertently baked into FileChecksum and 
> BlockGroupChecksum
> ---
>
> Key: HDFS-13191
> URL: https://issues.apache.org/jira/browse/HDFS-13191
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: Dennis Huo
>Priority: Minor
> Attachments: HDFS-13191.001.patch
>
>
> The org.apache.hadoop.io.DataOutputBuffer is used as an "optimization" in 
> many places to allow a reusable form of ByteArrayOutputStream, but requires 
> the caller to be careful to use getLength() instead of getData().length to 
> determine the number of actually valid bytes to consume.
> At least three places in the path of constructing FileChecksums have 
> incorrect usage of DataOutputBuffer:
> [FileChecksumHelper digesting block 
> MD5s|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java#L239]
> [BlockChecksumHelper digesting striped block MD5s to construct block-group 
> checksum|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockChecksumHelper.java#L412]
> [MD5MD5CRC32FileChecksum.getBytes()|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java#L76]
> The net effect is that FileChecksum consumes exact BlockChecksums if there 
> are 1 or 2 blocks (at 16 and 32 bytes respectively), but at 3 blocks will 
> round up to 64 bytes, effectively returning the same FileChecksum as if there 
> were 4 blocks and the 4th block happened to have an MD5 exactly equal to 
> 0x00...00. Similarly, BlockGroupChecksum will behave as if there is a 
> power-of-2 number of bytes from BlockChecksums in the BlockGroup.
> This appears to have been a latent bug for at least 9 years for FileChecksum 
> (and since inception for the implementation of striped files), and works fine 
> as long as HDFS implementations strictly stick to the same internal buffering 
> semantics.
> However, this also makes the implementation extremely brittle unless 
> carefully documented. For example, if code is ever refactored to pass around 
> a MessageDigest that consumes block MD5s as they come rather than writing 
> into a DataOutputBuffer before digesting the entire buffer, then the 
> resulting checksum calculations will change unexpectedly.
> At the same time, "fixing" the bug would also be backwards-incompatible, so 
> the bug might need to stick around. At least for the FileChecksum-level 
> calculation, it seems the bug has been latent for a very long time. Since 
> striped files are fairly new, the BlockChecksumHelper could probably be fixed 
> sooner rather than later to avoid perpetuating a bug. The getBytes() method 
> for FileChecksum is more innocuous, so could likely be fixed or left as-is 
> without too much impact either way.
> The bug can be highlighted by changing the internal buffer-growing semantics 
> of the DataOutputBuffer, or simply returning a randomly-sized byte buffer in 
> getData() while only ensuring the first getLength() bytes are actually 
> present, for example:
>  
> {code:java}
> diff --git 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
>  
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> index 4c2fa67f8f2..f2df94e898f 100644
> --- 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> +++ 
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> @@ -103,7 +103,17 @@ private DataOutputBuffer(Buffer buffer) {
> /** Returns the current contents of the buffer.
> * Data is only valid to {@link #getLength()}.
> */
> - public byte[] getData() { return buffer.getData(); }
> + public byte[] 

[jira] [Updated] (HDFS-13191) Internal buffer-sizing details are inadvertently baked into FileChecksum and BlockGroupChecksum

2018-02-23 Thread Dennis Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Huo updated HDFS-13191:
--
Attachment: HDFS-13191.001.patch

> Internal buffer-sizing details are inadvertently baked into FileChecksum and 
> BlockGroupChecksum
> ---
>
> Key: HDFS-13191
> URL: https://issues.apache.org/jira/browse/HDFS-13191
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: Dennis Huo
>Priority: Minor
> Attachments: HDFS-13191.001.patch
>
>
> The org.apache.hadoop.io.DataOutputBuffer is used as an "optimization" in 
> many places to allow a reusable form of ByteArrayOutputStream, but requires 
> the caller to be careful to use getLength() instead of getData().length to 
> determine the number of actually valid bytes to consume.
> At least three places in the path of constructing FileChecksums have 
> incorrect usage of DataOutputBuffer:
> [FileChecksumHelper digesting block 
> MD5s|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java#L239]
> [BlockChecksumHelper digesting striped block MD5s to construct block-group 
> checksum|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockChecksumHelper.java#L412]
> [MD5MD5CRC32FileChecksum.getBytes()|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java#L76]
> The net effect is that FileChecksum consumes exact BlockChecksums if there 
> are 1 or 2 blocks (at 16 and 32 bytes respectively), but at 3 blocks will 
> round up to 64 bytes, effectively returning the same FileChecksum as if there 
> were 4 blocks and the 4th block happened to have an MD5 exactly equal to 
> 0x00...00. Similarly, BlockGroupChecksum will behave as if there is a 
> power-of-2 number of bytes from BlockChecksums in the BlockGroup.
> This appears to have been a latent bug for at least 9 years for FileChecksum 
> (and since inception for the implementation of striped files), and works fine 
> as long as HDFS implementations strictly stick to the same internal buffering 
> semantics.
> However, this also makes the implementation extremely brittle unless 
> carefully documented. For example, if code is ever refactored to pass around 
> a MessageDigest that consumes block MD5s as they come rather than writing 
> into a DataOutputBuffer before digesting the entire buffer, then the 
> resulting checksum calculations will change unexpectedly.
> At the same time, "fixing" the bug would also be backwards-incompatible, so 
> the bug might need to stick around. At least for the FileChecksum-level 
> calculation, it seems the bug has been latent for a very long time. Since 
> striped files are fairly new, the BlockChecksumHelper could probably be fixed 
> sooner rather than later to avoid perpetuating a bug. The getBytes() method 
> for FileChecksum is more innocuous, so could likely be fixed or left as-is 
> without too much impact either way.
> The bug can be highlighted by changing the internal buffer-growing semantics 
> of the DataOutputBuffer, or simply returning a randomly-sized byte buffer in 
> getData() while only ensuring the first getLength() bytes are actually 
> present, for example:
>  
> {code:java}
> diff --git 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
>  
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> index 4c2fa67f8f2..f2df94e898f 100644
> --- 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> +++ 
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> @@ -103,7 +103,17 @@ private DataOutputBuffer(Buffer buffer) {
> /** Returns the current contents of the buffer.
> * Data is only valid to {@link #getLength()}.
> */
> - public byte[] getData() { return buffer.getData(); }
> + public byte[] getData() {
> + java.util.Random rand = new java.util.Random();
> + byte[] bufferData = buffer.getData();
> + byte[] ret = new byte[rand.nextInt(bufferData.length) + bufferData.length];
> + System.arraycopy(bufferData, 0, ret, 0, getLength());
> + return ret;
> + }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13184) RBF: Improve the unit test TestRouterRPCClientRetries

2018-02-23 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13184:
-
Attachment: HDFS-13184.002.patch

> RBF: Improve the unit test TestRouterRPCClientRetries
> -
>
> Key: HDFS-13184
> URL: https://issues.apache.org/jira/browse/HDFS-13184
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-13184.001.patch, HDFS-13184.002.patch
>
>
> From the proposal in this 
> https://issues.apache.org/jira/browse/HDFS-13119?focusedCommentId=16370421=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16370421,
>   this will speed the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13184) RBF: Improve the unit test TestRouterRPCClientRetries

2018-02-23 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375251#comment-16375251
 ] 

Yiqun Lin commented on HDFS-13184:
--

{quote}Regarding HDFS-13184.001.patch, it seems a little anti intuitive for it 
to be a namenode override when it's the Router client doing this. Shouldn't 
this be a setting for the Router itself?
{quote}
Seems making these client settings into Router conf, it won't make sense. The 
Router conf is used for building {{RouterContext}} in UT.
 I create a new client settings and override them in startcluster. Hope this 
way makes sense to you, [~elgoiri].

Attach the updated patch.

> RBF: Improve the unit test TestRouterRPCClientRetries
> -
>
> Key: HDFS-13184
> URL: https://issues.apache.org/jira/browse/HDFS-13184
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-13184.001.patch, HDFS-13184.002.patch
>
>
> From the proposal in this 
> https://issues.apache.org/jira/browse/HDFS-13119?focusedCommentId=16370421=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16370421,
>   this will speed the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13191) Internal buffer-sizing details are inadvertently baked into FileChecksum and BlockGroupChecksum

2018-02-23 Thread Dennis Huo (JIRA)
Dennis Huo created HDFS-13191:
-

 Summary: Internal buffer-sizing details are inadvertently baked 
into FileChecksum and BlockGroupChecksum
 Key: HDFS-13191
 URL: https://issues.apache.org/jira/browse/HDFS-13191
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, hdfs-client
Reporter: Dennis Huo


The org.apache.hadoop.io.DataOutputBuffer is used as an "optimization" in many 
places to allow a reusable form of ByteArrayOutputStream, but requires the 
caller to be careful to use getLength() instead of getData().length to 
determine the number of actually valid bytes to consume.

At least three places in the path of constructing FileChecksums have incorrect 
usage of DataOutputBuffer:

[FileChecksumHelper digesting block 
MD5s|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java#L239]

[BlockChecksumHelper digesting striped block MD5s to construct block-group 
checksum|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockChecksumHelper.java#L412]

[MD5MD5CRC32FileChecksum.getBytes()|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java#L76]

The net effect is that FileChecksum consumes exact BlockChecksums if there are 
1 or 2 blocks (at 16 and 32 bytes respectively), but at 3 blocks will round up 
to 64 bytes, effectively returning the same FileChecksum as if there were 4 
blocks and the 4th block happened to have an MD5 exactly equal to 0x00...00. 
Similarly, BlockGroupChecksum will behave as if there is a power-of-2 number of 
bytes from BlockChecksums in the BlockGroup.

This appears to have been a latent bug for at least 9 years for FileChecksum 
(and since inception for the implementation of striped files), and works fine 
as long as HDFS implementations strictly stick to the same internal buffering 
semantics.

However, this also makes the implementation extremely brittle unless carefully 
documented. For example, if code is ever refactored to pass around a 
MessageDigest that consumes block MD5s as they come rather than writing into a 
DataOutputBuffer before digesting the entire buffer, then the resulting 
checksum calculations will change unexpectedly.

At the same time, "fixing" the bug would also be backwards-incompatible, so the 
bug might need to stick around. At least for the FileChecksum-level 
calculation, it seems the bug has been latent for a very long time. Since 
striped files are fairly new, the BlockChecksumHelper could probably be fixed 
sooner rather than later to avoid perpetuating a bug. The getBytes() method for 
FileChecksum is more innocuous, so could likely be fixed or left as-is without 
too much impact either way.

The bug can be highlighted by changing the internal buffer-growing semantics of 
the DataOutputBuffer, or simply returning a randomly-sized byte buffer in 
getData() while only ensuring the first getLength() bytes are actually present, 
for example:

 
{code:java}
diff --git 
a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
 
b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
index 4c2fa67f8f2..f2df94e898f 100644
--- 
a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
+++ 
b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
@@ -103,7 +103,17 @@ private DataOutputBuffer(Buffer buffer) {
/** Returns the current contents of the buffer.
* Data is only valid to {@link #getLength()}.
*/
- public byte[] getData() { return buffer.getData(); }
+ public byte[] getData() {
+ java.util.Random rand = new java.util.Random();
+ byte[] bufferData = buffer.getData();
+ byte[] ret = new byte[rand.nextInt(bufferData.length) + bufferData.length];
+ System.arraycopy(bufferData, 0, ret, 0, getLength());
+ return ret;
+ }

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13188) Disk Balancer: Support multiple block pools during block move

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375240#comment-16375240
 ] 

genericqa commented on HDFS-13188:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  3s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 49s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13188 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911825/HDFS-13188.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1e7285035467 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 68ce193 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23182/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23182/testReport/ |
| Max. process+thread count | 4183 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375221#comment-16375221
 ] 

genericqa commented on HDFS-13145:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 51s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}186m 20s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestBlocksScheduledCounter |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13145 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911816/HDFS-13145.000.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 212343408dbf 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 68ce193 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23178/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23178/testReport/ |
| Max. process+thread count | 2980 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375217#comment-16375217
 ] 

genericqa commented on HDFS-13145:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 43s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 40s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}172m 11s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13145 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911818/HDFS-13145.000.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f59f4dca3303 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 68ce193 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23179/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23179/testReport/ |
| Max. process+thread count | 3619 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23179/console |
| Powered by | Apache Yetus 

[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375206#comment-16375206
 ] 

genericqa commented on HDFS-13102:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 32s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 18 new + 14 unchanged - 0 fixed = 32 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 45s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  
8s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m  1s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 11s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Bad attempt to compute absolute value of signed random integer in 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryDiffList.random(int)  
At DirectoryDiffList.java:value of signed random integer in 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryDiffList.random(int)  
At DirectoryDiffList.java:[line 178] |
|  |  
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryDiffList$SkipListNode 
defines compareTo(Object) and uses Object.equals()  At 
DirectoryDiffList.java:Object.equals()  At DirectoryDiffList.java:[line 74] |
| Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | 

[jira] [Comment Edited] (HDFS-13052) WebHDFS: Add support for snasphot diff

2018-02-23 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374688#comment-16374688
 ] 

Xiaoyu Yao edited comment on HDFS-13052 at 2/24/18 1:04 AM:


[~ljain], the Jenkins run looks good. The test failure does not seem to be 
related and can't repro on my environment. 

+1 for the patch v7 and I will commit it shortly.  

I opened HDFS-13190 to document this feature in the snapshot section of 
*WebHDFS.md* in a separate JIRA.




was (Author: xyao):
[~ljain], the Jenkins run looks good. The test failure does not seem to be 
related and can't repro on my environment. 

One last ask:
Can you update the snapshot section of *WebHDFS.md* to include this feature, 
ideally with some curl based examples?


> WebHDFS: Add support for snasphot diff
> --
>
> Key: HDFS-13052
> URL: https://issues.apache.org/jira/browse/HDFS-13052
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13052.001.patch, HDFS-13052.002.patch, 
> HDFS-13052.003.patch, HDFS-13052.004.patch, HDFS-13052.005.patch, 
> HDFS-13052.006.patch, HDFS-13052.007.patch
>
>
> This Jira aims to implement snapshot diff operation for webHdfs filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13190) Document WebHDFS support for snasphot diff

2018-02-23 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-13190:
-

 Summary: Document WebHDFS support for snasphot diff
 Key: HDFS-13190
 URL: https://issues.apache.org/jira/browse/HDFS-13190
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation, webhdfs
Reporter: Xiaoyu Yao
Assignee: Lokesh Jain


This ticket is opened to document the WebHDFS: Add support for snasphot diff 
from HDFS-13052 in WebHDFS.md.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375172#comment-16375172
 ] 

Erik Krogen edited comment on HDFS-13145 at 2/24/18 12:55 AM:
--

v0 LGTM overall. Simple fix. Verified that the test fails without your change. 
Considering that HDFS-10519 is in 3.0, I'm thinking we should target this for 
branch-3.0 and up?

Minor nit: Can we extend the comment above the if-statement explaining that if 
the log isn't in progress, all edits are guaranteed to be committed? And also 
maybe a comment on the test case explaining the real world situation it tests, 
i.e. replace:
{code}
// Do recovery from a separate QJM.
{code}
with something like
{code}
  // Do recovery from a separate QJM, like during failover
{code}
It wasn't immediately obvious to me what was happening. I don't have strong 
feelings about this one, though. Maybe it's already clear and I just missed it 
:) 

Last nit: Should we add a {{checkRecovery()}} to the test? 


was (Author: xkrogen):
v0 LGTM. Simple fix. Verified that the test fails without your change. 
Considering that HDFS-10519 is in 3.0, I'm thinking we should target this for 
branch-3.0 and up?

> SBN crash when transition to ANN with in-progress edit tailing enabled
> --
>
> Key: HDFS-13145
> URL: https://issues.apache.org/jira/browse/HDFS-13145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13145.000.patch
>
>
> With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} 
> will send two batches to JNs, one normal edit batch followed by a dummy batch 
> to update the commit ID on JNs.
> {code}
>   QuorumCall qcall = loggers.sendEdits(
>   segmentTxId, firstTxToFlush,
>   numReadyTxns, data);
>   loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");
>   
>   // Since we successfully wrote this batch, let the loggers know. Any 
> future
>   // RPCs will thus let the loggers know of the most recent transaction, 
> even
>   // if a logger has fallen behind.
>   loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1);
>   // If we don't have this dummy send, committed TxId might be one-batch
>   // stale on the Journal Nodes
>   if (updateCommittedTxId) {
> QuorumCall fakeCall = loggers.sendEdits(
> segmentTxId, firstTxToFlush,
> 0, new byte[0]);
> loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits");
>   }
> {code}
> Between each batch, it will wait for the JNs to reach a quorum. However, if 
> the ANN crashes in between, then SBN will crash while transiting to ANN:
> {code}
> java.lang.IllegalStateException: Cannot start writing at txid 24312595802 
> when there is a stream available for read: ..
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490)
> 2018-02-13 00:43:20,728 INFO 

[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375172#comment-16375172
 ] 

Erik Krogen commented on HDFS-13145:


v0 LGTM. Simple fix. Verified that the test fails without your change. 
Considering that HDFS-10519 is in 3.0, I'm thinking we should target this for 
branch-3.0 and up?

> SBN crash when transition to ANN with in-progress edit tailing enabled
> --
>
> Key: HDFS-13145
> URL: https://issues.apache.org/jira/browse/HDFS-13145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13145.000.patch
>
>
> With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} 
> will send two batches to JNs, one normal edit batch followed by a dummy batch 
> to update the commit ID on JNs.
> {code}
>   QuorumCall qcall = loggers.sendEdits(
>   segmentTxId, firstTxToFlush,
>   numReadyTxns, data);
>   loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");
>   
>   // Since we successfully wrote this batch, let the loggers know. Any 
> future
>   // RPCs will thus let the loggers know of the most recent transaction, 
> even
>   // if a logger has fallen behind.
>   loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1);
>   // If we don't have this dummy send, committed TxId might be one-batch
>   // stale on the Journal Nodes
>   if (updateCommittedTxId) {
> QuorumCall fakeCall = loggers.sendEdits(
> segmentTxId, firstTxToFlush,
> 0, new byte[0]);
> loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits");
>   }
> {code}
> Between each batch, it will wait for the JNs to reach a quorum. However, if 
> the ANN crashes in between, then SBN will crash while transiting to ANN:
> {code}
> java.lang.IllegalStateException: Cannot start writing at txid 24312595802 
> when there is a stream available for read: ..
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490)
> 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> This is because without the dummy batch, the {{commitTxnId}} will lag behind 
> the {{endTxId}}, which caused the check in {{openForWrite}} to fail:
> {code}
> List streams = new ArrayList();
> journalSet.selectInputStreams(streams, segmentTxId, true, false);
> if (!streams.isEmpty()) {
>   String error = String.format("Cannot start writing at txid %s " +
> "when there is a stream available for read: %s",
> segmentTxId, streams.get(0));
>   IOUtils.cleanupWithLogger(LOG,
>   streams.toArray(new EditLogInputStream[0]));
>   throw new IllegalStateException(error);
> }
> {code}
> In our environment, this can be reproduced pretty consistently, which will 
> leave the cluster with no running namenodes. Even though we are using a 2.8.2 
> backport, I believe 

[jira] [Updated] (HDFS-13188) Disk Balancer: Support multiple block pools during block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13188:
--
Attachment: HDFS-13188.02.patch

> Disk Balancer: Support multiple block pools during block move
> -
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13188.01.patch, HDFS-13188.02.patch
>
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13188) Disk Balancer: Support multiple block pools during block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375171#comment-16375171
 ] 

Bharat Viswanadham edited comment on HDFS-13188 at 2/24/18 12:47 AM:
-

[~elgoiri] Attached the patch v02 to add a federated test case, to test the 
behavior.


was (Author: bharatviswa):
[~elgoiri] Attached the patch v01 to add a federated test case, to test the 
behavior.

> Disk Balancer: Support multiple block pools during block move
> -
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13188.01.patch, HDFS-13188.02.patch
>
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13188) Disk Balancer: Support multiple block pools during block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13188:
--
Attachment: (was: HDFS-13188.01.patch)

> Disk Balancer: Support multiple block pools during block move
> -
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13188.01.patch
>
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13043) RBF: Expose the state of the Routers in the federation

2018-02-23 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375169#comment-16375169
 ] 

Wei Yan commented on HDFS-13043:


"short-cut" is too short. mistyped "i".  Reassigned back

> RBF: Expose the state of the Routers in the federation
> --
>
> Key: HDFS-13043
> URL: https://issues.apache.org/jira/browse/HDFS-13043
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HDFS-13043.000.patch, HDFS-13043.001.patch, 
> HDFS-13043.002.patch, HDFS-13043.003.patch, HDFS-13043.004.patch, 
> HDFS-13043.005.patch, HDFS-13043.006.patch, HDFS-13043.007.patch, 
> HDFS-13043.008.patch, HDFS-13043.009.patch, router-info.png
>
>
> The Router should expose the state of the other Routers in the federation 
> through a user UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13188) Disk Balancer: Support multiple block pools during block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375171#comment-16375171
 ] 

Bharat Viswanadham commented on HDFS-13188:
---

[~elgoiri] Attached the patch v01 to add a federated test case, to test the 
behavior.

> Disk Balancer: Support multiple block pools during block move
> -
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13188.01.patch
>
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13188) Disk Balancer: Support multiple block pools during block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13188:
--
Attachment: HDFS-13188.01.patch

> Disk Balancer: Support multiple block pools during block move
> -
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13188.01.patch, HDFS-13188.01.patch
>
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13043) RBF: Expose the state of the Routers in the federation

2018-02-23 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan reassigned HDFS-13043:
--

Assignee: Wei Yan  (was: Íñigo Goiri)

> RBF: Expose the state of the Routers in the federation
> --
>
> Key: HDFS-13043
> URL: https://issues.apache.org/jira/browse/HDFS-13043
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Wei Yan
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HDFS-13043.000.patch, HDFS-13043.001.patch, 
> HDFS-13043.002.patch, HDFS-13043.003.patch, HDFS-13043.004.patch, 
> HDFS-13043.005.patch, HDFS-13043.006.patch, HDFS-13043.007.patch, 
> HDFS-13043.008.patch, HDFS-13043.009.patch, router-info.png
>
>
> The Router should expose the state of the other Routers in the federation 
> through a user UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13043) RBF: Expose the state of the Routers in the federation

2018-02-23 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan reassigned HDFS-13043:
--

Assignee: Íñigo Goiri  (was: Wei Yan)

> RBF: Expose the state of the Routers in the federation
> --
>
> Key: HDFS-13043
> URL: https://issues.apache.org/jira/browse/HDFS-13043
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HDFS-13043.000.patch, HDFS-13043.001.patch, 
> HDFS-13043.002.patch, HDFS-13043.003.patch, HDFS-13043.004.patch, 
> HDFS-13043.005.patch, HDFS-13043.006.patch, HDFS-13043.007.patch, 
> HDFS-13043.008.patch, HDFS-13043.009.patch, router-info.png
>
>
> The Router should expose the state of the other Routers in the federation 
> through a user UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13189) Standby NameNode should roll active edit log when checkpointing

2018-02-23 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13189:

Issue Type: Sub-task  (was: Bug)
Parent: HDFS-12943

> Standby NameNode should roll active edit log when checkpointing
> ---
>
> Key: HDFS-13189
> URL: https://issues.apache.org/jira/browse/HDFS-13189
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chao Sun
>Priority: Minor
>
> When the SBN is doing checkpointing, it will hold the {{cpLock}}. In the 
> current implementation of edit log tailer thread, it will first check and 
> roll active edit log, and then tail and apply edits. In the case of 
> checkpointing, it will be blocked on the {{cpLock}} and will not roll the 
> edit log.
> It seems there is no dependency between the edit log roll and tailing edits, 
> so a better may be to do these in separate threads. This will be helpful for 
> people who uses the observer feature without in-progress edit log tailing. 
> An alternative is to configure 
> {{dfs.namenode.edit.log.autoroll.multiplier.threshold}} and 
> {{dfs.namenode.edit.log.autoroll.check.interval.ms}} to let ANN roll its own 
> log more frequently in case SBN is stuck on the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13189) Standby NameNode should roll active edit log when checkpointing

2018-02-23 Thread Chao Sun (JIRA)
Chao Sun created HDFS-13189:
---

 Summary: Standby NameNode should roll active edit log when 
checkpointing
 Key: HDFS-13189
 URL: https://issues.apache.org/jira/browse/HDFS-13189
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Chao Sun


When the SBN is doing checkpointing, it will hold the {{cpLock}}. In the 
current implementation of edit log tailer thread, it will first check and roll 
active edit log, and then tail and apply edits. In the case of checkpointing, 
it will be blocked on the {{cpLock}} and will not roll the edit log.

It seems there is no dependency between the edit log roll and tailing edits, so 
a better may be to do these in separate threads. This will be helpful for 
people who uses the observer feature without in-progress edit log tailing. 

An alternative is to configure 
{{dfs.namenode.edit.log.autoroll.multiplier.threshold}} and 
{{dfs.namenode.edit.log.autoroll.check.interval.ms}} to let ANN roll its own 
log more frequently in case SBN is stuck on the lock.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375145#comment-16375145
 ] 

genericqa commented on HDFS-13102:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 32s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 19 new + 14 unchanged - 0 fixed = 33 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
56s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m  6s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Bad attempt to compute absolute value of signed random integer in 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryDiffList.random(int)  
At DirectoryDiffList.java:value of signed random integer in 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryDiffList.random(int)  
At DirectoryDiffList.java:[line 178] |
|  |  
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryDiffList$SkipListNode 
defines compareTo(Object) and uses Object.equals()  At 
DirectoryDiffList.java:Object.equals()  At DirectoryDiffList.java:[line 74] |
| Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.federation.router.TestRouterSafemode |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13102 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911802/HDFS-13102.005.patch |
| 

[jira] [Commented] (HDFS-13181) DiskBalancer: Add an configuration for valid plan hours

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375148#comment-16375148
 ] 

genericqa commented on HDFS-13181:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 33s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 437 unchanged - 0 fixed = 438 total (was 437) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 37s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 33s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}172m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13181 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911796/HDFS-13181.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 36affbdebaf5 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 59cf758 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23175/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-13163) Move invalidated blocks to replica-trash with disk layout based on timestamp

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375146#comment-16375146
 ] 

genericqa commented on HDFS-13163:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-12996 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
38s{color} | {color:green} HDFS-12996 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} HDFS-12996 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} HDFS-12996 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} HDFS-12996 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
28s{color} | {color:green} HDFS-12996 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} HDFS-12996 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 9 new + 85 unchanged - 0 fixed = 94 total (was 85) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 39s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
29s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13163 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911798/HDFS-13163-HDFS-12996.01.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f98593e9fbc6 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-12996 / 85a0ed7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23176/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-13055) Aggregate usage statistics from datanodes

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375115#comment-16375115
 ] 

genericqa commented on HDFS-13055:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  8s{color} | {color:orange} hadoop-hdfs-project: The patch generated 5 new + 
1265 unchanged - 2 fixed = 1270 total (was 1267) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
46s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}120m 17s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}192m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.blockmanagement.TestDatanodeManager |
|   | hadoop.hdfs.TestAclsEndToEnd |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13055 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911792/HDFS-13055.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 4e3c6a15c038 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Updated] (HDFS-13188) Disk Balancer: Support multiple block pools during block move

2018-02-23 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-13188:

Summary: Disk Balancer: Support multiple block pools during block move  
(was: Disk Balancer: Bug in DiskBalancer in block move)

> Disk Balancer: Support multiple block pools during block move
> -
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13188.01.patch
>
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375100#comment-16375100
 ] 

Íñigo Goiri commented on HDFS-13188:


Any unit test we can add to make sure we are not skipping the other block pools?

> Disk Balancer: Bug in DiskBalancer in block move
> 
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13188.01.patch
>
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread Bharat Viswanadham (JIRA)
Bharat Viswanadham created HDFS-13188:
-

 Summary: Disk Balancer: Bug in DiskBalancer in block move
 Key: HDFS-13188
 URL: https://issues.apache.org/jira/browse/HDFS-13188
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Bharat Viswanadham
Assignee: Bharat Viswanadham


During execute plan:

When multiple block pools are there, it will only copy from first block pool 
blocks to destination disk, when balancing.

We want to distribute the blocks from all block pools on source disk to 
destination disk during balancing.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13188:
--
Attachment: HDFS-13188.01.patch

> Disk Balancer: Bug in DiskBalancer in block move
> 
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13188.01.patch
>
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375096#comment-16375096
 ] 

Bharat Viswanadham commented on HDFS-13188:
---

Attached wrong patch before, deleted it.

> Disk Balancer: Bug in DiskBalancer in block move
> 
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13188.01.patch
>
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13188:
--
Attachment: (was: HDFS-13188.00.patch)

> Disk Balancer: Bug in DiskBalancer in block move
> 
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13188:
--
Description: 
During execute plan:

*Federated setup:*

When multiple block pools are there, it will only copy from blocks from first 
block pool to destination disk, when balancing.

We want to distribute the blocks from all block pools on source disk to 
destination disk during balancing.

 

  was:
During execute plan:

When multiple block pools are there, it will only copy from blocks from first 
block pool to destination disk, when balancing.

We want to distribute the blocks from all block pools on source disk to 
destination disk during balancing.

 


> Disk Balancer: Bug in DiskBalancer in block move
> 
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13188:
--
Status: Patch Available  (was: In Progress)

> Disk Balancer: Bug in DiskBalancer in block move
> 
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> During execute plan:
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13188:
--
Attachment: HDFS-13188.00.patch

> Disk Balancer: Bug in DiskBalancer in block move
> 
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> During execute plan:
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-23 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-13102:
---
Attachment: HDFS-13102.005.patch

> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch, HDFS-13102.004.patch, HDFS-13102.005.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-23 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-13102:
---
Attachment: (was: HDFS-13102.005.patch)

> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch, HDFS-13102.004.patch, HDFS-13102.005.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-23 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375087#comment-16375087
 ] 

Shashikant Banerjee commented on HDFS-13102:


Removed the earlier v5 patch as it was stale. Adding the latest v5 patch.

> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch, HDFS-13102.004.patch, HDFS-13102.005.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13188:
--
Component/s: diskbalancer

> Disk Balancer: Bug in DiskBalancer in block move
> 
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> During execute plan:
> When multiple block pools are there, it will only copy from first block pool 
> blocks to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13188:
--
Description: 
During execute plan:

When multiple block pools are there, it will only copy from blocks from first 
block pool to destination disk, when balancing.

We want to distribute the blocks from all block pools on source disk to 
destination disk during balancing.

 

  was:
During execute plan:

When multiple block pools are there, it will only copy from first block pool 
blocks to destination disk, when balancing.

We want to distribute the blocks from all block pools on source disk to 
destination disk during balancing.

 


> Disk Balancer: Bug in DiskBalancer in block move
> 
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> During execute plan:
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-13188) Disk Balancer: Bug in DiskBalancer in block move

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-13188 started by Bharat Viswanadham.
-
> Disk Balancer: Bug in DiskBalancer in block move
> 
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> During execute plan:
> When multiple block pools are there, it will only copy from first block pool 
> blocks to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13187) RBF: Fix Routers information shown in the web UI

2018-02-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375074#comment-16375074
 ] 

Íñigo Goiri commented on HDFS-13187:


bq. I'll still prefer moving this part to a separate tab, to differentiate 
"Subclusters" and "Routers". And also we may add more information to the 
"Routers" page.

Sure, feel free to post a patch with this fix and the new tab.

> RBF: Fix Routers information shown in the web UI
> 
>
> Key: HDFS-13187
> URL: https://issues.apache.org/jira/browse/HDFS-13187
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Minor
> Attachments: HDFS-13187.000.patch, image-2018-02-23-09-23-29-495.png
>
>
> HDFS-13043 added component to existing web UI to include router information 
> there. But currently the UI doesn't show correctly. The new table is shown at 
> the bottom of each tab, and it doesn't have data there. It missed some code 
> in the .html and .js side.
>  
> A quick screen shot of what it shows currently (at the bottom of each page 
> tag):
> !image-2018-02-23-09-23-29-495.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13187) RBF: Fix Routers information shown in the web UI

2018-02-23 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375062#comment-16375062
 ] 

Wei Yan commented on HDFS-13187:


[~elgoiri] Thanks for the quick fix. I tried in our cluster, and it works well. 
I'll still prefer moving this part to a separate tab, to differentiate 
"Subclusters" and "Routers". And also we may add more information to the 
"Routers" page.

> RBF: Fix Routers information shown in the web UI
> 
>
> Key: HDFS-13187
> URL: https://issues.apache.org/jira/browse/HDFS-13187
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Minor
> Attachments: HDFS-13187.000.patch, image-2018-02-23-09-23-29-495.png
>
>
> HDFS-13043 added component to existing web UI to include router information 
> there. But currently the UI doesn't show correctly. The new table is shown at 
> the bottom of each tab, and it doesn't have data there. It missed some code 
> in the .html and .js side.
>  
> A quick screen shot of what it shows currently (at the bottom of each page 
> tag):
> !image-2018-02-23-09-23-29-495.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13145:

Attachment: HDFS-13145.000.patch

> SBN crash when transition to ANN with in-progress edit tailing enabled
> --
>
> Key: HDFS-13145
> URL: https://issues.apache.org/jira/browse/HDFS-13145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13145.000.patch
>
>
> With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} 
> will send two batches to JNs, one normal edit batch followed by a dummy batch 
> to update the commit ID on JNs.
> {code}
>   QuorumCall qcall = loggers.sendEdits(
>   segmentTxId, firstTxToFlush,
>   numReadyTxns, data);
>   loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");
>   
>   // Since we successfully wrote this batch, let the loggers know. Any 
> future
>   // RPCs will thus let the loggers know of the most recent transaction, 
> even
>   // if a logger has fallen behind.
>   loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1);
>   // If we don't have this dummy send, committed TxId might be one-batch
>   // stale on the Journal Nodes
>   if (updateCommittedTxId) {
> QuorumCall fakeCall = loggers.sendEdits(
> segmentTxId, firstTxToFlush,
> 0, new byte[0]);
> loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits");
>   }
> {code}
> Between each batch, it will wait for the JNs to reach a quorum. However, if 
> the ANN crashes in between, then SBN will crash while transiting to ANN:
> {code}
> java.lang.IllegalStateException: Cannot start writing at txid 24312595802 
> when there is a stream available for read: ..
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490)
> 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> This is because without the dummy batch, the {{commitTxnId}} will lag behind 
> the {{endTxId}}, which caused the check in {{openForWrite}} to fail:
> {code}
> List streams = new ArrayList();
> journalSet.selectInputStreams(streams, segmentTxId, true, false);
> if (!streams.isEmpty()) {
>   String error = String.format("Cannot start writing at txid %s " +
> "when there is a stream available for read: %s",
> segmentTxId, streams.get(0));
>   IOUtils.cleanupWithLogger(LOG,
>   streams.toArray(new EditLogInputStream[0]));
>   throw new IllegalStateException(error);
> }
> {code}
> In our environment, this can be reproduced pretty consistently, which will 
> leave the cluster with no running namenodes. Even though we are using a 2.8.2 
> backport, I believe the same issue also exist in 3.0.x. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Updated] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13145:

Attachment: (was: HDFS-13145.000.patch)

> SBN crash when transition to ANN with in-progress edit tailing enabled
> --
>
> Key: HDFS-13145
> URL: https://issues.apache.org/jira/browse/HDFS-13145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13145.000.patch
>
>
> With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} 
> will send two batches to JNs, one normal edit batch followed by a dummy batch 
> to update the commit ID on JNs.
> {code}
>   QuorumCall qcall = loggers.sendEdits(
>   segmentTxId, firstTxToFlush,
>   numReadyTxns, data);
>   loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");
>   
>   // Since we successfully wrote this batch, let the loggers know. Any 
> future
>   // RPCs will thus let the loggers know of the most recent transaction, 
> even
>   // if a logger has fallen behind.
>   loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1);
>   // If we don't have this dummy send, committed TxId might be one-batch
>   // stale on the Journal Nodes
>   if (updateCommittedTxId) {
> QuorumCall fakeCall = loggers.sendEdits(
> segmentTxId, firstTxToFlush,
> 0, new byte[0]);
> loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits");
>   }
> {code}
> Between each batch, it will wait for the JNs to reach a quorum. However, if 
> the ANN crashes in between, then SBN will crash while transiting to ANN:
> {code}
> java.lang.IllegalStateException: Cannot start writing at txid 24312595802 
> when there is a stream available for read: ..
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490)
> 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> This is because without the dummy batch, the {{commitTxnId}} will lag behind 
> the {{endTxId}}, which caused the check in {{openForWrite}} to fail:
> {code}
> List streams = new ArrayList();
> journalSet.selectInputStreams(streams, segmentTxId, true, false);
> if (!streams.isEmpty()) {
>   String error = String.format("Cannot start writing at txid %s " +
> "when there is a stream available for read: %s",
> segmentTxId, streams.get(0));
>   IOUtils.cleanupWithLogger(LOG,
>   streams.toArray(new EditLogInputStream[0]));
>   throw new IllegalStateException(error);
> }
> {code}
> In our environment, this can be reproduced pretty consistently, which will 
> leave the cluster with no running namenodes. Even though we are using a 2.8.2 
> backport, I believe the same issue also exist in 3.0.x. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-

[jira] [Updated] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13145:

Attachment: HDFS-13145.000.patch

> SBN crash when transition to ANN with in-progress edit tailing enabled
> --
>
> Key: HDFS-13145
> URL: https://issues.apache.org/jira/browse/HDFS-13145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13145.000.patch
>
>
> With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} 
> will send two batches to JNs, one normal edit batch followed by a dummy batch 
> to update the commit ID on JNs.
> {code}
>   QuorumCall qcall = loggers.sendEdits(
>   segmentTxId, firstTxToFlush,
>   numReadyTxns, data);
>   loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");
>   
>   // Since we successfully wrote this batch, let the loggers know. Any 
> future
>   // RPCs will thus let the loggers know of the most recent transaction, 
> even
>   // if a logger has fallen behind.
>   loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1);
>   // If we don't have this dummy send, committed TxId might be one-batch
>   // stale on the Journal Nodes
>   if (updateCommittedTxId) {
> QuorumCall fakeCall = loggers.sendEdits(
> segmentTxId, firstTxToFlush,
> 0, new byte[0]);
> loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits");
>   }
> {code}
> Between each batch, it will wait for the JNs to reach a quorum. However, if 
> the ANN crashes in between, then SBN will crash while transiting to ANN:
> {code}
> java.lang.IllegalStateException: Cannot start writing at txid 24312595802 
> when there is a stream available for read: ..
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490)
> 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> This is because without the dummy batch, the {{commitTxnId}} will lag behind 
> the {{endTxId}}, which caused the check in {{openForWrite}} to fail:
> {code}
> List streams = new ArrayList();
> journalSet.selectInputStreams(streams, segmentTxId, true, false);
> if (!streams.isEmpty()) {
>   String error = String.format("Cannot start writing at txid %s " +
> "when there is a stream available for read: %s",
> segmentTxId, streams.get(0));
>   IOUtils.cleanupWithLogger(LOG,
>   streams.toArray(new EditLogInputStream[0]));
>   throw new IllegalStateException(error);
> }
> {code}
> In our environment, this can be reproduced pretty consistently, which will 
> leave the cluster with no running namenodes. Even though we are using a 2.8.2 
> backport, I believe the same issue also exist in 3.0.x. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Updated] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled

2018-02-23 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13145:

Status: Patch Available  (was: Open)

Submit patch v0.

> SBN crash when transition to ANN with in-progress edit tailing enabled
> --
>
> Key: HDFS-13145
> URL: https://issues.apache.org/jira/browse/HDFS-13145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13145.000.patch
>
>
> With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} 
> will send two batches to JNs, one normal edit batch followed by a dummy batch 
> to update the commit ID on JNs.
> {code}
>   QuorumCall qcall = loggers.sendEdits(
>   segmentTxId, firstTxToFlush,
>   numReadyTxns, data);
>   loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");
>   
>   // Since we successfully wrote this batch, let the loggers know. Any 
> future
>   // RPCs will thus let the loggers know of the most recent transaction, 
> even
>   // if a logger has fallen behind.
>   loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1);
>   // If we don't have this dummy send, committed TxId might be one-batch
>   // stale on the Journal Nodes
>   if (updateCommittedTxId) {
> QuorumCall fakeCall = loggers.sendEdits(
> segmentTxId, firstTxToFlush,
> 0, new byte[0]);
> loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits");
>   }
> {code}
> Between each batch, it will wait for the JNs to reach a quorum. However, if 
> the ANN crashes in between, then SBN will crash while transiting to ANN:
> {code}
> java.lang.IllegalStateException: Cannot start writing at txid 24312595802 
> when there is a stream available for read: ..
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490)
> 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> This is because without the dummy batch, the {{commitTxnId}} will lag behind 
> the {{endTxId}}, which caused the check in {{openForWrite}} to fail:
> {code}
> List streams = new ArrayList();
> journalSet.selectInputStreams(streams, segmentTxId, true, false);
> if (!streams.isEmpty()) {
>   String error = String.format("Cannot start writing at txid %s " +
> "when there is a stream available for read: %s",
> segmentTxId, streams.get(0));
>   IOUtils.cleanupWithLogger(LOG,
>   streams.toArray(new EditLogInputStream[0]));
>   throw new IllegalStateException(error);
> }
> {code}
> In our environment, this can be reproduced pretty consistently, which will 
> leave the cluster with no running namenodes. Even though we are using a 2.8.2 
> backport, I believe the same issue also exist in 3.0.x. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HDFS-13164) File not closed if streamer fail with DSQuotaExceededException

2018-02-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375039#comment-16375039
 ] 

Hudson commented on HDFS-13164:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13709 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13709/])
HDFS-13164. File not closed if streamer fail with (xiao: rev 
51088d323359587dca7831f74c9d065c2fccc60d)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/LeaseRenewer.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java


> File not closed if streamer fail with DSQuotaExceededException
> --
>
> Key: HDFS-13164
> URL: https://issues.apache.org/jira/browse/HDFS-13164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-13164.01.patch, HDFS-13164.02.patch
>
>
>  This is found during yarn log aggregation but theoretically could happen to 
> any client.
> If the dir's space quota is exceeded, the following would happen when a file 
> is created:
>  - client {{startFile}} rpc to NN, gets a {{DFSOutputStream}}.
>  - writing to the stream would trigger the streamer to {{getAdditionalBlock}} 
> rpc to NN, which would get the DSQuotaExceededException
>  - client closes the stream
>   
>  The fact that this would leave a 0-sized (or whatever size left in the 
> quota) file in HDFS is beyond the scope of this jira. However, the file would 
> be left in openforwrite status (shown in {{fsck -openforwrite)}} at least, 
> and could potentially leak leaseRenewer too.
> This is because in the close implementation,
>  # {{isClosed}} is first checked, and the close call will be a no-op if 
> {{isClosed == true}}.
>  # {{flushInternal}} checks {{isClosed}}, and throws the exception right away 
> if true
> {{isClosed}} does this: {{return closed || getStreamer().streamerClosed;}}
> When the disk quota is reached, {{getAdditionalBlock}} will throw when the 
> streamer calls. Because the streamer runs in a separate thread, at the time 
> the client calls close on the stream, the streamer may or may not have 
> reached the Quota exception. If it has, then due to #1, the close call on the 
> stream will be no-op. If it hasn't, then due to #2 the {{completeFile}} logic 
> will be skipped.
> {code:java}
> protected synchronized void closeImpl() throws IOException {
> if (isClosed()) {
>   IOException e = lastException.getAndSet(null);
>   if (e == null)
> return;
>   else
> throw e;
> }
>   try {
> flushBuffer(); // flush from all upper layers
> ...
> flushInternal(); // flush all data to Datanodes
> // get last block before destroying the streamer
> ExtendedBlock lastBlock = getStreamer().getBlock();
> try (TraceScope ignored =
>dfsClient.getTracer().newScope("completeFile")) {
>completeFile(lastBlock);
> }
>} catch (ClosedChannelException ignored) {
>} finally {
>  closeThreads(true);
>}
>  }
>  {code}
> Log snippets:
> {noformat}
> 2018-02-16 15:59:32,916 DEBUG org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Quota Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /DIR is exceeded: quota = 200 B = 1.91 MB but diskspace consumed = 
> 404139552 B = 385.42 MB
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyDiskspaceQuota(DirectoryWithQuotaFeature.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:2124)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1991)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1966)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:463)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3896)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3484)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:686)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
> at 
> 

[jira] [Commented] (HDFS-13187) RBF: Fix Routers information shown in the web UI

2018-02-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375018#comment-16375018
 ] 

Íñigo Goiri commented on HDFS-13187:


Thanks [~ywskycn] for reporting, I found the issue.
I screwed up when copying from the internal branch to the external one.
I attached  [^HDFS-13187.000.patch] with the fix for what is broken.
I'm open to moving it to a tab too.

> RBF: Fix Routers information shown in the web UI
> 
>
> Key: HDFS-13187
> URL: https://issues.apache.org/jira/browse/HDFS-13187
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Minor
> Attachments: HDFS-13187.000.patch, image-2018-02-23-09-23-29-495.png
>
>
> HDFS-13043 added component to existing web UI to include router information 
> there. But currently the UI doesn't show correctly. The new table is shown at 
> the bottom of each tab, and it doesn't have data there. It missed some code 
> in the .html and .js side.
>  
> A quick screen shot of what it shows currently (at the bottom of each page 
> tag):
> !image-2018-02-23-09-23-29-495.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13187) RBF: Fix Routers information shown in the web UI

2018-02-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13187:
---
Attachment: HDFS-13187.000.patch

> RBF: Fix Routers information shown in the web UI
> 
>
> Key: HDFS-13187
> URL: https://issues.apache.org/jira/browse/HDFS-13187
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Minor
> Attachments: HDFS-13187.000.patch, image-2018-02-23-09-23-29-495.png
>
>
> HDFS-13043 added component to existing web UI to include router information 
> there. But currently the UI doesn't show correctly. The new table is shown at 
> the bottom of each tab, and it doesn't have data there. It missed some code 
> in the .html and .js side.
>  
> A quick screen shot of what it shows currently (at the bottom of each page 
> tag):
> !image-2018-02-23-09-23-29-495.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13164) File not closed if streamer fail with DSQuotaExceededException

2018-02-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13164:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to . trunk, branch-3.1, branch-3.0, branch-2, branch-2.9, 
branch-2.8.

Thanks for the review Yongjun!

> File not closed if streamer fail with DSQuotaExceededException
> --
>
> Key: HDFS-13164
> URL: https://issues.apache.org/jira/browse/HDFS-13164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-13164.01.patch, HDFS-13164.02.patch
>
>
>  This is found during yarn log aggregation but theoretically could happen to 
> any client.
> If the dir's space quota is exceeded, the following would happen when a file 
> is created:
>  - client {{startFile}} rpc to NN, gets a {{DFSOutputStream}}.
>  - writing to the stream would trigger the streamer to {{getAdditionalBlock}} 
> rpc to NN, which would get the DSQuotaExceededException
>  - client closes the stream
>   
>  The fact that this would leave a 0-sized (or whatever size left in the 
> quota) file in HDFS is beyond the scope of this jira. However, the file would 
> be left in openforwrite status (shown in {{fsck -openforwrite)}} at least, 
> and could potentially leak leaseRenewer too.
> This is because in the close implementation,
>  # {{isClosed}} is first checked, and the close call will be a no-op if 
> {{isClosed == true}}.
>  # {{flushInternal}} checks {{isClosed}}, and throws the exception right away 
> if true
> {{isClosed}} does this: {{return closed || getStreamer().streamerClosed;}}
> When the disk quota is reached, {{getAdditionalBlock}} will throw when the 
> streamer calls. Because the streamer runs in a separate thread, at the time 
> the client calls close on the stream, the streamer may or may not have 
> reached the Quota exception. If it has, then due to #1, the close call on the 
> stream will be no-op. If it hasn't, then due to #2 the {{completeFile}} logic 
> will be skipped.
> {code:java}
> protected synchronized void closeImpl() throws IOException {
> if (isClosed()) {
>   IOException e = lastException.getAndSet(null);
>   if (e == null)
> return;
>   else
> throw e;
> }
>   try {
> flushBuffer(); // flush from all upper layers
> ...
> flushInternal(); // flush all data to Datanodes
> // get last block before destroying the streamer
> ExtendedBlock lastBlock = getStreamer().getBlock();
> try (TraceScope ignored =
>dfsClient.getTracer().newScope("completeFile")) {
>completeFile(lastBlock);
> }
>} catch (ClosedChannelException ignored) {
>} finally {
>  closeThreads(true);
>}
>  }
>  {code}
> Log snippets:
> {noformat}
> 2018-02-16 15:59:32,916 DEBUG org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Quota Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /DIR is exceeded: quota = 200 B = 1.91 MB but diskspace consumed = 
> 404139552 B = 385.42 MB
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyDiskspaceQuota(DirectoryWithQuotaFeature.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:2124)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1991)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1966)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:463)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3896)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3484)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:686)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> 

[jira] [Updated] (HDFS-13164) File not closed if streamer fail with DSQuotaExceededException

2018-02-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13164:
-
Affects Version/s: (was: 2.8.0)

> File not closed if streamer fail with DSQuotaExceededException
> --
>
> Key: HDFS-13164
> URL: https://issues.apache.org/jira/browse/HDFS-13164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-13164.01.patch, HDFS-13164.02.patch
>
>
>  This is found during yarn log aggregation but theoretically could happen to 
> any client.
> If the dir's space quota is exceeded, the following would happen when a file 
> is created:
>  - client {{startFile}} rpc to NN, gets a {{DFSOutputStream}}.
>  - writing to the stream would trigger the streamer to {{getAdditionalBlock}} 
> rpc to NN, which would get the DSQuotaExceededException
>  - client closes the stream
>   
>  The fact that this would leave a 0-sized (or whatever size left in the 
> quota) file in HDFS is beyond the scope of this jira. However, the file would 
> be left in openforwrite status (shown in {{fsck -openforwrite)}} at least, 
> and could potentially leak leaseRenewer too.
> This is because in the close implementation,
>  # {{isClosed}} is first checked, and the close call will be a no-op if 
> {{isClosed == true}}.
>  # {{flushInternal}} checks {{isClosed}}, and throws the exception right away 
> if true
> {{isClosed}} does this: {{return closed || getStreamer().streamerClosed;}}
> When the disk quota is reached, {{getAdditionalBlock}} will throw when the 
> streamer calls. Because the streamer runs in a separate thread, at the time 
> the client calls close on the stream, the streamer may or may not have 
> reached the Quota exception. If it has, then due to #1, the close call on the 
> stream will be no-op. If it hasn't, then due to #2 the {{completeFile}} logic 
> will be skipped.
> {code:java}
> protected synchronized void closeImpl() throws IOException {
> if (isClosed()) {
>   IOException e = lastException.getAndSet(null);
>   if (e == null)
> return;
>   else
> throw e;
> }
>   try {
> flushBuffer(); // flush from all upper layers
> ...
> flushInternal(); // flush all data to Datanodes
> // get last block before destroying the streamer
> ExtendedBlock lastBlock = getStreamer().getBlock();
> try (TraceScope ignored =
>dfsClient.getTracer().newScope("completeFile")) {
>completeFile(lastBlock);
> }
>} catch (ClosedChannelException ignored) {
>} finally {
>  closeThreads(true);
>}
>  }
>  {code}
> Log snippets:
> {noformat}
> 2018-02-16 15:59:32,916 DEBUG org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Quota Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /DIR is exceeded: quota = 200 B = 1.91 MB but diskspace consumed = 
> 404139552 B = 385.42 MB
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyDiskspaceQuota(DirectoryWithQuotaFeature.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:2124)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1991)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1966)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:463)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3896)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3484)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:686)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
>

[jira] [Updated] (HDFS-13164) File not closed if streamer fail with DSQuotaExceededException

2018-02-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13164:
-
Fix Version/s: 2.8.4
   3.0.1
   2.9.1
   2.10.0
   3.1.0

> File not closed if streamer fail with DSQuotaExceededException
> --
>
> Key: HDFS-13164
> URL: https://issues.apache.org/jira/browse/HDFS-13164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-13164.01.patch, HDFS-13164.02.patch
>
>
>  This is found during yarn log aggregation but theoretically could happen to 
> any client.
> If the dir's space quota is exceeded, the following would happen when a file 
> is created:
>  - client {{startFile}} rpc to NN, gets a {{DFSOutputStream}}.
>  - writing to the stream would trigger the streamer to {{getAdditionalBlock}} 
> rpc to NN, which would get the DSQuotaExceededException
>  - client closes the stream
>   
>  The fact that this would leave a 0-sized (or whatever size left in the 
> quota) file in HDFS is beyond the scope of this jira. However, the file would 
> be left in openforwrite status (shown in {{fsck -openforwrite)}} at least, 
> and could potentially leak leaseRenewer too.
> This is because in the close implementation,
>  # {{isClosed}} is first checked, and the close call will be a no-op if 
> {{isClosed == true}}.
>  # {{flushInternal}} checks {{isClosed}}, and throws the exception right away 
> if true
> {{isClosed}} does this: {{return closed || getStreamer().streamerClosed;}}
> When the disk quota is reached, {{getAdditionalBlock}} will throw when the 
> streamer calls. Because the streamer runs in a separate thread, at the time 
> the client calls close on the stream, the streamer may or may not have 
> reached the Quota exception. If it has, then due to #1, the close call on the 
> stream will be no-op. If it hasn't, then due to #2 the {{completeFile}} logic 
> will be skipped.
> {code:java}
> protected synchronized void closeImpl() throws IOException {
> if (isClosed()) {
>   IOException e = lastException.getAndSet(null);
>   if (e == null)
> return;
>   else
> throw e;
> }
>   try {
> flushBuffer(); // flush from all upper layers
> ...
> flushInternal(); // flush all data to Datanodes
> // get last block before destroying the streamer
> ExtendedBlock lastBlock = getStreamer().getBlock();
> try (TraceScope ignored =
>dfsClient.getTracer().newScope("completeFile")) {
>completeFile(lastBlock);
> }
>} catch (ClosedChannelException ignored) {
>} finally {
>  closeThreads(true);
>}
>  }
>  {code}
> Log snippets:
> {noformat}
> 2018-02-16 15:59:32,916 DEBUG org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Quota Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /DIR is exceeded: quota = 200 B = 1.91 MB but diskspace consumed = 
> 404139552 B = 385.42 MB
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyDiskspaceQuota(DirectoryWithQuotaFeature.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:2124)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1991)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1966)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:463)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3896)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3484)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:686)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at 

[jira] [Commented] (HDFS-13008) Ozone: Add DN container open/close state to container report

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375008#comment-16375008
 ] 

genericqa commented on HDFS-13008:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 4s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
1s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
8s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
8s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
3s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 44s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
2 unchanged - 0 fixed = 3 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
42s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}136m 18s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}208m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.web.client.TestKeysRatis |
|   | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.namenode.TestTruncateQuotaUpdate |
|   | hadoop.hdfs.server.federation.router.TestRouterSafemode |
|   | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-13008 |
| JIRA Patch URL | 

[jira] [Updated] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-23 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13136:
--
Fix Version/s: 3.0.2

> Avoid taking FSN lock while doing group member lookup for FSD permission check
> --
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: 3.1.0, 3.0.2, 3.2.0
>
> Attachments: HDFS-13136-branch-3.0.001.patch, 
> HDFS-13136-branch-3.0.002.patch, HDFS-13136.001.patch, HDFS-13136.002.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13164) File not closed if streamer fail with DSQuotaExceededException

2018-02-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13164:
-
Summary: File not closed if streamer fail with DSQuotaExceededException  
(was: File not closed if append fail with DSQuotaExceededException)

> File not closed if streamer fail with DSQuotaExceededException
> --
>
> Key: HDFS-13164
> URL: https://issues.apache.org/jira/browse/HDFS-13164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-13164.01.patch, HDFS-13164.02.patch
>
>
>  This is found during yarn log aggregation but theoretically could happen to 
> any client.
> If the dir's space quota is exceeded, the following would happen when a file 
> is created:
>  - client {{startFile}} rpc to NN, gets a {{DFSOutputStream}}.
>  - writing to the stream would trigger the streamer to {{getAdditionalBlock}} 
> rpc to NN, which would get the DSQuotaExceededException
>  - client closes the stream
>   
>  The fact that this would leave a 0-sized (or whatever size left in the 
> quota) file in HDFS is beyond the scope of this jira. However, the file would 
> be left in openforwrite status (shown in {{fsck -openforwrite)}} at least, 
> and could potentially leak leaseRenewer too.
> This is because in the close implementation,
>  # {{isClosed}} is first checked, and the close call will be a no-op if 
> {{isClosed == true}}.
>  # {{flushInternal}} checks {{isClosed}}, and throws the exception right away 
> if true
> {{isClosed}} does this: {{return closed || getStreamer().streamerClosed;}}
> When the disk quota is reached, {{getAdditionalBlock}} will throw when the 
> streamer calls. Because the streamer runs in a separate thread, at the time 
> the client calls close on the stream, the streamer may or may not have 
> reached the Quota exception. If it has, then due to #1, the close call on the 
> stream will be no-op. If it hasn't, then due to #2 the {{completeFile}} logic 
> will be skipped.
> {code:java}
> protected synchronized void closeImpl() throws IOException {
> if (isClosed()) {
>   IOException e = lastException.getAndSet(null);
>   if (e == null)
> return;
>   else
> throw e;
> }
>   try {
> flushBuffer(); // flush from all upper layers
> ...
> flushInternal(); // flush all data to Datanodes
> // get last block before destroying the streamer
> ExtendedBlock lastBlock = getStreamer().getBlock();
> try (TraceScope ignored =
>dfsClient.getTracer().newScope("completeFile")) {
>completeFile(lastBlock);
> }
>} catch (ClosedChannelException ignored) {
>} finally {
>  closeThreads(true);
>}
>  }
>  {code}
> Log snippets:
> {noformat}
> 2018-02-16 15:59:32,916 DEBUG org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Quota Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /DIR is exceeded: quota = 200 B = 1.91 MB but diskspace consumed = 
> 404139552 B = 385.42 MB
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyDiskspaceQuota(DirectoryWithQuotaFeature.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:2124)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1991)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1966)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:463)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3896)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3484)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:686)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   

[jira] [Commented] (HDFS-13180) Implement security for Hadoop Distributed Storage Layer

2018-02-23 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374989#comment-16374989
 ] 

Xiaoyu Yao commented on HDFS-13180:
---

bq. Does it require a KMS service in addition to Kerberos?

No, we won't need KMS for HDSL. SCM will act as CA to manage the certificates 
for DNs. This is covered in the security doc section 3.3.2-3.3.4

> Implement security for Hadoop Distributed Storage Layer 
> 
>
> Key: HDFS-13180
> URL: https://issues.apache.org/jira/browse/HDFS-13180
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, ozone
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
> Attachments: HadoopStorageLayerSecurity.pdf
>
>
> In HDFS-7240, we have created a scalable block layer that facilitates 
> separation of namespace and block layer.  Hadoop Distributed Storage Layer 
> (HDSL) allows us to scale HDFS(HDFS-10419) and as well as create ozone 
> (HDFS-13074).
> This JIRA is an umbrella JIRA that tracks the security-related work items for 
> Hadoop Distributed Storage Layer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-23 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374987#comment-16374987
 ] 

Shashikant Banerjee commented on HDFS-13102:


Thanks [~szetszwo], for the review. As per our offline discussion, i have 
updated the patch.

Please have a look.

> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch, HDFS-13102.004.patch, HDFS-13102.005.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-23 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-13102:
---
Attachment: HDFS-13102.005.patch

> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch, HDFS-13102.004.patch, HDFS-13102.005.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13163) Move invalidated blocks to replica-trash with disk layout based on timestamp

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13163:
--
Attachment: HDFS-13163-HDFS-12996.01.patch

> Move invalidated blocks to replica-trash with disk layout based on timestamp
> 
>
> Key: HDFS-13163
> URL: https://issues.apache.org/jira/browse/HDFS-13163
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13163-HDFS-12996.00.patch, 
> HDFS-13163-HDFS-12996.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13163) Move invalidated blocks to replica-trash with disk layout based on timestamp

2018-02-23 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374973#comment-16374973
 ] 

Bharat Viswanadham commented on HDFS-13163:
---

Rebased the patch, to apply to HDFS-12996.

> Move invalidated blocks to replica-trash with disk layout based on timestamp
> 
>
> Key: HDFS-13163
> URL: https://issues.apache.org/jira/browse/HDFS-13163
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13163-HDFS-12996.00.patch, 
> HDFS-13163-HDFS-12996.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13150) Create fast path for SbNN tailing edits from JNs

2018-02-23 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-13150:
---
Attachment: edit-tailing-fast-path-design-v0.pdf

> Create fast path for SbNN tailing edits from JNs
> 
>
> Key: HDFS-13150
> URL: https://issues.apache.org/jira/browse/HDFS-13150
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, journal-node, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: edit-tailing-fast-path-design-v0.pdf
>
>
> In the interest of making coordinated/consistent reads easier to complete 
> with low latency, it is advantageous to reduce the time between when a 
> transaction is applied on the ANN and when it is applied on the SbNN. We 
> propose adding a new "fast path" which can be used to tail edits when low 
> latency is desired. We leave the existing tailing logic in place, and fall 
> back to this path on startup, recovery, and when the fast path encounters 
> unrecoverable errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-13150) Create fast path for SbNN tailing edits from JNs

2018-02-23 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-13150 started by Erik Krogen.
--
> Create fast path for SbNN tailing edits from JNs
> 
>
> Key: HDFS-13150
> URL: https://issues.apache.org/jira/browse/HDFS-13150
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, journal-node, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: edit-tailing-fast-path-design-v0.pdf
>
>
> In the interest of making coordinated/consistent reads easier to complete 
> with low latency, it is advantageous to reduce the time between when a 
> transaction is applied on the ANN and when it is applied on the SbNN. We 
> propose adding a new "fast path" which can be used to tail edits when low 
> latency is desired. We leave the existing tailing logic in place, and fall 
> back to this path on startup, recovery, and when the fast path encounters 
> unrecoverable errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13150) Create fast path for SbNN tailing edits from JNs

2018-02-23 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374964#comment-16374964
 ] 

Erik Krogen commented on HDFS-13150:


Attached a design document detailing the proposal, comments welcomed!

cc [~csun]

> Create fast path for SbNN tailing edits from JNs
> 
>
> Key: HDFS-13150
> URL: https://issues.apache.org/jira/browse/HDFS-13150
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, journal-node, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: edit-tailing-fast-path-design-v0.pdf
>
>
> In the interest of making coordinated/consistent reads easier to complete 
> with low latency, it is advantageous to reduce the time between when a 
> transaction is applied on the ANN and when it is applied on the SbNN. We 
> propose adding a new "fast path" which can be used to tail edits when low 
> latency is desired. We leave the existing tailing logic in place, and fall 
> back to this path on startup, recovery, and when the fast path encounters 
> unrecoverable errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13181) DiskBalancer: Add an configuration for valid plan hours

2018-02-23 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374959#comment-16374959
 ] 

Bharat Viswanadham commented on HDFS-13181:
---

Thanks [~arpitagarwal] and [~ajayydv] for review.

I have addressed the review comments in v01 patch.

> DiskBalancer: Add an configuration for valid plan hours 
> 
>
> Key: HDFS-13181
> URL: https://issues.apache.org/jira/browse/HDFS-13181
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13181.00.patch, HDFS-13181.01.patch
>
>
> Add a configuration for valid plan hours, instead of constant 24 hours in the 
> code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13181) DiskBalancer: Add an configuration for valid plan hours

2018-02-23 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-13181:
--
Attachment: HDFS-13181.01.patch

> DiskBalancer: Add an configuration for valid plan hours 
> 
>
> Key: HDFS-13181
> URL: https://issues.apache.org/jira/browse/HDFS-13181
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13181.00.patch, HDFS-13181.01.patch
>
>
> Add a configuration for valid plan hours, instead of constant 24 hours in the 
> code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13180) Implement security for Hadoop Distributed Storage Layer

2018-02-23 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374953#comment-16374953
 ] 

Konstantin Shvachko commented on HDFS-13180:


??The main difference between HDFS’s block tokens and HDSL’s block tokens is 
the shift to using public-private key pairs instead of a shared secret. The end 
user will experience no difference in the security model.??
If public-private key replaces a current use of secrets for block tokens how 
will the key management be handled? Does it require a KMS service in addition 
to Kerberos?

> Implement security for Hadoop Distributed Storage Layer 
> 
>
> Key: HDFS-13180
> URL: https://issues.apache.org/jira/browse/HDFS-13180
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, ozone
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
> Attachments: HadoopStorageLayerSecurity.pdf
>
>
> In HDFS-7240, we have created a scalable block layer that facilitates 
> separation of namespace and block layer.  Hadoop Distributed Storage Layer 
> (HDSL) allows us to scale HDFS(HDFS-10419) and as well as create ozone 
> (HDFS-13074).
> This JIRA is an umbrella JIRA that tracks the security-related work items for 
> Hadoop Distributed Storage Layer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13162) Create Replica Trash directory on DN startup

2018-02-23 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13162:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Create Replica Trash directory on DN startup
> 
>
> Key: HDFS-13162
> URL: https://issues.apache.org/jira/browse/HDFS-13162
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13162-HDFS-12996.01.patch, 
> HDFS-13162-HDFS-12996.02.patch, HDFS-13162-HDFS-12996.03.patch, 
> HDFS-13162.00-HDFS-12996.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2018-02-23 Thread Ajay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374932#comment-16374932
 ] 

Ajay Kumar commented on HDFS-13183:
---

[~hexiaoqiao], this is good start but it will not redirect {{getBlocks}} 
requests to ANN. It would be good if in HA mode, ANN redirect all calls to SNN 
or signals client to direct these calls to SNN through appropriate IOE.

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, namenode
>Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13183-trunk.001.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13164) File not closed if append fail with DSQuotaExceededException

2018-02-23 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374929#comment-16374929
 ] 

Xiao Chen commented on HDFS-13164:
--

Failed tests does not look related. Ran them all twice locally, passed. 
Committing

> File not closed if append fail with DSQuotaExceededException
> 
>
> Key: HDFS-13164
> URL: https://issues.apache.org/jira/browse/HDFS-13164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-13164.01.patch, HDFS-13164.02.patch
>
>
>  This is found during yarn log aggregation but theoretically could happen to 
> any client.
> If the dir's space quota is exceeded, the following would happen when a file 
> is created:
>  - client {{startFile}} rpc to NN, gets a {{DFSOutputStream}}.
>  - writing to the stream would trigger the streamer to {{getAdditionalBlock}} 
> rpc to NN, which would get the DSQuotaExceededException
>  - client closes the stream
>   
>  The fact that this would leave a 0-sized (or whatever size left in the 
> quota) file in HDFS is beyond the scope of this jira. However, the file would 
> be left in openforwrite status (shown in {{fsck -openforwrite)}} at least, 
> and could potentially leak leaseRenewer too.
> This is because in the close implementation,
>  # {{isClosed}} is first checked, and the close call will be a no-op if 
> {{isClosed == true}}.
>  # {{flushInternal}} checks {{isClosed}}, and throws the exception right away 
> if true
> {{isClosed}} does this: {{return closed || getStreamer().streamerClosed;}}
> When the disk quota is reached, {{getAdditionalBlock}} will throw when the 
> streamer calls. Because the streamer runs in a separate thread, at the time 
> the client calls close on the stream, the streamer may or may not have 
> reached the Quota exception. If it has, then due to #1, the close call on the 
> stream will be no-op. If it hasn't, then due to #2 the {{completeFile}} logic 
> will be skipped.
> {code:java}
> protected synchronized void closeImpl() throws IOException {
> if (isClosed()) {
>   IOException e = lastException.getAndSet(null);
>   if (e == null)
> return;
>   else
> throw e;
> }
>   try {
> flushBuffer(); // flush from all upper layers
> ...
> flushInternal(); // flush all data to Datanodes
> // get last block before destroying the streamer
> ExtendedBlock lastBlock = getStreamer().getBlock();
> try (TraceScope ignored =
>dfsClient.getTracer().newScope("completeFile")) {
>completeFile(lastBlock);
> }
>} catch (ClosedChannelException ignored) {
>} finally {
>  closeThreads(true);
>}
>  }
>  {code}
> Log snippets:
> {noformat}
> 2018-02-16 15:59:32,916 DEBUG org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Quota Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /DIR is exceeded: quota = 200 B = 1.91 MB but diskspace consumed = 
> 404139552 B = 385.42 MB
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyDiskspaceQuota(DirectoryWithQuotaFeature.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:2124)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1991)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1966)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:463)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3896)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3484)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:686)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at 

[jira] [Commented] (HDFS-13055) Aggregate usage statistics from datanodes

2018-02-23 Thread Ajay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374927#comment-16374927
 ] 

Ajay Kumar commented on HDFS-13055:
---

v4 patch rebased with trunk.

> Aggregate usage statistics from datanodes
> -
>
> Key: HDFS-13055
> URL: https://issues.apache.org/jira/browse/HDFS-13055
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDFS-13055.001.patch, HDFS-13055.002.patch, 
> HDFS-13055.003.patch, HDFS-13055.004.patch
>
>
> We collect variety of statistics in DataNodes and expose them via JMX. 
> Aggregating some of the high level statistics which we are already collecting 
> in {{DataNodeMetrics}} (like bytesRead,bytesWritten etc) over a configurable 
> time window will create a central repository accessible via JMX and UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13055) Aggregate usage statistics from datanodes

2018-02-23 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDFS-13055:
--
Attachment: HDFS-13055.004.patch

> Aggregate usage statistics from datanodes
> -
>
> Key: HDFS-13055
> URL: https://issues.apache.org/jira/browse/HDFS-13055
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDFS-13055.001.patch, HDFS-13055.002.patch, 
> HDFS-13055.003.patch, HDFS-13055.004.patch
>
>
> We collect variety of statistics in DataNodes and expose them via JMX. 
> Aggregating some of the high level statistics which we are already collecting 
> in {{DataNodeMetrics}} (like bytesRead,bytesWritten etc) over a configurable 
> time window will create a central repository accessible via JMX and UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13055) Aggregate usage statistics from datanodes

2018-02-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374910#comment-16374910
 ] 

genericqa commented on HDFS-13055:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-13055 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13055 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911616/HDFS-13055.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23173/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Aggregate usage statistics from datanodes
> -
>
> Key: HDFS-13055
> URL: https://issues.apache.org/jira/browse/HDFS-13055
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDFS-13055.001.patch, HDFS-13055.002.patch, 
> HDFS-13055.003.patch
>
>
> We collect variety of statistics in DataNodes and expose them via JMX. 
> Aggregating some of the high level statistics which we are already collecting 
> in {{DataNodeMetrics}} (like bytesRead,bytesWritten etc) over a configurable 
> time window will create a central repository accessible via JMX and UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13162) Create Replica Trash directory on DN startup

2018-02-23 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374904#comment-16374904
 ] 

Hanisha Koneru commented on HDFS-13162:
---

Committed to feature branch - HDFS-12996.
Thanks for the contribution [~bharatviswa] and for the reviews [~nandakumar131].

> Create Replica Trash directory on DN startup
> 
>
> Key: HDFS-13162
> URL: https://issues.apache.org/jira/browse/HDFS-13162
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13162-HDFS-12996.01.patch, 
> HDFS-13162-HDFS-12996.02.patch, HDFS-13162-HDFS-12996.03.patch, 
> HDFS-13162.00-HDFS-12996.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13162) Create Replica Trash directory on DN startup

2018-02-23 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374901#comment-16374901
 ] 

Hanisha Koneru commented on HDFS-13162:
---

Thanks Bharat. 
+1 for patch v03. Test failures look unrelated and pass locally.

> Create Replica Trash directory on DN startup
> 
>
> Key: HDFS-13162
> URL: https://issues.apache.org/jira/browse/HDFS-13162
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13162-HDFS-12996.01.patch, 
> HDFS-13162-HDFS-12996.02.patch, HDFS-13162-HDFS-12996.03.patch, 
> HDFS-13162.00-HDFS-12996.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13181) DiskBalancer: Add an configuration for valid plan hours

2018-02-23 Thread Ajay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374875#comment-16374875
 ] 

Ajay Kumar commented on HDFS-13181:
---

[~bharatviswa], thanks for working on this. Some minor suggestions:
# Instead of restricting valid interval in hours, it would be better to give 
users flexibility to specify it in any time unit.
#*  dfs.disk.balancer.valid.plan.hours to dfs.disk.balancer.valid.plan.interval
#* Use {{Configuration#getTimeDuration}} to get time interval.
# We can simplify the test case a bit using new LambadaTestUtils. Something like
{code}LambdaTestUtils.intercept(RemoteException.class, "DiskBalancerException",
  () -> { runCommand(cmdLine, hdfsConf, finalMiniCluster); 
});{code}

> DiskBalancer: Add an configuration for valid plan hours 
> 
>
> Key: HDFS-13181
> URL: https://issues.apache.org/jira/browse/HDFS-13181
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13181.00.patch
>
>
> Add a configuration for valid plan hours, instead of constant 24 hours in the 
> code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >