[jira] [Comment Edited] (HDFS-15315) IOException on close() when using Erasure Coding

2020-05-20 Thread Zhao Yi Ming (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112777#comment-17112777
 ] 

Zhao Yi Ming edited comment on HDFS-15315 at 5/21/20, 3:51 AM:
---

Assign to me have a try.


was (Author: zhaoyim):
Assign to me.

> IOException on close() when using Erasure Coding
> 
>
> Key: HDFS-15315
> URL: https://issues.apache.org/jira/browse/HDFS-15315
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: 3.1.1, ec, hdfs
>Affects Versions: 3.1.1
> Environment: XOR-2-1-1024k policy on hadoop 3.1.1 with 3 datanodes
>Reporter: Anshuman Singh
>Assignee: Zhao Yi Ming
>Priority: Major
>
> When using Erasure Coding policy on a directory, the replication factor is 
> set to 1. Solr fails in indexing documents with error - _java.io.IOException: 
> Unable to close file because the last block does not have enough number of 
> replicas._ It works fine without EC (with replication factor as 3.) It seems 
> to be identical to this issue. [ 
> https://issues.apache.org/jira/browse/HDFS-11486|https://issues.apache.org/jira/browse/HDFS-11486]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15315) IOException on close() when using Erasure Coding

2020-05-20 Thread Zhao Yi Ming (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112777#comment-17112777
 ] 

Zhao Yi Ming commented on HDFS-15315:
-

Assign to me.

> IOException on close() when using Erasure Coding
> 
>
> Key: HDFS-15315
> URL: https://issues.apache.org/jira/browse/HDFS-15315
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: 3.1.1, ec, hdfs
>Affects Versions: 3.1.1
> Environment: XOR-2-1-1024k policy on hadoop 3.1.1 with 3 datanodes
>Reporter: Anshuman Singh
>Assignee: Zhao Yi Ming
>Priority: Major
>
> When using Erasure Coding policy on a directory, the replication factor is 
> set to 1. Solr fails in indexing documents with error - _java.io.IOException: 
> Unable to close file because the last block does not have enough number of 
> replicas._ It works fine without EC (with replication factor as 3.) It seems 
> to be identical to this issue. [ 
> https://issues.apache.org/jira/browse/HDFS-11486|https://issues.apache.org/jira/browse/HDFS-11486]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15315) IOException on close() when using Erasure Coding

2020-05-20 Thread Zhao Yi Ming (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yi Ming reassigned HDFS-15315:
---

Assignee: Zhao Yi Ming

> IOException on close() when using Erasure Coding
> 
>
> Key: HDFS-15315
> URL: https://issues.apache.org/jira/browse/HDFS-15315
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: 3.1.1, ec, hdfs
>Affects Versions: 3.1.1
> Environment: XOR-2-1-1024k policy on hadoop 3.1.1 with 3 datanodes
>Reporter: Anshuman Singh
>Assignee: Zhao Yi Ming
>Priority: Major
>
> When using Erasure Coding policy on a directory, the replication factor is 
> set to 1. Solr fails in indexing documents with error - _java.io.IOException: 
> Unable to close file because the last block does not have enough number of 
> replicas._ It works fine without EC (with replication factor as 3.) It seems 
> to be identical to this issue. [ 
> https://issues.apache.org/jira/browse/HDFS-11486|https://issues.apache.org/jira/browse/HDFS-11486]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15367) Fail to get file checksum even if there's an available replica.

2020-05-20 Thread YCozy (Jira)
YCozy created HDFS-15367:


 Summary: Fail to get file checksum even if there's an available 
replica.
 Key: HDFS-15367
 URL: https://issues.apache.org/jira/browse/HDFS-15367
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: dfsclient, namenode
Affects Versions: 2.10.0
Reporter: YCozy


DFSClient can fail to get file checksum even when there's an available replica. 
One possible triggering process of the bug is as follows:
 * Start a cluster with three DNs (DN1, DN2, DN3). The default replication 
factor is set to 2.
 * Both DN1 and DN3 register with NN, as can be seen from NN's log (DN1 uses 
port 9866 while DN3 uses port 9666):

{noformat}
2020-05-21 01:24:57,196 INFO org.apache.hadoop.net.NetworkTopology: Adding a 
new node: /default-rack/127.0.0.1:9866
2020-05-21 01:25:06,155 INFO org.apache.hadoop.net.NetworkTopology: Adding a 
new node: /default-rack/127.0.0.1:9666{noformat}
 * DN1 sends block report to NN, as can be seen from NN's log:

{noformat}
2020-05-21 01:24:57,336 INFO BlockStateChange: BLOCK* processReport 
0x3ae7e5805f2e704e: from storage DS-638ee5ae-e435-4d82-ae4f-9066bc7eb850 node 
DatanodeRegistration(127.0.0.1:9866, 
datanodeUuid=b0702574-968f-4817-a660-42ec1c475606, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-75860997-47d0-4957-a4e6-4edbd79d64b8;nsid=49920454;c=1590024277030),
 blocks: 0, hasStaleStorage: false, processing time: 3 msecs, 
invalidatedBlocks: 0{noformat}
 * DN3 fails to send the block report to NN because of a network partition. We 
inject network partition to fail DN3's blockReport RPC. Also, NN's log does not 
contain the "processReport" entry for DN3.
 * DFSClient uploads a file. NN chooses DN1 and DN3 to host the replicas. The 
network partition on DN3 stops, so the file is uploaded successfully. This can 
be verified by NN's log:

{noformat}
2020-05-21 01:25:13,644 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocate blk_1073741825_1001, replicas=127.0.0.1:9666, 127.0.0.1:9866 for 
/dir1/file1._COPYING_{noformat}
 * Stop DN1, as can be seen from DN1's log:

{noformat}
2020-05-21 01:25:21,114 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
SHUTDOWN_MSG:{noformat}
 * DFSClient tries to get the file checksum. It fails to connect to DN1 and 
gives up. The bug is triggered.

{noformat}
20/05/21 01:25:34 INFO hdfs.DFSClient: Connecting to datanode 127.0.0.1:9866
20/05/21 01:25:34 WARN hdfs.DFSClient: src=/dir1/file1, 
datanodes[0]=DatanodeInfoWithStorage[127.0.0.1:9866,DS-638ee5ae-e435-4d82-ae4f-9066bc7eb850,DISK]
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at org.apache.hadoop.hdfs.DFSClient.connectToDN(DFSClient.java:1925)
        at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1798)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1638)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1635)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1646)
        at 
org.apache.hadoop.fs.shell.Display$Checksum.processPath(Display.java:199)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:327)
        at 
org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:299)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:281)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:265)
        at 
org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:175)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:317)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:380)
checksum: Fail to get block MD5 for 
BP-2092781073-172.17.0.4-1590024277030:blk_1073741825_1001{noformat}
Since DN3 also has a replica of the file, DFSClient should try to contact DN3 
to get the checksum.

To verify that DFSClient didn't connect to DN3, we changed the DEBUG log in 
DFSClient.connectToDN() to INFO log. From the above error messages printed by 
DFSClient we can see that the DFSClient only tries to connect to DN1.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HDFS-15353) Use sudo instead of su to allow nologin user for secure DataNode

2020-05-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112699#comment-17112699
 ] 

Hudson commented on HDFS-15353:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18283 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18283/])
HDFS-15353. Use sudo instead of su to allow nologin user for secure (github: 
rev 1a3c6bb33b615242506a0313a24527ca51a3d665)
* (edit) hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh


> Use sudo instead of su to allow nologin user for secure DataNode
> 
>
> Key: HDFS-15353
> URL: https://issues.apache.org/jira/browse/HDFS-15353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, security
>Reporter: Akira Ajisaka
>Assignee: Kei Kori
>Priority: Major
> Fix For: 3.4.0
>
>
> When launching secure DataNode, su command fails in hadoop-functions.sh if 
> the login shell of the secure user (hdfs) is /sbin/nologin. Can we use sudo 
> command instead of su to fix this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15353) Use sudo instead of su to allow nologin user for secure DataNode

2020-05-20 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15353:
-
Fix Version/s: 3.4.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged the PR into trunk. Thank you [~kkori].

> Use sudo instead of su to allow nologin user for secure DataNode
> 
>
> Key: HDFS-15353
> URL: https://issues.apache.org/jira/browse/HDFS-15353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, security
>Reporter: Akira Ajisaka
>Assignee: Kei Kori
>Priority: Major
> Fix For: 3.4.0
>
>
> When launching secure DataNode, su command fails in hadoop-functions.sh if 
> the login shell of the secure user (hdfs) is /sbin/nologin. Can we use sudo 
> command instead of su to fix this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15353) Use sudo instead of su to allow nologin user for secure DataNode

2020-05-20 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15353:
-
Hadoop Flags: Reviewed
 Summary: Use sudo instead of su to allow nologin user for secure 
DataNode  (was: Allow nologin hdfs user to run secure DataNode)

> Use sudo instead of su to allow nologin user for secure DataNode
> 
>
> Key: HDFS-15353
> URL: https://issues.apache.org/jira/browse/HDFS-15353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, security
>Reporter: Akira Ajisaka
>Assignee: Kei Kori
>Priority: Major
>
> When launching secure DataNode, su command fails in hadoop-functions.sh if 
> the login shell of the secure user (hdfs) is /sbin/nologin. Can we use sudo 
> command instead of su to fix this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15365) [RBF] findMatching method return wrong result

2020-05-20 Thread liuyanyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112692#comment-17112692
 ] 

liuyanyu commented on HDFS-15365:
-

[~ayushtkn] Sorry, I did not check the open source code, will close this jira.

> [RBF] findMatching method return wrong result
> -
>
> Key: HDFS-15365
> URL: https://issues.apache.org/jira/browse/HDFS-15365
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.1.1
>Reporter: liuyanyu
>Assignee: liuyanyu
>Priority: Major
> Attachments: image-2020-05-20-11-42-12-763.png, 
> image-2020-05-20-11-55-34-115.png
>
>
> A mount table /hacluster_root -> hdfs://haclsuter/ is setted on the cluster, 
> as follows:
> !image-2020-05-20-11-42-12-763.png!
> when I used 
> org.apache.hadoop.hdfs.server.federation.router.FederationUtil#findMatching 
> to find path hdfs://hacluster/yz0516/abc,the result of 
> FederationUtil.findMatching(mountTableEntries.iterator(),
> /yz0516/abc, hacluster) is /hacluster_root/yz0516/hacluster_root/abc. This is 
> wrong. The correct result should be /hacluster_root/yz0516/abc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15250) Setting `dfs.client.use.datanode.hostname` to true can crash the system because of unhandled UnresolvedAddressException

2020-05-20 Thread Andrey Elenskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112594#comment-17112594
 ] 

Andrey Elenskiy commented on HDFS-15250:


We've run into the same issue on 3.1.3 and ended up getting 
UnresolvedAddressException propagated all the way to clients (readers and 
writers) even if only one block location was not able to be resolved. So the 
entire read/write fails if on datanode from pipeline causes 
UnresolvedAddressException.

I see the patch doesn't actually handle this exception but just logs in TRACE 
and rethrows it, so would we expect to see the same problem?

I can also try out the patch on our system as it's fairly easy to reproduce in 
case you think this change is enough.

> Setting `dfs.client.use.datanode.hostname` to true can crash the system 
> because of unhandled UnresolvedAddressException
> ---
>
> Key: HDFS-15250
> URL: https://issues.apache.org/jira/browse/HDFS-15250
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ctest
>Assignee: Ctest
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
> Attachments: HDFS-15250-001.patch, HDFS-15250-002.patch
>
>
> *Problem:*
> `dfs.client.use.datanode.hostname` by default is set to false, which means 
> the client will use the IP address of the datanode to connect to the 
> datanode, rather than the hostname of the datanode.
> In `org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer`:
>  
> {code:java}
>  try {
>    Peer peer = remotePeerFactory.newConnectedPeer(inetSocketAddress, token,
>    datanode);
>    LOG.trace("nextTcpPeer: created newConnectedPeer {}", peer);
>    return new BlockReaderPeer(peer, false);
>  } catch (IOException e) {
>    LOG.trace("nextTcpPeer: failed to create newConnectedPeer connected to"
>    + "{}", datanode);
>    throw e;
>  }
> {code}
>  
> If `dfs.client.use.datanode.hostname` is false, then it will try to connect 
> via IP address. If the IP address is illegal and the connection fails, 
> IOException will be thrown from `newConnectedPeer` and be handled.
> If `dfs.client.use.datanode.hostname` is true, then it will try to connect 
> via hostname. If the hostname cannot be resolved, UnresolvedAddressException 
> will be thrown from `newConnectedPeer`. However, UnresolvedAddressException 
> is not a subclass of IOException so `nextTcpPeer` doesn’t handle this 
> exception at all. This unhandled exception could crash the system.
>  
> *Solution:*
> Since the method is handling the illegal IP address, then the illegal 
> hostname should be also handled as well. One solution is to add the handling 
> logic in `nextTcpPeer`:
> {code:java}
>  } catch (IOException e) {
>    LOG.trace("nextTcpPeer: failed to create newConnectedPeer connected to"
>    + "{}", datanode);
>    throw e;
>  } catch (UnresolvedAddressException e) {
>    ... // handling logic 
>  }{code}
> I am very happy to provide a patch to do this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15366) Active NameNode went down with NPE

2020-05-20 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112485#comment-17112485
 ] 

Xiaoqiao He commented on HDFS-15366:


HDFS-12832 could solve this issue. The root cause is that access inode 
information in directory tree without holding global namesystem lock. FYI.

> Active NameNode went down with NPE
> --
>
> Key: HDFS-15366
> URL: https://issues.apache.org/jira/browse/HDFS-15366
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: sarun singla
>Priority: Critical
>
> {code:java}
> 2020-05-12 00:31:54,565 ERROR blockmanagement.BlockManager 
> (BlockManager.java:run(3816)) - ReplicationMonitor thread received Runtime 
> exception.
> java.lang.NullPointerException
>  at org.apache.hadoop.hdfs.server.namenode.INode.getParent(INode.java:629)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getRelativePathINodes(FSDirectory.java:1009)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathINodes(FSDirectory.java:1015)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1020)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:591)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:550)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3912)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3875)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1560)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1452)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3847)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3799)
>  at java.lang.Thread.run(Thread.java:748)
> 2020-05-12 00:31:54,567 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2020-05-12 00:31:54,621 INFO namenode.NameNode (LogAdapter.java:info(47)) - 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at xyz.com/xxx
> /{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15280) Datanode delay random time to report block if BlockManager is busy

2020-05-20 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15280:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Datanode delay random time to report block if BlockManager is busy
> --
>
> Key: HDFS-15280
> URL: https://issues.apache.org/jira/browse/HDFS-15280
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15280.001.diff, HDFS-15280.002.patch
>
>
> When many Datanodes are reporting at the same time, the cluster may respond 
> slowly. Limit the concurrent reporting number. If BlockManager is busy, it 
> rejects  new request and the Datanode delay a few random time to report.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15280) Datanode delay random time to report block if BlockManager is busy

2020-05-20 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112468#comment-17112468
 ] 

Xiaoqiao He commented on HDFS-15280:


Hi [~hadoop_yangyun], I will close this JIRA if HDFS-7923 could solve your 
case. Please feel free to reopen it or file another new ticket if you would 
like to improve this feature.

> Datanode delay random time to report block if BlockManager is busy
> --
>
> Key: HDFS-15280
> URL: https://issues.apache.org/jira/browse/HDFS-15280
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15280.001.diff, HDFS-15280.002.patch
>
>
> When many Datanodes are reporting at the same time, the cluster may respond 
> slowly. Limit the concurrent reporting number. If BlockManager is busy, it 
> rejects  new request and the Datanode delay a few random time to report.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-20 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112463#comment-17112463
 ] 

Xiaoqiao He commented on HDFS-13183:


Thanks [~Jim_Brennan] for your information. I think TestBalancer should not be 
affected even with this feature, because configuration 
`dfs.ha.allow.stale.reads` not set true in TestBalancer and all logic will keep 
the same. Actually I try to instead `Balancer.run(namenodes, 
BalancerParameters.DEFAULT, conf);` with `Balancer.run(namenodes, nsIds, 
BalancerParameters.DEFAULT, conf);` offline, and all case of TestBalancer could 
pass at local.
The new addendum patch just improve log output.
About failed unit test 
{{TestBalancerWithHANameNodes.testBalancerWithObserver}}, I try to run many 
times at local without the changes, it is also low-probability failure, And I 
try to trace the code execution path, it looks no difference between patch and 
no patch. [~xkrogen] Would you mind to have another check?

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, 
> HDFS-13183.addendum.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-20 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-13183:
---
Attachment: (was: HDFS-13183.addendum.patch)

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, 
> HDFS-13183.addendum.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-20 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-13183:
---
Attachment: HDFS-13183.addendum.patch

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, 
> HDFS-13183.addendum.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-20 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-13183:
---
Attachment: HDFS-13183.addendum.patch

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, 
> HDFS-13183.addendum.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15366) Active NameNode went down with NPE

2020-05-20 Thread sarun singla (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sarun singla updated HDFS-15366:

Priority: Critical  (was: Major)

> Active NameNode went down with NPE
> --
>
> Key: HDFS-15366
> URL: https://issues.apache.org/jira/browse/HDFS-15366
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: sarun singla
>Priority: Critical
>
> {code:java}
> 2020-05-12 00:31:54,565 ERROR blockmanagement.BlockManager 
> (BlockManager.java:run(3816)) - ReplicationMonitor thread received Runtime 
> exception.
> java.lang.NullPointerException
>  at org.apache.hadoop.hdfs.server.namenode.INode.getParent(INode.java:629)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getRelativePathINodes(FSDirectory.java:1009)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathINodes(FSDirectory.java:1015)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1020)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:591)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:550)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3912)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3875)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1560)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1452)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3847)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3799)
>  at java.lang.Thread.run(Thread.java:748)
> 2020-05-12 00:31:54,567 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2020-05-12 00:31:54,621 INFO namenode.NameNode (LogAdapter.java:info(47)) - 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at xyz.com/xxx
> /{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15366) Active NameNode went down with NPE

2020-05-20 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112422#comment-17112422
 ] 

Wei-Chiu Chuang commented on HDFS-15366:


To add more color,

I did dig out this old stacktrace in a unit test 4 years ago. Post here for 
future reference
{noformat}
java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.namenode.INode.getParent(INode.java:660)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.getStoragePolicyID(INodeFile.java:392)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4011)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$300(BlockManager.java:3976)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1478)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1384)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3947)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3818)
{noformat}

It looks similar but not exactly the same. Notice that the 
computeReplicationWork holds namesystem lock but not fsdirectory lock. My hunch 
is there was a parallel thread that held fsdirectory lock but no namesystem 
lock that deleted the inode at the same time.

It only occur once in four years to me. Looks a very rare race condition bug.

> Active NameNode went down with NPE
> --
>
> Key: HDFS-15366
> URL: https://issues.apache.org/jira/browse/HDFS-15366
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: sarun singla
>Priority: Major
>
> {code:java}
> 2020-05-12 00:31:54,565 ERROR blockmanagement.BlockManager 
> (BlockManager.java:run(3816)) - ReplicationMonitor thread received Runtime 
> exception.
> java.lang.NullPointerException
>  at org.apache.hadoop.hdfs.server.namenode.INode.getParent(INode.java:629)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getRelativePathINodes(FSDirectory.java:1009)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathINodes(FSDirectory.java:1015)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1020)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:591)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:550)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3912)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3875)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1560)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1452)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3847)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3799)
>  at java.lang.Thread.run(Thread.java:748)
> 2020-05-12 00:31:54,567 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2020-05-12 00:31:54,621 INFO namenode.NameNode (LogAdapter.java:info(47)) - 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at xyz.com/xxx
> /{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15366) Active NameNode went down with NPE

2020-05-20 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112409#comment-17112409
 ] 

Stephen O'Donnell commented on HDFS-15366:
--

Looking at the code where the NPE occurred (in the version this cluster was 
running), it is:

{code}
  /** @return the parent directory */
  public final INodeDirectory getParent() {
return parent == null? null
: parent.isReference()? getParentReference().getParent(): 
parent.asDirectory(); // NPE on this line
  }

  /**
   * @return the parent as a reference if this is a referred inode;
   * otherwise, return null.
   */
  public INodeReference getParentReference() {
return parent == null || !parent.isReference()? null: 
(INodeReference)parent;
  }
{code}

If parent is null the code handles it, which suggest getParentReference() must 
be returning null. Looking at the code, I don't see how it can return null.

We suspect some sort of race condition here. This problem does not occur 
frequently, but there is clearly a problem somewhere.

> Active NameNode went down with NPE
> --
>
> Key: HDFS-15366
> URL: https://issues.apache.org/jira/browse/HDFS-15366
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: sarun singla
>Priority: Major
>
> {code:java}
> 2020-05-12 00:31:54,565 ERROR blockmanagement.BlockManager 
> (BlockManager.java:run(3816)) - ReplicationMonitor thread received Runtime 
> exception.
> java.lang.NullPointerException
>  at org.apache.hadoop.hdfs.server.namenode.INode.getParent(INode.java:629)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getRelativePathINodes(FSDirectory.java:1009)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathINodes(FSDirectory.java:1015)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1020)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:591)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:550)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3912)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3875)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1560)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1452)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3847)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3799)
>  at java.lang.Thread.run(Thread.java:748)
> 2020-05-12 00:31:54,567 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2020-05-12 00:31:54,621 INFO namenode.NameNode (LogAdapter.java:info(47)) - 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at xyz.com/xxx
> /{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15366) Active NameNode went down with NPE

2020-05-20 Thread sarun singla (Jira)
sarun singla created HDFS-15366:
---

 Summary: Active NameNode went down with NPE
 Key: HDFS-15366
 URL: https://issues.apache.org/jira/browse/HDFS-15366
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.3
Reporter: sarun singla


{code:java}
2020-05-12 00:31:54,565 ERROR blockmanagement.BlockManager 
(BlockManager.java:run(3816)) - ReplicationMonitor thread received Runtime 
exception.
java.lang.NullPointerException
 at org.apache.hadoop.hdfs.server.namenode.INode.getParent(INode.java:629)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getRelativePathINodes(FSDirectory.java:1009)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathINodes(FSDirectory.java:1015)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1020)
 at org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:591)
 at org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:550)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3912)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3875)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1560)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1452)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3847)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3799)
 at java.lang.Thread.run(Thread.java:748)
2020-05-12 00:31:54,567 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - 
Exiting with status 1
2020-05-12 00:31:54,621 INFO namenode.NameNode (LogAdapter.java:info(47)) - 
SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down NameNode at xyz.com/xxx
/{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15093) RENAME.TO_TRASH is ignored When RENAME.OVERWRITE is specified

2020-05-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112293#comment-17112293
 ] 

Ayush Saxena commented on HDFS-15093:
-

Test failures seems not related, Have just tweaked the test in the latest patch.

> RENAME.TO_TRASH is ignored When RENAME.OVERWRITE is specified
> -
>
> Key: HDFS-15093
> URL: https://issues.apache.org/jira/browse/HDFS-15093
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15093-01.patch, HDFS-15093-02.patch, 
> HDFS-15093-03.patch, HDFS-15093-04.patch
>
>
> When Rename Overwrite flag is specified the To_TRASH option gets silently 
> ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15288) Add Available Space Rack Fault Tolerant BPP

2020-05-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112288#comment-17112288
 ] 

Ayush Saxena commented on HDFS-15288:
-

Test failures seems unrelated.
Checkstyle are unavoidable, The config name is little big keeping the same 
semantics as {{AvailableSpaceBlockPlacementPolicy}}

> Add Available Space Rack Fault Tolerant BPP
> ---
>
> Key: HDFS-15288
> URL: https://issues.apache.org/jira/browse/HDFS-15288
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15288-01.patch, HDFS-15288-02.patch, 
> HDFS-15288-03.patch
>
>
> The Present {{AvailableSpaceBlockPlacementPolicy}} extends the Default Block 
> Placement policy, which makes it apt for Replicated files. But not very 
> efficient for EC files, which by default use. 
> {{BlockPlacementPolicyRackFaultTolerant}}. So propose a to add new BPP having 
> similar optimization as ASBPP where as keeping the spread of Blocks to max 
> racks, i.e as RackFaultTolerantBPP.
> This could extend {{BlockPlacementPolicyRackFaultTolerant}}, rather than the 
> {{BlockPlacementPOlicyDefault}} like ASBPP and keep other logics of 
> optimization same as ASBPP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-20 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112277#comment-17112277
 ] 

Jim Brennan commented on HDFS-13183:


Thanks [~hexiaoqiao].  It looks like there are still some failures.
One other note: it's possible TestBalancer did not fail because it uses its own 
copy of doBalance() called runBalancer().  I don't know if it would have failed 
if it was using Balancer.run() instead.  TestBalancerWithNodeGroup uses 
Balancer.run(), which is why it was affected.

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9376) TestSeveralNameNodes fails occasionally

2020-05-20 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112223#comment-17112223
 ] 

Jim Brennan commented on HDFS-9376:
---

Thanks [~iwasakims]!  I figured that was the case.

> TestSeveralNameNodes fails occasionally
> ---
>
> Key: HDFS-9376
> URL: https://issues.apache.org/jira/browse/HDFS-9376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: 3.0.0-alpha1, 2.10.1
>
> Attachments: HDFS-9376.001.patch, HDFS-9376.002.patch
>
>
> TestSeveralNameNodes has been failing in precommit builds.  It usually times 
> out on waiting for the last thread to finish writing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-05-20 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Attachment: (was: HDFS-15098.004.patch)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-05-20 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
Status: Open  (was: Patch Available)

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112111#comment-17112111
 ] 

Hadoop QA commented on HDFS-13183:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
50s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m  0s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}157m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics |
|   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29341/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-13183 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13003492/HDFS-13183.addendum.patch
 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 96d76f88501d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / cef07569294 |
| Default 

[jira] [Commented] (HDFS-15364) Support sort the output according to the number of occurrences of the opcode for StatisticsEditsVisitor

2020-05-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112054#comment-17112054
 ] 

Hadoop QA commented on HDFS-15364:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue}  0m  
0s{color} | {color:blue} markdownlint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
50s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 18 new + 65 unchanged - 4 fixed = 83 total (was 69) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 10s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup |
|   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29340/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15364 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13003480/HDFS-15364.002.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle markdownlint |
| uname | Linux fccc63450b7e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-15365) [RBF] findMatching method return wrong result

2020-05-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112036#comment-17112036
 ] 

Ayush Saxena commented on HDFS-15365:
-

[~rain_lyy] This is in our internal code. I didn't contributed this here. :P
So you can close this Jira here. 

> [RBF] findMatching method return wrong result
> -
>
> Key: HDFS-15365
> URL: https://issues.apache.org/jira/browse/HDFS-15365
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.1.1
>Reporter: liuyanyu
>Assignee: liuyanyu
>Priority: Major
> Attachments: image-2020-05-20-11-42-12-763.png, 
> image-2020-05-20-11-55-34-115.png
>
>
> A mount table /hacluster_root -> hdfs://haclsuter/ is setted on the cluster, 
> as follows:
> !image-2020-05-20-11-42-12-763.png!
> when I used 
> org.apache.hadoop.hdfs.server.federation.router.FederationUtil#findMatching 
> to find path hdfs://hacluster/yz0516/abc,the result of 
> FederationUtil.findMatching(mountTableEntries.iterator(),
> /yz0516/abc, hacluster) is /hacluster_root/yz0516/hacluster_root/abc. This is 
> wrong. The correct result should be /hacluster_root/yz0516/abc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-20 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-13183:
---
Attachment: HDFS-13183.addendum.patch
Status: Patch Available  (was: Reopened)

[~weichiu],[~Jim_Brennan] upload addendum patch and try to trigger yetus again. 
PTAL.
addendum try to select standby namenode when invoke #getBlocks rather than 
create NameNodeConnector instance for each iterator. Unit tests 
{{TestBalancerWithNodeGroup}} passed at local. Let's see what yetus say.

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-20 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He reopened HDFS-13183:


> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-05-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111930#comment-17111930
 ] 

Hadoop QA commented on HDFS-15098:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} prototool {color} | {color:blue}  0m  
0s{color} | {color:blue} prototool was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
1s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
30s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
27s{color} | {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
38s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
49s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 49s{color} | 
{color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} golang {color} | {color:red}  0m 
49s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 49s{color} 
| {color:red} root in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 19s{color} | {color:orange} root: The patch generated 9 new + 211 unchanged 
- 5 fixed = 220 total (was 216) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
31s{color} | {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
40s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 19 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 20 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  0m 
40s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
43s{color} | {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-common in the patch failed. {color} |
| 

[jira] [Comment Edited] (HDFS-12487) FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do the callers

2020-05-20 Thread pengWei Dou (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111917#comment-17111917
 ] 

pengWei Dou edited comment on HDFS-12487 at 5/20/20, 8:24 AM:
--

hi,[~liumihust], when i  used [^HDFS-12487.003.patch] , i find the following 
code in DiskBalancer#getBlockToCopy,
{code:java}
// 
if (block != null) {
...
} else {
}
{code}
 

 so, why do null check in your patch, can you explain it? thanks!


was (Author: doudou):
hi,[~liumihust], when i  used [^HDFS-12487.003.patch] , i find the following 
code in DiskBalancer#getBlockToCopy,
{code:java}
// 
if (block != null) {
...
} else {
}
{code}
 

 so, why do null check in your patch, can you explain is? thanks!

> FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do 
> the callers
> --
>
> Key: HDFS-12487
> URL: https://issues.apache.org/jira/browse/HDFS-12487
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, diskbalancer
>Affects Versions: 3.0.0
> Environment: CentOS 6.8 x64
> CPU:4 core
> Memory:16GB
> Hadoop: Release 3.0.0-alpha4
>Reporter: liumi
>Assignee: liumi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12487.002.patch, HDFS-12487.003.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> BlockIteratorImpl.nextBlock() will look for the blocks in the source volume, 
> if there are no blocks any more, it will return null up to 
> DiskBalancer.getBlockToCopy(). However, the DiskBalancer.getBlockToCopy() 
> will check whether it's a valid block.
> When I look into the FsDatasetSpi.isValidBlock(), I find that it doesn't 
> check the null pointer! In fact, we firstly need to check whether it's null 
> or not, or exception will occur.
> This bug is hard to find, because the DiskBalancer hardly copy all the data 
> of one volume to others. Even if some times we may copy all the data of one 
> volume to other volumes, when the bug occurs, the copy process has already 
> done.
> However, when we try to copy all the data of two or more volumes to other 
> volumes in more than one step, the thread will be shut down, which is caused 
> by the bug above.
> The bug can fixed by two ways:
> 1)Before the call of FsDatasetSpi.isValidBlock(), we check the null pointer
> 2)Check the null pointer inside the implementation of 
> FsDatasetSpi.isValidBlock()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12487) FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do the callers

2020-05-20 Thread pengWei Dou (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111917#comment-17111917
 ] 

pengWei Dou edited comment on HDFS-12487 at 5/20/20, 8:24 AM:
--

hi,[~liumihust], when i  used [^HDFS-12487.003.patch] , i find the following 
code in DiskBalancer#getBlockToCopy,
{code:java}
// 
if (block != null) {
...
} else {
}
{code}
 

 so, why do null check in your patch, can you explain is? thanks!


was (Author: doudou):
hi,[~liumihust], when i  used [^HDFS-12487.003.patch] , i find the following 
code in DiskBalancer#getBlockToCopy,
{code:java}
// if (block != null) {
...
} else {
}
{code}
 

 so, why do null check in your patch, can you explain is? thanks!

> FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do 
> the callers
> --
>
> Key: HDFS-12487
> URL: https://issues.apache.org/jira/browse/HDFS-12487
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, diskbalancer
>Affects Versions: 3.0.0
> Environment: CentOS 6.8 x64
> CPU:4 core
> Memory:16GB
> Hadoop: Release 3.0.0-alpha4
>Reporter: liumi
>Assignee: liumi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12487.002.patch, HDFS-12487.003.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> BlockIteratorImpl.nextBlock() will look for the blocks in the source volume, 
> if there are no blocks any more, it will return null up to 
> DiskBalancer.getBlockToCopy(). However, the DiskBalancer.getBlockToCopy() 
> will check whether it's a valid block.
> When I look into the FsDatasetSpi.isValidBlock(), I find that it doesn't 
> check the null pointer! In fact, we firstly need to check whether it's null 
> or not, or exception will occur.
> This bug is hard to find, because the DiskBalancer hardly copy all the data 
> of one volume to others. Even if some times we may copy all the data of one 
> volume to other volumes, when the bug occurs, the copy process has already 
> done.
> However, when we try to copy all the data of two or more volumes to other 
> volumes in more than one step, the thread will be shut down, which is caused 
> by the bug above.
> The bug can fixed by two ways:
> 1)Before the call of FsDatasetSpi.isValidBlock(), we check the null pointer
> 2)Check the null pointer inside the implementation of 
> FsDatasetSpi.isValidBlock()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12487) FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do the callers

2020-05-20 Thread pengWei Dou (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111917#comment-17111917
 ] 

pengWei Dou commented on HDFS-12487:


hi,[~liumihust], when i  used [^HDFS-12487.003.patch] , i find the following 
code in DiskBalancer#getBlockToCopy,
{code:java}
// if (block != null) {
...
} else {
}
{code}
 

 so, why do null check in your patch, can you explain is? thanks!

> FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do 
> the callers
> --
>
> Key: HDFS-12487
> URL: https://issues.apache.org/jira/browse/HDFS-12487
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, diskbalancer
>Affects Versions: 3.0.0
> Environment: CentOS 6.8 x64
> CPU:4 core
> Memory:16GB
> Hadoop: Release 3.0.0-alpha4
>Reporter: liumi
>Assignee: liumi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12487.002.patch, HDFS-12487.003.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> BlockIteratorImpl.nextBlock() will look for the blocks in the source volume, 
> if there are no blocks any more, it will return null up to 
> DiskBalancer.getBlockToCopy(). However, the DiskBalancer.getBlockToCopy() 
> will check whether it's a valid block.
> When I look into the FsDatasetSpi.isValidBlock(), I find that it doesn't 
> check the null pointer! In fact, we firstly need to check whether it's null 
> or not, or exception will occur.
> This bug is hard to find, because the DiskBalancer hardly copy all the data 
> of one volume to others. Even if some times we may copy all the data of one 
> volume to other volumes, when the bug occurs, the copy process has already 
> done.
> However, when we try to copy all the data of two or more volumes to other 
> volumes in more than one step, the thread will be shut down, which is caused 
> by the bug above.
> The bug can fixed by two ways:
> 1)Before the call of FsDatasetSpi.isValidBlock(), we check the null pointer
> 2)Check the null pointer inside the implementation of 
> FsDatasetSpi.isValidBlock()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-12487) FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do the callers

2020-05-20 Thread pengWei Dou (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengWei Dou updated HDFS-12487:
---
Comment: was deleted

(was: [#anchor]liumi

 )

> FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do 
> the callers
> --
>
> Key: HDFS-12487
> URL: https://issues.apache.org/jira/browse/HDFS-12487
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, diskbalancer
>Affects Versions: 3.0.0
> Environment: CentOS 6.8 x64
> CPU:4 core
> Memory:16GB
> Hadoop: Release 3.0.0-alpha4
>Reporter: liumi
>Assignee: liumi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12487.002.patch, HDFS-12487.003.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> BlockIteratorImpl.nextBlock() will look for the blocks in the source volume, 
> if there are no blocks any more, it will return null up to 
> DiskBalancer.getBlockToCopy(). However, the DiskBalancer.getBlockToCopy() 
> will check whether it's a valid block.
> When I look into the FsDatasetSpi.isValidBlock(), I find that it doesn't 
> check the null pointer! In fact, we firstly need to check whether it's null 
> or not, or exception will occur.
> This bug is hard to find, because the DiskBalancer hardly copy all the data 
> of one volume to others. Even if some times we may copy all the data of one 
> volume to other volumes, when the bug occurs, the copy process has already 
> done.
> However, when we try to copy all the data of two or more volumes to other 
> volumes in more than one step, the thread will be shut down, which is caused 
> by the bug above.
> The bug can fixed by two ways:
> 1)Before the call of FsDatasetSpi.isValidBlock(), we check the null pointer
> 2)Check the null pointer inside the implementation of 
> FsDatasetSpi.isValidBlock()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12487) FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do the callers

2020-05-20 Thread pengWei Dou (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111906#comment-17111906
 ] 

pengWei Dou commented on HDFS-12487:


[#anchor]liumi

 

> FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do 
> the callers
> --
>
> Key: HDFS-12487
> URL: https://issues.apache.org/jira/browse/HDFS-12487
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, diskbalancer
>Affects Versions: 3.0.0
> Environment: CentOS 6.8 x64
> CPU:4 core
> Memory:16GB
> Hadoop: Release 3.0.0-alpha4
>Reporter: liumi
>Assignee: liumi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12487.002.patch, HDFS-12487.003.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> BlockIteratorImpl.nextBlock() will look for the blocks in the source volume, 
> if there are no blocks any more, it will return null up to 
> DiskBalancer.getBlockToCopy(). However, the DiskBalancer.getBlockToCopy() 
> will check whether it's a valid block.
> When I look into the FsDatasetSpi.isValidBlock(), I find that it doesn't 
> check the null pointer! In fact, we firstly need to check whether it's null 
> or not, or exception will occur.
> This bug is hard to find, because the DiskBalancer hardly copy all the data 
> of one volume to others. Even if some times we may copy all the data of one 
> volume to other volumes, when the bug occurs, the copy process has already 
> done.
> However, when we try to copy all the data of two or more volumes to other 
> volumes in more than one step, the thread will be shut down, which is caused 
> by the bug above.
> The bug can fixed by two ways:
> 1)Before the call of FsDatasetSpi.isValidBlock(), we check the null pointer
> 2)Check the null pointer inside the implementation of 
> FsDatasetSpi.isValidBlock()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15364) Support sort the output according to the number of occurrences of the opcode for StatisticsEditsVisitor

2020-05-20 Thread bianqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bianqi updated HDFS-15364:
--
Status: Patch Available  (was: Open)

> Support sort the output according to the number of occurrences of the opcode 
> for StatisticsEditsVisitor
> ---
>
> Key: HDFS-15364
> URL: https://issues.apache.org/jira/browse/HDFS-15364
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 3.2.1
>Reporter: bianqi
>Assignee: bianqi
>Priority: Minor
> Attachments: HDFS-15364.001.patch, HDFS-15364.002.patch
>
>
>       At present, when we execute `hdfs oev -p stats -i edits -o 
> edits.stats`, the output format is as follows, and all the opcodes will be 
> output once.
> {quote}VERSION : -65
>  OP_ADD ( 0): 2
>  OP_RENAME_OLD ( 1): 2
>  OP_DELETE ( 2): 0
>  OP_MKDIR ( 3): 5
>  OP_SET_REPLICATION ( 4): 0
>  OP_DATANODE_ADD ( 5): 0
>  OP_DATANODE_REMOVE ( 6): 0
>  OP_SET_PERMISSIONS ( 7): 4
>  OP_SET_OWNER ( 8): 1
>  OP_CLOSE ( 9): 2
>  OP_SET_GENSTAMP_V1 ( 10): 0
>  OP_SET_NS_QUOTA ( 11): 0
>  OP_CLEAR_NS_QUOTA ( 12): 0
>  OP_TIMES ( 13): 0
>  OP_SET_QUOTA ( 14): 0
>  OP_RENAME ( 15): 0
>  OP_CONCAT_DELETE ( 16): 0
>  OP_SYMLINK ( 17): 0
>  OP_GET_DELEGATION_TOKEN ( 18): 0
>  OP_RENEW_DELEGATION_TOKEN ( 19): 0
>  OP_CANCEL_DELEGATION_TOKEN ( 20): 0
>  OP_UPDATE_MASTER_KEY ( 21): 0
>  OP_REASSIGN_LEASE ( 22): 0
>  OP_END_LOG_SEGMENT ( 23): 1
>  OP_START_LOG_SEGMENT ( 24): 1
>  OP_UPDATE_BLOCKS ( 25): 0
>  OP_CREATE_SNAPSHOT ( 26): 0
>  OP_DELETE_SNAPSHOT ( 27): 0
>  OP_RENAME_SNAPSHOT ( 28): 0
>  OP_ALLOW_SNAPSHOT ( 29): 0
>  OP_DISALLOW_SNAPSHOT ( 30): 0
>  OP_SET_GENSTAMP_V2 ( 31): 2
>  OP_ALLOCATE_BLOCK_ID ( 32): 2
>  OP_ADD_BLOCK ( 33): 2
>  OP_ADD_CACHE_DIRECTIVE ( 34): 0
>  OP_REMOVE_CACHE_DIRECTIVE ( 35): 0
>  OP_ADD_CACHE_POOL ( 36): 0
>  OP_MODIFY_CACHE_POOL ( 37): 0
>  OP_REMOVE_CACHE_POOL ( 38): 0
>  OP_MODIFY_CACHE_DIRECTIVE ( 39): 0
>  OP_SET_ACL ( 40): 0
>  OP_ROLLING_UPGRADE_START ( 41): 0
>  OP_ROLLING_UPGRADE_FINALIZE ( 42): 0
>  OP_SET_XATTR ( 43): 0
>  OP_REMOVE_XATTR ( 44): 0
>  OP_SET_STORAGE_POLICY ( 45): 0
>  OP_TRUNCATE ( 46): 0
>  OP_APPEND ( 47): 0
>  OP_SET_QUOTA_BY_STORAGETYPE ( 48): 0
>  OP_ADD_ERASURE_CODING_POLICY ( 49): 0
>  OP_ENABLE_ERASURE_CODING_POLIC ( 50): 0
>  OP_DISABLE_ERASURE_CODING_POLI ( 51): 0
>  OP_REMOVE_ERASURE_CODING_POLIC ( 52): 0
>  OP_INVALID ( -1): 0
> {quote}
>  But in general, the edits file we parse does not involve all the operation 
> codes. If all the operation codes are output, it is unfriendly for the 
> cluster administrator to view the output.
>     we usually only care about what opcodes appear in the edits file.We can 
> output the opcodes that appeared in the edits file and sort them.
> For example, we can execute the following command:
> {quote} hdfs oev -p stats -i edits_0001321-0001344 
> -sort -o edits.stats -v
> {quote}
> The output format is as follows:
> {quote}VERSION : -65
>  OP_MKDIR ( 3): 5
>  OP_SET_PERMISSIONS ( 7): 4
>  OP_ADD ( 0): 2
>  OP_RENAME_OLD ( 1): 2
>  OP_CLOSE ( 9): 2
>  OP_SET_GENSTAMP_V2 ( 31): 2
>  OP_ALLOCATE_BLOCK_ID ( 32): 2
>  OP_ADD_BLOCK ( 33): 2
>  OP_SET_OWNER ( 8): 1
>  OP_END_LOG_SEGMENT ( 23): 1
>  OP_START_LOG_SEGMENT ( 24): 1
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15364) Support sort the output according to the number of occurrences of the opcode for StatisticsEditsVisitor

2020-05-20 Thread bianqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bianqi updated HDFS-15364:
--
Status: Open  (was: Patch Available)

> Support sort the output according to the number of occurrences of the opcode 
> for StatisticsEditsVisitor
> ---
>
> Key: HDFS-15364
> URL: https://issues.apache.org/jira/browse/HDFS-15364
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 3.2.1
>Reporter: bianqi
>Assignee: bianqi
>Priority: Minor
> Attachments: HDFS-15364.001.patch, HDFS-15364.002.patch
>
>
>       At present, when we execute `hdfs oev -p stats -i edits -o 
> edits.stats`, the output format is as follows, and all the opcodes will be 
> output once.
> {quote}VERSION : -65
>  OP_ADD ( 0): 2
>  OP_RENAME_OLD ( 1): 2
>  OP_DELETE ( 2): 0
>  OP_MKDIR ( 3): 5
>  OP_SET_REPLICATION ( 4): 0
>  OP_DATANODE_ADD ( 5): 0
>  OP_DATANODE_REMOVE ( 6): 0
>  OP_SET_PERMISSIONS ( 7): 4
>  OP_SET_OWNER ( 8): 1
>  OP_CLOSE ( 9): 2
>  OP_SET_GENSTAMP_V1 ( 10): 0
>  OP_SET_NS_QUOTA ( 11): 0
>  OP_CLEAR_NS_QUOTA ( 12): 0
>  OP_TIMES ( 13): 0
>  OP_SET_QUOTA ( 14): 0
>  OP_RENAME ( 15): 0
>  OP_CONCAT_DELETE ( 16): 0
>  OP_SYMLINK ( 17): 0
>  OP_GET_DELEGATION_TOKEN ( 18): 0
>  OP_RENEW_DELEGATION_TOKEN ( 19): 0
>  OP_CANCEL_DELEGATION_TOKEN ( 20): 0
>  OP_UPDATE_MASTER_KEY ( 21): 0
>  OP_REASSIGN_LEASE ( 22): 0
>  OP_END_LOG_SEGMENT ( 23): 1
>  OP_START_LOG_SEGMENT ( 24): 1
>  OP_UPDATE_BLOCKS ( 25): 0
>  OP_CREATE_SNAPSHOT ( 26): 0
>  OP_DELETE_SNAPSHOT ( 27): 0
>  OP_RENAME_SNAPSHOT ( 28): 0
>  OP_ALLOW_SNAPSHOT ( 29): 0
>  OP_DISALLOW_SNAPSHOT ( 30): 0
>  OP_SET_GENSTAMP_V2 ( 31): 2
>  OP_ALLOCATE_BLOCK_ID ( 32): 2
>  OP_ADD_BLOCK ( 33): 2
>  OP_ADD_CACHE_DIRECTIVE ( 34): 0
>  OP_REMOVE_CACHE_DIRECTIVE ( 35): 0
>  OP_ADD_CACHE_POOL ( 36): 0
>  OP_MODIFY_CACHE_POOL ( 37): 0
>  OP_REMOVE_CACHE_POOL ( 38): 0
>  OP_MODIFY_CACHE_DIRECTIVE ( 39): 0
>  OP_SET_ACL ( 40): 0
>  OP_ROLLING_UPGRADE_START ( 41): 0
>  OP_ROLLING_UPGRADE_FINALIZE ( 42): 0
>  OP_SET_XATTR ( 43): 0
>  OP_REMOVE_XATTR ( 44): 0
>  OP_SET_STORAGE_POLICY ( 45): 0
>  OP_TRUNCATE ( 46): 0
>  OP_APPEND ( 47): 0
>  OP_SET_QUOTA_BY_STORAGETYPE ( 48): 0
>  OP_ADD_ERASURE_CODING_POLICY ( 49): 0
>  OP_ENABLE_ERASURE_CODING_POLIC ( 50): 0
>  OP_DISABLE_ERASURE_CODING_POLI ( 51): 0
>  OP_REMOVE_ERASURE_CODING_POLIC ( 52): 0
>  OP_INVALID ( -1): 0
> {quote}
>  But in general, the edits file we parse does not involve all the operation 
> codes. If all the operation codes are output, it is unfriendly for the 
> cluster administrator to view the output.
>     we usually only care about what opcodes appear in the edits file.We can 
> output the opcodes that appeared in the edits file and sort them.
> For example, we can execute the following command:
> {quote} hdfs oev -p stats -i edits_0001321-0001344 
> -sort -o edits.stats -v
> {quote}
> The output format is as follows:
> {quote}VERSION : -65
>  OP_MKDIR ( 3): 5
>  OP_SET_PERMISSIONS ( 7): 4
>  OP_ADD ( 0): 2
>  OP_RENAME_OLD ( 1): 2
>  OP_CLOSE ( 9): 2
>  OP_SET_GENSTAMP_V2 ( 31): 2
>  OP_ALLOCATE_BLOCK_ID ( 32): 2
>  OP_ADD_BLOCK ( 33): 2
>  OP_SET_OWNER ( 8): 1
>  OP_END_LOG_SEGMENT ( 23): 1
>  OP_START_LOG_SEGMENT ( 24): 1
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-12487) FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do the callers

2020-05-20 Thread pengWei Dou (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengWei Dou updated HDFS-12487:
---
Comment: was deleted

(was: liumi)

> FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do 
> the callers
> --
>
> Key: HDFS-12487
> URL: https://issues.apache.org/jira/browse/HDFS-12487
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, diskbalancer
>Affects Versions: 3.0.0
> Environment: CentOS 6.8 x64
> CPU:4 core
> Memory:16GB
> Hadoop: Release 3.0.0-alpha4
>Reporter: liumi
>Assignee: liumi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12487.002.patch, HDFS-12487.003.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> BlockIteratorImpl.nextBlock() will look for the blocks in the source volume, 
> if there are no blocks any more, it will return null up to 
> DiskBalancer.getBlockToCopy(). However, the DiskBalancer.getBlockToCopy() 
> will check whether it's a valid block.
> When I look into the FsDatasetSpi.isValidBlock(), I find that it doesn't 
> check the null pointer! In fact, we firstly need to check whether it's null 
> or not, or exception will occur.
> This bug is hard to find, because the DiskBalancer hardly copy all the data 
> of one volume to others. Even if some times we may copy all the data of one 
> volume to other volumes, when the bug occurs, the copy process has already 
> done.
> However, when we try to copy all the data of two or more volumes to other 
> volumes in more than one step, the thread will be shut down, which is caused 
> by the bug above.
> The bug can fixed by two ways:
> 1)Before the call of FsDatasetSpi.isValidBlock(), we check the null pointer
> 2)Check the null pointer inside the implementation of 
> FsDatasetSpi.isValidBlock()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12487) FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do the callers

2020-05-20 Thread pengWei Dou (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111904#comment-17111904
 ] 

pengWei Dou commented on HDFS-12487:


liumi

> FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do 
> the callers
> --
>
> Key: HDFS-12487
> URL: https://issues.apache.org/jira/browse/HDFS-12487
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, diskbalancer
>Affects Versions: 3.0.0
> Environment: CentOS 6.8 x64
> CPU:4 core
> Memory:16GB
> Hadoop: Release 3.0.0-alpha4
>Reporter: liumi
>Assignee: liumi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12487.002.patch, HDFS-12487.003.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> BlockIteratorImpl.nextBlock() will look for the blocks in the source volume, 
> if there are no blocks any more, it will return null up to 
> DiskBalancer.getBlockToCopy(). However, the DiskBalancer.getBlockToCopy() 
> will check whether it's a valid block.
> When I look into the FsDatasetSpi.isValidBlock(), I find that it doesn't 
> check the null pointer! In fact, we firstly need to check whether it's null 
> or not, or exception will occur.
> This bug is hard to find, because the DiskBalancer hardly copy all the data 
> of one volume to others. Even if some times we may copy all the data of one 
> volume to other volumes, when the bug occurs, the copy process has already 
> done.
> However, when we try to copy all the data of two or more volumes to other 
> volumes in more than one step, the thread will be shut down, which is caused 
> by the bug above.
> The bug can fixed by two ways:
> 1)Before the call of FsDatasetSpi.isValidBlock(), we check the null pointer
> 2)Check the null pointer inside the implementation of 
> FsDatasetSpi.isValidBlock()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15364) Support sort the output according to the number of occurrences of the opcode for StatisticsEditsVisitor

2020-05-20 Thread bianqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bianqi updated HDFS-15364:
--
Attachment: HDFS-15364.002.patch

> Support sort the output according to the number of occurrences of the opcode 
> for StatisticsEditsVisitor
> ---
>
> Key: HDFS-15364
> URL: https://issues.apache.org/jira/browse/HDFS-15364
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 3.2.1
>Reporter: bianqi
>Assignee: bianqi
>Priority: Minor
> Attachments: HDFS-15364.001.patch, HDFS-15364.002.patch
>
>
>       At present, when we execute `hdfs oev -p stats -i edits -o 
> edits.stats`, the output format is as follows, and all the opcodes will be 
> output once.
> {quote}VERSION : -65
>  OP_ADD ( 0): 2
>  OP_RENAME_OLD ( 1): 2
>  OP_DELETE ( 2): 0
>  OP_MKDIR ( 3): 5
>  OP_SET_REPLICATION ( 4): 0
>  OP_DATANODE_ADD ( 5): 0
>  OP_DATANODE_REMOVE ( 6): 0
>  OP_SET_PERMISSIONS ( 7): 4
>  OP_SET_OWNER ( 8): 1
>  OP_CLOSE ( 9): 2
>  OP_SET_GENSTAMP_V1 ( 10): 0
>  OP_SET_NS_QUOTA ( 11): 0
>  OP_CLEAR_NS_QUOTA ( 12): 0
>  OP_TIMES ( 13): 0
>  OP_SET_QUOTA ( 14): 0
>  OP_RENAME ( 15): 0
>  OP_CONCAT_DELETE ( 16): 0
>  OP_SYMLINK ( 17): 0
>  OP_GET_DELEGATION_TOKEN ( 18): 0
>  OP_RENEW_DELEGATION_TOKEN ( 19): 0
>  OP_CANCEL_DELEGATION_TOKEN ( 20): 0
>  OP_UPDATE_MASTER_KEY ( 21): 0
>  OP_REASSIGN_LEASE ( 22): 0
>  OP_END_LOG_SEGMENT ( 23): 1
>  OP_START_LOG_SEGMENT ( 24): 1
>  OP_UPDATE_BLOCKS ( 25): 0
>  OP_CREATE_SNAPSHOT ( 26): 0
>  OP_DELETE_SNAPSHOT ( 27): 0
>  OP_RENAME_SNAPSHOT ( 28): 0
>  OP_ALLOW_SNAPSHOT ( 29): 0
>  OP_DISALLOW_SNAPSHOT ( 30): 0
>  OP_SET_GENSTAMP_V2 ( 31): 2
>  OP_ALLOCATE_BLOCK_ID ( 32): 2
>  OP_ADD_BLOCK ( 33): 2
>  OP_ADD_CACHE_DIRECTIVE ( 34): 0
>  OP_REMOVE_CACHE_DIRECTIVE ( 35): 0
>  OP_ADD_CACHE_POOL ( 36): 0
>  OP_MODIFY_CACHE_POOL ( 37): 0
>  OP_REMOVE_CACHE_POOL ( 38): 0
>  OP_MODIFY_CACHE_DIRECTIVE ( 39): 0
>  OP_SET_ACL ( 40): 0
>  OP_ROLLING_UPGRADE_START ( 41): 0
>  OP_ROLLING_UPGRADE_FINALIZE ( 42): 0
>  OP_SET_XATTR ( 43): 0
>  OP_REMOVE_XATTR ( 44): 0
>  OP_SET_STORAGE_POLICY ( 45): 0
>  OP_TRUNCATE ( 46): 0
>  OP_APPEND ( 47): 0
>  OP_SET_QUOTA_BY_STORAGETYPE ( 48): 0
>  OP_ADD_ERASURE_CODING_POLICY ( 49): 0
>  OP_ENABLE_ERASURE_CODING_POLIC ( 50): 0
>  OP_DISABLE_ERASURE_CODING_POLI ( 51): 0
>  OP_REMOVE_ERASURE_CODING_POLIC ( 52): 0
>  OP_INVALID ( -1): 0
> {quote}
>  But in general, the edits file we parse does not involve all the operation 
> codes. If all the operation codes are output, it is unfriendly for the 
> cluster administrator to view the output.
>     we usually only care about what opcodes appear in the edits file.We can 
> output the opcodes that appeared in the edits file and sort them.
> For example, we can execute the following command:
> {quote} hdfs oev -p stats -i edits_0001321-0001344 
> -sort -o edits.stats -v
> {quote}
> The output format is as follows:
> {quote}VERSION : -65
>  OP_MKDIR ( 3): 5
>  OP_SET_PERMISSIONS ( 7): 4
>  OP_ADD ( 0): 2
>  OP_RENAME_OLD ( 1): 2
>  OP_CLOSE ( 9): 2
>  OP_SET_GENSTAMP_V2 ( 31): 2
>  OP_ALLOCATE_BLOCK_ID ( 32): 2
>  OP_ADD_BLOCK ( 33): 2
>  OP_SET_OWNER ( 8): 1
>  OP_END_LOG_SEGMENT ( 23): 1
>  OP_START_LOG_SEGMENT ( 24): 1
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2020-05-20 Thread zZtai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zZtai updated HDFS-15098:
-
  Attachment: HDFS-15098.004.patch
Release Note: 
1.Modified the cc, checkstyle and whitespace that occurred in the previous 
patch.
2.Modified the test case about SM4 which can run successfully.

  was:patch to hadoop trunk branch

  Status: Patch Available  (was: Open)

Referring to the previous patch, we modified checkstyle and test cases.

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org