[jira] [Updated] (HDFS-16751) WebUI FileSystem explorer could delete wrong file by mistake
[ https://issues.apache.org/jira/browse/HDFS-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-16751: - Summary: WebUI FileSystem explorer could delete wrong file by mistake (was: WebUI FileSystem explorer file Deletion could delete wrong file by mistake) > WebUI FileSystem explorer could delete wrong file by mistake > > > Key: HDFS-16751 > URL: https://issues.apache.org/jira/browse/HDFS-16751 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.1 >Reporter: Walter Su >Priority: Major > Attachments: tmp.png > > > on FileSystem explorer page, I click on 'Delete' icon in order to delete file > A. The result is that File B is deleted. > I found out that the ajax url string concation is wrong, as show in the image > I attached. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16751) WebUI FileSystem explorer file Deletion could delete wrong file by mistake
Walter Su created HDFS-16751: Summary: WebUI FileSystem explorer file Deletion could delete wrong file by mistake Key: HDFS-16751 URL: https://issues.apache.org/jira/browse/HDFS-16751 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.2.1 Reporter: Walter Su Attachments: tmp.png on FileSystem explorer page, I click on 'Delete' icon in order to delete file A. The result is that File B is deleted. I found out that the ajax url string concation is wrong, as show in the image I attached. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16644) java.io.IOException Invalid token in javax.security.sasl.qop
Walter Su created HDFS-16644: Summary: java.io.IOException Invalid token in javax.security.sasl.qop Key: HDFS-16644 URL: https://issues.apache.org/jira/browse/HDFS-16644 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.2.1 Reporter: Walter Su deployment: server side: kerberos enabled cluster with jdk 1.8 and hdfs-server 3.2.1 client side: I run command hadoop fs -put a test file, with kerberos ticket inited first, and use identical core-site.xml & hdfs-site.xml configuration. using client ver 3.2.1, it succeeds. using client ver 2.8.5, it succeeds. using client ver 2.10.1, it fails. The client side error info is: org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2022-06-27 01:06:15,781 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode{data=FSDataset{dirpath='[/mnt/disk1/hdfs, /mnt/***/hdfs, /mnt/***/hdfs, /mnt/***/hdfs]'}, localName='emr-worker-***.***:9866', datanodeUuid='b1c7f64a-6389-4739-bddf-***', xmitsInProgress=0}:Exception transfering block BP-1187699012-10.-***:blk_1119803380_46080919 to mirror 10.*:9866 java.io.IOException: Invalid token in javax.security.sasl.qop: D at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:220) Once any client ver 2.10.1 connect to hdfs server, the DataNode no longer accepts any client connection, even client ver 3.2.1 cannot connects to hdfs server. The DataNode rejects any client connection. For a short time, all DataNodes rejects client connections. The problem exists even if I replace DataNode with ver 3.3.0 or replace java with jdk 11. The problem is fixed if I replace DataNode with ver 3.2.0. I guess the problem is related to HDFS-13541 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10383) Safely close resources in DFSTestUtil
[ https://issues.apache.org/jira/browse/HDFS-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284143#comment-15284143 ] Walter Su commented on HDFS-10383: -- bq. IOUtils#cleanup swallows it in the finally block. Great work! And good analysis about {{createStripedFile()}}. We already have {{createStripedFile()}} before {{DFSStripedSteam}} is implemented. The test still prints a warning stacktrace because of secondary {{completeFile()}}. So I think, which is not related to this, how about changing it together {code} - out = dfs.create(file, (short) 1); // create an empty file + cluster.getNameNodeRpc() + .create(file.toString(), new FsPermission((short)0755), + dfs.getClient().getClientName(), + new EnumSetWritable<>(EnumSet.of(CreateFlag.CREATE)), + false, (short)1, 128*1024*1024L, null); {code} > Safely close resources in DFSTestUtil > - > > Key: HDFS-10383 > URL: https://issues.apache.org/jira/browse/HDFS-10383 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10383.000.patch, HDFS-10383.001.patch, > HDFS-10383.002.patch > > > There are a few of methods in {{DFSTestUtil}} that do not close the resource > safely, or elegantly. We can use the try-with-resource statement to address > this problem. > Specially, as {{DFSTestUtil}} is popularly used in test, we need to preserve > any exceptions thrown during the processing of the resource while still > guaranteeing it's closed finally. Take for example,the current implementation > of {{DFSTestUtil#createFile()}} closes the FSDataOutputStream in the > {{finally}} block, and when closing if the internal > {{DFSOutputStream#close()}} throws any exception, which it often does, the > exception thrown during the processing will be lost. See this [test > failure|https://builds.apache.org/job/PreCommit-HADOOP-Build/9320/testReport/org.apache.hadoop.hdfs/TestAsyncDFSRename/testAggressiveConcurrentAsyncRenameWithOverwrite/], > and we have to guess what was the root cause. > Using try-with-resource, we can close the resources safely, and the > exceptions thrown both in processing and closing will be available (closing > exception will be suppressed). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor
[ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275875#comment-15275875 ] Walter Su commented on HDFS-10220: -- The last patch looks pretty good. +1 once the test nits get addressed. Thanks [~ashangit] for the contribution. Thanks [~raviprak] and [~liuml07] for the good advice and review. > Namenode failover due to too long loking in LeaseManager.Monitor > > > Key: HDFS-10220 > URL: https://issues.apache.org/jira/browse/HDFS-10220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Nicolas Fraison >Assignee: Nicolas Fraison >Priority: Minor > Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, > HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, > HADOOP-10220.006.patch, threaddump_zkfc.txt > > > I have faced a namenode failover due to unresponsive namenode detected by the > zkfc with lot's of WARN messages (5 millions) like this one: > _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All > existing blocks are COMPLETE, lease removed, file closed._ > On the threaddump taken by the zkfc there are lots of thread blocked due to a > lock. > Looking at the code, there are a lock taken by the LeaseManager.Monitor when > some lease must be released. Due to the really big number of lease to be > released the namenode has taken too many times to release them blocking all > other tasks and making the zkfc thinking that the namenode was not > available/stuck. > The idea of this patch is to limit the number of leased released each time we > check for lease so the lock won't be taken for a too long time period. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10340) data node sudden killed
[ https://issues.apache.org/jira/browse/HDFS-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261938#comment-15261938 ] Walter Su commented on HDFS-10340: -- I don't think it's an issue. SIGTERM comes from the outside. The signal is probably emitted by some script, command, or daemon process. > data node sudden killed > > > Key: HDFS-10340 > URL: https://issues.apache.org/jira/browse/HDFS-10340 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 > Environment: Ubuntu 16.04 LTS , RAM 16g , cpu core : 8 , hdd 100gb, > hadoop 2.6.0 >Reporter: tu nguyen khac >Priority: Critical > > I tried to setup a new data node using ubuntu 16 > and get it join to an existed Hadoop Hdfs cluster ( there are 10 nodes in > this cluster and they all run on centos Os 6 ) > But when i try to boostrap this node , after about 10 or 20 minutes i get > this strange errors : > 2016-04-26 20:12:09,394 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.3.24.65:55323, dest: /10.3.24.197:50010, bytes: 79902, op: HDFS_WRITE, > cliID: DFSClient_NONMAPREDUCE_1379996362_1, offset: 0, srvID: > 225f5b43-1dd3-4ac6-88d2-1e8d27dba55b, blockid: > BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, duration: > 15331628 > 2016-04-26 20:12:09,394 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > PacketResponder: BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, > type=LAST_IN_PIPELINE, downstreams=0:[] terminating > 2016-04-26 20:12:25,410 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038502_789829 > 2016-04-26 20:12:25,411 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832 > 2016-04-26 20:13:18,546 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Scheduling blk_1074038502_789829 file > /data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502 > for deletion > 2016-04-26 20:13:18,562 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Deleted BP-352432948-10.3.24.65-1433821675295 blk_1074038502_789829 file > /data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502 > 2016-04-26 20:15:46,481 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM > 2016-04-26 20:15:46,504 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down DataNode at bigdata-dw-24-197/10.3.24.197 > / -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261388#comment-15261388 ] Walter Su commented on HDFS-9958: - bq. I think only DFSClient currently reports storageID. No, it doesn't. {code} //DFSInputStream.java protected void reportCheckSumFailure(CorruptedBlocks corruptedBlocks, int dataNodeCount, boolean isStriped) { ... reportList.add(new LocatedBlock(blk, locs)); } } ... dfsClient.reportChecksumFailure(src, reportList.toArray(new LocatedBlock[reportList.size()])); {code} {{locs}} is {{DatanodeInfoWithStorage}} actually, it has the storageIDs. But the {{LocatedBlock}} constructor is wrong. {code} public LocatedBlock(ExtendedBlock b, DatanodeInfo[] locs) { // By default, startOffset is unknown(-1) and corrupt is false. this(b, locs, null, null, -1, false, EMPTY_LOCS); } ... ... public LocatedBlock(ExtendedBlock b, DatanodeInfo[] locs, String[] storageIDs, StorageType[] storageTypes, long startOffset, boolean corrupt, DatanodeInfo[] cachedLocs) { ... DatanodeInfoWithStorage storage = new DatanodeInfoWithStorage(di, storageIDs != null ? storageIDs[i] : null, storageTypes != null ? storageTypes[i] : null); this.locs[i] = storage; {code} It loses the storageIDs. > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, > HDFS-9958.005.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at
[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259918#comment-15259918 ] Walter Su commented on HDFS-9958: - Failed tests are not related. Will commit shortly if there's no further comment. > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, > HDFS-9958.005.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5280) Corrupted meta files on data nodes prevents DFClient from connecting to data nodes and updating corruption status to name node.
[ https://issues.apache.org/jira/browse/HDFS-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259490#comment-15259490 ] Walter Su commented on HDFS-5280: - There's other IOExceptions will cause readBlock RPC call fails, then cause the dn marked as dead. We could fix them as well. If I understand correctly, you approach is to use a fake checksum. When client reads data, the check failed, and client will mark block as corrupted instead of mark dn as dead. I think, can we let client not to read from this dn at first? If client fails to create blockreader, it can tell if the dn is dead or it's just the block is corrupted. {code} //DFSInputStream.java 652 try { 653 blockReader = getBlockReader(targetBlock, offsetIntoBlock, 654 targetBlock.getBlockSize() - offsetIntoBlock, targetAddr, 655 storageType, chosenNode); 656 if(connectFailedOnce) { 657 DFSClient.LOG.info("Successfully connected to " + targetAddr + 658 " for " + targetBlock.getBlock()); 659 } 660 return chosenNode; 661 } catch (IOException ex) { 662 if (ex instanceof InvalidEncryptionKeyException && refetchEncryptionKey > 0) { ... 672 } else { ... 677 addToDeadNodes(chosenNode); 678 } 679 } 680 } 681 } {code} Instead of going to {{else}} clause, can we have another Exception like {{InvalidEncryptionKeyException}}, if we catch it, we skip the dn, and do not add it to dead nodes. > Corrupted meta files on data nodes prevents DFClient from connecting to data > nodes and updating corruption status to name node. > --- > > Key: HDFS-5280 > URL: https://issues.apache.org/jira/browse/HDFS-5280 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs-client >Affects Versions: 1.1.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.7.2 > Environment: Red hat enterprise 6.4 > Hadoop-2.1.0 >Reporter: Jinghui Wang >Assignee: Andres Perez > Attachments: HDFS-5280.patch > > > Meta files being corrupted causes the DFSClient not able to connect to the > datanodes to access the blocks, so DFSClient never perform a read on the > block, which is what throws the ChecksumException when file blocks are > corrupted and report to the namenode to mark the block as corrupt. Since the > client never got to that far, thus the file status remain as healthy and so > are all the blocks. > To replicate the error, put a file onto HDFS. > run hadoop fsck /tmp/bogus.csv -files -blocks -location will get that > following output. > FSCK started for path /tmp/bogus.csv at 11:33:29 > /tmp/bogus.csv 109 bytes, 1 block(s): OK > 0. blk_-4255166695856420554_5292 len=109 repl=3 > find the block/meta files for 4255166695856420554 by running > ssh datanode1.address find /hadoop/ -name "*4255166695856420554*" and it will > get the following output: > /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554 > /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta > now corrupt the meta file by running > ssh datanode1.address "sed -i -e '1i 1234567891' > /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta" > now run hadoop fs -cat /tmp/bogus.csv > will show the stack trace of DFSClient failing to connect to the data node > with the corrupted meta file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259457#comment-15259457 ] Walter Su commented on HDFS-9958: - {code} @@ -1320,11 +1320,22 @@ public void findAndMarkBlockAsCorrupt(final ExtendedBlock blk, +if (storage == null) { + storage = storedBlock.findStorageInfo(node); +} {code} I'm surprised that most of the time, {{storageID}} is null. It makes the code above error prone, because the blk can be added/moved to another healthy storage in the same node. I suppose we should add the storageID message into the request. +1. re-trigger the jenkins. > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, > HDFS-9958.005.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor
[ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259384#comment-15259384 ] Walter Su commented on HDFS-10220: -- bq. I think it add some readability and also because it is used twice. I only took a peek last time. Yeah, i'm ok with that. Another problem when I go through the details, {code} while(!sortedLeases.isEmpty() && sortedLeases.peek().expiredHardLimit() && !isMaxLockHoldToReleaseLease(start)) { Lease leaseToCheck = sortedLeases.poll(); ... Collection files = leaseToCheck.getFiles(); ... for(Long id : leaseINodeIds) { ... } finally { filesLeasesChecked++; if (isMaxLockHoldToReleaseLease(start)) { LOG.debug("Breaking out of checkLeases() after " + filesLeasesChecked + " file leases checked."); break; } } {code} You can't just break the inside for-loop, the {{leaseToCheck}} has been polled out of the queue. This will cause some files won't be closed. > Namenode failover due to too long loking in LeaseManager.Monitor > > > Key: HDFS-10220 > URL: https://issues.apache.org/jira/browse/HDFS-10220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Nicolas Fraison >Assignee: Nicolas Fraison >Priority: Minor > Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, > HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, > threaddump_zkfc.txt > > > I have faced a namenode failover due to unresponsive namenode detected by the > zkfc with lot's of WARN messages (5 millions) like this one: > _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All > existing blocks are COMPLETE, lease removed, file closed._ > On the threaddump taken by the zkfc there are lots of thread blocked due to a > lock. > Looking at the code, there are a lock taken by the LeaseManager.Monitor when > some lease must be released. Due to the really big number of lease to be > released the namenode has taken too many times to release them blocking all > other tasks and making the zkfc thinking that the namenode was not > available/stuck. > The idea of this patch is to limit the number of leased released each time we > check for lease so the lock won't be taken for a too long time period. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor
[ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257898#comment-15257898 ] Walter Su commented on HDFS-10220: -- Thanks [~ashangit] for the update. repeat one of my previous comment:{{isMaxLockHoldToReleaseLease}} doesn't need to be a function. Is it because it's called twice per iteration? I think check it once per iteration would be enough. > Namenode failover due to too long loking in LeaseManager.Monitor > > > Key: HDFS-10220 > URL: https://issues.apache.org/jira/browse/HDFS-10220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Nicolas Fraison >Assignee: Nicolas Fraison >Priority: Minor > Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, > HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, > threaddump_zkfc.txt > > > I have faced a namenode failover due to unresponsive namenode detected by the > zkfc with lot's of WARN messages (5 millions) like this one: > _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All > existing blocks are COMPLETE, lease removed, file closed._ > On the threaddump taken by the zkfc there are lots of thread blocked due to a > lock. > Looking at the code, there are a lock taken by the LeaseManager.Monitor when > some lease must be released. Due to the really big number of lease to be > released the namenode has taken too many times to release them blocking all > other tasks and making the zkfc thinking that the namenode was not > available/stuck. > The idea of this patch is to limit the number of leased released each time we > check for lease so the lock won't be taken for a too long time period. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255787#comment-15255787 ] Walter Su commented on HDFS-10301: -- bq. BR ids are monotonically increasing. The id values are random intially, if it starts with a large value it could overflow after a long run? If DN restarts, the value randomized again. We should be careful in case NN rejects all following BRs. If BR is splitted into multipe RPCs, there's no interleaving natually because DN get the acked before it sends next RPC. Interleaving only exists if BR is not splitted. I agree bug need to be fixed from inside, It's just eliminating interleaving for good maybe not a bad idea, as it simplifies the problem, and is also a simple workaround for this jira. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255772#comment-15255772 ] Walter Su commented on HDFS-10301: -- Thank you for your explanation. I learned a lot. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor
[ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253523#comment-15253523 ] Walter Su commented on HDFS-10220: -- I mean, saving administrators the trouble to tune this. > Namenode failover due to too long loking in LeaseManager.Monitor > > > Key: HDFS-10220 > URL: https://issues.apache.org/jira/browse/HDFS-10220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Nicolas Fraison >Assignee: Nicolas Fraison >Priority: Minor > Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, > HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt > > > I have faced a namenode failover due to unresponsive namenode detected by the > zkfc with lot's of WARN messages (5 millions) like this one: > _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All > existing blocks are COMPLETE, lease removed, file closed._ > On the threaddump taken by the zkfc there are lots of thread blocked due to a > lock. > Looking at the code, there are a lock taken by the LeaseManager.Monitor when > some lease must be released. Due to the really big number of lease to be > released the namenode has taken too many times to release them blocking all > other tasks and making the zkfc thinking that the namenode was not > available/stuck. > The idea of this patch is to limit the number of leased released each time we > check for lease so the lock won't be taken for a too long time period. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor
[ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253354#comment-15253354 ] Walter Su commented on HDFS-10220: -- You are right. The only question I have is I have no idea if the default value 1000 is a right choice, or the approach of throttling the rate. I kind of hope it's out-of-the-box. Small companies with small clusters have cluster administrators who may not quite understand what the configuration means. bq. Counting the time since better in term of funcionnality but I'm afraid about adding extra computation time on this check compare to a simple count of files. The idea is not to spend more times to release those lease. What is your feeling about it? I believe the overhead can be ignored. Or we can calc the elapse time after processing a small batch. I saw {{BlockManager.BlockReportProcessingThread}} release the writeLock if it holds it more than 4ms. Do you think the same idea works here? > Namenode failover due to too long loking in LeaseManager.Monitor > > > Key: HDFS-10220 > URL: https://issues.apache.org/jira/browse/HDFS-10220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Nicolas Fraison >Assignee: Nicolas Fraison >Priority: Minor > Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, > HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt > > > I have faced a namenode failover due to unresponsive namenode detected by the > zkfc with lot's of WARN messages (5 millions) like this one: > _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All > existing blocks are COMPLETE, lease removed, file closed._ > On the threaddump taken by the zkfc there are lots of thread blocked due to a > lock. > Looking at the code, there are a lock taken by the LeaseManager.Monitor when > some lease must be released. Due to the really big number of lease to be > released the namenode has taken too many times to release them blocking all > other tasks and making the zkfc thinking that the namenode was not > available/stuck. > The idea of this patch is to limit the number of leased released each time we > check for lease so the lock won't be taken for a too long time period. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253236#comment-15253236 ] Walter Su commented on HDFS-10301: -- I like your idea of counting storages with same reportId, and no purge if there's any interleaving. I guest {{rpcsSeen}} can be removed or replaced by {{storagesSeen}}? Processing the retransmissioned reports is kind of wasting resource. I think the best approach is as Colin said, "to remove existing DataNode storage report RPCs with the old ID from the queue when we receive one with a new block report ID." Let's consider it as an optimization in another jira. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253204#comment-15253204 ] Walter Su commented on HDFS-10301: -- The handler threads will wait anyway, either waiting the queue monitor or the fsn writeLock. The queue processingThread will contend for fsn writeLock. In the end, there's no difference. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253181#comment-15253181 ] Walter Su commented on HDFS-10301: -- bq. Enabling HDFS-9198 will fifo process BRs. It doesn't solve this implementation bug but virtually eliminates it from occurring. bq. This addresses Daryn's comment rather than solving the reported bug, as BTW Daryn correctly stated. that's incorrect. Please run the test in 001 patch with-and-without the fix, you'll see the difference. It does solve the issue. Because, The bug only exists when reports are contained in one rpc. If they are splitted into multiple RPCs, it's not problem, because the {{rpcsSeen}} guard prevent it from happening. So, my approach is to process reports contained in one rpc contiguously, by putting them into the queue atomically. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-10301: - Assignee: (was: Walter Su) > Blocks removed by thousands due to falsely detected zombie storages > --- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor
[ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251150#comment-15251150 ] Walter Su commented on HDFS-10220: -- 1. isMaxFilesCheckedToReleaseLease is not requirted to be a function. 2. repeat [~vinayrpet] said, removeFilesInLease(leaseToCheck, removing); may not be required. 3. The LOG.warn("..") is kind of verbose. 4. I think the config should keep inside. It's about implemention detail. The re-check interval is 2s and is hard-coded too. Besides, It's too complicated for user to pick a right value. Instead of counting the files, I prefer counting the time. If it holds the lock for too long, log a warning and break out for a while. 5. btw, HDFS-9311 should solve this issue. > Namenode failover due to too long loking in LeaseManager.Monitor > > > Key: HDFS-10220 > URL: https://issues.apache.org/jira/browse/HDFS-10220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Nicolas Fraison >Assignee: Nicolas Fraison >Priority: Minor > Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, > HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt > > > I have faced a namenode failover due to unresponsive namenode detected by the > zkfc with lot's of WARN messages (5 millions) like this one: > _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All > existing blocks are COMPLETE, lease removed, file closed._ > On the threaddump taken by the zkfc there are lots of thread blocked due to a > lock. > Looking at the code, there are a lock taken by the LeaseManager.Monitor when > some lease must be released. Due to the really big number of lease to be > released the namenode has taken too many times to release them blocking all > other tasks and making the zkfc thinking that the namenode was not > available/stuck. > The idea of this patch is to limit the number of leased released each time we > check for lease so the lock won't be taken for a too long time period. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5280) Corrupted meta files on data nodes prevents DFClient from connecting to data nodes and updating corruption status to name node.
[ https://issues.apache.org/jira/browse/HDFS-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249649#comment-15249649 ] Walter Su commented on HDFS-5280: - +1 for catching the exception. The same exception will cause {{BlockScanner}} to shutdown. We should be cautious to catch any {{RuntimeException}}. Instead of add {{catch}} to the outside try-finally clause, how about just catch the exactly exception at the place where it's been threw. Like what we did in {{FSNamesystem.java}} {code} 744 try { 745 checksumType = DataChecksum.Type.valueOf(checksumTypeStr); 746 } catch (IllegalArgumentException iae) { 747 throw new IOException("Invalid checksum type in " 748 + DFS_CHECKSUM_TYPE_KEY + ": " + checksumTypeStr); 749 } {code} > Corrupted meta files on data nodes prevents DFClient from connecting to data > nodes and updating corruption status to name node. > --- > > Key: HDFS-5280 > URL: https://issues.apache.org/jira/browse/HDFS-5280 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs-client >Affects Versions: 1.1.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.7.2 > Environment: Red hat enterprise 6.4 > Hadoop-2.1.0 >Reporter: Jinghui Wang >Assignee: Andres Perez > Attachments: HDFS-5280.patch > > > Meta files being corrupted causes the DFSClient not able to connect to the > datanodes to access the blocks, so DFSClient never perform a read on the > block, which is what throws the ChecksumException when file blocks are > corrupted and report to the namenode to mark the block as corrupt. Since the > client never got to that far, thus the file status remain as healthy and so > are all the blocks. > To replicate the error, put a file onto HDFS. > run hadoop fsck /tmp/bogus.csv -files -blocks -location will get that > following output. > FSCK started for path /tmp/bogus.csv at 11:33:29 > /tmp/bogus.csv 109 bytes, 1 block(s): OK > 0. blk_-4255166695856420554_5292 len=109 repl=3 > find the block/meta files for 4255166695856420554 by running > ssh datanode1.address find /hadoop/ -name "*4255166695856420554*" and it will > get the following output: > /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554 > /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta > now corrupt the meta file by running > ssh datanode1.address "sed -i -e '1i 1234567891' > /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta" > now run hadoop fs -cat /tmp/bogus.csv > will show the stack trace of DFSClient failing to connect to the data node > with the corrupted meta file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249312#comment-15249312 ] Walter Su commented on HDFS-9958: - bq. we fix countNodes().corruptReplicas() to return the number after going thru all storages( irrespective of their state) that have the corruptNodes (in this case), since numNodes() is storage state agnostic. I think {{countNodes(blk)}} going thru all storages is unnecessary. Also I think {{numMachines}} should only include NORMAL and READ_ONLY. So {{createLocatedBlock(..)}} going thru all storages is unnecessary. {code} if (numMachines > 0) { for(DatanodeStorageInfo storage : blocksMap.getStorages(blk)) { {code} btw, which is not related to this topic, I think {{findAndMarkBlockAsCorrupt(..)}} shouldn't support adding blk to the map if the storage is not found. ping [~jingzhao] to check if he has any comment. > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10316) revisit corrupt replicas count
Walter Su created HDFS-10316: Summary: revisit corrupt replicas count Key: HDFS-10316 URL: https://issues.apache.org/jira/browse/HDFS-10316 Project: Hadoop HDFS Issue Type: Bug Reporter: Walter Su A DN has 4 types of storages: 1. NORMAL 2. READ_ONLY 3. FAILED 4. (missing/pruned) blocksMap.numNodes(blk) counts 1,2,3 blocksMap.getStorages(blk) counts 1,2,3 countNodes(blk).corruptReplicas() counts 1,2 corruptReplicas counts 1,2,3,4. Because findAndMarkBlockAsCorrupt(..) supports adding blk to the map even if the storage is not found. The inconsistency causes bugs like HDFS-9958. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds
[ https://issues.apache.org/jira/browse/HDFS-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9744: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.8. Thanks [~templedf] for the review. Thanks [~jojochuang] for the report. And thanks [~linyiqun] for the contribution! > TestDirectoryScanner#testThrottling occasionally time out after 300 seconds > --- > > Key: HDFS-9744 > URL: https://issues.apache.org/jira/browse/HDFS-9744 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Lin Yiqun >Priority: Minor > Labels: test > Fix For: 2.8.0 > > Attachments: HDFS-9744.001.patch > > > I have seen quite a few test failures in TestDirectoryScanner#testThrottling. > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/ > Looking at the log, it does not look like the test got stucked. On my local > machine, this test took 219 seconds. It is likely that this test takes more > than 300 seconds to complete on a busy jenkins slave. I think it is > reasonable to set a longer time out value, or reduce the number of blocks to > reduce the duration of the test. > Error Message > {noformat} > test timed out after 30 milliseconds > {noformat} > Stacktrace > {noformat} > java.lang.Exception: test timed out after 30 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at > org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804) > at > org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423) > at > org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432) > at > org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217) > at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125) > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376) > at > org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108) > at > org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds
[ https://issues.apache.org/jira/browse/HDFS-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247547#comment-15247547 ] Walter Su commented on HDFS-9744: - +1. will commit shortly. > TestDirectoryScanner#testThrottling occasionally time out after 300 seconds > --- > > Key: HDFS-9744 > URL: https://issues.apache.org/jira/browse/HDFS-9744 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: Jenkins >Reporter: Wei-Chiu Chuang >Assignee: Lin Yiqun >Priority: Minor > Labels: test > Attachments: HDFS-9744.001.patch > > > I have seen quite a few test failures in TestDirectoryScanner#testThrottling. > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/ > Looking at the log, it does not look like the test got stucked. On my local > machine, this test took 219 seconds. It is likely that this test takes more > than 300 seconds to complete on a busy jenkins slave. I think it is > reasonable to set a longer time out value, or reduce the number of blocks to > reduce the duration of the test. > Error Message > {noformat} > test timed out after 30 milliseconds > {noformat} > Stacktrace > {noformat} > java.lang.Exception: test timed out after 30 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at > org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804) > at > org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423) > at > org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432) > at > org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217) > at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125) > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376) > at > org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108) > at > org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-10284: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2. Thanks [~vinayrpet], [~brahmareddy] for the review, and thanks [~liuml07] for the contribution! > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch, > HDFS-10284.002.patch, HDFS-10284.003.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247468#comment-15247468 ] Walter Su commented on HDFS-10284: -- +1. will commit shortly. > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch, > HDFS-10284.002.patch, HDFS-10284.003.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10291) TestShortCircuitLocalRead failing
[ https://issues.apache.org/jira/browse/HDFS-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247445#comment-15247445 ] Walter Su commented on HDFS-10291: -- cherry-picked to trunk. > TestShortCircuitLocalRead failing > - > > Key: HDFS-10291 > URL: https://issues.apache.org/jira/browse/HDFS-10291 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 2.8.0 > > Attachments: HDFS-10291-001.patch > > > {{TestShortCircuitLocalRead}} failing as length of read is considered off end > of buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247432#comment-15247432 ] Walter Su commented on HDFS-9958: - Thanks [~kshukla] for the update. I've noticed {{testArrayOutOfBoundsException()}} failed. It tries to simulate {{DatanodeProtocol#reportBadBlocks(..)}} from 3rd DN. But "TEST" is not a real storageID, so the block isn't added to blocksMap. A fix is to get the real storageID from 3rd DN. Could you re-post a patch to fix this? > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-10301: - Assignee: Walter Su Status: Patch Available (was: Open) Upload a patch. Kindly review. > Blocks removed by thousands due to falsely detected zombie storages > --- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Walter Su >Priority: Critical > Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-10301: - Attachment: HDFS-10301.01.patch > Blocks removed by thousands due to falsely detected zombie storages > --- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247129#comment-15247129 ] Walter Su commented on HDFS-10301: -- Oh, I see. In this case, the reports are not splitted. And because the for-loop is outside the lock, the 2 for-loops interleaved. {code} for (int r = 0; r < reports.length; r++) { {code} > Blocks removed by thousands due to falsely detected zombie storages > --- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9684) DataNode stopped sending heartbeat after getting OutOfMemoryError form DataTransfer thread.
[ https://issues.apache.org/jira/browse/HDFS-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247037#comment-15247037 ] Walter Su commented on HDFS-9684: - My previous comment is incorrect. It turns out that the MR tasks swallowed all the virtual memories. > DataNode stopped sending heartbeat after getting OutOfMemoryError form > DataTransfer thread. > --- > > Key: HDFS-9684 > URL: https://issues.apache.org/jira/browse/HDFS-9684 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Blocker > Attachments: HDFS-9684.01.patch > > > {noformat} > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:714) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1999) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:2008) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:857) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:671) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246996#comment-15246996 ] Walter Su commented on HDFS-10301: -- 1. IPC reader is single-thread by default. If it's multi-threaded, The order of putting rpc requests into {{callQueue}} is unspecified. 1. IPC {{callQueue}} is fifo. 2. IPC Handler is multi-threaded. If 2 handlers are both waiting the fsn lock, the entry order depends on the fairness of the lock. bq. When constructed as fair, threads contend for entry using an *approximately* arrival-order policy. When the currently held lock is released either the longest-waiting single writer thread will be assigned the write lock... (quore from https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html) I think if DN can't get acked from NN, it shouldn't assume the arrival/processing order(esp when reestablish a connection). Well, I'm still curious about how the interleave happened. Any thoughts? > Blocks removed by thousands due to falsely detected zombie storages > --- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-10275: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.8, branch-2.7. Thanks [~linyiqun] for the contribution! > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.7.3 > > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245569#comment-15245569 ] Walter Su commented on HDFS-10275: -- sorry I didn't see that. The patch LGTM. +1. > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
[ https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245394#comment-15245394 ] Walter Su commented on HDFS-10275: -- Good analysis! I think a better way to do this is to use a real FSDataset? Just remove {{SimulatedFSDataset.setFactory(conf);}}. What do you think ? > TestDataNodeMetrics failing intermittently due to TotalWriteTime counted > incorrectly > > > Key: HDFS-10275 > URL: https://issues.apache.org/jira/browse/HDFS-10275 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-10275.001.patch > > > The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info > show these: > {code} > Results : > Failed tests: > > TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 > expected: but was: > Tests in error: > TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for > Min... > TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting > for ... > TestHFlush.testHFlushInterrupted ? IO The stream is closed > {code} > In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I > looked into the code and found the real reason is that the metric of > {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And > the this leads to retry operations till timeout. > I debug the test in my local. I found the most suspect reason which cause > {{TotalWriteTime}} metric count always be 0 is that we using the > {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it > will use the inner class's method {{SimulatedOutputStream#write}} to count > the write time and the method of this class just updates the {{length}} and > throws its data away. > {code} > @Override > public void write(byte[] b, > int off, > int len) throws IOException { > length += len; > } > {code} > So the writing operation hardly not costs any time. So we should use a real > way to create file instead of simulated way. I have tested in my local that > the test is passed just one time when I delete the simulated way, while the > test retries many times to count write time in old way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245332#comment-15245332 ] Walter Su commented on HDFS-10284: -- bq. I think it's due to mocking fsn while being concurrently accessed by another thread (smmthread). Good point. bq. Stubbing or verification of a shared mock from different threads is NOT the proper way of testing because it will always lead to intermittent behavior. (quote from https://github.com/mockito/mockito/wiki/FAQ) bq. feel free to use mocks concurrently, however prepare (stub) them before the concurrency starts. (quote from https://code.google.com/archive/p/mockito/issues/301) So I think we should move the stubbing {code} doReturn(true).when(fsn).inTransitionToActive(); {code} before test starts, at least before {{smmthread}} is started. The patch looks really good. bq. I found the BlockManagerSafeMode$SafeModeMonitor#canLeave is not checking the namesystem#inTransitionToActive() It make sense. Would you create another jira for this? > o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode > fails intermittently > - > > Key: HDFS-10284 > URL: https://issues.apache.org/jira/browse/HDFS-10284 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch > > > *Stacktrace* > {code} > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169) > {code} > Sample failing pre-commit UT: > https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10291) TestShortCircuitLocalRead failing
[ https://issues.apache.org/jira/browse/HDFS-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245154#comment-15245154 ] Walter Su commented on HDFS-10291: -- +1. > TestShortCircuitLocalRead failing > - > > Key: HDFS-10291 > URL: https://issues.apache.org/jira/browse/HDFS-10291 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HDFS-10291-001.patch > > > {{TestShortCircuitLocalRead}} failing as length of read is considered off end > of buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete
[ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9412: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.8. Thanks [~He Tianyi] for contribution! > getBlocks occupies FSLock and takes too long to complete > > > Key: HDFS-9412 > URL: https://issues.apache.org/jira/browse/HDFS-9412 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: He Tianyi >Assignee: He Tianyi > Fix For: 2.8.0 > > Attachments: HDFS-9412..patch, HDFS-9412.0001.patch, > HDFS-9412.0002.patch > > > {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a > long time to complete (probably several seconds, if number of blocks are too > much). > During this period, other threads attempting to acquire write lock will wait. > In an extreme case, RPC handlers are occupied by one reader thread calling > {{getBlocks}} and all other threads waiting for write lock, rpc server acts > like hung. Unfortunately, this tends to happen in heavy loaded cluster, since > read operations come and go fast (they do not need to wait), leaving write > operations waiting. > Looks like we can optimize this thing like DN block report did in past, by > splitting the operation into smaller sub operations, and let other threads do > their work between each sub operation. The whole result is returned at once, > though (one thing different from DN block report). > I am not sure whether this will work. Any better idea? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete
[ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240996#comment-15240996 ] Walter Su commented on HDFS-9412: - {{TestBalancer}} passes locally. +1 for the last patch. > getBlocks occupies FSLock and takes too long to complete > > > Key: HDFS-9412 > URL: https://issues.apache.org/jira/browse/HDFS-9412 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: He Tianyi >Assignee: He Tianyi > Attachments: HDFS-9412..patch, HDFS-9412.0001.patch, > HDFS-9412.0002.patch > > > {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a > long time to complete (probably several seconds, if number of blocks are too > much). > During this period, other threads attempting to acquire write lock will wait. > In an extreme case, RPC handlers are occupied by one reader thread calling > {{getBlocks}} and all other threads waiting for write lock, rpc server acts > like hung. Unfortunately, this tends to happen in heavy loaded cluster, since > read operations come and go fast (they do not need to wait), leaving write > operations waiting. > Looks like we can optimize this thing like DN block report did in past, by > splitting the operation into smaller sub operations, and let other threads do > their work between each sub operation. The whole result is returned at once, > though (one thing different from DN block report). > I am not sure whether this will work. Any better idea? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete
[ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240455#comment-15240455 ] Walter Su commented on HDFS-9412: - Thank you for updating. The test {{TestGetBlocks}} failed. Do you mind changing the test accordingly? And fix the checkstyle issue as well. > getBlocks occupies FSLock and takes too long to complete > > > Key: HDFS-9412 > URL: https://issues.apache.org/jira/browse/HDFS-9412 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: He Tianyi >Assignee: He Tianyi > Attachments: HDFS-9412..patch, HDFS-9412.0001.patch > > > {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a > long time to complete (probably several seconds, if number of blocks are too > much). > During this period, other threads attempting to acquire write lock will wait. > In an extreme case, RPC handlers are occupied by one reader thread calling > {{getBlocks}} and all other threads waiting for write lock, rpc server acts > like hung. Unfortunately, this tends to happen in heavy loaded cluster, since > read operations come and go fast (they do not need to wait), leaving write > operations waiting. > Looks like we can optimize this thing like DN block report did in past, by > splitting the operation into smaller sub operations, and let other threads do > their work between each sub operation. The whole result is returned at once, > though (one thing different from DN block report). > I am not sure whether this will work. Any better idea? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete
[ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239103#comment-15239103 ] Walter Su commented on HDFS-9412: - One thread holding a readLock too long is very like holding a writeLock. We should avoid that. And after HDFS-8824, the small blocks are unused anyway, so there's no point to send them to balancer. Hi, [~He Tianyi], Do you mind rebase the patch? > getBlocks occupies FSLock and takes too long to complete > > > Key: HDFS-9412 > URL: https://issues.apache.org/jira/browse/HDFS-9412 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: He Tianyi >Assignee: He Tianyi > Attachments: HDFS-9412..patch > > > {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a > long time to complete (probably several seconds, if number of blocks are too > much). > During this period, other threads attempting to acquire write lock will wait. > In an extreme case, RPC handlers are occupied by one reader thread calling > {{getBlocks}} and all other threads waiting for write lock, rpc server acts > like hung. Unfortunately, this tends to happen in heavy loaded cluster, since > read operations come and go fast (they do not need to wait), leaving write > operations waiting. > Looks like we can optimize this thing like DN block report did in past, by > splitting the operation into smaller sub operations, and let other threads do > their work between each sub operation. The whole result is returned at once, > though (one thing different from DN block report). > I am not sure whether this will work. Any better idea? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9772) TestBlockReplacement#testThrottler doesn't work as expected
[ https://issues.apache.org/jira/browse/HDFS-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9772: Labels: test (was: ) Priority: Minor (was: Major) Issue Type: Test (was: Bug) > TestBlockReplacement#testThrottler doesn't work as expected > --- > > Key: HDFS-9772 > URL: https://issues.apache.org/jira/browse/HDFS-9772 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Labels: test > Fix For: 2.7.3 > > Attachments: HDFS.001.patch > > > In {{TestBlockReplacement#testThrottler}}, it use a fault variable to > calculate the ended bandwidth. It use variable {{totalBytes}} rathe than > final variable {{TOTAL_BYTES}}. And the value of {{TOTAL_BYTES}} is set to > {{bytesToSend}}. The {{totalBytes}} looks no meaning here and this will make > {{totalBytes*1000/(end-start)}} always be 0 and the comparison always true. > The method code is below: > {code} > @Test > public void testThrottler() throws IOException { > Configuration conf = new HdfsConfiguration(); > FileSystem.setDefaultUri(conf, "hdfs://localhost:0"); > long bandwidthPerSec = 1024*1024L; > final long TOTAL_BYTES =6*bandwidthPerSec; > long bytesToSend = TOTAL_BYTES; > long start = Time.monotonicNow(); > DataTransferThrottler throttler = new > DataTransferThrottler(bandwidthPerSec); > long totalBytes = 0L; > long bytesSent = 1024*512L; // 0.5MB > throttler.throttle(bytesSent); > bytesToSend -= bytesSent; > bytesSent = 1024*768L; // 0.75MB > throttler.throttle(bytesSent); > bytesToSend -= bytesSent; > try { > Thread.sleep(1000); > } catch (InterruptedException ignored) {} > throttler.throttle(bytesToSend); > long end = Time.monotonicNow(); > assertTrue(totalBytes*1000/(end-start)<=bandwidthPerSec); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9772) TestBlockReplacement#testThrottler doesn't work as expected
[ https://issues.apache.org/jira/browse/HDFS-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9772: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.8, branch-2.7. Thanks [~linyiqun] for the contribution. > TestBlockReplacement#testThrottler doesn't work as expected > --- > > Key: HDFS-9772 > URL: https://issues.apache.org/jira/browse/HDFS-9772 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.7.3 > > Attachments: HDFS.001.patch > > > In {{TestBlockReplacement#testThrottler}}, it use a fault variable to > calculate the ended bandwidth. It use variable {{totalBytes}} rathe than > final variable {{TOTAL_BYTES}}. And the value of {{TOTAL_BYTES}} is set to > {{bytesToSend}}. The {{totalBytes}} looks no meaning here and this will make > {{totalBytes*1000/(end-start)}} always be 0 and the comparison always true. > The method code is below: > {code} > @Test > public void testThrottler() throws IOException { > Configuration conf = new HdfsConfiguration(); > FileSystem.setDefaultUri(conf, "hdfs://localhost:0"); > long bandwidthPerSec = 1024*1024L; > final long TOTAL_BYTES =6*bandwidthPerSec; > long bytesToSend = TOTAL_BYTES; > long start = Time.monotonicNow(); > DataTransferThrottler throttler = new > DataTransferThrottler(bandwidthPerSec); > long totalBytes = 0L; > long bytesSent = 1024*512L; // 0.5MB > throttler.throttle(bytesSent); > bytesToSend -= bytesSent; > bytesSent = 1024*768L; // 0.75MB > throttler.throttle(bytesSent); > bytesToSend -= bytesSent; > try { > Thread.sleep(1000); > } catch (InterruptedException ignored) {} > throttler.throttle(bytesToSend); > long end = Time.monotonicNow(); > assertTrue(totalBytes*1000/(end-start)<=bandwidthPerSec); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9772) TestBlockReplacement#testThrottler doesn't work as expected
[ https://issues.apache.org/jira/browse/HDFS-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9772: Summary: TestBlockReplacement#testThrottler doesn't work as expected (was: TestBlockReplacement#testThrottler use falut variable to calculate bandwidth) > TestBlockReplacement#testThrottler doesn't work as expected > --- > > Key: HDFS-9772 > URL: https://issues.apache.org/jira/browse/HDFS-9772 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS.001.patch > > > In {{TestBlockReplacement#testThrottler}}, it use a fault variable to > calculate the ended bandwidth. It use variable {{totalBytes}} rathe than > final variable {{TOTAL_BYTES}}. And the value of {{TOTAL_BYTES}} is set to > {{bytesToSend}}. The {{totalBytes}} looks no meaning here and this will make > {{totalBytes*1000/(end-start)}} always be 0 and the comparison always true. > The method code is below: > {code} > @Test > public void testThrottler() throws IOException { > Configuration conf = new HdfsConfiguration(); > FileSystem.setDefaultUri(conf, "hdfs://localhost:0"); > long bandwidthPerSec = 1024*1024L; > final long TOTAL_BYTES =6*bandwidthPerSec; > long bytesToSend = TOTAL_BYTES; > long start = Time.monotonicNow(); > DataTransferThrottler throttler = new > DataTransferThrottler(bandwidthPerSec); > long totalBytes = 0L; > long bytesSent = 1024*512L; // 0.5MB > throttler.throttle(bytesSent); > bytesToSend -= bytesSent; > bytesSent = 1024*768L; // 0.75MB > throttler.throttle(bytesSent); > bytesToSend -= bytesSent; > try { > Thread.sleep(1000); > } catch (InterruptedException ignored) {} > throttler.throttle(bytesToSend); > long end = Time.monotonicNow(); > assertTrue(totalBytes*1000/(end-start)<=bandwidthPerSec); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9772) TestBlockReplacement#testThrottler use falut variable to calculate bandwidth
[ https://issues.apache.org/jira/browse/HDFS-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238984#comment-15238984 ] Walter Su commented on HDFS-9772: - +1. > TestBlockReplacement#testThrottler use falut variable to calculate bandwidth > > > Key: HDFS-9772 > URL: https://issues.apache.org/jira/browse/HDFS-9772 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS.001.patch > > > In {{TestBlockReplacement#testThrottler}}, it use a fault variable to > calculate the ended bandwidth. It use variable {{totalBytes}} rathe than > final variable {{TOTAL_BYTES}}. And the value of {{TOTAL_BYTES}} is set to > {{bytesToSend}}. The {{totalBytes}} looks no meaning here and this will make > {{totalBytes*1000/(end-start)}} always be 0 and the comparison always true. > The method code is below: > {code} > @Test > public void testThrottler() throws IOException { > Configuration conf = new HdfsConfiguration(); > FileSystem.setDefaultUri(conf, "hdfs://localhost:0"); > long bandwidthPerSec = 1024*1024L; > final long TOTAL_BYTES =6*bandwidthPerSec; > long bytesToSend = TOTAL_BYTES; > long start = Time.monotonicNow(); > DataTransferThrottler throttler = new > DataTransferThrottler(bandwidthPerSec); > long totalBytes = 0L; > long bytesSent = 1024*512L; // 0.5MB > throttler.throttle(bytesSent); > bytesToSend -= bytesSent; > bytesSent = 1024*768L; // 0.75MB > throttler.throttle(bytesSent); > bytesToSend -= bytesSent; > try { > Thread.sleep(1000); > } catch (InterruptedException ignored) {} > throttler.throttle(bytesToSend); > long end = Time.monotonicNow(); > assertTrue(totalBytes*1000/(end-start)<=bandwidthPerSec); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9825) Balancer should not terminate if only one of the namenodes has error
[ https://issues.apache.org/jira/browse/HDFS-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238891#comment-15238891 ] Walter Su commented on HDFS-9825: - The patch looks pretty good. Could you rebase it? And one question: {code} for(int iteration = 0;; iteration++) { final Mapresults = new LinkedHashMap<>(); for(NameNodeConnector nnc : connectors) { {code} Does it need to retry the succeeded/failed namenodes in each iteration? Since One block pool may takes much longer than the others. > Balancer should not terminate if only one of the namenodes has error > > > Key: HDFS-9825 > URL: https://issues.apache.org/jira/browse/HDFS-9825 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h9825_20160217.patch, h9825_20160218.patch, > h9825_20160218b.patch > > > Currently, the Balancer terminates if only one of the namenodes has error in > federation setting. Instead, it should continue balancing the cluster with > the remaining namenodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9476) TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail
[ https://issues.apache.org/jira/browse/HDFS-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238658#comment-15238658 ] Walter Su commented on HDFS-9476: - +1. > TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail > - > > Key: HDFS-9476 > URL: https://issues.apache.org/jira/browse/HDFS-9476 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Akira AJISAKA > Attachments: HDFS-9476.01.patch > > > This test occasionally fail. For example, the most recent one is: > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2587/ > Error Message > {noformat} > Cannot obtain block length for > LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020; > getBlockSize()=1024; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]} > {noformat} > Stacktrace > {noformat} > java.io.IOException: Cannot obtain block length for > LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020; > getBlockSize()=1024; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]} > at > org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:399) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:343) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:275) > at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:265) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1046) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1011) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.dfsOpenFileWithRetries(TestDFSUpgradeFromImage.java:177) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyDir(TestDFSUpgradeFromImage.java:213) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyFileSystem(TestDFSUpgradeFromImage.java:228) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.upgradeAndVerify(TestDFSUpgradeFromImage.java:600) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.testUpgradeFromRel1BBWImage(TestDFSUpgradeFromImage.java:622) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9826) Erasure Coding: Postpone the recovery work for a configurable time period
[ https://issues.apache.org/jira/browse/HDFS-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238589#comment-15238589 ] Walter Su commented on HDFS-9826: - Good thought. And I think current implementation {{LowRedundancyBlocks}} uses multi-level priority queue. Blocks of highest risk are always processed first. It achieves the same goal as you proposed. Don't you think? > Erasure Coding: Postpone the recovery work for a configurable time period > -- > > Key: HDFS-9826 > URL: https://issues.apache.org/jira/browse/HDFS-9826 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > Attachments: HDFS-9826-001.patch, HDFS-9826-002.patch > > > Currently NameNode prepares recovering when finding an under replicated > block group. This is inefficient and reduces resources for other operations. > It would be better to postpone the recovery work for a period of time if only > one internal block is corrupted considering points shown by papers such as > \[1\]\[2\]: > 1.Transient errors in which no data are lost account for more than 90% of > data center failures, owing to network partitions, software problems, or > non-disk hardware faults. > 2.Although erasure codes tolerate multiple simultaneous failures, single > failures represent 99.75% of recoveries. > Different clusters may have different status, so we should allow user to > configure the time for postponing the recoveries. Proper configuration will > reduce a large proportion of unnecessary recoveries. When finding multiple > internal blocks corrupted in a block group, we prepare the recovery work > immediately because it’s very rare and we don’t want to increase the risk of > losing data. > [1] Availability in globally distributed storage systems > http://static.usenix.org/events/osdi10/tech/full_papers/Ford.pdf > [2] Rethinking erasure codes for cloud file systems: minimizing I/O for > recovery and degraded reads > http://static.usenix.org/events/fast/tech/full_papers/Khan.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states
[ https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233561#comment-15233561 ] Walter Su commented on HDFS-9918: - +1. Thanks, [~rakesh_r]. > Erasure Coding: Sort located striped blocks based on decommissioned states > -- > > Key: HDFS-9918 > URL: https://issues.apache.org/jira/browse/HDFS-9918 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, > HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, > HDFS-9918-006.patch, HDFS-9918-007.patch, HDFS-9918-008.patch, > HDFS-9918-009.patch, HDFS-9918-010.patch, HDFS-9918-011.patch, > HDFS-9918-012.patch, HDFS-9918-013.patch > > > This jira is a follow-on work of HDFS-8786, where we do decommissioning of > datanodes having striped blocks. > Now, after decommissioning it requires to change the ordering of the storage > list so that the decommissioned datanodes should only be last node in list. > For example, assume we have a block group with storage list:- > d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 > mapping to indices > 0, 1, 2, 3, 4, 5, 6, 7, 8, 2 > Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a > decommissioning node then should switch d2 and d9 in the storage list. > Thanks [~jingzhao] for the > [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232103#comment-15232103 ] Walter Su commented on HDFS-7661: - Great design/discussion. Since we come back to discuss the use cases, and "effort vs benefit“. I'm thinking if the use cases are rare, we can provide a simpler workaround. We provide: 1. a fake "flush", which only flushes the full stripe, and doesn't flush the last partial stripe. It won't make sure every byte is safe, but it helps recovery logic to recover more data. 2. a real "flush". The easiest way to do this is to start a new block group. It makes sure the data written before the "flush" is safe and visible. It saves user the trouble of closing and appending the same file. Since we support variable-length blocks, it's totally doable. I need to mention that the implementation of appending striped file also utilizes variable-length blocks. The trouble is creating too many block groups. But if there's too many small blocks, and if they are adjacent in the same file, we can concatenate them to a bigger block, although striped blocks concatenation seems not easy either. > [umbrella] support hflush and hsync for erasure coded files > --- > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-v20160323.pdf, > HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states
[ https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231512#comment-15231512 ] Walter Su commented on HDFS-9918: - The patch looks pretty good. Thanks [~rakesh_r]. tiny suggestions: 1. Since we don't sort by distance, the logic for resolving client node can moved inside {{sortLocatedBlock}}. 2. It would be better if we can test the locToIndex mapping is correct after sorting. > Erasure Coding: Sort located striped blocks based on decommissioned states > -- > > Key: HDFS-9918 > URL: https://issues.apache.org/jira/browse/HDFS-9918 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, > HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, > HDFS-9918-006.patch, HDFS-9918-007.patch, HDFS-9918-008.patch, > HDFS-9918-009.patch, HDFS-9918-010.patch, HDFS-9918-011.patch > > > This jira is a follow-on work of HDFS-8786, where we do decommissioning of > datanodes having striped blocks. > Now, after decommissioning it requires to change the ordering of the storage > list so that the decommissioned datanodes should only be last node in list. > For example, assume we have a block group with storage list:- > d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 > mapping to indices > 0, 1, 2, 3, 4, 5, 6, 7, 8, 2 > Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a > decommissioning node then should switch d2 and d9 in the storage list. > Thanks [~jingzhao] for the > [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states
[ https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219241#comment-15219241 ] Walter Su commented on HDFS-9918: - The optimization works for {{BlockInfoStriped}}. A missing block occupies a slot. Like {noformat} 0, null, 2, 3, 4, 5, 6, 7, 8, 1, 0', 1', 7', 8' {noformat} In LocatedStripedBlock, the data is like {noformat} 0, 2, 3, 4, 5, 6, 7, 8, 1, 0', 1', 7', 8' {noformat} That's why I don't see the point maintain the order (of the in-service ones). Unless we change {{createLocatedBlock(..)}}. But optimizing {{LocatedStripedBlock}} to save some network traffic seems trival. > Erasure Coding: Sort located striped blocks based on decommissioned states > -- > > Key: HDFS-9918 > URL: https://issues.apache.org/jira/browse/HDFS-9918 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, > HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, > HDFS-9918-006.patch, HDFS-9918-007.patch, HDFS-9918-008.patch > > > This jira is a follow-on work of HDFS-8786, where we do decommissioning of > datanodes having striped blocks. > Now, after decommissioning it requires to change the ordering of the storage > list so that the decommissioned datanodes should only be last node in list. > For example, assume we have a block group with storage list:- > d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 > mapping to indices > 0, 1, 2, 3, 4, 5, 6, 7, 8, 2 > Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a > decommissioning node then should switch d2 and d9 in the storage list. > Thanks [~jingzhao] for the > [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states
[ https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217456#comment-15217456 ] Walter Su commented on HDFS-9918: - I see the difference. To achieve your goal, we need a new comparator, and zip 3 arrays into 1 array. But I don't see the point of preserving the order of blkIndices. bq. how about going ahead with the previous approach? It's just I prefer comparator paradigm. It's easier to understand and modify. I'm ok with your previous approach.:) > Erasure Coding: Sort located striped blocks based on decommissioned states > -- > > Key: HDFS-9918 > URL: https://issues.apache.org/jira/browse/HDFS-9918 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, > HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, > HDFS-9918-006.patch, HDFS-9918-007.patch > > > This jira is a follow-on work of HDFS-8786, where we do decommissioning of > datanodes having striped blocks. > Now, after decommissioning it requires to change the ordering of the storage > list so that the decommissioned datanodes should only be last node in list. > For example, assume we have a block group with storage list:- > d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 > mapping to indices > 0, 1, 2, 3, 4, 5, 6, 7, 8, 2 > Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a > decommissioning node then should switch d2 and d9 in the storage list. > Thanks [~jingzhao] for the > [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states
[ https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217217#comment-15217217 ] Walter Su commented on HDFS-9918: - bq. 1. Index in the logical block group. 2.Decomm status 3.Distance to the targethost Good summary. And, If the sorting priority is 2,1, we can reuse {{DecomStaleComparator}}. {code} // Move decommissioned/stale datanodes to the bottom Arrays.sort(di, comparator); {code} Because according to the javadoc of {{Arays.sort(..)}} , {noformat} This sort is guaranteed to be stable: equal elements will not be reordered as a result of the sort. {noformat} Also we can write a new comparator. But I think client side can handle the randomized ordering if there's no duplication? > Erasure Coding: Sort located striped blocks based on decommissioned states > -- > > Key: HDFS-9918 > URL: https://issues.apache.org/jira/browse/HDFS-9918 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, > HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, > HDFS-9918-006.patch, HDFS-9918-007.patch > > > This jira is a follow-on work of HDFS-8786, where we do decommissioning of > datanodes having striped blocks. > Now, after decommissioning it requires to change the ordering of the storage > list so that the decommissioned datanodes should only be last node in list. > For example, assume we have a block group with storage list:- > d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 > mapping to indices > 0, 1, 2, 3, 4, 5, 6, 7, 8, 2 > Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a > decommissioning node then should switch d2 and d9 in the storage list. > Thanks [~jingzhao] for the > [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states
[ https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215796#comment-15215796 ] Walter Su commented on HDFS-9918: - We also need to sort locations by distance. It's unlikely but from 2 duplicated in-service blocks we want to choose the nearer one. The logic is like {{sortLocatedBlocks(..)}}, how about reuse it? We move {{BlockIndex}} & {{BlockToken}} according to {{Location}}. Like {code} //public void sortLocatedBlocks(final String targethost, for (LocatedBlock b : locatedblocks) { DatanodeInfo[] di = b.getLocations(); + HashMaplocToIndex = null; + HashMap locToToken = null; + if(b instanceof LocatedStripedBlock){ +locToIndex = new HashMap<>(); +locToToken = new HashMap<>(); +LocatedStripedBlock lb = (LocatedStripedBlock) b; +for(int i=0; i Erasure Coding: Sort located striped blocks based on decommissioned states > -- > > Key: HDFS-9918 > URL: https://issues.apache.org/jira/browse/HDFS-9918 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, > HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, > HDFS-9918-006.patch > > > This jira is a follow-on work of HDFS-8786, where we do decommissioning of > datanodes having striped blocks. > Now, after decommissioning it requires to change the ordering of the storage > list so that the decommissioned datanodes should only be last node in list. > For example, assume we have a block group with storage list:- > d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 > mapping to indices > 0, 1, 2, 3, 4, 5, 6, 7, 8, 2 > Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a > decommissioning node then should switch d2 and d9 in the storage list. > Thanks [~jingzhao] for the > [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-10182: - Fix Version/s: 2.6.5 committed to branch-2.6 > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Fix For: 2.7.3, 2.6.5 > > Attachments: HDFS-10182-001.patch, HDFS-10182-branch26.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the > passed-in buf from the caller is passed to another thread to fill. If the > first attempt is timed out, the second attempt would be issued with another > temp ByteBuffer. Now suppose the second attempt wins and the first attempt > is blocked somewhere in the IO path. The second attempt's result would be > copied to the buf provided by the caller and then caller would think the > pread is all set. Later the caller might use the buf to do something else > (for e.g. read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-10182: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.8, branch-2.7. Hi, [~sinago]. Would you mind upload a patch against branch-2.6? > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Fix For: 2.7.3 > > Attachments: HDFS-10182-001.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the > passed-in buf from the caller is passed to another thread to fill. If the > first attempt is timed out, the second attempt would be issued with another > temp ByteBuffer. Now suppose the second attempt wins and the first attempt > is blocked somewhere in the IO path. The second attempt's result would be > copied to the buf provided by the caller and then caller would think the > pread is all set. Later the caller might use the buf to do something else > (for e.g. read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213926#comment-15213926 ] Walter Su commented on HDFS-10182: -- +1. I'll commit it shortly. > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Attachments: HDFS-10182-001.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the > passed-in buf from the caller is passed to another thread to fill. If the > first attempt is timed out, the second attempt would be issued with another > temp ByteBuffer. Now suppose the second attempt wins and the first attempt > is blocked somewhere in the IO path. The second attempt's result would be > copied to the buf provided by the caller and then caller would think the > pread is all set. Later the caller might use the buf to do something else > (for e.g. read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9952) Expose FSNamesystem lock wait time as metrics
[ https://issues.apache.org/jira/browse/HDFS-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213915#comment-15213915 ] Walter Su commented on HDFS-9952: - Thanks [~vinayrpet] for updating. Just one minor suggestion: We should take care of {{readUnlock()}} being called even if currentThread doesn't have it actually. Just like {{writeUnlock()}} did. Better safe than sorry. Otherwise, The patch looks pretty good to me. +1 once addressed. It would be great if [~daryn] can also take a look. > Expose FSNamesystem lock wait time as metrics > - > > Key: HDFS-9952 > URL: https://issues.apache.org/jira/browse/HDFS-9952 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-9952-01.patch, HDFS-9952-02.patch, > HDFS-9952-03.patch > > > Expose FSNameSystem's readlock() and writeLock() wait time as metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201350#comment-15201350 ] Walter Su commented on HDFS-10182: -- And because {{cancelAll(futures);}} doesn't interrupt the first attempt, and also don't wait for it to finish. Thanks [~sinago] for reporting. The patch LGTM. > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Attachments: HDFS-10182-001.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the > passed-in buf from the caller is passed to another thread to fill. If the > first attempt is timed out, the second attempt would be issued with another > temp ByteBuffer. Now suppose the second attempt wins and the first attempt > is blocked somewhere in the IO path. The second attempt's result would be > copied to the buf provided by the caller and then caller would think the > pread is all set. Later the caller might use the buf to do something else > (for e.g. read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9952) Expose FSNamesystem lock wait time as metrics
[ https://issues.apache.org/jira/browse/HDFS-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198770#comment-15198770 ] Walter Su commented on HDFS-9952: - bq. MutableRate#add is synchronized in an extremely critical code path which will destroy concurrent read ops. We have many nested locks, for example: {noformat} getBlockLocations(..) --> readLock() --> isInSafeMode() --> synchronized isInManualOrResourceLowSafeMode() listCorruptFileBlocks(..) --> readLock() --> blockManager.getCorruptReplicaBlockIterator() --> synchronized Iterator iterator(int level) {noformat} It's pretty difficult not to use any nested locks. I think if the time frame of (holding) inside (write)lock is short, comparing to that of holding outside lock, it's probably that N threads pass through the inside lock at different time. If there's little contention for inside lock, it hardly increase the contention for ouside lock. It's just every thread holds the outside lock a little longer because of the additional logic. In this case, the time frame of holding MutableRate lock is short. What it does inside the lock is simple algebraic calculation. But assume fsWriteLock is just released, and many threads are waiting at the entrance of fsReadLock. If MutableRate lock is the first thing inside the door of fsReadLock, then there's lots contention for MutableRate lock once those threads get inside the door at the same time. What if we save the value at ThreadLocal, and after we release the fsReadLock, we add it to metrics? ThreadLocal is lock free. I'm not expert at lock, just what I thought. > Expose FSNamesystem lock wait time as metrics > - > > Key: HDFS-9952 > URL: https://issues.apache.org/jira/browse/HDFS-9952 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-9952-01.patch, HDFS-9952-02.patch > > > Expose FSNameSystem's readlock() and writeLock() wait time as metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9684) DataNode stopped sending heartbeat after getting OutOfMemoryError form DataTransfer thread.
[ https://issues.apache.org/jira/browse/HDFS-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194914#comment-15194914 ] Walter Su commented on HDFS-9684: - I have seen a case DN got command from NN to transfer huge numbers of blocks. There's 7000+ threads at its peak. I don't advocate recover from {{OutOfMemoryError}}. But it's our responsibility not to create too much threads at first place. > DataNode stopped sending heartbeat after getting OutOfMemoryError form > DataTransfer thread. > --- > > Key: HDFS-9684 > URL: https://issues.apache.org/jira/browse/HDFS-9684 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Blocker > Attachments: HDFS-9684.01.patch > > > {noformat} > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:714) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1999) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:2008) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:857) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:671) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8211) DataNode UUID is always null in the JMX counter
[ https://issues.apache.org/jira/browse/HDFS-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8211: Priority: Major (was: Minor) Changes Priority to Major. As [~qwertymaniac] pointed out, this patch unintentionally fixes an issue that DN may regenerate its UUIDs unintentionally (See HDFS-9949). I think we should backport this to branch-2.7 ?? > DataNode UUID is always null in the JMX counter > --- > > Key: HDFS-8211 > URL: https://issues.apache.org/jira/browse/HDFS-8211 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 2.8.0 > > Attachments: hdfs-8211.001.patch, hdfs-8211.002.patch > > > The DataNode JMX counters are tagged with DataNode UUID, but it always gets a > null value instead of the UUID. > {code} > Hadoop:service=DataNode,name=FSDatasetState*-null*. > {code} > This null is supposed be the datanode UUID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time
[ https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186974#comment-15186974 ] Walter Su commented on HDFS-9822: - bq. I am still a little confused how this error happens. Me too. I don't think we get the right cause. bq. But if there are same block group entry exists in different queue.. No 2 queues can have same BG. The update(..) logic is correct. No queue can has 2 same items. The queue is a HashSet. My pure guess is that it's caused by race condition. We have a guard at {code} // BlockManager#scheduleReconstruction(..) if (block.isStriped()) { if (pendingNum > 0) { // Wait the previous reconstruction to finish. return null; } {code} which is inside namesystem lock. But before {{ReplicationMonitor}} thread goes to {{validateReconstructionWork(..)}}, it loses the lock. So it's possible the junit thread get the lock. If they both passes the guard, eventually one of them will failed the assert. > Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped > block at the same time > > > Key: HDFS-9822 > URL: https://issues.apache.org/jira/browse/HDFS-9822 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Reporter: Tsz Wo Nicholas Sze >Assignee: Rakesh R > Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch > > > Found the following AssertionError in > https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/ > {code} > AssertionError: Should wait the previous reconstruction to finish > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100) > at java.lang.Thread.run(Thread.java:745) > at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126) > at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184856#comment-15184856 ] Walter Su commented on HDFS-7866: - Sorry for the confusion, now the javadoc looks verbose. But thanks for trying. We can do some improvements in the follow-on JIRA. The last patch LGTM too. Thanks again, [~lirui]. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184267#comment-15184267 ] Walter Su commented on HDFS-7866: - 1. Not only javadoc, what I mean was separating the logic of manipulation of the 12-bits. 2 sets of set/get methods for them. Now the cuts are both (1,11), it's just the meanings of each part are different. But what if in the future the cut is different? You have the same concern about unifying set method: bq. 3. The biggest concern is INodeFile constructor – related to that, the toLong method. Currently when isStriped, we just interpret replication as the EC policy ID. This looks pretty hacky. But it looks pretty tricky to fix. By the way, if we're planning use unified cut (1,11) for both of them, why bother having one enum item BLOCK_LAYOUT_AND_REDUNDANCY(12-bits) and do the bit masking by myself, insead of 2 enum items as before which does the bit masking for us. Some nits: 1. {{LAYOUT_BIT_WIDTH}}, {{MAX_REDUNDANCY}} can be private inside {{HeaderFormat}}. 2. {code} /** * @return The ID of the erasure coding policy on the file. -1 represents no * EC policy. */ @VisibleForTesting @Override public byte getErasureCodingPolicyID() { if (isStriped()) { return (byte) HeaderFormat.getReplication(header); } return -1; } {code} {code} // check if the file has an EC policy ErasureCodingPolicy ecPolicy = FSDirErasureCodingOp. getErasureCodingPolicy(fsd.getFSNamesystem(), existing); if (ecPolicy != null) { replication = ecPolicy.getId(); } {code} We are sure policyID is strictly <=7-bits right? Casting an ID with value >=128 to byte becomes negative, then the logic gets wild. Vice versa. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183118#comment-15183118 ] Walter Su commented on HDFS-7866: - What do you think let it diverge instead of forcing unification? It takes more time to understand the new format from the code. I think add some javadoc would be nice. How about like this: {noformat} /** * Bit format: * [4-bit storagePolicyID][12-bit BLOCK_LAYOUT_AND_REDUNDANCY] * [48-bit preferredBlockSize] * * BLOCK_LAYOUT_AND_REDUNDANCY format for replicated block: * 0 [11-bit replication] * * BLOCK_LAYOUT_AND_REDUNDANCY format for striped block: * 1 [11-bit ErasureCodingPolicy ID] * */ {noformat} I think getErasureCodingPolicyID() don't have to re-use getReplication(long header). Even though now they are both 11-bits. In the future, We might keep split 11-bit ec policy ID. The 2 methods keep diverging. I guess something like, {noformat} /** * Bit format: * [4-bit storagePolicyID][12-bit BLOCK_LAYOUT_AND_REDUNDANCY] * [48-bit preferredBlockSize] * * BLOCK_LAYOUT_AND_REDUNDANCY format for non-ec block: * 0 [11-bit replication] * * BLOCK_LAYOUT_AND_REDUNDANCY format for ec striped block: * 10 [4-bit replication][6-bit ErasureCodingPolicy ID] * * BLOCK_LAYOUT_AND_REDUNDANCY format for ec contiguous block: * 11 [4-bit replication][6-bit ErasureCodingPolicy ID] * */ {noformat} And I think we should reserve some high-value ID for custom policy. And reserve some for unknown policy which might intergrated(hard-coded) in the future. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9803) Proactively refresh ShortCircuitCache entries to avoid latency spikes
[ https://issues.apache.org/jira/browse/HDFS-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155386#comment-15155386 ] Walter Su commented on HDFS-9803: - Is it related to HDFS-5637? Which version of hdfs-client module you use? > Proactively refresh ShortCircuitCache entries to avoid latency spikes > - > > Key: HDFS-9803 > URL: https://issues.apache.org/jira/browse/HDFS-9803 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Nick Dimiduk > > My region server logs are flooding with messages like > "SecretManager$InvalidToken: access control error while attempting to set up > short-circuit access to ... is expired". These logs > correspond with responseTooSlow WARNings from the region server. > {noformat} > 2016-01-19 22:10:14,432 INFO > [B.defaultRpcServer.handler=4,queue=1,port=16020] > shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not load > 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken exception. > org.apache.hadoop.security.token.SecretManager$InvalidToken: access control > error while attempting to set up short-circuit access to token > with block_token_identifier (expiryDate=1453194430724, keyId=1508822027, > userId=hbase, blockPoolId=BP-1145309065-XXX-1448053136416, > blockId=1074037633, access modes=[READ]) is expired. > at > org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591) > at > org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490) > at > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782) > at > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716) > at > org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422) > at > org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437) > ... > {noformat} > A potential solution could be to have a background thread that makes a best > effort to proactively refreshes tokens in the cache before they expire, so as > to minimize latency impact on the critical path. > Thanks to [~cnauroth] for providing an explaination and suggesting a solution > over on the [user > list|http://mail-archives.apache.org/mod_mbox/hadoop-user/201601.mbox/%3CCANZa%3DGt%3Dhvuf3fyOJqf-jdpBPL_xDknKBcp7LmaC-YUm0jDUVg%40mail.gmail.com%3E]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9716: Resolution: Cannot Reproduce Status: Resolved (was: Patch Available) HDFS-9755 covers the same fix. Closed this as 'Cannot Reproduce'. Thanks [~liuml07] for reporting the issue. > o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk > --- > > Key: HDFS-9716 > URL: https://issues.apache.org/jira/browse/HDFS-9716 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Mingliang Liu >Assignee: Walter Su > Labels: test > Attachments: HDFS-9716.01.patch, HDFS-9716.02.patch > > > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/ > * > https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9816) Erasure Coding: allow to use multiple EC policies in striping related tests [Part 3]
[ https://issues.apache.org/jira/browse/HDFS-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151796#comment-15151796 ] Walter Su commented on HDFS-9816: - bq. Then we can move the current hard coded suite to TestBlockRecovery. It works for me. Thanks [~zhz], [~lirui]. btw, If safeLength calculation is already tested at {{TestBlockRecovery#testSafeLength}}. I think {{TestLeaseRecoveryStriped}} should use production code to get safeLength instead of repeating it? > Erasure Coding: allow to use multiple EC policies in striping related tests > [Part 3] > > > Key: HDFS-9816 > URL: https://issues.apache.org/jira/browse/HDFS-9816 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Reporter: Rui Li >Assignee: Rui Li > Attachments: HDFS-9816.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9816) Erasure Coding: allow to use multiple EC policies in striping related tests [Part 3]
[ https://issues.apache.org/jira/browse/HDFS-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150432#comment-15150432 ] Walter Su commented on HDFS-9816: - The safe length may change if steps 3,4 is included(See HDFS-9173). I prefer hard-coded test suite because it's more clear what the safe length is, and in which step for now. The safe length calculation also need to be tested. So it's better to use hard-coded values, and don't repeat the calculation logic from production code. > Erasure Coding: allow to use multiple EC policies in striping related tests > [Part 3] > > > Key: HDFS-9816 > URL: https://issues.apache.org/jira/browse/HDFS-9816 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Reporter: Rui Li >Assignee: Rui Li > Attachments: HDFS-9816.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9347) Invariant assumption in TestQuorumJournalManager.shutdown() is wrong
[ https://issues.apache.org/jira/browse/HDFS-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9347: Fix Version/s: 2.6.5 2.7.3 > Invariant assumption in TestQuorumJournalManager.shutdown() is wrong > > > Key: HDFS-9347 > URL: https://issues.apache.org/jira/browse/HDFS-9347 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.8.0, 2.7.3, 2.6.5 > > Attachments: HDFS-9347.001.patch, HDFS-9347.002.patch, > HDFS-9347.003.patch, HDFS-9347.004.patch, HDFS-9347.005.patch, > HDFS-9347.006.patch > > > The code > {code:title=TestTestQuorumJournalManager.java|borderStyle=solid} > @After > public void shutdown() throws IOException { > IOUtils.cleanup(LOG, toClose.toArray(new Closeable[0])); > > // Should not leak clients between tests -- this can cause flaky tests. > // (See HDFS-4643) > GenericTestUtils.assertNoThreadsMatching(".*IPC Client.*"); > > if (cluster != null) { > cluster.shutdown(); > } > } > {code} > implicitly assumes when the call returns from IOUtils.cleanup() (which calls > close() on QuorumJournalManager object), all IPC client connection threads > are terminated. However, there is no internal implementation that enforces > this assumption. Even if the bug reported in HADOOP-12532 is fixed, the > internal code still only ensures IPC connections are terminated, but not the > thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9752: Attachment: HDFS-9752-branch-2.7.03.patch HDFS-9752-branch-2.6.03.patch > Permanent write failures may happen to slow writers during datanode rolling > upgrades > > > Key: HDFS-9752 > URL: https://issues.apache.org/jira/browse/HDFS-9752 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Walter Su >Priority: Critical > Attachments: HDFS-9752-branch-2.6.03.patch, > HDFS-9752-branch-2.7.03.patch, HDFS-9752.01.patch, HDFS-9752.02.patch, > HDFS-9752.03.patch, HdfsWriter.java > > > When datanodes are being upgraded, an out-of-band ack is sent upstream and > the client does a pipeline recovery. The client may hit this multiple times > as more nodes get upgraded. This normally does not cause any issue, but if > the client is holding the stream open without writing any data during this > time, a permanent write failure can occur. > This is because there is a limit of 5 recovery trials for the same packet, > which is tracked by "last acked sequence number". Since the empty heartbeat > packets for an idle output stream does not increment the sequence number, the > write will fail after it seeing 5 pipeline breakages by datanode upgrades. > This check/limit was added to avoid spinning until running out of nodes in > the cluster due to a corruption or any other irrecoverable conditions. The > datanode upgrade-restart should be excluded from the count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9347) Invariant assumption in TestQuorumJournalManager.shutdown() is wrong
[ https://issues.apache.org/jira/browse/HDFS-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138264#comment-15138264 ] Walter Su commented on HDFS-9347: - Thanks [~jojochuang] for the work. I just cherry-picked it to branch-2.7 and branch-2.6. > Invariant assumption in TestQuorumJournalManager.shutdown() is wrong > > > Key: HDFS-9347 > URL: https://issues.apache.org/jira/browse/HDFS-9347 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.8.0, 2.7.3, 2.6.5 > > Attachments: HDFS-9347.001.patch, HDFS-9347.002.patch, > HDFS-9347.003.patch, HDFS-9347.004.patch, HDFS-9347.005.patch, > HDFS-9347.006.patch > > > The code > {code:title=TestTestQuorumJournalManager.java|borderStyle=solid} > @After > public void shutdown() throws IOException { > IOUtils.cleanup(LOG, toClose.toArray(new Closeable[0])); > > // Should not leak clients between tests -- this can cause flaky tests. > // (See HDFS-4643) > GenericTestUtils.assertNoThreadsMatching(".*IPC Client.*"); > > if (cluster != null) { > cluster.shutdown(); > } > } > {code} > implicitly assumes when the call returns from IOUtils.cleanup() (which calls > close() on QuorumJournalManager object), all IPC client connection threads > are terminated. However, there is no internal implementation that enforces > this assumption. Even if the bug reported in HADOOP-12532 is fixed, the > internal code still only ensures IPC connections are terminated, but not the > thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138306#comment-15138306 ] Walter Su commented on HDFS-9752: - Thanks all for reviewing the patch. The patch depends on HDFS-9347. I just cherry-picked it to 2.6.5. Now I've uploaded the separate patch for 2.7/2.6. > Permanent write failures may happen to slow writers during datanode rolling > upgrades > > > Key: HDFS-9752 > URL: https://issues.apache.org/jira/browse/HDFS-9752 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Walter Su >Priority: Critical > Attachments: HDFS-9752-branch-2.6.03.patch, > HDFS-9752-branch-2.7.03.patch, HDFS-9752.01.patch, HDFS-9752.02.patch, > HDFS-9752.03.patch, HdfsWriter.java > > > When datanodes are being upgraded, an out-of-band ack is sent upstream and > the client does a pipeline recovery. The client may hit this multiple times > as more nodes get upgraded. This normally does not cause any issue, but if > the client is holding the stream open without writing any data during this > time, a permanent write failure can occur. > This is because there is a limit of 5 recovery trials for the same packet, > which is tracked by "last acked sequence number". Since the empty heartbeat > packets for an idle output stream does not increment the sequence number, the > write will fail after it seeing 5 pipeline breakages by datanode upgrades. > This check/limit was added to avoid spinning until running out of nodes in > the cluster due to a corruption or any other irrecoverable conditions. The > datanode upgrade-restart should be excluded from the count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9752: Attachment: HDFS-9752.03.patch bq. The test can just verify that pipelineRecoveryCount is not incremented after DN restart and pipeline recovery... Good idea. Thanks. Uploaded 03 patch. It's still difficult to remove the sleep used for waiting DN shutdown. bq. To avoid sleeping for arbitrary amount of time to wait for a datanode to shutdown, we can have DataNode#shutdown() to set a variable at the end to indicate shutdown is complete. I use thread name and GenericTestUtils.waitForThreadTermination(..) to do that. Hope it's ok to you. > Permanent write failures may happen to slow writers during datanode rolling > upgrades > > > Key: HDFS-9752 > URL: https://issues.apache.org/jira/browse/HDFS-9752 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Walter Su >Priority: Critical > Attachments: HDFS-9752.01.patch, HDFS-9752.02.patch, > HDFS-9752.03.patch, HdfsWriter.java > > > When datanodes are being upgraded, an out-of-band ack is sent upstream and > the client does a pipeline recovery. The client may hit this multiple times > as more nodes get upgraded. This normally does not cause any issue, but if > the client is holding the stream open without writing any data during this > time, a permanent write failure can occur. > This is because there is a limit of 5 recovery trials for the same packet, > which is tracked by "last acked sequence number". Since the empty heartbeat > packets for an idle output stream does not increment the sequence number, the > write will fail after it seeing 5 pipeline breakages by datanode upgrades. > This check/limit was added to avoid spinning until running out of nodes in > the cluster due to a corruption or any other irrecoverable conditions. The > datanode upgrade-restart should be excluded from the count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9752: Attachment: HDFS-9752.02.patch Thanks for the advises. Uploaded 02 patch. The test now takes ~30s. But it's still difficult to remove the _sleep_ used for waiting DN shutdown. I can use org.apache.mina.util.AvailablePortFinder.available(int port) to wait port to be free. But afraid of the extra dependency. > Permanent write failures may happen to slow writers during datanode rolling > upgrades > > > Key: HDFS-9752 > URL: https://issues.apache.org/jira/browse/HDFS-9752 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Walter Su >Priority: Critical > Attachments: HDFS-9752.01.patch, HDFS-9752.02.patch, HdfsWriter.java > > > When datanodes are being upgraded, an out-of-band ack is sent upstream and > the client does a pipeline recovery. The client may hit this multiple times > as more nodes get upgraded. This normally does not cause any issue, but if > the client is holding the stream open without writing any data during this > time, a permanent write failure can occur. > This is because there is a limit of 5 recovery trials for the same packet, > which is tracked by "last acked sequence number". Since the empty heartbeat > packets for an idle output stream does not increment the sequence number, the > write will fail after it seeing 5 pipeline breakages by datanode upgrades. > This check/limit was added to avoid spinning until running out of nodes in > the cluster due to a corruption or any other irrecoverable conditions. The > datanode upgrade-restart should be excluded from the count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134131#comment-15134131 ] Walter Su commented on HDFS-9752: - Hi, [~xiaobingo]. I think the 'write failure' means the outputstream throw up a IOException and can't closed normally. In you test, if you just kill the DN without upgrade command, the client will consider it an error node and exclude it. It's not in the new pipeline. > Permanent write failures may happen to slow writers during datanode rolling > upgrades > > > Key: HDFS-9752 > URL: https://issues.apache.org/jira/browse/HDFS-9752 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Walter Su >Priority: Critical > Attachments: HDFS-9752.01.patch, HDFS-9752.02.patch, HdfsWriter.java > > > When datanodes are being upgraded, an out-of-band ack is sent upstream and > the client does a pipeline recovery. The client may hit this multiple times > as more nodes get upgraded. This normally does not cause any issue, but if > the client is holding the stream open without writing any data during this > time, a permanent write failure can occur. > This is because there is a limit of 5 recovery trials for the same packet, > which is tracked by "last acked sequence number". Since the empty heartbeat > packets for an idle output stream does not increment the sequence number, the > write will fail after it seeing 5 pipeline breakages by datanode upgrades. > This check/limit was added to avoid spinning until running out of nodes in > the cluster due to a corruption or any other irrecoverable conditions. The > datanode upgrade-restart should be excluded from the count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9748) When addExpectedReplicasToPending is called twice, pendingReplications should avoid duplication
Walter Su created HDFS-9748: --- Summary: When addExpectedReplicasToPending is called twice, pendingReplications should avoid duplication Key: HDFS-9748 URL: https://issues.apache.org/jira/browse/HDFS-9748 Project: Hadoop HDFS Issue Type: Bug Reporter: Walter Su Assignee: Walter Su Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9748) When addExpectedReplicasToPending is called twice, pendingReplications should avoid duplication
[ https://issues.apache.org/jira/browse/HDFS-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9748: Affects Version/s: 2.8.0 Status: Patch Available (was: Open) > When addExpectedReplicasToPending is called twice, pendingReplications should > avoid duplication > --- > > Key: HDFS-9748 > URL: https://issues.apache.org/jira/browse/HDFS-9748 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Attachments: HDFS-9748.01.patch > > > 1. When completeFile() is called, addExpectedReplicasToPending() will be > called (HDFS-8999). > 2. When first replica is reported, addExpectedReplicasToPending() will be > called the second time. > {code} > //BlockManager.addStoredBlock(..) > if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED && > hasMinStorage(storedBlock, numLiveReplicas)) { > addExpectedReplicasToPending(storedBlock, bc); > completeBlock(storedBlock, false); > } else if (storedBlock.isComplete() && result == AddBlockResult.ADDED) { > {code} > But, > {code} > //PendingReplicationBlocks.java > void incrementReplicas(DatanodeDescriptor... newTargets) { > if (newTargets != null) { > Collections.addAll(targets, newTargets); > } > } > {code} > targets is ArrayList, the above code simply add all {{newTargets}} to > {{targets}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9748) When addExpectedReplicasToPending is called twice, pendingReplications should avoid duplication
[ https://issues.apache.org/jira/browse/HDFS-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9748: Attachment: HDFS-9748.01.patch > When addExpectedReplicasToPending is called twice, pendingReplications should > avoid duplication > --- > > Key: HDFS-9748 > URL: https://issues.apache.org/jira/browse/HDFS-9748 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Attachments: HDFS-9748.01.patch > > > 1. When completeFile() is called, addExpectedReplicasToPending() will be > called (HDFS-8999). > 2. When first replica is reported, addExpectedReplicasToPending() will be > called the second time. > {code} > //BlockManager.addStoredBlock(..) > if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED && > hasMinStorage(storedBlock, numLiveReplicas)) { > addExpectedReplicasToPending(storedBlock, bc); > completeBlock(storedBlock, false); > } else if (storedBlock.isComplete() && result == AddBlockResult.ADDED) { > {code} > But, > {code} > //PendingReplicationBlocks.java > void incrementReplicas(DatanodeDescriptor... newTargets) { > if (newTargets != null) { > Collections.addAll(targets, newTargets); > } > } > {code} > targets is ArrayList, the above code simply add all {{newTargets}} to > {{targets}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9748) When addExpectedReplicasToPending is called twice, pendingReplications should avoid duplication
[ https://issues.apache.org/jira/browse/HDFS-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9748: Description: 1. When completeFile() is called, addExpectedReplicasToPending() will be called (HDFS-8999). 2. When first replica is reported, addExpectedReplicasToPending() will be called the second time. {code} //BlockManager.addStoredBlock(..) if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED && hasMinStorage(storedBlock, numLiveReplicas)) { addExpectedReplicasToPending(storedBlock, bc); completeBlock(storedBlock, false); } else if (storedBlock.isComplete() && result == AddBlockResult.ADDED) { {code} But, {code} //PendingReplicationBlocks.java void incrementReplicas(DatanodeDescriptor... newTargets) { if (newTargets != null) { Collections.addAll(targets, newTargets); } } {code} targets is ArrayList, the above code simply add all {{newTargets}} to {{targets}}. > When addExpectedReplicasToPending is called twice, pendingReplications should > avoid duplication > --- > > Key: HDFS-9748 > URL: https://issues.apache.org/jira/browse/HDFS-9748 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > > 1. When completeFile() is called, addExpectedReplicasToPending() will be > called (HDFS-8999). > 2. When first replica is reported, addExpectedReplicasToPending() will be > called the second time. > {code} > //BlockManager.addStoredBlock(..) > if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED && > hasMinStorage(storedBlock, numLiveReplicas)) { > addExpectedReplicasToPending(storedBlock, bc); > completeBlock(storedBlock, false); > } else if (storedBlock.isComplete() && result == AddBlockResult.ADDED) { > {code} > But, > {code} > //PendingReplicationBlocks.java > void incrementReplicas(DatanodeDescriptor... newTargets) { > if (newTargets != null) { > Collections.addAll(targets, newTargets); > } > } > {code} > targets is ArrayList, the above code simply add all {{newTargets}} to > {{targets}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9752: Attachment: HDFS-9752.01.patch Thanks [~kihwal] for reporting this. Uploaded 01 patch, kindly review. The patch resets {{pipelineRecoveryCount}} every time a packet is successfully sent. > Permanent write failures may happen to slow writers during datanode rolling > upgrades > > > Key: HDFS-9752 > URL: https://issues.apache.org/jira/browse/HDFS-9752 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Priority: Critical > Attachments: HDFS-9752.01.patch > > > When datanodes are being upgraded, an out-of-band ack is sent upstream and > the client does a pipeline recovery. The client may hit this multiple times > as more nodes get upgraded. This normally does not cause any issue, but if > the client is holding the stream open without writing any data during this > time, a permanent write failure can occur. > This is because there is a limit of 5 recovery trials for the same packet, > which is tracked by "last acked sequence number". Since the empty heartbeat > packets for an idle output stream does not increment the sequence number, the > write will fail after it seeing 5 pipeline breakages by datanode upgrades. > This check/limit was added to avoid spinning until running out of nodes in > the cluster due to a corruption or any other irrecoverable conditions. The > datanode upgrade-restart should be excluded from the count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9752: Assignee: Walter Su Status: Patch Available (was: Open) > Permanent write failures may happen to slow writers during datanode rolling > upgrades > > > Key: HDFS-9752 > URL: https://issues.apache.org/jira/browse/HDFS-9752 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Walter Su >Priority: Critical > Attachments: HDFS-9752.01.patch > > > When datanodes are being upgraded, an out-of-band ack is sent upstream and > the client does a pipeline recovery. The client may hit this multiple times > as more nodes get upgraded. This normally does not cause any issue, but if > the client is holding the stream open without writing any data during this > time, a permanent write failure can occur. > This is because there is a limit of 5 recovery trials for the same packet, > which is tracked by "last acked sequence number". Since the empty heartbeat > packets for an idle output stream does not increment the sequence number, the > write will fail after it seeing 5 pipeline breakages by datanode upgrades. > This check/limit was added to avoid spinning until running out of nodes in > the cluster due to a corruption or any other irrecoverable conditions. The > datanode upgrade-restart should be excluded from the count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127932#comment-15127932 ] Walter Su commented on HDFS-9716: - There are 2 ways to make BlockGroup under-replicated: 1.shutdown DN, or 2. replica file corruption. If replica is corruped, and is reported to NN, NN asks DN to delete the replica. java.io.File.length() returns 0 if the file is not found. > o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk > --- > > Key: HDFS-9716 > URL: https://issues.apache.org/jira/browse/HDFS-9716 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Walter Su > > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/ > * > https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su reassigned HDFS-9716: --- Assignee: Walter Su > o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk > --- > > Key: HDFS-9716 > URL: https://issues.apache.org/jira/browse/HDFS-9716 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Walter Su > > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/ > * > https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9716: Labels: test (was: ) Affects Version/s: (was: 2.8.0) Status: Patch Available (was: Open) > o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk > --- > > Key: HDFS-9716 > URL: https://issues.apache.org/jira/browse/HDFS-9716 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Mingliang Liu >Assignee: Walter Su > Labels: test > Attachments: HDFS-9716.01.patch > > > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/ > * > https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128008#comment-15128008 ] Walter Su commented on HDFS-9716: - You can easily reproduce the failure by delay line 345 with a breakpoint or a sleep, so NN has time to invalidate the replica. {code} 339 // Check the replica on the new target node. 340 for (int i = 0; i < toRecoverBlockNum; i++) { 341 File replicaAfterRecovery = cluster.getBlockFile(targetDNs[i], blocks[i]); 342 LOG.info("replica after recovery " + replicaAfterRecovery); 343 File metadataAfterRecovery = 344 cluster.getBlockMetadataFile(targetDNs[i], blocks[i]); 345 assertEquals(replicaAfterRecovery.length(), replicas[i].length()); 346 LOG.info("replica before " + replicas[i]); 347 assertTrue(metadataAfterRecovery.getName(). 348 endsWith(blocks[i].getGenerationStamp() + ".meta")); 349 byte[] replicaContentAfterRecovery = 350 DFSTestUtil.readFileAsBytes(replicaAfterRecovery); 351 352 Assert.assertArrayEquals(replicaContents[i], replicaContentAfterRecovery); 353 } 354 } {code} > o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk > --- > > Key: HDFS-9716 > URL: https://issues.apache.org/jira/browse/HDFS-9716 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Mingliang Liu >Assignee: Walter Su > Labels: test > Attachments: HDFS-9716.01.patch > > > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/ > * > https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9716: Attachment: HDFS-9716.01.patch > o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk > --- > > Key: HDFS-9716 > URL: https://issues.apache.org/jira/browse/HDFS-9716 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Walter Su > Attachments: HDFS-9716.01.patch > > > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/ > * > https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128003#comment-15128003 ] Walter Su commented on HDFS-9716: - {{MiniDFSCluster.getBlockFile}} on _dead DN_ is called before replica corruption and deletion. It won't return null. > o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk > --- > > Key: HDFS-9716 > URL: https://issues.apache.org/jira/browse/HDFS-9716 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Mingliang Liu >Assignee: Walter Su > Labels: test > Attachments: HDFS-9716.01.patch > > > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/ > * > https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
[ https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9716: Attachment: HDFS-9716.02.patch Thanks [~drankye]. Uploaded 02 patch to address that. > o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk > --- > > Key: HDFS-9716 > URL: https://issues.apache.org/jira/browse/HDFS-9716 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Mingliang Liu >Assignee: Walter Su > Labels: test > Attachments: HDFS-9716.01.patch, HDFS-9716.02.patch > > > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/ > * > https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9646) ErasureCodingWorker may fail when recovering data blocks with length less than the first internal block
[ https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098152#comment-15098152 ] Walter Su commented on HDFS-9646: - {code} + bufferSize, (int)(maxTargetLength - positionInBlock)); {code} {{positionInBlock}} is initially 0, you just cast {{maxTargetLength}} to {{int}} whose maximal value is block size. bq. 3. Question: do we need new test codes to expose the issue and ensure the issue is fixed? I think the randomized {{generateDeadDnIndices()}} can test that. Patch looks good to me, too. Thanks [~jingzhao]. > ErasureCodingWorker may fail when recovering data blocks with length less > than the first internal block > --- > > Key: HDFS-9646 > URL: https://issues.apache.org/jira/browse/HDFS-9646 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Takuya Fukudome >Assignee: Jing Zhao >Priority: Critical > Attachments: HDFS-9646.000.patch, test-reconstruct-stripe-file.patch > > > This is reported by [~tfukudom]: ErasureCodingWorker may fail with the > following exception when recovering a non-full internal block. > {code} > 2016-01-06 11:14:44,740 WARN datanode.DataNode > (ErasureCodingWorker.java:run(467)) - Failed to recover striped block: > BP-987302662-172.29.4.13-1450757377698:blk_-92233720368 > 54322288_29751 > java.io.IOException: Transfer failed for all targets. > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9534) Add CLI command to clear storage policy from a path.
[ https://issues.apache.org/jira/browse/HDFS-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101365#comment-15101365 ] Walter Su commented on HDFS-9534: - Thanks [~xiaobingo]. 1. I think the original design doesn't mean to make UNSPECIFIED_STORAGE_POLICY_ID a policy. So it's not in policy suite. 2. In FSDirAttrOp.java, you can pass _policyId_ instead of _policyName_ to _setStoragePolicy(..)_. > Add CLI command to clear storage policy from a path. > > > Key: HDFS-9534 > URL: https://issues.apache.org/jira/browse/HDFS-9534 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Reporter: Chris Nauroth >Assignee: Xiaobing Zhou > Attachments: HDFS-9534.001.patch > > > The {{hdfs storagepolicies}} command has sub-commands for > {{-setStoragePolicy}} and {{-getStoragePolicy}} on a path. However, there is > no {{-removeStoragePolicy}} to remove a previously set storage policy on a > path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085202#comment-15085202 ] Walter Su commented on HDFS-7661: - You totally miss my point. A successful flush is a guarantee that the data is safe. If 1st flush succeed, data written before 1st flush is safe. If 2nd flush failed, data written between 1st ~ 2nd flush is lost. User can restart writing at 1st flush point (with a lease recovery). According to the description, if the data before 1st flush is damaged, how can we restart at 1st flush point? Client have to restart at the beginning of current block. Then what's the meaning of "flush"? > Erasure coding: support hflush and hsync > > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084527#comment-15084527 ] Walter Su commented on HDFS-7661: - According to the description, 1. 3 parity blocks should be updated in sequential. 2. The 2nd flush decreases data safety before the 1st flush. If there's already numParityBlks failures, the 2nd flush must succeed, even cannot be aborted by user. Otherwise it'll damage the data before 1st flush. > Erasure coding: support hflush and hsync > > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082272#comment-15082272 ] Walter Su commented on HDFS-7661: - bq. But, the older parity internal block comes back later, then we have different version parity blocks. How? Does the older parity internal block become dirty? > Erasure coding: support hflush and hsync > > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, > HDFS-EC-file-flush-sync-design-version1.1.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8430) Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
[ https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082228#comment-15082228 ] Walter Su commented on HDFS-8430: - Thanks [~szetszwo] for clarifying. {{New Algorithm 2}} looks good. And we need a new DataTransferProtocol instead of {{blockChecksum(..)}} to get cell checksum array. > Erasure coding: update DFSClient.getFileChecksum() logic for stripe files > - > > Key: HDFS-8430 > URL: https://issues.apache.org/jira/browse/HDFS-8430 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Walter Su >Assignee: Kai Zheng > Attachments: HDFS-8430-poc1.patch > > > HADOOP-3981 introduces a distributed file checksum algorithm. It's designed > for replicated block. > {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped > block group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)