[jira] [Updated] (HDFS-16751) WebUI FileSystem explorer could delete wrong file by mistake

2022-08-29 Thread Walter Su (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-16751:
-
Summary: WebUI FileSystem explorer could delete wrong file by mistake  
(was: WebUI FileSystem explorer file Deletion could delete wrong file by 
mistake)

> WebUI FileSystem explorer could delete wrong file by mistake
> 
>
> Key: HDFS-16751
> URL: https://issues.apache.org/jira/browse/HDFS-16751
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Walter Su
>Priority: Major
> Attachments: tmp.png
>
>
> on FileSystem explorer page, I click on 'Delete' icon in order to delete file 
> A. The result is that File B is deleted.
> I found out that the ajax url string concation is wrong, as show in the image 
> I attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16751) WebUI FileSystem explorer file Deletion could delete wrong file by mistake

2022-08-29 Thread Walter Su (Jira)
Walter Su created HDFS-16751:


 Summary: WebUI FileSystem explorer file Deletion could delete 
wrong file by mistake
 Key: HDFS-16751
 URL: https://issues.apache.org/jira/browse/HDFS-16751
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.2.1
Reporter: Walter Su
 Attachments: tmp.png

on FileSystem explorer page, I click on 'Delete' icon in order to delete file 
A. The result is that File B is deleted.
I found out that the ajax url string concation is wrong, as show in the image I 
attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16644) java.io.IOException Invalid token in javax.security.sasl.qop

2022-06-29 Thread Walter Su (Jira)
Walter Su created HDFS-16644:


 Summary: java.io.IOException Invalid token in 
javax.security.sasl.qop
 Key: HDFS-16644
 URL: https://issues.apache.org/jira/browse/HDFS-16644
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.2.1
Reporter: Walter Su


deployment:

server side: kerberos enabled cluster with jdk 1.8 and hdfs-server 3.2.1

client side:
I run command hadoop fs -put a test file, with kerberos ticket inited first, 
and use identical core-site.xml & hdfs-site.xml configuration.

 using client ver 3.2.1, it succeeds.

 using client ver 2.8.5, it succeeds.

 using client ver 2.10.1, it fails. The client side error info is:

org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: SASL 
encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2022-06-27 01:06:15,781 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataNode{data=FSDataset{dirpath='[/mnt/disk1/hdfs, /mnt/***/hdfs, 
/mnt/***/hdfs, /mnt/***/hdfs]'}, localName='emr-worker-***.***:9866', 
datanodeUuid='b1c7f64a-6389-4739-bddf-***', xmitsInProgress=0}:Exception 
transfering block BP-1187699012-10.-***:blk_1119803380_46080919 to mirror 
10.*:9866
java.io.IOException: Invalid token in javax.security.sasl.qop: D
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:220)

Once any client ver 2.10.1 connect to hdfs server, the DataNode no longer 
accepts any client connection, even client ver 3.2.1 cannot connects to hdfs 
server. The DataNode rejects any client connection. For a short time, all 
DataNodes rejects client connections. 

The problem exists even if I replace DataNode with ver 3.3.0 or replace java 
with jdk 11.
The problem is fixed if I replace DataNode with ver 3.2.0. I guess the problem 
is related to HDFS-13541




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10383) Safely close resources in DFSTestUtil

2016-05-15 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284143#comment-15284143
 ] 

Walter Su commented on HDFS-10383:
--

bq. IOUtils#cleanup swallows it in the finally block.
Great work! And good analysis about {{createStripedFile()}}. We already have 
{{createStripedFile()}} before {{DFSStripedSteam}} is implemented. The test 
still prints a warning stacktrace because of secondary {{completeFile()}}. So I 
think, which is not related to this, how about changing it together
{code}
-  out = dfs.create(file, (short) 1); // create an empty file
+  cluster.getNameNodeRpc()
+  .create(file.toString(), new FsPermission((short)0755),
+  dfs.getClient().getClientName(),
+  new EnumSetWritable<>(EnumSet.of(CreateFlag.CREATE)),
+  false, (short)1, 128*1024*1024L, null);
{code}

> Safely close resources in DFSTestUtil
> -
>
> Key: HDFS-10383
> URL: https://issues.apache.org/jira/browse/HDFS-10383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-10383.000.patch, HDFS-10383.001.patch, 
> HDFS-10383.002.patch
>
>
> There are a few of methods in {{DFSTestUtil}} that do not close the resource 
> safely, or elegantly. We can use the try-with-resource statement to address 
> this problem.
> Specially, as {{DFSTestUtil}} is popularly used in test, we need to preserve 
> any exceptions thrown during the processing of the resource while still 
> guaranteeing it's closed finally. Take for example,the current implementation 
> of {{DFSTestUtil#createFile()}} closes the FSDataOutputStream in the 
> {{finally}} block, and when closing if the internal 
> {{DFSOutputStream#close()}} throws any exception, which it often does, the 
> exception thrown during the processing will be lost. See this [test 
> failure|https://builds.apache.org/job/PreCommit-HADOOP-Build/9320/testReport/org.apache.hadoop.hdfs/TestAsyncDFSRename/testAggressiveConcurrentAsyncRenameWithOverwrite/],
>  and we have to guess what was the root cause.
> Using try-with-resource, we can close the resources safely, and the 
> exceptions thrown both in processing and closing will be available (closing 
> exception will be suppressed).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-05-08 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275875#comment-15275875
 ] 

Walter Su commented on HDFS-10220:
--

The last patch looks pretty good. +1 once the test nits get addressed. Thanks 
[~ashangit] for the contribution. Thanks [~raviprak] and [~liuml07] for the 
good advice and review.

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, 
> HADOOP-10220.006.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10340) data node sudden killed

2016-04-28 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261938#comment-15261938
 ] 

Walter Su commented on HDFS-10340:
--

I don't think it's an issue. SIGTERM comes from the outside. The signal is 
probably emitted by some script, command, or daemon process.

> data node sudden killed 
> 
>
> Key: HDFS-10340
> URL: https://issues.apache.org/jira/browse/HDFS-10340
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
> Environment: Ubuntu 16.04 LTS , RAM 16g , cpu core : 8 , hdd 100gb, 
> hadoop 2.6.0
>Reporter: tu nguyen khac
>Priority: Critical
>
> I tried to setup a new data node using ubuntu 16 
> and get it join to an existed Hadoop Hdfs cluster ( there are 10 nodes in 
> this cluster and they all run on centos Os 6 ) 
> But when i try to boostrap this node , after about 10 or 20 minutes i get 
> this strange errors : 
> 2016-04-26 20:12:09,394 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.3.24.65:55323, dest: /10.3.24.197:50010, bytes: 79902, op: HDFS_WRITE, 
> cliID: DFSClient_NONMAPREDUCE_1379996362_1, offset: 0, srvID: 
> 225f5b43-1dd3-4ac6-88d2-1e8d27dba55b, blockid: 
> BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, duration: 
> 15331628
> 2016-04-26 20:12:09,394 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder: BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832, 
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2016-04-26 20:12:25,410 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038502_789829
> 2016-04-26 20:12:25,411 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-352432948-10.3.24.65-1433821675295:blk_1074038505_789832
> 2016-04-26 20:13:18,546 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Scheduling blk_1074038502_789829 file 
> /data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
>  for deletion
> 2016-04-26 20:13:18,562 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Deleted BP-352432948-10.3.24.65-1433821675295 blk_1074038502_789829 file 
> /data/hadoop_data/backup/data/current/BP-352432948-10.3.24.65-1433821675295/current/finalized/subdir4/subdir134/blk_1074038502
> 2016-04-26 20:15:46,481 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
> 2016-04-26 20:15:46,504 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down DataNode at bigdata-dw-24-197/10.3.24.197
> /



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-27 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261388#comment-15261388
 ] 

Walter Su commented on HDFS-9958:
-

bq. I think only DFSClient currently reports storageID.
No, it doesn't.
{code}
//DFSInputStream.java
  protected void reportCheckSumFailure(CorruptedBlocks corruptedBlocks,
  int dataNodeCount, boolean isStriped) {
...
reportList.add(new LocatedBlock(blk, locs));
  }
}
...
 dfsClient.reportChecksumFailure(src,
  reportList.toArray(new LocatedBlock[reportList.size()]));
{code}

{{locs}} is {{DatanodeInfoWithStorage}} actually, it has the storageIDs. But 
the {{LocatedBlock}} constructor is wrong.
{code}
  public LocatedBlock(ExtendedBlock b, DatanodeInfo[] locs) {
// By default, startOffset is unknown(-1) and corrupt is false.
this(b, locs, null, null, -1, false, EMPTY_LOCS);
  }
...
...
  public LocatedBlock(ExtendedBlock b, DatanodeInfo[] locs, String[] storageIDs,
  StorageType[] storageTypes, long startOffset,
  boolean corrupt, DatanodeInfo[] cachedLocs) {
...
DatanodeInfoWithStorage storage = new DatanodeInfoWithStorage(di,
storageIDs != null ? storageIDs[i] : null,
storageTypes != null ? storageTypes[i] : null);
this.locs[i] = storage;
{code}
It loses the storageIDs.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 

[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-27 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259918#comment-15259918
 ] 

Walter Su commented on HDFS-9958:
-

Failed tests are not related. Will commit shortly if there's no further comment.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5280) Corrupted meta files on data nodes prevents DFClient from connecting to data nodes and updating corruption status to name node.

2016-04-26 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259490#comment-15259490
 ] 

Walter Su commented on HDFS-5280:
-

There's other IOExceptions will cause readBlock RPC call fails, then cause the 
dn marked as dead. We could fix them as well.
If I understand correctly, you approach is to use a fake checksum. When client 
reads data, the check failed, and client will mark block as corrupted instead 
of mark dn as dead. I think, can we let client not to read from this dn at 
first? If client fails to create blockreader, it can tell if the dn is dead or 
it's just the block is corrupted.

{code}
//DFSInputStream.java
 652   try {
 653 blockReader = getBlockReader(targetBlock, offsetIntoBlock,
 654 targetBlock.getBlockSize() - offsetIntoBlock, targetAddr,
 655 storageType, chosenNode);
 656 if(connectFailedOnce) {
 657   DFSClient.LOG.info("Successfully connected to " + targetAddr +
 658  " for " + targetBlock.getBlock());
 659 }
 660 return chosenNode;
 661   } catch (IOException ex) {
 662 if (ex instanceof InvalidEncryptionKeyException && 
refetchEncryptionKey > 0) {
...
 672 } else {
...
 677   addToDeadNodes(chosenNode);
 678 }
 679   }
 680 }
 681   }
{code}
Instead of going to {{else}} clause, can we have another Exception like 
{{InvalidEncryptionKeyException}}, if we catch it, we skip the dn, and do not 
add it to dead nodes.

> Corrupted meta files on data nodes prevents DFClient from connecting to data 
> nodes and updating corruption status to name node.
> ---
>
> Key: HDFS-5280
> URL: https://issues.apache.org/jira/browse/HDFS-5280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 1.1.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.7.2
> Environment: Red hat enterprise 6.4
> Hadoop-2.1.0
>Reporter: Jinghui Wang
>Assignee: Andres Perez
> Attachments: HDFS-5280.patch
>
>
> Meta files being corrupted causes the DFSClient not able to connect to the 
> datanodes to access the blocks, so DFSClient never perform a read on the 
> block, which is what throws the ChecksumException when file blocks are 
> corrupted and report to the namenode to mark the block as corrupt.  Since the 
> client never got to that far, thus the file status remain as healthy and so 
> are all the blocks.
> To replicate the error, put a file onto HDFS.
> run hadoop fsck /tmp/bogus.csv -files -blocks -location will get that 
> following output.
> FSCK started for path /tmp/bogus.csv at 11:33:29
> /tmp/bogus.csv 109 bytes, 1 block(s):  OK
> 0. blk_-4255166695856420554_5292 len=109 repl=3
> find the block/meta files for 4255166695856420554 by running 
> ssh datanode1.address find /hadoop/ -name "*4255166695856420554*" and it will 
> get the following output:
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta
> now corrupt the meta file by running 
> ssh datanode1.address "sed -i -e '1i 1234567891' 
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta" 
> now run hadoop fs -cat /tmp/bogus.csv
> will show the stack trace of DFSClient failing to connect to the data node 
> with the corrupted meta file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-26 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259457#comment-15259457
 ] 

Walter Su commented on HDFS-9958:
-

{code}
@@ -1320,11 +1320,22 @@ public void findAndMarkBlockAsCorrupt(final 
ExtendedBlock blk,
  
+if (storage == null) {
+  storage = storedBlock.findStorageInfo(node);
+}
{code}
I'm surprised that most of the time, {{storageID}} is null. It makes the code 
above error prone, because the blk can be added/moved to another healthy 
storage in the same node. I suppose we should add the storageID message into 
the request.

+1. re-trigger the jenkins.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-26 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259384#comment-15259384
 ] 

Walter Su commented on HDFS-10220:
--

bq. I think it add some readability and also because it is used twice.
I only took a peek last time. Yeah, i'm ok with that.
Another problem when I go through the details,
{code}
while(!sortedLeases.isEmpty() && sortedLeases.peek().expiredHardLimit()
  && !isMaxLockHoldToReleaseLease(start)) {
  Lease leaseToCheck = sortedLeases.poll();
  ...
  Collection files = leaseToCheck.getFiles();
 ...
  for(Long id : leaseINodeIds) {
...
} finally {
  filesLeasesChecked++;
  if (isMaxLockHoldToReleaseLease(start)) {
LOG.debug("Breaking out of checkLeases() after " +
filesLeasesChecked + " file leases checked.");
break;
  }
  }
{code}
You can't just break the inside for-loop, the {{leaseToCheck}} has been polled 
out of the queue. This will cause some files won't be closed.

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, 
> threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-26 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257898#comment-15257898
 ] 

Walter Su commented on HDFS-10220:
--

Thanks [~ashangit] for the update.
repeat one of my previous comment:{{isMaxLockHoldToReleaseLease}} doesn't need 
to be a function. Is it because it's called twice per iteration? I think check 
it once per iteration would be enough.

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, 
> threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-24 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255787#comment-15255787
 ] 

Walter Su commented on HDFS-10301:
--

bq. BR ids are monotonically increasing.
The id values are random intially, if it starts with a large value it could 
overflow after a long run? If DN restarts, the value randomized again. We 
should be careful in case NN rejects all following BRs.
If BR is splitted into multipe RPCs, there's no interleaving natually because 
DN get the acked before it sends next RPC. Interleaving only exists if BR is 
not splitted. I agree bug need to be fixed from inside, It's just eliminating 
interleaving for good maybe not a bad idea, as it simplifies the problem, and 
is also a simple workaround for this jira.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-24 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255772#comment-15255772
 ] 

Walter Su commented on HDFS-10301:
--

Thank you for your explanation. I learned a lot.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-22 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253523#comment-15253523
 ] 

Walter Su commented on HDFS-10220:
--

I mean, saving administrators the trouble to tune this.

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253354#comment-15253354
 ] 

Walter Su commented on HDFS-10220:
--

You are right. The only question I have is I have no idea if the default value 
1000 is a right choice, or the approach of throttling the rate. I kind of hope 
it's out-of-the-box. Small companies with small clusters have cluster 
administrators who may not quite understand what the configuration means.

bq. Counting the time since better in term of funcionnality but I'm afraid 
about adding extra computation time on this check compare to a simple count of 
files. The idea is not to spend more times to release those lease. What is your 
feeling about it?
I believe the overhead can be ignored. Or we can calc the elapse time after 
processing a small batch.

I saw {{BlockManager.BlockReportProcessingThread}} release the writeLock if it 
holds it more than 4ms. Do you think the same idea works here?

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253236#comment-15253236
 ] 

Walter Su commented on HDFS-10301:
--

I like your idea of counting storages with same reportId, and no purge if 
there's any interleaving. I guest {{rpcsSeen}} can be removed or replaced by 
{{storagesSeen}}?

Processing the retransmissioned reports is kind of wasting resource. I think 
the best approach is as Colin said, "to remove existing DataNode storage report 
RPCs with the old ID from the queue when we receive one with a new block report 
ID." Let's consider it as an optimization in another jira.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253204#comment-15253204
 ] 

Walter Su commented on HDFS-10301:
--

The handler threads will wait anyway, either waiting the queue monitor or the 
fsn writeLock. The queue processingThread will contend for fsn writeLock. In 
the end, there's no difference.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253181#comment-15253181
 ] 

Walter Su commented on HDFS-10301:
--

bq. Enabling HDFS-9198 will fifo process BRs. It doesn't solve this 
implementation bug but virtually eliminates it from occurring.
bq. This addresses Daryn's comment rather than solving the reported bug, as BTW 
Daryn correctly stated.
that's incorrect. Please run the test in 001 patch with-and-without the fix, 
you'll see the difference. It does solve the issue. Because, 

The bug only exists when reports are contained in one rpc. If they are splitted 
into multiple RPCs, it's not problem, because the {{rpcsSeen}} guard prevent it 
from happening. So, my approach is to process reports contained in one rpc 
contiguously, by putting them into the queue atomically.


> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-21 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-10301:
-
Assignee: (was: Walter Su)

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-20 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251150#comment-15251150
 ] 

Walter Su commented on HDFS-10220:
--

1. isMaxFilesCheckedToReleaseLease is not requirted to be a function.
2. repeat [~vinayrpet] said, removeFilesInLease(leaseToCheck, removing); may 
not be required.
3. The LOG.warn("..") is kind of verbose.
4. I think the config should keep inside. It's about implemention detail. The 
re-check interval is 2s and is hard-coded too. Besides, It's too complicated 
for user to pick a right value. Instead of counting the files, I prefer 
counting the time. If it holds the lock for too long, log a warning and break 
out for a while.
5. btw, HDFS-9311 should solve this issue.


> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5280) Corrupted meta files on data nodes prevents DFClient from connecting to data nodes and updating corruption status to name node.

2016-04-20 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249649#comment-15249649
 ] 

Walter Su commented on HDFS-5280:
-

+1 for catching the exception. The same exception will cause {{BlockScanner}} 
to shutdown.
We should be cautious to catch any {{RuntimeException}}. Instead of add 
{{catch}} to the outside try-finally clause, how about just catch the exactly 
exception at the place where it's been threw.  Like what we did in 
{{FSNamesystem.java}}
{code}
 744   try {
 745  checksumType = DataChecksum.Type.valueOf(checksumTypeStr);
 746   } catch (IllegalArgumentException iae) {
 747  throw new IOException("Invalid checksum type in "
 748 + DFS_CHECKSUM_TYPE_KEY + ": " + checksumTypeStr);
 749   }
{code}

> Corrupted meta files on data nodes prevents DFClient from connecting to data 
> nodes and updating corruption status to name node.
> ---
>
> Key: HDFS-5280
> URL: https://issues.apache.org/jira/browse/HDFS-5280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 1.1.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.7.2
> Environment: Red hat enterprise 6.4
> Hadoop-2.1.0
>Reporter: Jinghui Wang
>Assignee: Andres Perez
> Attachments: HDFS-5280.patch
>
>
> Meta files being corrupted causes the DFSClient not able to connect to the 
> datanodes to access the blocks, so DFSClient never perform a read on the 
> block, which is what throws the ChecksumException when file blocks are 
> corrupted and report to the namenode to mark the block as corrupt.  Since the 
> client never got to that far, thus the file status remain as healthy and so 
> are all the blocks.
> To replicate the error, put a file onto HDFS.
> run hadoop fsck /tmp/bogus.csv -files -blocks -location will get that 
> following output.
> FSCK started for path /tmp/bogus.csv at 11:33:29
> /tmp/bogus.csv 109 bytes, 1 block(s):  OK
> 0. blk_-4255166695856420554_5292 len=109 repl=3
> find the block/meta files for 4255166695856420554 by running 
> ssh datanode1.address find /hadoop/ -name "*4255166695856420554*" and it will 
> get the following output:
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta
> now corrupt the meta file by running 
> ssh datanode1.address "sed -i -e '1i 1234567891' 
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta" 
> now run hadoop fs -cat /tmp/bogus.csv
> will show the stack trace of DFSClient failing to connect to the data node 
> with the corrupted meta file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-19 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249312#comment-15249312
 ] 

Walter Su commented on HDFS-9958:
-

bq. we fix countNodes().corruptReplicas() to return the number after going thru 
all storages( irrespective of their state) that have the corruptNodes (in this 
case), since numNodes() is storage state agnostic.
I think {{countNodes(blk)}} going thru all storages is unnecessary. Also I 
think {{numMachines}} should only include NORMAL and READ_ONLY. So 
{{createLocatedBlock(..)}} going thru all storages is unnecessary.
{code}
if (numMachines > 0) {
  for(DatanodeStorageInfo storage : blocksMap.getStorages(blk)) {
{code}

btw, which is not related to this topic, I think 
{{findAndMarkBlockAsCorrupt(..)}} shouldn't support adding blk to the map if 
the storage is not found.

ping [~jingzhao] to check if he has any comment.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10316) revisit corrupt replicas count

2016-04-19 Thread Walter Su (JIRA)
Walter Su created HDFS-10316:


 Summary: revisit corrupt replicas count
 Key: HDFS-10316
 URL: https://issues.apache.org/jira/browse/HDFS-10316
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Walter Su


A DN has 4 types of storages:
1. NORMAL
2. READ_ONLY
3. FAILED
4. (missing/pruned)

blocksMap.numNodes(blk) counts 1,2,3
blocksMap.getStorages(blk) counts 1,2,3

countNodes(blk).corruptReplicas() counts 1,2
corruptReplicas counts 1,2,3,4. Because findAndMarkBlockAsCorrupt(..) supports 
adding blk to the map even if the storage is not found.

The inconsistency causes bugs like HDFS-9958.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds

2016-04-19 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9744:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8. Thanks [~templedf] for the review.  
Thanks [~jojochuang] for the report. And thanks [~linyiqun] for the 
contribution!

> TestDirectoryScanner#testThrottling occasionally time out after 300 seconds
> ---
>
> Key: HDFS-9744
> URL: https://issues.apache.org/jira/browse/HDFS-9744
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: HDFS-9744.001.patch
>
>
> I have seen quite a few test failures in TestDirectoryScanner#testThrottling.
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/
> Looking at the log, it does not look like the test got stucked. On my local 
> machine, this test took 219 seconds. It is likely that this test takes more 
> than 300 seconds to complete on a busy jenkins slave. I think it is 
> reasonable to set a longer time out value, or reduce the number of blocks to 
> reduce the duration of the test.
> Error Message
> {noformat}
> test timed out after 30 milliseconds
> {noformat}
> Stacktrace
> {noformat}
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds

2016-04-19 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247547#comment-15247547
 ] 

Walter Su commented on HDFS-9744:
-

+1. will commit shortly.

> TestDirectoryScanner#testThrottling occasionally time out after 300 seconds
> ---
>
> Key: HDFS-9744
> URL: https://issues.apache.org/jira/browse/HDFS-9744
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: test
> Attachments: HDFS-9744.001.patch
>
>
> I have seen quite a few test failures in TestDirectoryScanner#testThrottling.
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/
> Looking at the log, it does not look like the test got stucked. On my local 
> machine, this test took 219 seconds. It is likely that this test takes more 
> than 300 seconds to complete on a busy jenkins slave. I think it is 
> reasonable to set a longer time out value, or reduce the number of blocks to 
> reduce the duration of the test.
> Error Message
> {noformat}
> test timed out after 30 milliseconds
> {noformat}
> Stacktrace
> {noformat}
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently

2016-04-19 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-10284:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2. Thanks [~vinayrpet], [~brahmareddy]  for the 
review, and thanks [~liuml07] for the contribution!

> o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode 
> fails intermittently
> -
>
> Key: HDFS-10284
> URL: https://issues.apache.org/jira/browse/HDFS-10284
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch, 
> HDFS-10284.002.patch, HDFS-10284.003.patch
>
>
> *Stacktrace*
> {code}
> org.mockito.exceptions.misusing.UnfinishedStubbingException: 
> Unfinished stubbing detected here:
> -> at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169)
> E.g. thenReturn() may be missing.
> Examples of correct stubbing:
> when(mock.isOk()).thenReturn(true);
> when(mock.isOk()).thenThrow(exception);
> doThrow(exception).when(mock).someVoidMethod();
> Hints:
>  1. missing thenReturn()
>  2. although stubbed methods may return mocks, you cannot inline mock 
> creation (mock()) call inside a thenReturn method (see issue 53)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169)
> {code}
> Sample failing pre-commit UT: 
> https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently

2016-04-19 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247468#comment-15247468
 ] 

Walter Su commented on HDFS-10284:
--

+1. will commit shortly.

> o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode 
> fails intermittently
> -
>
> Key: HDFS-10284
> URL: https://issues.apache.org/jira/browse/HDFS-10284
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch, 
> HDFS-10284.002.patch, HDFS-10284.003.patch
>
>
> *Stacktrace*
> {code}
> org.mockito.exceptions.misusing.UnfinishedStubbingException: 
> Unfinished stubbing detected here:
> -> at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169)
> E.g. thenReturn() may be missing.
> Examples of correct stubbing:
> when(mock.isOk()).thenReturn(true);
> when(mock.isOk()).thenThrow(exception);
> doThrow(exception).when(mock).someVoidMethod();
> Hints:
>  1. missing thenReturn()
>  2. although stubbed methods may return mocks, you cannot inline mock 
> creation (mock()) call inside a thenReturn method (see issue 53)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169)
> {code}
> Sample failing pre-commit UT: 
> https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10291) TestShortCircuitLocalRead failing

2016-04-19 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247445#comment-15247445
 ] 

Walter Su commented on HDFS-10291:
--

cherry-picked to trunk.

> TestShortCircuitLocalRead failing
> -
>
> Key: HDFS-10291
> URL: https://issues.apache.org/jira/browse/HDFS-10291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 2.8.0
>
> Attachments: HDFS-10291-001.patch
>
>
> {{TestShortCircuitLocalRead}} failing as length of read is considered off end 
> of buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-19 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247432#comment-15247432
 ] 

Walter Su commented on HDFS-9958:
-

Thanks [~kshukla] for the update. I've noticed 
{{testArrayOutOfBoundsException()}} failed. It tries to simulate 
{{DatanodeProtocol#reportBadBlocks(..)}} from 3rd DN. But "TEST" is not a real 
storageID, so the block isn't added to blocksMap. A fix is to get the real 
storageID from 3rd DN. Could you re-post a patch to fix this?

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-19 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-10301:
-
Assignee: Walter Su
  Status: Patch Available  (was: Open)

Upload a patch. Kindly review.

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-19 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-10301:
-
Attachment: HDFS-10301.01.patch

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247129#comment-15247129
 ] 

Walter Su commented on HDFS-10301:
--

Oh, I see. In this case, the reports are not splitted. And because the for-loop 
is outside the lock, the 2 for-loops interleaved.
{code}
for (int r = 0; r < reports.length; r++) {
{code}

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9684) DataNode stopped sending heartbeat after getting OutOfMemoryError form DataTransfer thread.

2016-04-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247037#comment-15247037
 ] 

Walter Su commented on HDFS-9684:
-

My previous comment is incorrect. It turns out that the MR tasks swallowed all 
the virtual memories.

> DataNode stopped sending heartbeat after getting OutOfMemoryError form 
> DataTransfer thread.
> ---
>
> Key: HDFS-9684
> URL: https://issues.apache.org/jira/browse/HDFS-9684
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Blocker
> Attachments: HDFS-9684.01.patch
>
>
> {noformat}
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:714)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1999)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:2008)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:857)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:671)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246996#comment-15246996
 ] 

Walter Su commented on HDFS-10301:
--

1. IPC reader is single-thread by default. If it's multi-threaded, The order of 
putting rpc requests into {{callQueue}} is unspecified.
1. IPC {{callQueue}} is fifo.
2. IPC Handler is multi-threaded. If 2 handlers are both waiting the fsn lock, 
the entry order depends on the fairness of the lock.
bq. When constructed as fair, threads contend for entry using an 
*approximately* arrival-order policy. When the currently held lock is released 
either the longest-waiting single writer thread will be assigned the write 
lock... (quore from 
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html)

I think if DN can't get acked from NN, it shouldn't assume the 
arrival/processing order(esp when reestablish a connection). Well, I'm still 
curious about how the interleave happened. Any thoughts?

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-10275:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8, branch-2.7. Thanks [~linyiqun] for 
the contribution!

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.7.3
>
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245569#comment-15245569
 ] 

Walter Su commented on HDFS-10275:
--

sorry I didn't see that. The patch LGTM. +1.

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245394#comment-15245394
 ] 

Walter Su commented on HDFS-10275:
--

Good analysis! I think a better way to do this is to use a real FSDataset? Just 
remove {{SimulatedFSDataset.setFactory(conf);}}. What do you think ?

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently

2016-04-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245332#comment-15245332
 ] 

Walter Su commented on HDFS-10284:
--

bq. I think it's due to mocking fsn while being concurrently accessed by 
another thread (smmthread).
Good point.
bq. Stubbing or verification of a shared mock from different threads is NOT the 
proper way of testing because it will always lead to intermittent behavior. 
(quote from https://github.com/mockito/mockito/wiki/FAQ)
bq. feel free to use mocks concurrently, however prepare (stub) them before the 
concurrency starts. (quote from 
https://code.google.com/archive/p/mockito/issues/301)

So I think we should move the stubbing 
{code}
doReturn(true).when(fsn).inTransitionToActive();
{code}
before test starts, at least before {{smmthread}} is started. The patch looks 
really good.

bq. I found the BlockManagerSafeMode$SafeModeMonitor#canLeave is not checking 
the namesystem#inTransitionToActive()
It make sense. Would you create another jira for this?

> o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode 
> fails intermittently
> -
>
> Key: HDFS-10284
> URL: https://issues.apache.org/jira/browse/HDFS-10284
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Minor
> Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch
>
>
> *Stacktrace*
> {code}
> org.mockito.exceptions.misusing.UnfinishedStubbingException: 
> Unfinished stubbing detected here:
> -> at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169)
> E.g. thenReturn() may be missing.
> Examples of correct stubbing:
> when(mock.isOk()).thenReturn(true);
> when(mock.isOk()).thenThrow(exception);
> doThrow(exception).when(mock).someVoidMethod();
> Hints:
>  1. missing thenReturn()
>  2. although stubbed methods may return mocks, you cannot inline mock 
> creation (mock()) call inside a thenReturn method (see issue 53)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169)
> {code}
> Sample failing pre-commit UT: 
> https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10291) TestShortCircuitLocalRead failing

2016-04-17 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245154#comment-15245154
 ] 

Walter Su commented on HDFS-10291:
--

+1.

> TestShortCircuitLocalRead failing
> -
>
> Key: HDFS-10291
> URL: https://issues.apache.org/jira/browse/HDFS-10291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HDFS-10291-001.patch
>
>
> {{TestShortCircuitLocalRead}} failing as length of read is considered off end 
> of buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete

2016-04-17 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9412:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8. Thanks [~He Tianyi] for contribution!

> getBlocks occupies FSLock and takes too long to complete
> 
>
> Key: HDFS-9412
> URL: https://issues.apache.org/jira/browse/HDFS-9412
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: He Tianyi
>Assignee: He Tianyi
> Fix For: 2.8.0
>
> Attachments: HDFS-9412..patch, HDFS-9412.0001.patch, 
> HDFS-9412.0002.patch
>
>
> {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a 
> long time to complete (probably several seconds, if number of blocks are too 
> much). 
> During this period, other threads attempting to acquire write lock will wait. 
> In an extreme case, RPC handlers are occupied by one reader thread calling 
> {{getBlocks}} and all other threads waiting for write lock, rpc server acts 
> like hung. Unfortunately, this tends to happen in heavy loaded cluster, since 
> read operations come and go fast (they do not need to wait), leaving write 
> operations waiting.
> Looks like we can optimize this thing like DN block report did in past, by 
> splitting the operation into smaller sub operations, and let other threads do 
> their work between each sub operation. The whole result is returned at once, 
> though (one thing different from DN block report). 
> I am not sure whether this will work. Any better idea?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete

2016-04-14 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240996#comment-15240996
 ] 

Walter Su commented on HDFS-9412:
-

{{TestBalancer}} passes locally. +1 for the last patch.

> getBlocks occupies FSLock and takes too long to complete
> 
>
> Key: HDFS-9412
> URL: https://issues.apache.org/jira/browse/HDFS-9412
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: He Tianyi
>Assignee: He Tianyi
> Attachments: HDFS-9412..patch, HDFS-9412.0001.patch, 
> HDFS-9412.0002.patch
>
>
> {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a 
> long time to complete (probably several seconds, if number of blocks are too 
> much). 
> During this period, other threads attempting to acquire write lock will wait. 
> In an extreme case, RPC handlers are occupied by one reader thread calling 
> {{getBlocks}} and all other threads waiting for write lock, rpc server acts 
> like hung. Unfortunately, this tends to happen in heavy loaded cluster, since 
> read operations come and go fast (they do not need to wait), leaving write 
> operations waiting.
> Looks like we can optimize this thing like DN block report did in past, by 
> splitting the operation into smaller sub operations, and let other threads do 
> their work between each sub operation. The whole result is returned at once, 
> though (one thing different from DN block report). 
> I am not sure whether this will work. Any better idea?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete

2016-04-13 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240455#comment-15240455
 ] 

Walter Su commented on HDFS-9412:
-

Thank you for updating. The test {{TestGetBlocks}} failed. Do you mind changing 
the test accordingly? And fix the checkstyle issue as well. 

> getBlocks occupies FSLock and takes too long to complete
> 
>
> Key: HDFS-9412
> URL: https://issues.apache.org/jira/browse/HDFS-9412
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: He Tianyi
>Assignee: He Tianyi
> Attachments: HDFS-9412..patch, HDFS-9412.0001.patch
>
>
> {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a 
> long time to complete (probably several seconds, if number of blocks are too 
> much). 
> During this period, other threads attempting to acquire write lock will wait. 
> In an extreme case, RPC handlers are occupied by one reader thread calling 
> {{getBlocks}} and all other threads waiting for write lock, rpc server acts 
> like hung. Unfortunately, this tends to happen in heavy loaded cluster, since 
> read operations come and go fast (they do not need to wait), leaving write 
> operations waiting.
> Looks like we can optimize this thing like DN block report did in past, by 
> splitting the operation into smaller sub operations, and let other threads do 
> their work between each sub operation. The whole result is returned at once, 
> though (one thing different from DN block report). 
> I am not sure whether this will work. Any better idea?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete

2016-04-13 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239103#comment-15239103
 ] 

Walter Su commented on HDFS-9412:
-

One thread holding a readLock too long is very like holding a writeLock. We 
should avoid that. And after HDFS-8824, the small blocks are unused anyway, so 
there's no point to send them to balancer.
Hi, [~He Tianyi], Do you mind rebase the patch?

> getBlocks occupies FSLock and takes too long to complete
> 
>
> Key: HDFS-9412
> URL: https://issues.apache.org/jira/browse/HDFS-9412
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: He Tianyi
>Assignee: He Tianyi
> Attachments: HDFS-9412..patch
>
>
> {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a 
> long time to complete (probably several seconds, if number of blocks are too 
> much). 
> During this period, other threads attempting to acquire write lock will wait. 
> In an extreme case, RPC handlers are occupied by one reader thread calling 
> {{getBlocks}} and all other threads waiting for write lock, rpc server acts 
> like hung. Unfortunately, this tends to happen in heavy loaded cluster, since 
> read operations come and go fast (they do not need to wait), leaving write 
> operations waiting.
> Looks like we can optimize this thing like DN block report did in past, by 
> splitting the operation into smaller sub operations, and let other threads do 
> their work between each sub operation. The whole result is returned at once, 
> though (one thing different from DN block report). 
> I am not sure whether this will work. Any better idea?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9772) TestBlockReplacement#testThrottler doesn't work as expected

2016-04-13 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9772:

Labels: test  (was: )
  Priority: Minor  (was: Major)
Issue Type: Test  (was: Bug)

> TestBlockReplacement#testThrottler doesn't work as expected
> ---
>
> Key: HDFS-9772
> URL: https://issues.apache.org/jira/browse/HDFS-9772
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: test
> Fix For: 2.7.3
>
> Attachments: HDFS.001.patch
>
>
> In {{TestBlockReplacement#testThrottler}}, it use a fault variable to 
> calculate the ended bandwidth. It use variable {{totalBytes}} rathe than 
> final variable {{TOTAL_BYTES}}. And the value of {{TOTAL_BYTES}} is set to 
> {{bytesToSend}}. The {{totalBytes}} looks no meaning here and this will make 
> {{totalBytes*1000/(end-start)}} always be 0 and the comparison always true. 
> The method code is below:
> {code}
> @Test
>   public void testThrottler() throws IOException {
> Configuration conf = new HdfsConfiguration();
> FileSystem.setDefaultUri(conf, "hdfs://localhost:0");
> long bandwidthPerSec = 1024*1024L;
> final long TOTAL_BYTES =6*bandwidthPerSec; 
> long bytesToSend = TOTAL_BYTES; 
> long start = Time.monotonicNow();
> DataTransferThrottler throttler = new 
> DataTransferThrottler(bandwidthPerSec);
> long totalBytes = 0L;
> long bytesSent = 1024*512L; // 0.5MB
> throttler.throttle(bytesSent);
> bytesToSend -= bytesSent;
> bytesSent = 1024*768L; // 0.75MB
> throttler.throttle(bytesSent);
> bytesToSend -= bytesSent;
> try {
>   Thread.sleep(1000);
> } catch (InterruptedException ignored) {}
> throttler.throttle(bytesToSend);
> long end = Time.monotonicNow();
> assertTrue(totalBytes*1000/(end-start)<=bandwidthPerSec);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9772) TestBlockReplacement#testThrottler doesn't work as expected

2016-04-13 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9772:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8, branch-2.7. Thanks [~linyiqun] for 
the contribution. 

> TestBlockReplacement#testThrottler doesn't work as expected
> ---
>
> Key: HDFS-9772
> URL: https://issues.apache.org/jira/browse/HDFS-9772
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.7.3
>
> Attachments: HDFS.001.patch
>
>
> In {{TestBlockReplacement#testThrottler}}, it use a fault variable to 
> calculate the ended bandwidth. It use variable {{totalBytes}} rathe than 
> final variable {{TOTAL_BYTES}}. And the value of {{TOTAL_BYTES}} is set to 
> {{bytesToSend}}. The {{totalBytes}} looks no meaning here and this will make 
> {{totalBytes*1000/(end-start)}} always be 0 and the comparison always true. 
> The method code is below:
> {code}
> @Test
>   public void testThrottler() throws IOException {
> Configuration conf = new HdfsConfiguration();
> FileSystem.setDefaultUri(conf, "hdfs://localhost:0");
> long bandwidthPerSec = 1024*1024L;
> final long TOTAL_BYTES =6*bandwidthPerSec; 
> long bytesToSend = TOTAL_BYTES; 
> long start = Time.monotonicNow();
> DataTransferThrottler throttler = new 
> DataTransferThrottler(bandwidthPerSec);
> long totalBytes = 0L;
> long bytesSent = 1024*512L; // 0.5MB
> throttler.throttle(bytesSent);
> bytesToSend -= bytesSent;
> bytesSent = 1024*768L; // 0.75MB
> throttler.throttle(bytesSent);
> bytesToSend -= bytesSent;
> try {
>   Thread.sleep(1000);
> } catch (InterruptedException ignored) {}
> throttler.throttle(bytesToSend);
> long end = Time.monotonicNow();
> assertTrue(totalBytes*1000/(end-start)<=bandwidthPerSec);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9772) TestBlockReplacement#testThrottler doesn't work as expected

2016-04-13 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9772:

Summary: TestBlockReplacement#testThrottler doesn't work as expected  (was: 
TestBlockReplacement#testThrottler use falut variable to calculate bandwidth)

> TestBlockReplacement#testThrottler doesn't work as expected
> ---
>
> Key: HDFS-9772
> URL: https://issues.apache.org/jira/browse/HDFS-9772
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS.001.patch
>
>
> In {{TestBlockReplacement#testThrottler}}, it use a fault variable to 
> calculate the ended bandwidth. It use variable {{totalBytes}} rathe than 
> final variable {{TOTAL_BYTES}}. And the value of {{TOTAL_BYTES}} is set to 
> {{bytesToSend}}. The {{totalBytes}} looks no meaning here and this will make 
> {{totalBytes*1000/(end-start)}} always be 0 and the comparison always true. 
> The method code is below:
> {code}
> @Test
>   public void testThrottler() throws IOException {
> Configuration conf = new HdfsConfiguration();
> FileSystem.setDefaultUri(conf, "hdfs://localhost:0");
> long bandwidthPerSec = 1024*1024L;
> final long TOTAL_BYTES =6*bandwidthPerSec; 
> long bytesToSend = TOTAL_BYTES; 
> long start = Time.monotonicNow();
> DataTransferThrottler throttler = new 
> DataTransferThrottler(bandwidthPerSec);
> long totalBytes = 0L;
> long bytesSent = 1024*512L; // 0.5MB
> throttler.throttle(bytesSent);
> bytesToSend -= bytesSent;
> bytesSent = 1024*768L; // 0.75MB
> throttler.throttle(bytesSent);
> bytesToSend -= bytesSent;
> try {
>   Thread.sleep(1000);
> } catch (InterruptedException ignored) {}
> throttler.throttle(bytesToSend);
> long end = Time.monotonicNow();
> assertTrue(totalBytes*1000/(end-start)<=bandwidthPerSec);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9772) TestBlockReplacement#testThrottler use falut variable to calculate bandwidth

2016-04-13 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238984#comment-15238984
 ] 

Walter Su commented on HDFS-9772:
-

+1.

> TestBlockReplacement#testThrottler use falut variable to calculate bandwidth
> 
>
> Key: HDFS-9772
> URL: https://issues.apache.org/jira/browse/HDFS-9772
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS.001.patch
>
>
> In {{TestBlockReplacement#testThrottler}}, it use a fault variable to 
> calculate the ended bandwidth. It use variable {{totalBytes}} rathe than 
> final variable {{TOTAL_BYTES}}. And the value of {{TOTAL_BYTES}} is set to 
> {{bytesToSend}}. The {{totalBytes}} looks no meaning here and this will make 
> {{totalBytes*1000/(end-start)}} always be 0 and the comparison always true. 
> The method code is below:
> {code}
> @Test
>   public void testThrottler() throws IOException {
> Configuration conf = new HdfsConfiguration();
> FileSystem.setDefaultUri(conf, "hdfs://localhost:0");
> long bandwidthPerSec = 1024*1024L;
> final long TOTAL_BYTES =6*bandwidthPerSec; 
> long bytesToSend = TOTAL_BYTES; 
> long start = Time.monotonicNow();
> DataTransferThrottler throttler = new 
> DataTransferThrottler(bandwidthPerSec);
> long totalBytes = 0L;
> long bytesSent = 1024*512L; // 0.5MB
> throttler.throttle(bytesSent);
> bytesToSend -= bytesSent;
> bytesSent = 1024*768L; // 0.75MB
> throttler.throttle(bytesSent);
> bytesToSend -= bytesSent;
> try {
>   Thread.sleep(1000);
> } catch (InterruptedException ignored) {}
> throttler.throttle(bytesToSend);
> long end = Time.monotonicNow();
> assertTrue(totalBytes*1000/(end-start)<=bandwidthPerSec);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9825) Balancer should not terminate if only one of the namenodes has error

2016-04-13 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238891#comment-15238891
 ] 

Walter Su commented on HDFS-9825:
-

The patch looks pretty good. Could you rebase it? And one question:
{code}
  for(int iteration = 0;; iteration++) {
final Map results = new LinkedHashMap<>();
for(NameNodeConnector nnc : connectors) {
{code}
Does it need to retry the succeeded/failed namenodes in each iteration? Since 
One block pool may takes much longer than the others.

> Balancer should not terminate if only one of the namenodes has error
> 
>
> Key: HDFS-9825
> URL: https://issues.apache.org/jira/browse/HDFS-9825
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h9825_20160217.patch, h9825_20160218.patch, 
> h9825_20160218b.patch
>
>
> Currently, the Balancer terminates if only one of the namenodes has error in 
> federation setting.  Instead, it should continue balancing the cluster with 
> the remaining namenodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9476) TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail

2016-04-13 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238658#comment-15238658
 ] 

Walter Su commented on HDFS-9476:
-

+1.

> TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail
> -
>
> Key: HDFS-9476
> URL: https://issues.apache.org/jira/browse/HDFS-9476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Akira AJISAKA
> Attachments: HDFS-9476.01.patch
>
>
> This test occasionally fail. For example, the most recent one is:
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2587/
> Error Message
> {noformat}
> Cannot obtain block length for 
> LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020;
>  getBlockSize()=1024; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]}
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Cannot obtain block length for 
> LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020;
>  getBlockSize()=1024; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]}
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:399)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:343)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:275)
>   at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:265)
>   at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1046)
>   at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1011)
>   at 
> org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.dfsOpenFileWithRetries(TestDFSUpgradeFromImage.java:177)
>   at 
> org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyDir(TestDFSUpgradeFromImage.java:213)
>   at 
> org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyFileSystem(TestDFSUpgradeFromImage.java:228)
>   at 
> org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.upgradeAndVerify(TestDFSUpgradeFromImage.java:600)
>   at 
> org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.testUpgradeFromRel1BBWImage(TestDFSUpgradeFromImage.java:622)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9826) Erasure Coding: Postpone the recovery work for a configurable time period

2016-04-12 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238589#comment-15238589
 ] 

Walter Su commented on HDFS-9826:
-

Good thought. And I think current implementation {{LowRedundancyBlocks}} uses 
multi-level priority queue. Blocks of highest risk are always processed first. 
It achieves the same goal as you proposed. Don't you think?

>  Erasure Coding: Postpone the recovery work for a configurable time period
> --
>
> Key: HDFS-9826
> URL: https://issues.apache.org/jira/browse/HDFS-9826
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Li Bo
>Assignee: Li Bo
> Attachments: HDFS-9826-001.patch, HDFS-9826-002.patch
>
>
> Currently NameNode prepares recovering when finding an under replicated  
> block group. This is inefficient and reduces resources for other operations. 
> It would be better to postpone the recovery work for a period of time if only 
> one internal block is corrupted considering points shown by papers such as 
> \[1\]\[2\]:
> 1.Transient errors in which no data are lost account for more than 90% of 
> data center failures, owing to network partitions, software problems, or 
> non-disk hardware faults.
> 2.Although erasure codes tolerate multiple simultaneous failures, single 
> failures represent 99.75% of recoveries.
> Different clusters may have different status, so we should allow user to 
> configure the time for postponing the recoveries. Proper configuration will 
> reduce a large proportion of unnecessary recoveries. When finding multiple 
> internal blocks corrupted in a block group, we prepare the recovery work 
> immediately because it’s very rare and we don’t want to increase the risk of 
> losing data.
> [1] Availability in globally distributed storage systems
> http://static.usenix.org/events/osdi10/tech/full_papers/Ford.pdf
> [2] Rethinking erasure codes for cloud file systems: minimizing I/O for 
> recovery and degraded reads
> http://static.usenix.org/events/fast/tech/full_papers/Khan.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states

2016-04-09 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233561#comment-15233561
 ] 

Walter Su commented on HDFS-9918:
-

+1. Thanks, [~rakesh_r].

> Erasure Coding: Sort located striped blocks based on decommissioned states
> --
>
> Key: HDFS-9918
> URL: https://issues.apache.org/jira/browse/HDFS-9918
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, 
> HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, 
> HDFS-9918-006.patch, HDFS-9918-007.patch, HDFS-9918-008.patch, 
> HDFS-9918-009.patch, HDFS-9918-010.patch, HDFS-9918-011.patch, 
> HDFS-9918-012.patch, HDFS-9918-013.patch
>
>
> This jira is a follow-on work of HDFS-8786, where we do decommissioning of 
> datanodes having striped blocks.
> Now, after decommissioning it requires to change the ordering of the storage 
> list so that the decommissioned datanodes should only be last node in list.
> For example, assume we have a block group with storage list:-
> d0, d1, d2, d3, d4, d5, d6, d7, d8, d9
> mapping to indices
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 2
> Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a 
> decommissioning node then should switch d2 and d9 in the storage list.
> Thanks [~jingzhao] for the 
> [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-08 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232103#comment-15232103
 ] 

Walter Su commented on HDFS-7661:
-

Great design/discussion. Since we come back to discuss the use cases, and 
"effort vs benefit“. I'm thinking if the use cases are rare, we can provide a 
simpler workaround. We provide:
1. a fake "flush", which only flushes the full stripe, and doesn't flush the 
last partial stripe. It won't make sure every byte is safe, but it helps 
recovery logic to recover more data.
2. a real "flush". The easiest way to do this is to start a new block group. It 
makes sure the data written before the "flush" is safe and visible. It saves 
user the trouble of closing and appending the same file.

Since we support variable-length blocks, it's totally doable. I need to mention 
that the implementation of appending striped file also utilizes variable-length 
blocks. The trouble is creating too many block groups. But if there's too many 
small blocks, and if they are  adjacent in the same file, we can concatenate 
them to a bigger block, although striped blocks concatenation seems not easy 
either.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, Undo-Log-Design-20160406.jpg
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states

2016-04-07 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231512#comment-15231512
 ] 

Walter Su commented on HDFS-9918:
-

The patch looks pretty good. Thanks [~rakesh_r]. tiny suggestions:
1. Since we don't sort by distance, the logic for resolving client node can 
moved inside {{sortLocatedBlock}}.
2. It would be better if we can test the locToIndex mapping is correct after 
sorting.

> Erasure Coding: Sort located striped blocks based on decommissioned states
> --
>
> Key: HDFS-9918
> URL: https://issues.apache.org/jira/browse/HDFS-9918
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, 
> HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, 
> HDFS-9918-006.patch, HDFS-9918-007.patch, HDFS-9918-008.patch, 
> HDFS-9918-009.patch, HDFS-9918-010.patch, HDFS-9918-011.patch
>
>
> This jira is a follow-on work of HDFS-8786, where we do decommissioning of 
> datanodes having striped blocks.
> Now, after decommissioning it requires to change the ordering of the storage 
> list so that the decommissioned datanodes should only be last node in list.
> For example, assume we have a block group with storage list:-
> d0, d1, d2, d3, d4, d5, d6, d7, d8, d9
> mapping to indices
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 2
> Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a 
> decommissioning node then should switch d2 and d9 in the storage list.
> Thanks [~jingzhao] for the 
> [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states

2016-03-30 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219241#comment-15219241
 ] 

Walter Su commented on HDFS-9918:
-

The optimization works for {{BlockInfoStriped}}. A missing block occupies a 
slot. Like
{noformat}
0, null, 2, 3, 4, 5, 6, 7, 8, 1, 0', 1', 7', 8'
{noformat}
In LocatedStripedBlock, the data is like
{noformat}
0, 2, 3, 4, 5, 6, 7, 8, 1, 0', 1', 7', 8'
{noformat}
That's why I don't see the point maintain the order (of the in-service ones). 
Unless we change {{createLocatedBlock(..)}}. But optimizing 
{{LocatedStripedBlock}} to save some network traffic seems trival.

> Erasure Coding: Sort located striped blocks based on decommissioned states
> --
>
> Key: HDFS-9918
> URL: https://issues.apache.org/jira/browse/HDFS-9918
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, 
> HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, 
> HDFS-9918-006.patch, HDFS-9918-007.patch, HDFS-9918-008.patch
>
>
> This jira is a follow-on work of HDFS-8786, where we do decommissioning of 
> datanodes having striped blocks.
> Now, after decommissioning it requires to change the ordering of the storage 
> list so that the decommissioned datanodes should only be last node in list.
> For example, assume we have a block group with storage list:-
> d0, d1, d2, d3, d4, d5, d6, d7, d8, d9
> mapping to indices
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 2
> Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a 
> decommissioning node then should switch d2 and d9 in the storage list.
> Thanks [~jingzhao] for the 
> [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states

2016-03-29 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217456#comment-15217456
 ] 

Walter Su commented on HDFS-9918:
-

I see the difference. To achieve your goal, we need a new comparator, and zip 3 
arrays into 1 array. But I don't see the point of preserving the order of 
blkIndices.
bq. how about going ahead with the previous approach?
It's just I prefer comparator paradigm. It's easier to understand and modify. 
I'm ok with your previous approach.:)

> Erasure Coding: Sort located striped blocks based on decommissioned states
> --
>
> Key: HDFS-9918
> URL: https://issues.apache.org/jira/browse/HDFS-9918
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, 
> HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, 
> HDFS-9918-006.patch, HDFS-9918-007.patch
>
>
> This jira is a follow-on work of HDFS-8786, where we do decommissioning of 
> datanodes having striped blocks.
> Now, after decommissioning it requires to change the ordering of the storage 
> list so that the decommissioned datanodes should only be last node in list.
> For example, assume we have a block group with storage list:-
> d0, d1, d2, d3, d4, d5, d6, d7, d8, d9
> mapping to indices
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 2
> Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a 
> decommissioning node then should switch d2 and d9 in the storage list.
> Thanks [~jingzhao] for the 
> [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states

2016-03-29 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217217#comment-15217217
 ] 

Walter Su commented on HDFS-9918:
-

bq. 1. Index in the logical block group. 2.Decomm status 3.Distance to the 
targethost
Good summary. And, If the sorting priority is 2,1, we can reuse 
{{DecomStaleComparator}}. 
{code}
  // Move decommissioned/stale datanodes to the bottom
  Arrays.sort(di, comparator);
{code}
Because according to the javadoc of {{Arays.sort(..)}} ,
{noformat}
This sort is guaranteed to be stable: equal elements will not be reordered as a 
result of the sort.
{noformat}
Also we can write a new comparator. But I think client side can handle the 
randomized ordering if there's no duplication?

> Erasure Coding: Sort located striped blocks based on decommissioned states
> --
>
> Key: HDFS-9918
> URL: https://issues.apache.org/jira/browse/HDFS-9918
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, 
> HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, 
> HDFS-9918-006.patch, HDFS-9918-007.patch
>
>
> This jira is a follow-on work of HDFS-8786, where we do decommissioning of 
> datanodes having striped blocks.
> Now, after decommissioning it requires to change the ordering of the storage 
> list so that the decommissioned datanodes should only be last node in list.
> For example, assume we have a block group with storage list:-
> d0, d1, d2, d3, d4, d5, d6, d7, d8, d9
> mapping to indices
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 2
> Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a 
> decommissioning node then should switch d2 and d9 in the storage list.
> Thanks [~jingzhao] for the 
> [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9918) Erasure Coding: Sort located striped blocks based on decommissioned states

2016-03-29 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215796#comment-15215796
 ] 

Walter Su commented on HDFS-9918:
-

We also need to sort locations by distance. It's unlikely but from 2 duplicated 
in-service blocks we want to choose the nearer one. The logic is like 
{{sortLocatedBlocks(..)}}, how about reuse it?
We move {{BlockIndex}} & {{BlockToken}} according to {{Location}}. Like
{code}
//public void sortLocatedBlocks(final String targethost,
 for (LocatedBlock b : locatedblocks) {
   DatanodeInfo[] di = b.getLocations();
+  HashMap locToIndex = null;
+  HashMap locToToken = null;
+  if(b instanceof LocatedStripedBlock){
+locToIndex = new HashMap<>();
+locToToken = new HashMap<>();
+LocatedStripedBlock lb = (LocatedStripedBlock) b;
+for(int i=0; i Erasure Coding: Sort located striped blocks based on decommissioned states
> --
>
> Key: HDFS-9918
> URL: https://issues.apache.org/jira/browse/HDFS-9918
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-9918-001.patch, HDFS-9918-002.patch, 
> HDFS-9918-003.patch, HDFS-9918-004.patch, HDFS-9918-005.patch, 
> HDFS-9918-006.patch
>
>
> This jira is a follow-on work of HDFS-8786, where we do decommissioning of 
> datanodes having striped blocks.
> Now, after decommissioning it requires to change the ordering of the storage 
> list so that the decommissioned datanodes should only be last node in list.
> For example, assume we have a block group with storage list:-
> d0, d1, d2, d3, d4, d5, d6, d7, d8, d9
> mapping to indices
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 2
> Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a 
> decommissioning node then should switch d2 and d9 in the storage list.
> Thanks [~jingzhao] for the 
> [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10182) Hedged read might overwrite user's buf

2016-03-29 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-10182:
-
Fix Version/s: 2.6.5

committed to branch-2.6

> Hedged read might overwrite user's buf
> --
>
> Key: HDFS-10182
> URL: https://issues.apache.org/jira/browse/HDFS-10182
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhouyingchao
>Assignee: zhouyingchao
> Fix For: 2.7.3, 2.6.5
>
> Attachments: HDFS-10182-001.patch, HDFS-10182-branch26.patch
>
>
> In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the 
> passed-in buf from the caller is passed to another thread to fill.  If the 
> first attempt is timed out, the second attempt would be issued with another 
> temp ByteBuffer. Now  suppose the second attempt wins and the first attempt 
> is blocked somewhere in the IO path. The second attempt's result would be 
> copied to the buf provided by the caller and then caller would think the 
> pread is all set. Later the caller might use the buf to do something else 
> (for e.g. read another chunk of data), however, the first attempt in earlier 
> hedgedFetchBlockByteRange might get some data and fill into the buf ... 
> If this happens, the caller's buf would then be corrupted.
> To fix the issue, we should allocate a temp buf for the first attempt too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10182) Hedged read might overwrite user's buf

2016-03-28 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-10182:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8, branch-2.7.
Hi, [~sinago]. Would you mind upload a patch against branch-2.6?

> Hedged read might overwrite user's buf
> --
>
> Key: HDFS-10182
> URL: https://issues.apache.org/jira/browse/HDFS-10182
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhouyingchao
>Assignee: zhouyingchao
> Fix For: 2.7.3
>
> Attachments: HDFS-10182-001.patch
>
>
> In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the 
> passed-in buf from the caller is passed to another thread to fill.  If the 
> first attempt is timed out, the second attempt would be issued with another 
> temp ByteBuffer. Now  suppose the second attempt wins and the first attempt 
> is blocked somewhere in the IO path. The second attempt's result would be 
> copied to the buf provided by the caller and then caller would think the 
> pread is all set. Later the caller might use the buf to do something else 
> (for e.g. read another chunk of data), however, the first attempt in earlier 
> hedgedFetchBlockByteRange might get some data and fill into the buf ... 
> If this happens, the caller's buf would then be corrupted.
> To fix the issue, we should allocate a temp buf for the first attempt too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10182) Hedged read might overwrite user's buf

2016-03-28 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213926#comment-15213926
 ] 

Walter Su commented on HDFS-10182:
--

+1. I'll commit it shortly.

> Hedged read might overwrite user's buf
> --
>
> Key: HDFS-10182
> URL: https://issues.apache.org/jira/browse/HDFS-10182
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhouyingchao
>Assignee: zhouyingchao
> Attachments: HDFS-10182-001.patch
>
>
> In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the 
> passed-in buf from the caller is passed to another thread to fill.  If the 
> first attempt is timed out, the second attempt would be issued with another 
> temp ByteBuffer. Now  suppose the second attempt wins and the first attempt 
> is blocked somewhere in the IO path. The second attempt's result would be 
> copied to the buf provided by the caller and then caller would think the 
> pread is all set. Later the caller might use the buf to do something else 
> (for e.g. read another chunk of data), however, the first attempt in earlier 
> hedgedFetchBlockByteRange might get some data and fill into the buf ... 
> If this happens, the caller's buf would then be corrupted.
> To fix the issue, we should allocate a temp buf for the first attempt too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9952) Expose FSNamesystem lock wait time as metrics

2016-03-28 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213915#comment-15213915
 ] 

Walter Su commented on HDFS-9952:
-

Thanks [~vinayrpet] for updating. Just one minor suggestion:
We should take care of {{readUnlock()}} being called even if currentThread 
doesn't have it actually. Just like {{writeUnlock()}} did. Better safe than 
sorry.
Otherwise, The patch looks pretty good to me. +1 once addressed. It would be 
great if [~daryn] can also take a look.

> Expose FSNamesystem lock wait time as metrics
> -
>
> Key: HDFS-9952
> URL: https://issues.apache.org/jira/browse/HDFS-9952
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-9952-01.patch, HDFS-9952-02.patch, 
> HDFS-9952-03.patch
>
>
> Expose FSNameSystem's readlock() and writeLock() wait time as metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10182) Hedged read might overwrite user's buf

2016-03-19 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201350#comment-15201350
 ] 

Walter Su commented on HDFS-10182:
--

And because {{cancelAll(futures);}} doesn't interrupt the first attempt, and 
also don't wait for it to finish. Thanks [~sinago] for reporting. The patch 
LGTM.

> Hedged read might overwrite user's buf
> --
>
> Key: HDFS-10182
> URL: https://issues.apache.org/jira/browse/HDFS-10182
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhouyingchao
>Assignee: zhouyingchao
> Attachments: HDFS-10182-001.patch
>
>
> In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the 
> passed-in buf from the caller is passed to another thread to fill.  If the 
> first attempt is timed out, the second attempt would be issued with another 
> temp ByteBuffer. Now  suppose the second attempt wins and the first attempt 
> is blocked somewhere in the IO path. The second attempt's result would be 
> copied to the buf provided by the caller and then caller would think the 
> pread is all set. Later the caller might use the buf to do something else 
> (for e.g. read another chunk of data), however, the first attempt in earlier 
> hedgedFetchBlockByteRange might get some data and fill into the buf ... 
> If this happens, the caller's buf would then be corrupted.
> To fix the issue, we should allocate a temp buf for the first attempt too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9952) Expose FSNamesystem lock wait time as metrics

2016-03-18 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198770#comment-15198770
 ] 

Walter Su commented on HDFS-9952:
-

bq. MutableRate#add is synchronized in an extremely critical code path which 
will destroy concurrent read ops.
We have many nested locks, for example:
{noformat}
getBlockLocations(..)
--> readLock()
--> isInSafeMode()
--> synchronized isInManualOrResourceLowSafeMode()

listCorruptFileBlocks(..)
--> readLock()
--> blockManager.getCorruptReplicaBlockIterator()
--> synchronized Iterator iterator(int level)
{noformat}
It's pretty difficult not to use any nested locks. I think if the time frame of 
(holding) inside (write)lock is short, comparing to that of holding outside 
lock, it's probably that N threads pass through the inside lock at different 
time. If there's little contention for inside lock, it hardly increase the 
contention for ouside lock. It's just every thread holds the outside lock a 
little longer because of the additional logic.

In this case, the time frame of holding MutableRate lock is short. What it does 
inside the lock is simple algebraic calculation. But assume fsWriteLock is just 
released, and many threads are waiting at the entrance of fsReadLock. If 
MutableRate lock is the first thing inside the door of fsReadLock, then there's 
lots contention for MutableRate lock once those threads get inside the door at 
the same time.

What if we save the value at ThreadLocal, and after we release the fsReadLock, 
we add it to metrics? ThreadLocal is lock free.

I'm not expert at lock, just what I thought.

> Expose FSNamesystem lock wait time as metrics
> -
>
> Key: HDFS-9952
> URL: https://issues.apache.org/jira/browse/HDFS-9952
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-9952-01.patch, HDFS-9952-02.patch
>
>
> Expose FSNameSystem's readlock() and writeLock() wait time as metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9684) DataNode stopped sending heartbeat after getting OutOfMemoryError form DataTransfer thread.

2016-03-15 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194914#comment-15194914
 ] 

Walter Su commented on HDFS-9684:
-

I have seen a case DN got command from NN to transfer huge numbers of blocks. 
There's 7000+ threads at its peak. I don't  advocate recover from 
{{OutOfMemoryError}}. But it's our responsibility not to create too much 
threads at first place.

> DataNode stopped sending heartbeat after getting OutOfMemoryError form 
> DataTransfer thread.
> ---
>
> Key: HDFS-9684
> URL: https://issues.apache.org/jira/browse/HDFS-9684
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Blocker
> Attachments: HDFS-9684.01.patch
>
>
> {noformat}
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:714)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1999)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:2008)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:857)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:671)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8211) DataNode UUID is always null in the JMX counter

2016-03-14 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8211:

Priority: Major  (was: Minor)

Changes Priority to Major.
As [~qwertymaniac] pointed out, this patch unintentionally fixes an issue that 
DN may regenerate its UUIDs unintentionally (See HDFS-9949).
I think we should backport this to branch-2.7 ??

> DataNode UUID is always null in the JMX counter
> ---
>
> Key: HDFS-8211
> URL: https://issues.apache.org/jira/browse/HDFS-8211
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: 2.8.0
>
> Attachments: hdfs-8211.001.patch, hdfs-8211.002.patch
>
>
> The DataNode JMX counters are tagged with DataNode UUID, but it always gets a 
> null value instead of the UUID.
> {code}
> Hadoop:service=DataNode,name=FSDatasetState*-null*.
> {code}
> This null is supposed be the datanode UUID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-09 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186974#comment-15186974
 ] 

Walter Su commented on HDFS-9822:
-

bq. I am still a little confused how this error happens.
Me too. I don't think we get the right cause.
bq. But if there are same block group entry exists in different queue..
No 2 queues can have same BG. The update(..) logic is correct.
No queue can has 2 same items. The queue is a HashSet.

My pure guess is that it's caused by race condition. We have a guard at
{code}
//  BlockManager#scheduleReconstruction(..)
if (block.isStriped()) {
  if (pendingNum > 0) {
// Wait the previous reconstruction to finish.
return null;
  }
{code}
which is inside namesystem lock. But before {{ReplicationMonitor}} thread goes 
to {{validateReconstructionWork(..)}}, it loses the lock. So it's possible the 
junit thread get the lock. If they both passes the guard, eventually one of 
them will failed the assert.

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-08 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184856#comment-15184856
 ] 

Walter Su commented on HDFS-7866:
-

Sorry for the confusion, now the javadoc looks verbose. But thanks for trying. 
We can do some improvements in the follow-on JIRA.
The last patch LGTM too. Thanks again, [~lirui].

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, 
> HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, 
> HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-07 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184267#comment-15184267
 ] 

Walter Su commented on HDFS-7866:
-

1. Not only javadoc, what I mean was separating the logic of manipulation of 
the 12-bits. 2 sets of set/get methods for them. Now the cuts are both (1,11), 
it's just the meanings of each part are different. But what if in the future 
the cut is different? You have the same concern about unifying set method:
bq. 3. The biggest concern is INodeFile constructor – related to that, the 
toLong method. Currently when isStriped, we just interpret replication as the 
EC policy ID. This looks pretty hacky. But it looks pretty tricky to fix. 

By the way, if we're planning use unified cut (1,11) for both of them, why 
bother having one enum item BLOCK_LAYOUT_AND_REDUNDANCY(12-bits) and do the bit 
masking by myself, insead of 2 enum items as before which does the bit masking 
for us.

Some nits:
1. {{LAYOUT_BIT_WIDTH}}, {{MAX_REDUNDANCY}} can be private inside 
{{HeaderFormat}}.

2. 
{code}
  /**
   * @return The ID of the erasure coding policy on the file. -1 represents no
   *  EC policy.
   */
  @VisibleForTesting
  @Override
  public byte getErasureCodingPolicyID() {
if (isStriped()) {
  return (byte) HeaderFormat.getReplication(header);
}
return -1;
  }
{code}
{code}
 // check if the file has an EC policy
  ErasureCodingPolicy ecPolicy = FSDirErasureCodingOp.
  getErasureCodingPolicy(fsd.getFSNamesystem(), existing);
  if (ecPolicy != null) {
replication = ecPolicy.getId();
  }
{code}
We are sure policyID is strictly <=7-bits right? Casting an ID with value >=128 
to byte becomes negative, then the logic gets wild. Vice versa.

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, 
> HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, 
> HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-07 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183118#comment-15183118
 ] 

Walter Su commented on HDFS-7866:
-

What do you think let it diverge instead of forcing unification? It takes more 
time to understand the new format from the code. I think add some javadoc would 
be nice.
How about like this:
{noformat}
 /** 
   * Bit format:
   * [4-bit storagePolicyID][12-bit BLOCK_LAYOUT_AND_REDUNDANCY]
   * [48-bit preferredBlockSize]
   *
   * BLOCK_LAYOUT_AND_REDUNDANCY format for replicated block:
   * 0 [11-bit replication]
   *
   * BLOCK_LAYOUT_AND_REDUNDANCY format for striped block:
   * 1 [11-bit ErasureCodingPolicy ID]
   *
   */
{noformat}
I think getErasureCodingPolicyID() don't have to re-use getReplication(long 
header). Even though now they are both 11-bits. In the future,
We might keep split 11-bit ec policy ID. The 2 methods keep diverging. I guess 
something like,
{noformat}
 /** 
   * Bit format:
   * [4-bit storagePolicyID][12-bit BLOCK_LAYOUT_AND_REDUNDANCY]
   * [48-bit preferredBlockSize]
   *
   * BLOCK_LAYOUT_AND_REDUNDANCY format for non-ec block:
   * 0 [11-bit replication]
   *
   * BLOCK_LAYOUT_AND_REDUNDANCY format for ec striped  block:
   * 10 [4-bit replication][6-bit ErasureCodingPolicy ID]
   *
   * BLOCK_LAYOUT_AND_REDUNDANCY format for ec contiguous block:
   * 11 [4-bit replication][6-bit ErasureCodingPolicy ID]
   *
   */
{noformat}
And I think we should reserve some high-value ID for custom policy. And reserve 
some for unknown  policy which might intergrated(hard-coded) in the future.

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, 
> HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, 
> HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9803) Proactively refresh ShortCircuitCache entries to avoid latency spikes

2016-02-19 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155386#comment-15155386
 ] 

Walter Su commented on HDFS-9803:
-

Is it related to HDFS-5637? Which version of hdfs-client module you use?

> Proactively refresh ShortCircuitCache entries to avoid latency spikes
> -
>
> Key: HDFS-9803
> URL: https://issues.apache.org/jira/browse/HDFS-9803
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Nick Dimiduk
>
> My region server logs are flooding with messages like 
> "SecretManager$InvalidToken: access control error while attempting to set up 
> short-circuit access to  ... is expired". These logs 
> correspond with responseTooSlow WARNings from the region server.
> {noformat}
> 2016-01-19 22:10:14,432 INFO  
> [B.defaultRpcServer.handler=4,queue=1,port=16020] 
> shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not load 
> 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken exception.
> org.apache.hadoop.security.token.SecretManager$InvalidToken: access control 
> error while attempting to set up short-circuit access to  token 
> with block_token_identifier (expiryDate=1453194430724, keyId=1508822027, 
> userId=hbase, blockPoolId=BP-1145309065-XXX-1448053136416, 
> blockId=1074037633, access modes=[READ]) is expired.
>   at 
> org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
>   at 
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
>   at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
>   at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
> ...
> {noformat}
> A potential solution could be to have a background thread that makes a best 
> effort to proactively refreshes tokens in the cache before they expire, so as 
> to minimize latency impact on the critical path.
> Thanks to [~cnauroth] for providing an explaination and suggesting a solution 
> over on the [user 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-user/201601.mbox/%3CCANZa%3DGt%3Dhvuf3fyOJqf-jdpBPL_xDknKBcp7LmaC-YUm0jDUVg%40mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk

2016-02-19 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9716:

Resolution: Cannot Reproduce
Status: Resolved  (was: Patch Available)

HDFS-9755 covers the same fix.
Closed this as 'Cannot Reproduce'. Thanks [~liuml07] for reporting the issue.

> o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
> ---
>
> Key: HDFS-9716
> URL: https://issues.apache.org/jira/browse/HDFS-9716
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Mingliang Liu
>Assignee: Walter Su
>  Labels: test
> Attachments: HDFS-9716.01.patch, HDFS-9716.02.patch
>
>
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/
> * 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9816) Erasure Coding: allow to use multiple EC policies in striping related tests [Part 3]

2016-02-17 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151796#comment-15151796
 ] 

Walter Su commented on HDFS-9816:
-

bq. Then we can move the current hard coded suite to TestBlockRecovery.
It works for me. Thanks [~zhz], [~lirui].
btw, If safeLength calculation is already tested at 
{{TestBlockRecovery#testSafeLength}}. I think {{TestLeaseRecoveryStriped}} 
should use production code to get safeLength instead of repeating it?

> Erasure Coding: allow to use multiple EC policies in striping related tests 
> [Part 3]
> 
>
> Key: HDFS-9816
> URL: https://issues.apache.org/jira/browse/HDFS-9816
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HDFS-9816.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9816) Erasure Coding: allow to use multiple EC policies in striping related tests [Part 3]

2016-02-17 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150432#comment-15150432
 ] 

Walter Su commented on HDFS-9816:
-

The safe length may change if steps 3,4 is included(See HDFS-9173). I prefer 
hard-coded test suite because it's more clear what the safe length is, and in 
which step for now. The safe length calculation also need to be tested. So it's 
better to use hard-coded values, and don't repeat the calculation logic from 
production code.

> Erasure Coding: allow to use multiple EC policies in striping related tests 
> [Part 3]
> 
>
> Key: HDFS-9816
> URL: https://issues.apache.org/jira/browse/HDFS-9816
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HDFS-9816.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9347) Invariant assumption in TestQuorumJournalManager.shutdown() is wrong

2016-02-08 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9347:

Fix Version/s: 2.6.5
   2.7.3

> Invariant assumption in TestQuorumJournalManager.shutdown() is wrong
> 
>
> Key: HDFS-9347
> URL: https://issues.apache.org/jira/browse/HDFS-9347
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.8.0, 2.7.3, 2.6.5
>
> Attachments: HDFS-9347.001.patch, HDFS-9347.002.patch, 
> HDFS-9347.003.patch, HDFS-9347.004.patch, HDFS-9347.005.patch, 
> HDFS-9347.006.patch
>
>
> The code
> {code:title=TestTestQuorumJournalManager.java|borderStyle=solid}
> @After
>   public void shutdown() throws IOException {
> IOUtils.cleanup(LOG, toClose.toArray(new Closeable[0]));
> 
> // Should not leak clients between tests -- this can cause flaky tests.
> // (See HDFS-4643)
> GenericTestUtils.assertNoThreadsMatching(".*IPC Client.*");
> 
> if (cluster != null) {
>   cluster.shutdown();
> }
>   }
> {code}
> implicitly assumes when the call returns from IOUtils.cleanup() (which calls 
> close() on QuorumJournalManager object), all IPC client connection threads 
> are terminated. However, there is no internal implementation that enforces 
> this assumption. Even if the bug reported in HADOOP-12532 is fixed, the 
> internal code still only ensures IPC connections are terminated, but not the 
> thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades

2016-02-08 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9752:

Attachment: HDFS-9752-branch-2.7.03.patch
HDFS-9752-branch-2.6.03.patch

> Permanent write failures may happen to slow writers during datanode rolling 
> upgrades
> 
>
> Key: HDFS-9752
> URL: https://issues.apache.org/jira/browse/HDFS-9752
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-9752-branch-2.6.03.patch, 
> HDFS-9752-branch-2.7.03.patch, HDFS-9752.01.patch, HDFS-9752.02.patch, 
> HDFS-9752.03.patch, HdfsWriter.java
>
>
> When datanodes are being upgraded, an out-of-band ack is sent upstream and 
> the client does a pipeline recovery. The client may hit this multiple times 
> as more nodes get upgraded.  This normally does not cause any issue, but if 
> the client is holding the stream open without writing any data during this 
> time, a permanent write failure can occur.
> This is because there is a limit of 5 recovery trials for the same packet, 
> which is tracked by "last acked sequence number". Since the empty heartbeat 
> packets for an idle output stream does not increment the sequence number, the 
> write will fail after it seeing 5 pipeline breakages by datanode upgrades.
> This check/limit was added to avoid spinning until running out of nodes in 
> the cluster due to a corruption or any other irrecoverable conditions.  The 
> datanode upgrade-restart  should be excluded from the count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9347) Invariant assumption in TestQuorumJournalManager.shutdown() is wrong

2016-02-08 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138264#comment-15138264
 ] 

Walter Su commented on HDFS-9347:
-

Thanks [~jojochuang] for the work. I just cherry-picked it to branch-2.7 and 
branch-2.6.

> Invariant assumption in TestQuorumJournalManager.shutdown() is wrong
> 
>
> Key: HDFS-9347
> URL: https://issues.apache.org/jira/browse/HDFS-9347
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.8.0, 2.7.3, 2.6.5
>
> Attachments: HDFS-9347.001.patch, HDFS-9347.002.patch, 
> HDFS-9347.003.patch, HDFS-9347.004.patch, HDFS-9347.005.patch, 
> HDFS-9347.006.patch
>
>
> The code
> {code:title=TestTestQuorumJournalManager.java|borderStyle=solid}
> @After
>   public void shutdown() throws IOException {
> IOUtils.cleanup(LOG, toClose.toArray(new Closeable[0]));
> 
> // Should not leak clients between tests -- this can cause flaky tests.
> // (See HDFS-4643)
> GenericTestUtils.assertNoThreadsMatching(".*IPC Client.*");
> 
> if (cluster != null) {
>   cluster.shutdown();
> }
>   }
> {code}
> implicitly assumes when the call returns from IOUtils.cleanup() (which calls 
> close() on QuorumJournalManager object), all IPC client connection threads 
> are terminated. However, there is no internal implementation that enforces 
> this assumption. Even if the bug reported in HADOOP-12532 is fixed, the 
> internal code still only ensures IPC connections are terminated, but not the 
> thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades

2016-02-08 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138306#comment-15138306
 ] 

Walter Su commented on HDFS-9752:
-

Thanks all for reviewing the patch.
The patch depends on HDFS-9347. I just cherry-picked it to 2.6.5. Now I've 
uploaded the separate patch for 2.7/2.6.

> Permanent write failures may happen to slow writers during datanode rolling 
> upgrades
> 
>
> Key: HDFS-9752
> URL: https://issues.apache.org/jira/browse/HDFS-9752
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-9752-branch-2.6.03.patch, 
> HDFS-9752-branch-2.7.03.patch, HDFS-9752.01.patch, HDFS-9752.02.patch, 
> HDFS-9752.03.patch, HdfsWriter.java
>
>
> When datanodes are being upgraded, an out-of-band ack is sent upstream and 
> the client does a pipeline recovery. The client may hit this multiple times 
> as more nodes get upgraded.  This normally does not cause any issue, but if 
> the client is holding the stream open without writing any data during this 
> time, a permanent write failure can occur.
> This is because there is a limit of 5 recovery trials for the same packet, 
> which is tracked by "last acked sequence number". Since the empty heartbeat 
> packets for an idle output stream does not increment the sequence number, the 
> write will fail after it seeing 5 pipeline breakages by datanode upgrades.
> This check/limit was added to avoid spinning until running out of nodes in 
> the cluster due to a corruption or any other irrecoverable conditions.  The 
> datanode upgrade-restart  should be excluded from the count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades

2016-02-06 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9752:

Attachment: HDFS-9752.03.patch

bq. The test can just verify that pipelineRecoveryCount is not incremented 
after DN restart and pipeline recovery...
Good idea. Thanks. Uploaded 03 patch.

It's still difficult to remove the sleep used for waiting DN shutdown.
bq. To avoid sleeping for arbitrary amount of time to wait for a datanode to 
shutdown, we can have DataNode#shutdown() to set a variable at the end to 
indicate shutdown is complete.
I use thread name and GenericTestUtils.waitForThreadTermination(..) to do that. 
Hope it's ok to you.

> Permanent write failures may happen to slow writers during datanode rolling 
> upgrades
> 
>
> Key: HDFS-9752
> URL: https://issues.apache.org/jira/browse/HDFS-9752
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-9752.01.patch, HDFS-9752.02.patch, 
> HDFS-9752.03.patch, HdfsWriter.java
>
>
> When datanodes are being upgraded, an out-of-band ack is sent upstream and 
> the client does a pipeline recovery. The client may hit this multiple times 
> as more nodes get upgraded.  This normally does not cause any issue, but if 
> the client is holding the stream open without writing any data during this 
> time, a permanent write failure can occur.
> This is because there is a limit of 5 recovery trials for the same packet, 
> which is tracked by "last acked sequence number". Since the empty heartbeat 
> packets for an idle output stream does not increment the sequence number, the 
> write will fail after it seeing 5 pipeline breakages by datanode upgrades.
> This check/limit was added to avoid spinning until running out of nodes in 
> the cluster due to a corruption or any other irrecoverable conditions.  The 
> datanode upgrade-restart  should be excluded from the count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades

2016-02-05 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9752:

Attachment: HDFS-9752.02.patch

Thanks for the advises. Uploaded 02 patch. The test now takes ~30s. But it's 
still difficult to remove the _sleep_ used for waiting DN shutdown. I can use 
org.apache.mina.util.AvailablePortFinder.available(int port) to wait port to be 
free. But afraid of the extra dependency.

> Permanent write failures may happen to slow writers during datanode rolling 
> upgrades
> 
>
> Key: HDFS-9752
> URL: https://issues.apache.org/jira/browse/HDFS-9752
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-9752.01.patch, HDFS-9752.02.patch, HdfsWriter.java
>
>
> When datanodes are being upgraded, an out-of-band ack is sent upstream and 
> the client does a pipeline recovery. The client may hit this multiple times 
> as more nodes get upgraded.  This normally does not cause any issue, but if 
> the client is holding the stream open without writing any data during this 
> time, a permanent write failure can occur.
> This is because there is a limit of 5 recovery trials for the same packet, 
> which is tracked by "last acked sequence number". Since the empty heartbeat 
> packets for an idle output stream does not increment the sequence number, the 
> write will fail after it seeing 5 pipeline breakages by datanode upgrades.
> This check/limit was added to avoid spinning until running out of nodes in 
> the cluster due to a corruption or any other irrecoverable conditions.  The 
> datanode upgrade-restart  should be excluded from the count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades

2016-02-05 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134131#comment-15134131
 ] 

Walter Su commented on HDFS-9752:
-

Hi, [~xiaobingo]. I think the 'write failure' means the outputstream throw up a 
IOException and can't closed normally. In you test, if you just kill the DN 
without upgrade command, the client will consider it an error node and exclude 
it. It's not in the new pipeline.


> Permanent write failures may happen to slow writers during datanode rolling 
> upgrades
> 
>
> Key: HDFS-9752
> URL: https://issues.apache.org/jira/browse/HDFS-9752
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-9752.01.patch, HDFS-9752.02.patch, HdfsWriter.java
>
>
> When datanodes are being upgraded, an out-of-band ack is sent upstream and 
> the client does a pipeline recovery. The client may hit this multiple times 
> as more nodes get upgraded.  This normally does not cause any issue, but if 
> the client is holding the stream open without writing any data during this 
> time, a permanent write failure can occur.
> This is because there is a limit of 5 recovery trials for the same packet, 
> which is tracked by "last acked sequence number". Since the empty heartbeat 
> packets for an idle output stream does not increment the sequence number, the 
> write will fail after it seeing 5 pipeline breakages by datanode upgrades.
> This check/limit was added to avoid spinning until running out of nodes in 
> the cluster due to a corruption or any other irrecoverable conditions.  The 
> datanode upgrade-restart  should be excluded from the count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9748) When addExpectedReplicasToPending is called twice, pendingReplications should avoid duplication

2016-02-03 Thread Walter Su (JIRA)
Walter Su created HDFS-9748:
---

 Summary: When addExpectedReplicasToPending is called twice, 
pendingReplications should avoid duplication
 Key: HDFS-9748
 URL: https://issues.apache.org/jira/browse/HDFS-9748
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9748) When addExpectedReplicasToPending is called twice, pendingReplications should avoid duplication

2016-02-03 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9748:

Affects Version/s: 2.8.0
   Status: Patch Available  (was: Open)

> When addExpectedReplicasToPending is called twice, pendingReplications should 
> avoid duplication
> ---
>
> Key: HDFS-9748
> URL: https://issues.apache.org/jira/browse/HDFS-9748
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-9748.01.patch
>
>
> 1. When completeFile() is called, addExpectedReplicasToPending() will be 
> called (HDFS-8999).
> 2. When first replica is reported, addExpectedReplicasToPending() will be 
> called the second time.
> {code}
> //BlockManager.addStoredBlock(..)
> if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED &&
> hasMinStorage(storedBlock, numLiveReplicas)) {
>   addExpectedReplicasToPending(storedBlock, bc);
>   completeBlock(storedBlock, false);
> } else if (storedBlock.isComplete() && result == AddBlockResult.ADDED) {
> {code}
> But,
> {code}
> //PendingReplicationBlocks.java
> void incrementReplicas(DatanodeDescriptor... newTargets) {
>   if (newTargets != null) {
> Collections.addAll(targets, newTargets);
>   }
> }
> {code}
> targets is ArrayList, the above code simply add all {{newTargets}} to 
> {{targets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9748) When addExpectedReplicasToPending is called twice, pendingReplications should avoid duplication

2016-02-03 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9748:

Attachment: HDFS-9748.01.patch

> When addExpectedReplicasToPending is called twice, pendingReplications should 
> avoid duplication
> ---
>
> Key: HDFS-9748
> URL: https://issues.apache.org/jira/browse/HDFS-9748
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-9748.01.patch
>
>
> 1. When completeFile() is called, addExpectedReplicasToPending() will be 
> called (HDFS-8999).
> 2. When first replica is reported, addExpectedReplicasToPending() will be 
> called the second time.
> {code}
> //BlockManager.addStoredBlock(..)
> if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED &&
> hasMinStorage(storedBlock, numLiveReplicas)) {
>   addExpectedReplicasToPending(storedBlock, bc);
>   completeBlock(storedBlock, false);
> } else if (storedBlock.isComplete() && result == AddBlockResult.ADDED) {
> {code}
> But,
> {code}
> //PendingReplicationBlocks.java
> void incrementReplicas(DatanodeDescriptor... newTargets) {
>   if (newTargets != null) {
> Collections.addAll(targets, newTargets);
>   }
> }
> {code}
> targets is ArrayList, the above code simply add all {{newTargets}} to 
> {{targets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9748) When addExpectedReplicasToPending is called twice, pendingReplications should avoid duplication

2016-02-03 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9748:

Description: 
1. When completeFile() is called, addExpectedReplicasToPending() will be called 
(HDFS-8999).

2. When first replica is reported, addExpectedReplicasToPending() will be 
called the second time.
{code}
//BlockManager.addStoredBlock(..)
if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED &&
hasMinStorage(storedBlock, numLiveReplicas)) {
  addExpectedReplicasToPending(storedBlock, bc);
  completeBlock(storedBlock, false);
} else if (storedBlock.isComplete() && result == AddBlockResult.ADDED) {
{code}

But,
{code}
//PendingReplicationBlocks.java
void incrementReplicas(DatanodeDescriptor... newTargets) {
  if (newTargets != null) {
Collections.addAll(targets, newTargets);
  }
}
{code}
targets is ArrayList, the above code simply add all {{newTargets}} to 
{{targets}}.


> When addExpectedReplicasToPending is called twice, pendingReplications should 
> avoid duplication
> ---
>
> Key: HDFS-9748
> URL: https://issues.apache.org/jira/browse/HDFS-9748
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>
> 1. When completeFile() is called, addExpectedReplicasToPending() will be 
> called (HDFS-8999).
> 2. When first replica is reported, addExpectedReplicasToPending() will be 
> called the second time.
> {code}
> //BlockManager.addStoredBlock(..)
> if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED &&
> hasMinStorage(storedBlock, numLiveReplicas)) {
>   addExpectedReplicasToPending(storedBlock, bc);
>   completeBlock(storedBlock, false);
> } else if (storedBlock.isComplete() && result == AddBlockResult.ADDED) {
> {code}
> But,
> {code}
> //PendingReplicationBlocks.java
> void incrementReplicas(DatanodeDescriptor... newTargets) {
>   if (newTargets != null) {
> Collections.addAll(targets, newTargets);
>   }
> }
> {code}
> targets is ArrayList, the above code simply add all {{newTargets}} to 
> {{targets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades

2016-02-03 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9752:

Attachment: HDFS-9752.01.patch

Thanks [~kihwal] for reporting this.
Uploaded 01 patch, kindly review. The patch resets {{pipelineRecoveryCount}} 
every time a packet is successfully sent.

> Permanent write failures may happen to slow writers during datanode rolling 
> upgrades
> 
>
> Key: HDFS-9752
> URL: https://issues.apache.org/jira/browse/HDFS-9752
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9752.01.patch
>
>
> When datanodes are being upgraded, an out-of-band ack is sent upstream and 
> the client does a pipeline recovery. The client may hit this multiple times 
> as more nodes get upgraded.  This normally does not cause any issue, but if 
> the client is holding the stream open without writing any data during this 
> time, a permanent write failure can occur.
> This is because there is a limit of 5 recovery trials for the same packet, 
> which is tracked by "last acked sequence number". Since the empty heartbeat 
> packets for an idle output stream does not increment the sequence number, the 
> write will fail after it seeing 5 pipeline breakages by datanode upgrades.
> This check/limit was added to avoid spinning until running out of nodes in 
> the cluster due to a corruption or any other irrecoverable conditions.  The 
> datanode upgrade-restart  should be excluded from the count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9752) Permanent write failures may happen to slow writers during datanode rolling upgrades

2016-02-03 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9752:

Assignee: Walter Su
  Status: Patch Available  (was: Open)

> Permanent write failures may happen to slow writers during datanode rolling 
> upgrades
> 
>
> Key: HDFS-9752
> URL: https://issues.apache.org/jira/browse/HDFS-9752
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-9752.01.patch
>
>
> When datanodes are being upgraded, an out-of-band ack is sent upstream and 
> the client does a pipeline recovery. The client may hit this multiple times 
> as more nodes get upgraded.  This normally does not cause any issue, but if 
> the client is holding the stream open without writing any data during this 
> time, a permanent write failure can occur.
> This is because there is a limit of 5 recovery trials for the same packet, 
> which is tracked by "last acked sequence number". Since the empty heartbeat 
> packets for an idle output stream does not increment the sequence number, the 
> write will fail after it seeing 5 pipeline breakages by datanode upgrades.
> This check/limit was added to avoid spinning until running out of nodes in 
> the cluster due to a corruption or any other irrecoverable conditions.  The 
> datanode upgrade-restart  should be excluded from the count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk

2016-02-02 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127932#comment-15127932
 ] 

Walter Su commented on HDFS-9716:
-

There are 2 ways to make BlockGroup under-replicated: 1.shutdown DN, or 2. 
replica file corruption.
If replica is corruped, and is reported to NN, NN asks DN to delete the replica.
java.io.File.length() returns 0 if the file is not found.

> o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
> ---
>
> Key: HDFS-9716
> URL: https://issues.apache.org/jira/browse/HDFS-9716
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Walter Su
>
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/
> * 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk

2016-02-02 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su reassigned HDFS-9716:
---

Assignee: Walter Su

> o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
> ---
>
> Key: HDFS-9716
> URL: https://issues.apache.org/jira/browse/HDFS-9716
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Walter Su
>
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/
> * 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk

2016-02-02 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9716:

   Labels: test  (was: )
Affects Version/s: (was: 2.8.0)
   Status: Patch Available  (was: Open)

> o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
> ---
>
> Key: HDFS-9716
> URL: https://issues.apache.org/jira/browse/HDFS-9716
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Mingliang Liu
>Assignee: Walter Su
>  Labels: test
> Attachments: HDFS-9716.01.patch
>
>
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/
> * 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk

2016-02-02 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128008#comment-15128008
 ] 

Walter Su commented on HDFS-9716:
-

You can easily reproduce the failure by delay line 345 with a breakpoint or a 
sleep, so NN has time to invalidate the replica.
{code}
339 // Check the replica on the new target node.
340 for (int i = 0; i < toRecoverBlockNum; i++) {
341   File replicaAfterRecovery = cluster.getBlockFile(targetDNs[i], 
blocks[i]);
342   LOG.info("replica after recovery " + replicaAfterRecovery);
343   File metadataAfterRecovery =
344   cluster.getBlockMetadataFile(targetDNs[i], blocks[i]);
345   assertEquals(replicaAfterRecovery.length(), replicas[i].length());
346   LOG.info("replica before " + replicas[i]);
347   assertTrue(metadataAfterRecovery.getName().
348   endsWith(blocks[i].getGenerationStamp() + ".meta"));
349   byte[] replicaContentAfterRecovery =
350   DFSTestUtil.readFileAsBytes(replicaAfterRecovery);
351 
352   Assert.assertArrayEquals(replicaContents[i], 
replicaContentAfterRecovery);
353 }
354   }
{code}

> o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
> ---
>
> Key: HDFS-9716
> URL: https://issues.apache.org/jira/browse/HDFS-9716
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Mingliang Liu
>Assignee: Walter Su
>  Labels: test
> Attachments: HDFS-9716.01.patch
>
>
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/
> * 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk

2016-02-02 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9716:

Attachment: HDFS-9716.01.patch

> o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
> ---
>
> Key: HDFS-9716
> URL: https://issues.apache.org/jira/browse/HDFS-9716
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Walter Su
> Attachments: HDFS-9716.01.patch
>
>
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/
> * 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk

2016-02-02 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128003#comment-15128003
 ] 

Walter Su commented on HDFS-9716:
-

{{MiniDFSCluster.getBlockFile}} on _dead DN_ is called before replica 
corruption and deletion. It won't return null.

> o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
> ---
>
> Key: HDFS-9716
> URL: https://issues.apache.org/jira/browse/HDFS-9716
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Mingliang Liu
>Assignee: Walter Su
>  Labels: test
> Attachments: HDFS-9716.01.patch
>
>
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/
> * 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9716) o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk

2016-02-02 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9716:

Attachment: HDFS-9716.02.patch

Thanks [~drankye]. Uploaded 02 patch to address that.

> o.a.h.hdfs.TestRecoverStripedFile fails intermittently in trunk
> ---
>
> Key: HDFS-9716
> URL: https://issues.apache.org/jira/browse/HDFS-9716
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Mingliang Liu
>Assignee: Walter Su
>  Labels: test
> Attachments: HDFS-9716.01.patch, HDFS-9716.02.patch
>
>
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14269/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks1/
> * 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8477/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/testRecoverThreeDataBlocks/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9646) ErasureCodingWorker may fail when recovering data blocks with length less than the first internal block

2016-01-14 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098152#comment-15098152
 ] 

Walter Su commented on HDFS-9646:
-

{code}
+  bufferSize, (int)(maxTargetLength - positionInBlock));
{code}
{{positionInBlock}} is initially 0, you just cast {{maxTargetLength}} to 
{{int}} whose maximal value is block size.

bq. 3. Question: do we need new test codes to expose the issue and ensure the 
issue is fixed? 
I think the randomized {{generateDeadDnIndices()}} can test that.

Patch looks good to me, too. Thanks [~jingzhao].


> ErasureCodingWorker may fail when recovering data blocks with length less 
> than the first internal block
> ---
>
> Key: HDFS-9646
> URL: https://issues.apache.org/jira/browse/HDFS-9646
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Takuya Fukudome
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-9646.000.patch, test-reconstruct-stripe-file.patch
>
>
> This is reported by [~tfukudom]: ErasureCodingWorker may fail with the 
> following exception when recovering a non-full internal block.
> {code}
> 2016-01-06 11:14:44,740 WARN  datanode.DataNode 
> (ErasureCodingWorker.java:run(467)) - Failed to recover striped block: 
> BP-987302662-172.29.4.13-1450757377698:blk_-92233720368
> 54322288_29751
> java.io.IOException: Transfer failed for all targets.
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9534) Add CLI command to clear storage policy from a path.

2016-01-14 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101365#comment-15101365
 ] 

Walter Su commented on HDFS-9534:
-

Thanks [~xiaobingo].
1. I think the original design doesn't mean to make 
UNSPECIFIED_STORAGE_POLICY_ID a policy. So it's not in policy suite.
2. In FSDirAttrOp.java, you can pass _policyId_ instead of _policyName_ to 
_setStoragePolicy(..)_.

> Add CLI command to clear storage policy from a path.
> 
>
> Key: HDFS-9534
> URL: https://issues.apache.org/jira/browse/HDFS-9534
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Chris Nauroth
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9534.001.patch
>
>
> The {{hdfs storagepolicies}} command has sub-commands for 
> {{-setStoragePolicy}} and {{-getStoragePolicy}} on a path.  However, there is 
> no {{-removeStoragePolicy}} to remove a previously set storage policy on a 
> path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync

2016-01-06 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085202#comment-15085202
 ] 

Walter Su commented on HDFS-7661:
-

You totally miss my point.
A successful flush is a guarantee that the data is safe.
If 1st flush succeed, data written before 1st flush is safe.
If 2nd flush failed, data written between 1st ~ 2nd flush is lost. User can 
restart writing at 1st flush point (with a lease recovery).

According to the description, if the data before 1st flush is damaged, how can 
we restart at 1st flush point? Client have to restart at the beginning of 
current block. Then what's the meaning of "flush"?

> Erasure coding: support hflush and hsync
> 
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync

2016-01-05 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084527#comment-15084527
 ] 

Walter Su commented on HDFS-7661:
-

According to the description, 
1. 3 parity blocks should be updated in sequential.
2. The 2nd flush decreases data safety before the 1st flush. If there's already 
numParityBlks failures, the 2nd flush must succeed,  even cannot be aborted by 
user. Otherwise it'll damage the data before 1st flush.

> Erasure coding: support hflush and hsync
> 
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync

2016-01-04 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082272#comment-15082272
 ] 

Walter Su commented on HDFS-7661:
-

bq. But, the older parity internal block comes back later, then we have 
different version parity blocks.
How? Does the older parity internal block become dirty?

> Erasure coding: support hflush and hsync
> 
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8430) Erasure coding: update DFSClient.getFileChecksum() logic for stripe files

2016-01-04 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082228#comment-15082228
 ] 

Walter Su commented on HDFS-8430:
-

Thanks [~szetszwo] for clarifying. {{New Algorithm 2}} looks good. And we need 
a new DataTransferProtocol instead of {{blockChecksum(..)}} to get cell 
checksum array.

> Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
> -
>
> Key: HDFS-8430
> URL: https://issues.apache.org/jira/browse/HDFS-8430
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Walter Su
>Assignee: Kai Zheng
> Attachments: HDFS-8430-poc1.patch
>
>
> HADOOP-3981 introduces a  distributed file checksum algorithm. It's designed 
> for replicated block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped 
> block group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >