[jira] [Updated] (HDFS-10471) DFSAdmin#SetQuotaCommand's help msg is not correct
[ https://issues.apache.org/jira/browse/HDFS-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10471: - Status: Patch Available (was: Open) Attach a simple patch fot this. > DFSAdmin#SetQuotaCommand's help msg is not correct > -- > > Key: HDFS-10471 > URL: https://issues.apache.org/jira/browse/HDFS-10471 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-10471.001.patch > > > The help message of the command that related with SetQuota is not show > correct. In message, the name {{quota}} was showed as {{N}}. The {{N}} was > not appeared before. > {noformat} > -setQuota ...: Set the quota for each > directory . > The directory quota is a long integer that puts a hard limit > on the number of names in the directory tree > For each directory, attempt to set the quota. An error will be > reported if > 1. N is not a positive integer, or > 2. User is not an administrator, or > 3. The directory does not exist or is a file. > Note: A quota of 1 would force the directory to remain empty. > {noformat} > The command {{-setSpaceQuota}} also has similar problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10471) DFSAdmin#SetQuotaCommand's help msg is not correct
[ https://issues.apache.org/jira/browse/HDFS-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10471: - Attachment: HDFS-10471.001.patch > DFSAdmin#SetQuotaCommand's help msg is not correct > -- > > Key: HDFS-10471 > URL: https://issues.apache.org/jira/browse/HDFS-10471 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-10471.001.patch > > > The help message of the command that related with SetQuota is not show > correct. In message, the name {{quota}} was showed as {{N}}. The {{N}} was > not appeared before. > {noformat} > -setQuota ...: Set the quota for each > directory . > The directory quota is a long integer that puts a hard limit > on the number of names in the directory tree > For each directory, attempt to set the quota. An error will be > reported if > 1. N is not a positive integer, or > 2. User is not an administrator, or > 3. The directory does not exist or is a file. > Note: A quota of 1 would force the directory to remain empty. > {noformat} > The command {{-setSpaceQuota}} also has similar problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10471) DFSAdmin#SetQuotaCommand's help msg is not correct
Yiqun Lin created HDFS-10471: Summary: DFSAdmin#SetQuotaCommand's help msg is not correct Key: HDFS-10471 URL: https://issues.apache.org/jira/browse/HDFS-10471 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Reporter: Yiqun Lin Assignee: Yiqun Lin Priority: Minor The help message of the command that related with SetQuota is not show correct. In message, the name {{quota}} was showed as {{N}}. The {{N}} was not appeared before. {noformat} -setQuota ...: Set the quota for each directory . The directory quota is a long integer that puts a hard limit on the number of names in the directory tree For each directory, attempt to set the quota. An error will be reported if 1. N is not a positive integer, or 2. User is not an administrator, or 3. The directory does not exist or is a file. Note: A quota of 1 would force the directory to remain empty. {noformat} The command {{-setSpaceQuota}} also has similar problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9833) Erasure coding: recomputing block checksum on the fly by reconstructing the missed/corrupt block data
[ https://issues.apache.org/jira/browse/HDFS-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306265#comment-15306265 ] Rakesh R commented on HDFS-9833: Test case failures are unrelated to my patch, please ignore it. > Erasure coding: recomputing block checksum on the fly by reconstructing the > missed/corrupt block data > - > > Key: HDFS-9833 > URL: https://issues.apache.org/jira/browse/HDFS-9833 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rakesh R > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-9833-00-draft.patch, HDFS-9833-01.patch, > HDFS-9833-02.patch, HDFS-9833-03.patch, HDFS-9833-04.patch, > HDFS-9833-05.patch, HDFS-9833-06.patch > > > As discussed in HDFS-8430 and HDFS-9694, to compute striped file checksum > even some of striped blocks are missed, we need to consider recomputing block > checksum on the fly for the missed/corrupt blocks. To recompute the block > checksum, the block data needs to be reconstructed by erasure decoding, and > the main needed codes for the block reconstruction could be borrowed from > HDFS-9719, the refactoring of the existing {{ErasureCodingWorker}}. In EC > worker, reconstructed blocks need to be written out to target datanodes, but > here in this case, the remote writing isn't necessary, as the reconstructed > block data is only used to recompute the checksum. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10426) TestPendingInvalidateBlock failed in trunk
[ https://issues.apache.org/jira/browse/HDFS-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10426: - Attachment: HDFS-10426.003.patch Thanks [~iwasakims] for comments. {quote} In order to fix #2, we should wait for replication count of the block reaches to 2 before delete. {quote} I think the method {{DataNodeTestUtils.setHeartbeatsDisabledForTests}} will make sense for this. We can disable herartbeats for test here. {quote} ReplicationMonitor kicks in between delete and checking pending deletion blocks count. {quote} I'd like to add a flag for test in {{BlockManager#ReplicationMonitor}} and this will make few change. But I haven't found the way that we just don't add any code to control this. I am glad if you have some good comments for me. The other intermittent failures seem that the pengdingDeleteBlocks was not deleted completely when we did the check. We can add some retry chances for that. Attach a new patch. > TestPendingInvalidateBlock failed in trunk > -- > > Key: HDFS-10426 > URL: https://issues.apache.org/jira/browse/HDFS-10426 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10426.001.patch, HDFS-10426.002.patch, > HDFS-10426.003.patch > > > The test {{TestPendingInvalidateBlock}} failed sometimes. The stack info: > {code} > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock > testPendingDeletion(org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock) > Time elapsed: 7.703 sec <<< FAILURE! > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeletion(TestPendingInvalidateBlock.java:92) > {code} > It looks that the {{invalidateBlock}} has been removed before we do the check > {code} > // restart NN > cluster.restartNameNode(true); > dfs.delete(foo, true); > Assert.assertEquals(0, cluster.getNamesystem().getBlocksTotal()); > Assert.assertEquals(REPLICATION, cluster.getNamesystem() > .getPendingDeletionBlocks()); > Assert.assertEquals(REPLICATION, > dfs.getPendingDeletionBlocksCount()); > {code} > And I look into the related configurations. I found the property > {{dfs.namenode.replication.interval}} was just set as 1 second in this test. > And after the delay time of {{dfs.namenode.startup.delay.block.deletion.sec}} > and the delete operation was slowly, it will cause this case. We can see the > stack info before, the failed test costs 7.7s more than 5+1 second. > One way can improve this. > * Increase the time of {{dfs.namenode.replication.interval}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306248#comment-15306248 ] Vinayakumar B commented on HDFS-10453: -- In Branch-2 code, {{BlockManager#computeReconstructionWorkForBlocks()}}, flow goes like below before reaching the part of the code where you have provided the patch, 1. With global-lock held, {{work}} list will be created with the blocks which are exactly present at that time in {{scheduleReplication()}}, lock will be released after creating the list. {code}if (block.isDeleted() || !block.isCompleteOrCommitted()) { // remove from neededReplications neededReplications.remove(block, priority); return null; }{code} 2. For blocks in {{work}} list, targets will be chosen. 3. Global lock will be acquired again, recheck happens for all {{ReplicationWork}} blocks which have got targets, whether block deleted or not during step #2. Now only blocks which are not deleted during this time will be proceeded for replication. {code}if (block.isDeleted() || !block.isCompleteOrCommitted()) { neededReplications.remove(block, priority); rw.resetTargets(); return false; }{code} Therefore the problem you mentioned in the description does not occur IMO. Correct me, If I am wrong. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.7.1 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be
[jira] [Created] (HDFS-10470) HDFS HA with kerberose Specified version of key is not available
deng created HDFS-10470: --- Summary: HDFS HA with kerberose Specified version of key is not available Key: HDFS-10470 URL: https://issues.apache.org/jira/browse/HDFS-10470 Project: Hadoop HDFS Issue Type: New Feature Components: datanode Affects Versions: 2.6.0 Environment: java version "1.7.0_79" Java(TM) SE Runtime Environment (build 1.7.0_79-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode) Reporter: deng When I enable kerberose with HDFS HA, the journalnode always throw the below exception,but the hdfs works well. 2016-05-30 10:54:37,877 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: Authentication exception: GSSException: Failure unspecified at GSS-A PI level (Mechanism level: Specified version of key is not available (44)) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: Failure unspecified at GSS-API level (Mechanism level: Specified version of key is not available (44)) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:517) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1279) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Specified version of key is not available (44)) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) at sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(SpNegoContext.java:875) at sun.security.jgss.spnego.SpNegoContext.acceptSecContext(SpNegoContext.java:548) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:366) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:348) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:348) ... 23 more Caused by: KrbException: Specified version of key is not available (44) at sun.security.krb5.EncryptionKey.findKey(EncryptionKey.java:588) at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:270) at sun.security.krb5.KrbApReq.(KrbApReq.java:144) at sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108) at sun.securi
[jira] [Commented] (HDFS-10433) Make retry also works well for Async DFS
[ https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306085#comment-15306085 ] Hadoop QA commented on HDFS-10433: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 31s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 22s {color} | {color:red} root: The patch generated 44 new + 224 unchanged - 6 fixed = 268 total (was 230) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 46s {color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s {color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 58s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 133m 58s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks | | | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12806846/h10433_20160528c.patch | | JIRA Issue | HDFS-10433 | | Optional Tests | asflicense xml compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux dd63698998c9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4e1f56e | | Default Java | 1
[jira] [Commented] (HDFS-10433) Make retry also works well for Async DFS
[ https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306045#comment-15306045 ] Tsz Wo Nicholas Sze commented on HDFS-10433: Jenkins somehow tested h10433_20160528b.patch again but not /h10433_20160528c.patch. > Make retry also works well for Async DFS > > > Key: HDFS-10433 > URL: https://issues.apache.org/jira/browse/HDFS-10433 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Xiaobing Zhou >Assignee: Tsz Wo Nicholas Sze > Attachments: h10433_20160524.patch, h10433_20160525.patch, > h10433_20160525b.patch, h10433_20160527.patch, h10433_20160528.patch, > h10433_20160528c.patch > > > In current Async DFS implementation, file system calls are invoked and > returns Future immediately to clients. Clients call Future#get to retrieve > final results. Future#get internally invokes a chain of callbacks residing in > ClientNamenodeProtocolTranslatorPB, ProtobufRpcEngine and ipc.Client. The > callback path bypasses the original retry layer/logic designed for > synchronous DFS. This proposes refactoring to make retry also works for Async > DFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10433) Make retry also works well for Async DFS
[ https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-10433: --- Attachment: (was: h10433_20160528b.patch) > Make retry also works well for Async DFS > > > Key: HDFS-10433 > URL: https://issues.apache.org/jira/browse/HDFS-10433 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Xiaobing Zhou >Assignee: Tsz Wo Nicholas Sze > Attachments: h10433_20160524.patch, h10433_20160525.patch, > h10433_20160525b.patch, h10433_20160527.patch, h10433_20160528.patch, > h10433_20160528c.patch > > > In current Async DFS implementation, file system calls are invoked and > returns Future immediately to clients. Clients call Future#get to retrieve > final results. Future#get internally invokes a chain of callbacks residing in > ClientNamenodeProtocolTranslatorPB, ProtobufRpcEngine and ipc.Client. The > callback path bypasses the original retry layer/logic designed for > synchronous DFS. This proposes refactoring to make retry also works for Async > DFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305917#comment-15305917 ] Hadoop QA commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 53s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 29s {color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} branch-2 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} branch-2 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} branch-2 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 54s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in branch-2 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} branch-2 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s {color} | {color:green} branch-2 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 31s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 17s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 157m 14s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_91 Failed junit tests | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation | | | hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs | | JDK v1.7.0_101 Failed junit tests | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:babe025 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12806883/HDFS-10453-branch-2.003.patch | | JIRA Issue | HDFS-10453 | |
[jira] [Updated] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-10453: --- Affects Version/s: 2.4.1 Status: Patch Available (was: Open) Kick jenkins for branch-2. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1, 2.4.1 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-10453: --- Status: Open (was: Patch Available) > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-10453: --- Attachment: HDFS-10453-branch-2.003.patch > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org