[jira] [Work logged] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
[ https://issues.apache.org/jira/browse/HDFS-15923?focusedWorklogId=572450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-572450 ] ASF GitHub Bot logged work on HDFS-15923: - Author: ASF GitHub Bot Created on: 26/Mar/21 04:34 Start Date: 26/Mar/21 04:34 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2819: URL: https://github.com/apache/hadoop/pull/2819#issuecomment-807928592 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 29m 13s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 35m 53s | | trunk passed | | +1 :green_heart: | compile | 0m 42s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 25s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 37s | | trunk passed | | +1 :green_heart: | javadoc | 0m 36s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 0m 52s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 15s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 50s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 0m 34s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 0m 34s | | the patch passed | | +1 :green_heart: | compile | 0m 26s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | javac | 0m 26s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 17s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2819/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | +1 :green_heart: | mvnsite | 0m 31s | | the patch passed | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 17s | | the patch passed | | +1 :green_heart: | shadedclient | 17m 1s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 23m 44s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2819/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 32s | | The patch does not generate ASF License warnings. | | | | 134m 52s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterFederationRename | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2819/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2819 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 83666180296e 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 22f65bf61522127f940c3cc7a6aed538d61c07d1 | | Default Java | Private
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15715: --- Attachment: (was: image-2021-02-25-14-41-49-394.png) > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, > HDFS-15715.002.patch.addendum, image-2021-03-26-12-17-45-500.png > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290727#comment-17290727 ] zhengchenyu edited comment on HDFS-15715 at 3/26/21, 4:17 AM: -- [~hexiaoqiao] Yeah, no problem. Note: I found this problem in cluster which version is hadoop-2.7.3, but all version may trigger this bug. So I submit a patch base on trunk. 1. How to found it ? Due to limited length and my poor English, I will describe the analysis procedure simply. (a) demmission is very slow !image-2021-03-26-12-17-45-500.png! When do datanode demission, UnderReplicatedBlocks keep high, PendingDeletionBlocks decline slowly. We could speculate replicationMonitor is heavy. (b) strange Log from NameNode We could guess that some code in choosTarget may not be rational. {code:java} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} (c) many stack info statistical By many stack info statistical, I Found hot code in below jstack {code:java} "org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@64a8c844" #34 daemon prio=5 os_prio=0 tid=0x7f772e03a800 nid=0x6288f runnable [0x7f4507c0f000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296) at org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296) at org.apache.hadoop.net.NetworkTopology.getNode(NetworkTopology.java:556) at org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:808) at org.apache.hadoop.net.NetworkTopologyWithMultiDC.countNumOfAvailableNodes(NetworkTopologyWithMultiDC.java:259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseRandom(BlockPlacementPolicyDefaultWithMultiDC.java:803) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:473) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:300) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:177) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWorkWithMultiDC.chooseTargets(BlockManager.java:4448) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocksWithMultiDC(BlockManager.java:1740) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1419) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4341) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4293) at java.lang.Thread.run(Thread.java:748) {code} (d) continue to enable debug log After enable some debug log, print "is not chosen since the rack has too many chosen nodes" frequently. And the total number of this log are close to cluster's DataNodeStorage number. We could guess hit rate of choosTagert is very slow. Then I use unit-test to reproduce this problem. 2. How to repair this problem ? I have reproduced this case in trunk branch. I submit HDFS-15715.002.patch, in this patch, I doesn't repair the bug, The TestReplicationPolicyWithMultiStorage could reproduce the bug. When this bug was triggered, will print many logs like "is not chosen since the rack has too many chosen nodes." Then apply HDFS-15715.002.patch.addendum, this bug fix. The UnderReplicatedBlocks decline normally. was (Author: zhengchenyu): [~hexiaoqiao] Yeah, no problem. Note: I found this problem in cluster which version is hadoop-2.7.3, but all version may trigger this bug. So I submit a patch base on trunk. 1. How to found it ? Due to limited length and my poor English, I will describe the analysis procedure simply. (a) demmission is very slow
[jira] [Commented] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
[ https://issues.apache.org/jira/browse/HDFS-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309090#comment-17309090 ] zhuobin zheng commented on HDFS-15923: -- hi [~LiJinglun], can you help review this patch? Thanks ~~ > RBF: Authentication failed when rename accross sub clusters > > > Key: HDFS-15923 > URL: https://issues.apache.org/jira/browse/HDFS-15923 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: zhuobin zheng >Priority: Major > Labels: RBF, pull-request-available, rename > Time Spent: 10m > Remaining Estimate: 0h > > Rename accross subcluster with RBF and Kerberos environment. Will encounter > the following two errors: > # Save Object to journal. > # Precheck try to get src file status > So, we need use Proxy UGI doAs create DistcpProcedure and TrashProcedure and > submit Job. > In patch i use proxy ugi doAs above method. It worked. > But there are another strange thing and this patch not solve: > Router use ugi itself to submit the Distcp job. But not user ugi or proxy > ugi. This may cause excessive distcp permissions. > First: Save Object to journal. > {code:java} > // code placeholder > 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) > at > org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) > at > org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:277) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1240) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1219) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1201) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1139) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:544) > at >
[jira] [Updated] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
[ https://issues.apache.org/jira/browse/HDFS-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuobin zheng updated HDFS-15923: - External issue URL: https://github.com/apache/hadoop/pull/2819 > RBF: Authentication failed when rename accross sub clusters > > > Key: HDFS-15923 > URL: https://issues.apache.org/jira/browse/HDFS-15923 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: zhuobin zheng >Priority: Major > Labels: RBF, pull-request-available, rename > Time Spent: 10m > Remaining Estimate: 0h > > Rename accross subcluster with RBF and Kerberos environment. Will encounter > the following two errors: > # Save Object to journal. > # Precheck try to get src file status > So, we need use Proxy UGI doAs create DistcpProcedure and TrashProcedure and > submit Job. > In patch i use proxy ugi doAs above method. It worked. > But there are another strange thing and this patch not solve: > Router use ugi itself to submit the Distcp job. But not user ugi or proxy > ugi. This may cause excessive distcp permissions. > First: Save Object to journal. > {code:java} > // code placeholder > 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) > at > org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) > at > org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:277) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1240) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1219) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1201) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1139) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:544) > at >
[jira] [Work logged] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
[ https://issues.apache.org/jira/browse/HDFS-15923?focusedWorklogId=572421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-572421 ] ASF GitHub Bot logged work on HDFS-15923: - Author: ASF GitHub Bot Created on: 26/Mar/21 02:15 Start Date: 26/Mar/21 02:15 Worklog Time Spent: 10m Work Description: zhengzhuobinzzb opened a new pull request #2819: URL: https://github.com/apache/hadoop/pull/2819 Rename accross subcluster with RBF and Kerberos environment. Will encounter the following two errors: Save Object to journal. Precheck try to get src file status So, we need use Proxy UGI doAs create DistcpProcedure and TrashProcedure and submit Job. In patch i use proxy ugi doAs above method. It worked. But there are another strange thing and this patch not solve: Router use ugi itself to submit the Distcp job. But not user ugi or proxy ugi. This may cause excessive distcp permissions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 572421) Remaining Estimate: 0h Time Spent: 10m > RBF: Authentication failed when rename accross sub clusters > > > Key: HDFS-15923 > URL: https://issues.apache.org/jira/browse/HDFS-15923 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: zhuobin zheng >Priority: Major > Labels: RBF, rename > Time Spent: 10m > Remaining Estimate: 0h > > Rename accross subcluster with RBF and Kerberos environment. Will encounter > the following two errors: > # Save Object to journal. > # Precheck try to get src file status > So, we need use Proxy UGI doAs create DistcpProcedure and TrashProcedure and > submit Job. > In patch i use proxy ugi doAs above method. It worked. > But there are another strange thing and this patch not solve: > Router use ugi itself to submit the Distcp job. But not user ugi or proxy > ugi. This may cause excessive distcp permissions. > First: Save Object to journal. > {code:java} > // code placeholder > 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) > at > org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) > at > org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at >
[jira] [Updated] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
[ https://issues.apache.org/jira/browse/HDFS-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-15923: -- Labels: RBF pull-request-available rename (was: RBF rename) > RBF: Authentication failed when rename accross sub clusters > > > Key: HDFS-15923 > URL: https://issues.apache.org/jira/browse/HDFS-15923 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: zhuobin zheng >Priority: Major > Labels: RBF, pull-request-available, rename > Time Spent: 10m > Remaining Estimate: 0h > > Rename accross subcluster with RBF and Kerberos environment. Will encounter > the following two errors: > # Save Object to journal. > # Precheck try to get src file status > So, we need use Proxy UGI doAs create DistcpProcedure and TrashProcedure and > submit Job. > In patch i use proxy ugi doAs above method. It worked. > But there are another strange thing and this patch not solve: > Router use ugi itself to submit the Distcp job. But not user ugi or proxy > ugi. This may cause excessive distcp permissions. > First: Save Object to journal. > {code:java} > // code placeholder > 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) > at > org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) > at > org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:277) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1240) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1219) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1201) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1139) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:544) > at >
[jira] [Updated] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
[ https://issues.apache.org/jira/browse/HDFS-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuobin zheng updated HDFS-15923: - External issue ID: (was: HDFS-15747) > RBF: Authentication failed when rename accross sub clusters > > > Key: HDFS-15923 > URL: https://issues.apache.org/jira/browse/HDFS-15923 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: zhuobin zheng >Priority: Major > Labels: RBF, rename > > Rename accross subcluster with RBF and Kerberos environment. Will encounter > the following two errors: > # Save Object to journal. > # Precheck try to get src file status > So, we need use Proxy UGI doAs create DistcpProcedure and TrashProcedure and > submit Job. > In patch i use proxy ugi doAs above method. It worked. > But there are another strange thing and this patch not solve: > Router use ugi itself to submit the Distcp job. But not user ugi or proxy > ugi. This may cause excessive distcp permissions. > First: Save Object to journal. > {code:java} > // code placeholder > 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) > at > org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) > at > org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:277) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1240) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1219) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1201) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1139) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:544) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:471) > at
[jira] [Updated] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
[ https://issues.apache.org/jira/browse/HDFS-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuobin zheng updated HDFS-15923: - External issue ID: HDFS-15747 > RBF: Authentication failed when rename accross sub clusters > > > Key: HDFS-15923 > URL: https://issues.apache.org/jira/browse/HDFS-15923 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: zhuobin zheng >Priority: Major > Labels: RBF, rename > > Rename accross subcluster with RBF and Kerberos environment. Will encounter > the following two errors: > # Save Object to journal. > # Precheck try to get src file status > So, we need use Proxy UGI doAs create DistcpProcedure and TrashProcedure and > submit Job. > In patch i use proxy ugi doAs above method. It worked. > But there are another strange thing and this patch not solve: > Router use ugi itself to submit the Distcp job. But not user ugi or proxy > ugi. This may cause excessive distcp permissions. > First: Save Object to journal. > {code:java} > // code placeholder > 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) > at > org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) > at > org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:277) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1240) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1219) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1201) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1139) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:544) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:471) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125) >
[jira] [Updated] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
[ https://issues.apache.org/jira/browse/HDFS-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuobin zheng updated HDFS-15923: - Labels: RBF rename (was: RBF) > RBF: Authentication failed when rename accross sub clusters > > > Key: HDFS-15923 > URL: https://issues.apache.org/jira/browse/HDFS-15923 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: zhuobin zheng >Priority: Major > Labels: RBF, rename > > Rename accross subcluster with RBF and Kerberos environment. Will encounter > the following two errors: > # Save Object to journal. > # Precheck try to get src file status > So, we need use Proxy UGI doAs create DistcpProcedure and TrashProcedure and > submit Job. > In patch i use proxy ugi doAs above method. It worked. > But there are another strange thing and this patch not solve: > Router use ugi itself to submit the Distcp job. But not user ugi or proxy > ugi. This may cause excessive distcp permissions. > First: Save Object to journal. > {code:java} > // code placeholder > 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) > at > org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) > at > org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:277) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1240) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1219) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1201) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1139) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:544) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:471) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125) >
[jira] [Updated] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
[ https://issues.apache.org/jira/browse/HDFS-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuobin zheng updated HDFS-15923: - Labels: RBF (was: ) > RBF: Authentication failed when rename accross sub clusters > > > Key: HDFS-15923 > URL: https://issues.apache.org/jira/browse/HDFS-15923 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: zhuobin zheng >Priority: Major > Labels: RBF > > Rename accross subcluster with RBF and Kerberos environment. Will encounter > the following two errors: > # Save Object to journal. > # Precheck try to get src file status > So, we need use Proxy UGI doAs create DistcpProcedure and TrashProcedure and > submit Job. > In patch i use proxy ugi doAs above method. It worked. > But there are another strange thing and this patch not solve: > Router use ugi itself to submit the Distcp job. But not user ugi or proxy > ugi. This may cause excessive distcp permissions. > First: Save Object to journal. > {code:java} > // code placeholder > 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) > at > org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) > at > org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:277) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1240) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1219) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1201) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1139) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:544) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:471) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125) > at
[jira] [Created] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
zhuobin zheng created HDFS-15923: Summary: RBF: Authentication failed when rename accross sub clusters Key: HDFS-15923 URL: https://issues.apache.org/jira/browse/HDFS-15923 Project: Hadoop HDFS Issue Type: Bug Components: rbf Reporter: zhuobin zheng Rename accross subcluster with RBF and Kerberos environment. Will encounter the following two errors: # Save Object to journal. # Precheck try to get src file status So, we need use Proxy UGI doAs create DistcpProcedure and TrashProcedure and submit Job. In patch i use proxy ugi doAs above method. It worked. But there are another strange thing and this patch not solve: Router use ugi itself to submit the Distcp job. But not user ugi or proxy ugi. This may cause excessive distcp permissions. First: Save Object to journal. {code:java} // code placeholder 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) at org.apache.hadoop.ipc.Client.call(Client.java:1452) at org.apache.hadoop.ipc.Client.call(Client.java:1405) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy11.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy12.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:277) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1240) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1219) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1139) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:544) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:471) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1105) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:994) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:982) at org.apache.hadoop.tools.fedbalance.procedure.BalanceJournalInfoHDFS.saveJob(BalanceJournalInfoHDFS.java:89) at
[jira] [Work logged] (HDFS-15850) Superuser actions should be reported to external enforcers
[ https://issues.apache.org/jira/browse/HDFS-15850?focusedWorklogId=572417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-572417 ] ASF GitHub Bot logged work on HDFS-15850: - Author: ASF GitHub Bot Created on: 26/Mar/21 02:06 Start Date: 26/Mar/21 02:06 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2784: URL: https://github.com/apache/hadoop/pull/2784#issuecomment-807877116 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 2s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 19s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 24m 48s | | trunk passed | | +1 :green_heart: | compile | 5m 53s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 5m 26s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 15s | | trunk passed | | +1 :green_heart: | javadoc | 1m 34s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 2m 24s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 8s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 58s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 58s | | the patch passed | | +1 :green_heart: | compile | 6m 12s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 6m 12s | | the patch passed | | +1 :green_heart: | compile | 5m 6s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | javac | 5m 6s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 19s | | hadoop-hdfs-project: The patch generated 0 new + 498 unchanged - 6 fixed = 498 total (was 504) | | +1 :green_heart: | mvnsite | 2m 1s | | the patch passed | | +1 :green_heart: | javadoc | 1m 25s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 2m 20s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | -1 :x: | spotbugs | 3m 47s | [/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2784/8/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html) | hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | shadedclient | 17m 6s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 233m 58s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2784/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | -1 :x: | unit | 0m 32s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2784/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-rbf in the patch failed. | | +0 :ok: | asflicense | 0m 32s | | ASF License check generated no output? | | | | 363m 23s | | | | Reason | Tests | |---:|:--| | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Possible null pointer dereference of r in org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncate(String, long, String, String, long) Dereferenced at FSNamesystem.java:r in org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncate(String, long, String, String, long) Dereferenced at
[jira] [Work logged] (HDFS-15879) Exclude slow nodes when choose targets for blocks
[ https://issues.apache.org/jira/browse/HDFS-15879?focusedWorklogId=572415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-572415 ] ASF GitHub Bot logged work on HDFS-15879: - Author: ASF GitHub Bot Created on: 26/Mar/21 01:39 Start Date: 26/Mar/21 01:39 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #2748: URL: https://github.com/apache/hadoop/pull/2748#discussion_r601954791 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyExcludeSlowNodes.java ## @@ -0,0 +1,128 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.blockmanagement; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hdfs.DFSConfigKeys; +import org.apache.hadoop.hdfs.DFSTestUtil; +import org.apache.hadoop.hdfs.TestBlockStoragePolicy; +import org.apache.hadoop.hdfs.server.namenode.NameNode; +import org.apache.hadoop.net.Node; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Set; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +@RunWith(Parameterized.class) +public class TestReplicationPolicyExcludeSlowNodes +extends BaseReplicationPolicyTest { + + public TestReplicationPolicyExcludeSlowNodes(String blockPlacementPolicy) { +this.blockPlacementPolicy = blockPlacementPolicy; + } + + @Parameterized.Parameters + public static Iterable data() { +return Arrays.asList(new Object[][] { +{ BlockPlacementPolicyDefault.class.getName() }, +{ BlockPlacementPolicyWithUpgradeDomain.class.getName() } }); + } + + @Override + DatanodeDescriptor[] getDatanodeDescriptors(Configuration conf) { +conf.setBoolean(DFSConfigKeys +.DFS_DATANODE_PEER_STATS_ENABLED_KEY, +true); +conf.setStrings(DFSConfigKeys +.DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_KEY, +"1s"); +conf.setBoolean(DFSConfigKeys +.DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY, +true); +final String[] racks = { +"/rack1", +"/rack1", +"/rack2", +"/rack2", +"/rack3", +"/rack3"}; Review comment: Hi @tasanuma , this PR is mainly for filtering slow nodes, which can set the maximum number of slow nodes to be filtered, and we can set reasonable parameters for the cluster size. But if we want to add the RackFaultTolerant policy and pass the unit test, we can also put the 6 DNs on 6 different racks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 572415) Time Spent: 2h 50m (was: 2h 40m) > Exclude slow nodes when choose targets for blocks > - > > Key: HDFS-15879 > URL: https://issues.apache.org/jira/browse/HDFS-15879 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Previously, we have monitored the slow nodes, related to > [HDFS-11194|https://issues.apache.org/jira/browse/HDFS-11194]. > We can use a thread to periodically collect these slow nodes into a set. Then > use the set to filter out slow nodes when choose targets for blocks. > This feature can be configured to be turned on when needed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For
[jira] [Work logged] (HDFS-15922) Use memcpy for copying non-null terminated string in jni_helper.c
[ https://issues.apache.org/jira/browse/HDFS-15922?focusedWorklogId=572262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-572262 ] ASF GitHub Bot logged work on HDFS-15922: - Author: ASF GitHub Bot Created on: 25/Mar/21 19:37 Start Date: 25/Mar/21 19:37 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2818: URL: https://github.com/apache/hadoop/pull/2818#issuecomment-807347586 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 54s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 8s | | trunk passed | | +1 :green_heart: | compile | 2m 42s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 2m 44s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | mvnsite | 0m 30s | | trunk passed | | +1 :green_heart: | shadedclient | 52m 35s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 16s | | the patch passed | | +1 :green_heart: | compile | 2m 29s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | cc | 2m 28s | | hadoop-hdfs-project_hadoop-hdfs-native-client-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 60 unchanged - 27 fixed = 60 total (was 87) | | +1 :green_heart: | golang | 2m 28s | | the patch passed | | +1 :green_heart: | javac | 2m 28s | | the patch passed | | +1 :green_heart: | compile | 2m 33s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | -1 :x: | cc | 2m 33s | [/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2818/2/artifact/out/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt) | hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 generated 9 new + 51 unchanged - 36 fixed = 60 total (was 87) | | +1 :green_heart: | golang | 2m 33s | | the patch passed | | +1 :green_heart: | javac | 2m 33s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | mvnsite | 0m 19s | | the patch passed | | +1 :green_heart: | shadedclient | 13m 14s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 108m 25s | | hadoop-hdfs-native-client in the patch passed. | | +1 :green_heart: | asflicense | 0m 33s | | The patch does not generate ASF License warnings. | | | | 183m 45s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2818/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2818 | | Optional Tests | dupname asflicense compile cc mvnsite javac unit codespell golang | | uname | Linux db21b0266588 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 3012b5101299fb3c175d5cc76130dc1e16014964 | | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2818/2/testReport/ | | Max. process+thread count | 541 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HDFS-15922) Use memcpy for copying non-null terminated string in jni_helper.c
[ https://issues.apache.org/jira/browse/HDFS-15922?focusedWorklogId=572246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-572246 ] ASF GitHub Bot logged work on HDFS-15922: - Author: ASF GitHub Bot Created on: 25/Mar/21 19:23 Start Date: 25/Mar/21 19:23 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2818: URL: https://github.com/apache/hadoop/pull/2818#issuecomment-807326538 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 35m 18s | | trunk passed | | +1 :green_heart: | compile | 2m 54s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 2m 50s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | mvnsite | 0m 24s | | trunk passed | | +1 :green_heart: | shadedclient | 57m 13s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 14s | | the patch passed | | +1 :green_heart: | compile | 2m 39s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | -1 :x: | cc | 2m 39s | [/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2818/1/artifact/out/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt) | hadoop-hdfs-project_hadoop-hdfs-native-client-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 3 new + 69 unchanged - 18 fixed = 72 total (was 87) | | +1 :green_heart: | golang | 2m 39s | | the patch passed | | +1 :green_heart: | javac | 2m 39s | | the patch passed | | +1 :green_heart: | compile | 2m 40s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | -1 :x: | cc | 2m 40s | [/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2818/1/artifact/out/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt) | hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 generated 5 new + 67 unchanged - 20 fixed = 72 total (was 87) | | +1 :green_heart: | golang | 2m 40s | | the patch passed | | +1 :green_heart: | javac | 2m 40s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | mvnsite | 0m 16s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 44s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 108m 32s | | hadoop-hdfs-native-client in the patch passed. | | +1 :green_heart: | asflicense | 0m 33s | | The patch does not generate ASF License warnings. | | | | 191m 5s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2818/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2818 | | Optional Tests | dupname asflicense compile cc mvnsite javac unit codespell golang | | uname | Linux b5a23a90c1f2 4.15.0-126-generic #129-Ubuntu SMP Mon Nov 23 18:53:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 9858cfcb5bc22ab53f92ebd8f5e3235a4d0583d8 | | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | Multi-JDK versions |
[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang
[ https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=572186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-572186 ] ASF GitHub Bot logged work on HDFS-15869: - Author: ASF GitHub Bot Created on: 25/Mar/21 18:22 Start Date: 25/Mar/21 18:22 Worklog Time Spent: 10m Work Description: functioner commented on a change in pull request #2737: URL: https://github.com/apache/hadoop/pull/2737#discussion_r601739047 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java ## @@ -63,6 +68,9 @@ DFS_NAMENODE_EDITS_ASYNC_LOGGING_PENDING_QUEUE_SIZE_DEFAULT); editPendingQ = new ArrayBlockingQueue<>(editPendingQSize); + +// the thread pool size should be configurable later, and justified with a rationale +logSyncNotifyExecutor = Executors.newFixedThreadPool(10); Review comment: Sure. What should be the default value? Many users use the default value, so probably we shouldn't set it as 0 by default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 572186) Time Spent: 1h 20m (was: 1h 10m) > Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can > cause the namenode to hang > > > Key: HDFS-15869 > URL: https://issues.apache.org/jira/browse/HDFS-15869 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs async, namenode >Affects Versions: 3.2.2 >Reporter: Haoze Wu >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > We were doing some testing of the latest Hadoop stable release 3.2.2 and > found some network issue can cause the namenode to hang even with the async > edit logging (FSEditLogAsync). > The workflow of the FSEditLogAsync thread is basically: > # get EditLog from a queue (line 229) > # do the transaction (line 232) > # sync the log if doSync (line 243) > # do logSyncNotify (line 248) > {code:java} > //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java > @Override > public void run() { > try { > while (true) { > boolean doSync; > Edit edit = dequeueEdit(); // > line 229 > if (edit != null) { > // sync if requested by edit log. > doSync = edit.logEdit(); // > line 232 > syncWaitQ.add(edit); > } else { > // sync when editq runs dry, but have edits pending a sync. > doSync = !syncWaitQ.isEmpty(); > } > if (doSync) { > // normally edit log exceptions cause the NN to terminate, but tests > // relying on ExitUtil.terminate need to see the exception. > RuntimeException syncEx = null; > try { > logSync(getLastWrittenTxId()); // > line 243 > } catch (RuntimeException ex) { > syncEx = ex; > } > while ((edit = syncWaitQ.poll()) != null) { > edit.logSyncNotify(syncEx);// > line 248 > } > } > } > } catch (InterruptedException ie) { > LOG.info(Thread.currentThread().getName() + " was interrupted, > exiting"); > } catch (Throwable t) { > terminate(t); > } > } > {code} > In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is > essentially doing some network write (line 365). > {code:java} > //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java > private static class RpcEdit extends Edit { > // ... > @Override > public void logSyncNotify(RuntimeException syncEx) { > try { > if (syncEx == null) { > call.sendResponse(); // line > 365 > } else { > call.abortResponse(syncEx); > } > } catch (Exception e) {} // don't care if not sent. > } > // ... > }{code} > If the sendResponse operation in line 365 gets stuck, then the whole > FSEditLogAsync thread is not able to proceed. In this case, the critical > logSync (line 243) can’t be executed, for the incoming transactions. Then the > namenode hangs. This is undesirable because FSEditLogAsync’s key feature
[jira] [Commented] (HDFS-15921) Improve the log for the Storage Policy Operations
[ https://issues.apache.org/jira/browse/HDFS-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308830#comment-17308830 ] Bhavik Patel commented on HDFS-15921: - Thanks for the review [~hemanthboyina] As we are using SLF4J, it provides place holder-based logging format so parameters get substituted by actual string supplied at runtime; so IF condition is not required. > Improve the log for the Storage Policy Operations > - > > Key: HDFS-15921 > URL: https://issues.apache.org/jira/browse/HDFS-15921 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Attachments: HDFS-15921.001.patch > > > Improve the log for the Storage Policy Operations -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15921) Improve the log for the Storage Policy Operations
[ https://issues.apache.org/jira/browse/HDFS-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308816#comment-17308816 ] Hemanth Boyina commented on HDFS-15921: --- thanks for the report and patch [~bpatel] IMO in NameNodeRpcServer.java it will better if we have the IF condition check like {code:java} if(stateChangeLog.isDebugEnabled()) { {code} the same was missed in HDFS-15890 for concat operation > Improve the log for the Storage Policy Operations > - > > Key: HDFS-15921 > URL: https://issues.apache.org/jira/browse/HDFS-15921 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Attachments: HDFS-15921.001.patch > > > Improve the log for the Storage Policy Operations -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15921) Improve the log for the Storage Policy Operations
[ https://issues.apache.org/jira/browse/HDFS-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308795#comment-17308795 ] Hadoop QA commented on HDFS-15921: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 14s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 16s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 35s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 22m 59s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 3m 10s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 14s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 8s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | |
[jira] [Work logged] (HDFS-15922) Use memcpy for copying non-null terminated string in jni_helper.c
[ https://issues.apache.org/jira/browse/HDFS-15922?focusedWorklogId=571977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571977 ] ASF GitHub Bot logged work on HDFS-15922: - Author: ASF GitHub Bot Created on: 25/Mar/21 16:11 Start Date: 25/Mar/21 16:11 Worklog Time Spent: 10m Work Description: GauthamBanasandra opened a new pull request #2818: URL: https://github.com/apache/hadoop/pull/2818 * strncpy reports a warning if the destination string isn't null terminated. * The scenario here is that the string is deliberately not null terminated since we want to imperatively suffix a PATH_SEPARATOR at the end. * Thus, the warning reported by strncpy even though valid, isn't applicable. * Hence we replace strncpy with memcpy which doesn't worry if the string is null terminated or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 571977) Remaining Estimate: 0h Time Spent: 10m > Use memcpy for copying non-null terminated string in jni_helper.c > - > > Key: HDFS-15922 > URL: https://issues.apache.org/jira/browse/HDFS-15922 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs++ >Affects Versions: 3.4.0 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We currently get a warning while compiling HDFS native client - > {code} > [WARNING] inlined from 'wildcard_expandPath' at > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:427:21, > [WARNING] /usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:10: > warning: '__builtin_strncpy' output truncated before terminating nul copying > as many bytes from a string as its length [-Wstringop-truncation] > [WARNING] > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:402:43: > note: length computed here > {code} > The scenario here is such that the copied string is deliberately not null > terminated, since we want to insert a PATH_SEPARATOR ourselves. The warning > reported by strncpy is valid, but not applicable in this scenario. Thus, we > need to use memcpy which doesn't mind if the string is null terminated or not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15922) Use memcpy for copying non-null terminated string in jni_helper.c
[ https://issues.apache.org/jira/browse/HDFS-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-15922: -- Labels: pull-request-available (was: ) > Use memcpy for copying non-null terminated string in jni_helper.c > - > > Key: HDFS-15922 > URL: https://issues.apache.org/jira/browse/HDFS-15922 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs++ >Affects Versions: 3.4.0 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We currently get a warning while compiling HDFS native client - > {code} > [WARNING] inlined from 'wildcard_expandPath' at > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:427:21, > [WARNING] /usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:10: > warning: '__builtin_strncpy' output truncated before terminating nul copying > as many bytes from a string as its length [-Wstringop-truncation] > [WARNING] > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:402:43: > note: length computed here > {code} > The scenario here is such that the copied string is deliberately not null > terminated, since we want to insert a PATH_SEPARATOR ourselves. The warning > reported by strncpy is valid, but not applicable in this scenario. Thus, we > need to use memcpy which doesn't mind if the string is null terminated or not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15922) Use memcpy for copying non-null terminated string in jni_helper.c
[ https://issues.apache.org/jira/browse/HDFS-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautham Banasandra updated HDFS-15922: -- Description: We currently get a warning while compiling HDFS native client - {code} [WARNING] inlined from 'wildcard_expandPath' at /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:427:21, [WARNING] /usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:10: warning: '__builtin_strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Wstringop-truncation] [WARNING] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:402:43: note: length computed here {code} The scenario here is such that the copied string is deliberately not null terminated, since we want to insert a PATH_SEPARATOR ourselves. The warning reported by strncpy is valid, but not applicable in this scenario. Thus, we need to use memcpy which doesn't mind if the string is null terminated or not. was: We currently get a warning while compiling HDFS native client - {code} [WARNING] inlined from 'wildcard_expandPath' at /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:427:21, [WARNING] /usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:10: warning: '__builtin_strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Wstringop-truncation] [WARNING] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:402:43: note: length computed here {code} The scenario here is such that the copied string is deliberately not null terminated, since we want to insert a PATH_SEPARATOR ourselves. The warning reported by strncpy is valid, but not applicable in this scenario. > Use memcpy for copying non-null terminated string in jni_helper.c > - > > Key: HDFS-15922 > URL: https://issues.apache.org/jira/browse/HDFS-15922 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs++ >Affects Versions: 3.4.0 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > > We currently get a warning while compiling HDFS native client - > {code} > [WARNING] inlined from 'wildcard_expandPath' at > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:427:21, > [WARNING] /usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:10: > warning: '__builtin_strncpy' output truncated before terminating nul copying > as many bytes from a string as its length [-Wstringop-truncation] > [WARNING] > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:402:43: > note: length computed here > {code} > The scenario here is such that the copied string is deliberately not null > terminated, since we want to insert a PATH_SEPARATOR ourselves. The warning > reported by strncpy is valid, but not applicable in this scenario. Thus, we > need to use memcpy which doesn't mind if the string is null terminated or not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15922) Use memcpy for copying non-null terminated string in jni_helper.c
Gautham Banasandra created HDFS-15922: - Summary: Use memcpy for copying non-null terminated string in jni_helper.c Key: HDFS-15922 URL: https://issues.apache.org/jira/browse/HDFS-15922 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs++ Affects Versions: 3.4.0 Reporter: Gautham Banasandra Assignee: Gautham Banasandra We currently get a warning while compiling HDFS native client - {code} [WARNING] inlined from 'wildcard_expandPath' at /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:427:21, [WARNING] /usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:10: warning: '__builtin_strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Wstringop-truncation] [WARNING] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:402:43: note: length computed here {code} The scenario here is such that the copied string is deliberately not null terminated, since we want to insert a PATH_SEPARATOR ourselves. The warning reported by strncpy is valid, but not applicable in this scenario. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15894) Trace Time-consuming RPC response of certain threshold.
[ https://issues.apache.org/jira/browse/HDFS-15894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308778#comment-17308778 ] Hemanth Boyina commented on HDFS-15894: --- thanks [~prasad-acit] for the report and submitting the patch we already log slow RPC requests in Server#logSlowRpcCalls which solves your problem , IMHO I think we don't required these new changes > Trace Time-consuming RPC response of certain threshold. > --- > > Key: HDFS-15894 > URL: https://issues.apache.org/jira/browse/HDFS-15894 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Attachments: HDFS-15894.001.patch, HDFS-15894.002.patch, > HDFS-15894.003.patch > > > Monitor & Trace Time-consuming RPC requests. > Sometimes RPC Requests gets delayed, which impacts the system performance. > Currently, there is no track for delayed RPC request. > We can log such delayed RPC calls which exceeds certain threshold. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff
[ https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308776#comment-17308776 ] Jason Wen commented on HDFS-15916: -- Yes, it makes sense for sure if we can make it work with the mentioned fix. In general I would like to see the compatibility between 3.x client to 2.x server, but I am not sure if there is any more road blocks to achieve that. If for this particular case we can make it work with few changes, it's definitely worth fixing that. > Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for > snapshotdiff > > > Key: HDFS-15916 > URL: https://issues.apache.org/jira/browse/HDFS-15916 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 3.2.2 >Reporter: Srinivasu Majeti >Priority: Major > > Looks like when using distcp diff options between two snapshots from a hadoop > 3 cluster to hadoop 2 cluster , we get below exception and seems to be break > backward compatibility due to new API introduction > getSnapshotDiffReportListing. > > {code:java} > hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException): > Unknown method getSnapshotDiffReportListing called on > org.apache.hadoop.hdfs.protocol.ClientProtocol protocol > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang
[ https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=571930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571930 ] ASF GitHub Bot logged work on HDFS-15869: - Author: ASF GitHub Bot Created on: 25/Mar/21 15:00 Start Date: 25/Mar/21 15:00 Worklog Time Spent: 10m Work Description: linyiqun commented on a change in pull request #2737: URL: https://github.com/apache/hadoop/pull/2737#discussion_r601568107 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java ## @@ -63,6 +68,9 @@ DFS_NAMENODE_EDITS_ASYNC_LOGGING_PENDING_QUEUE_SIZE_DEFAULT); editPendingQ = new ArrayBlockingQueue<>(editPendingQSize); + +// the thread pool size should be configurable later, and justified with a rationale +logSyncNotifyExecutor = Executors.newFixedThreadPool(10); Review comment: I prefer to make this improvement be more configurable to use. Can we make thread pool size be configurable in this PR? if the pool size is configured as 0, that means this improvements is disabled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 571930) Time Spent: 1h 10m (was: 1h) > Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can > cause the namenode to hang > > > Key: HDFS-15869 > URL: https://issues.apache.org/jira/browse/HDFS-15869 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs async, namenode >Affects Versions: 3.2.2 >Reporter: Haoze Wu >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > We were doing some testing of the latest Hadoop stable release 3.2.2 and > found some network issue can cause the namenode to hang even with the async > edit logging (FSEditLogAsync). > The workflow of the FSEditLogAsync thread is basically: > # get EditLog from a queue (line 229) > # do the transaction (line 232) > # sync the log if doSync (line 243) > # do logSyncNotify (line 248) > {code:java} > //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java > @Override > public void run() { > try { > while (true) { > boolean doSync; > Edit edit = dequeueEdit(); // > line 229 > if (edit != null) { > // sync if requested by edit log. > doSync = edit.logEdit(); // > line 232 > syncWaitQ.add(edit); > } else { > // sync when editq runs dry, but have edits pending a sync. > doSync = !syncWaitQ.isEmpty(); > } > if (doSync) { > // normally edit log exceptions cause the NN to terminate, but tests > // relying on ExitUtil.terminate need to see the exception. > RuntimeException syncEx = null; > try { > logSync(getLastWrittenTxId()); // > line 243 > } catch (RuntimeException ex) { > syncEx = ex; > } > while ((edit = syncWaitQ.poll()) != null) { > edit.logSyncNotify(syncEx);// > line 248 > } > } > } > } catch (InterruptedException ie) { > LOG.info(Thread.currentThread().getName() + " was interrupted, > exiting"); > } catch (Throwable t) { > terminate(t); > } > } > {code} > In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is > essentially doing some network write (line 365). > {code:java} > //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java > private static class RpcEdit extends Edit { > // ... > @Override > public void logSyncNotify(RuntimeException syncEx) { > try { > if (syncEx == null) { > call.sendResponse(); // line > 365 > } else { > call.abortResponse(syncEx); > } > } catch (Exception e) {} // don't care if not sent. > } > // ... > }{code} > If the sendResponse operation in line 365 gets stuck, then the whole > FSEditLogAsync thread is not able to proceed. In this case, the critical > logSync (line 243) can’t be executed, for the incoming transactions. Then the
[jira] [Commented] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff
[ https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308711#comment-17308711 ] Ayush Saxena commented on HDFS-15916: - Well in any case if older client isn't able to connect to newer server, that would be a big problem, but in this case also I don't think there is any harm supporting a newer client to connect to an older server by means of a fallback. DistCp is a tool commonly used for migration and usually the data copy is done at the load(newer cluster) since the source cluster is serving the client requests till the migration is complete. And DistCp is a widely used util, supporting new client to old server should be good, most of the commonly used APIs do work that way barring some. IMO adding a fallback won't have any performance implications in the present code, so no resistance that way, Just something like below in \{{getSnapshotDiffReportInternal}} (I suppose) {code:java} try { report = dfs.getSnapshotDiffReportListing(snapshotDir, fromSnapshot, toSnapshot, startPath, index); } catch (RpcNoSuchMethodException e) { return dfs.getSnapshotDiffReport(snapshotDir, fromSnapshot, toSnapshot); }{code} There is a fallback for a case in the API today also: {code:java} // In case the diff needs to be computed between a snapshot and the current // tree, we should not do iterative diffReport computation as the iterative // approach might fail if in between the rpc calls the current tree // changes in absence of the global fsn lock. if (!isValidSnapshotName(fromSnapshot) || !isValidSnapshotName( toSnapshot)) { return dfs.getSnapshotDiffReport(snapshotDir, fromSnapshot, toSnapshot); }{code} [~zhenshan.wen] does that makes sense now? > Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for > snapshotdiff > > > Key: HDFS-15916 > URL: https://issues.apache.org/jira/browse/HDFS-15916 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 3.2.2 >Reporter: Srinivasu Majeti >Priority: Major > > Looks like when using distcp diff options between two snapshots from a hadoop > 3 cluster to hadoop 2 cluster , we get below exception and seems to be break > backward compatibility due to new API introduction > getSnapshotDiffReportListing. > > {code:java} > hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException): > Unknown method getSnapshotDiffReportListing called on > org.apache.hadoop.hdfs.protocol.ClientProtocol protocol > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough
[ https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308697#comment-17308697 ] Stephen O'Donnell commented on HDFS-13639: -- Thanks [~leosun08]. This was was a clean cherry-pick to branch-3.3 so backported it to there. > SlotReleaser is not fast enough > --- > > Key: HDFS-13639 > URL: https://issues.apache.org/jira/browse/HDFS-13639 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.4.0, 2.6.0, 3.0.2 > Environment: 1. YCSB: > {color:#00} recordcount=20 > fieldcount=1 > fieldlength=1000 > operationcount=1000 > > workload=com.yahoo.ycsb.workloads.CoreWorkload > > table=ycsb-test > columnfamily=C > readproportion=1 > updateproportion=0 > insertproportion=0 > scanproportion=0 > > maxscanlength=0 > requestdistribution=zipfian > > # default > readallfields=true > writeallfields=true > scanlengthdistribution=constan{color} > {color:#00}2. datanode:{color} > -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m > -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log > -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled > -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 > -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure > -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent > -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking > -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps > {color:#00}3. regionserver:{color} > {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g > -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 > -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 > -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc > -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime > -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy > -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m > -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking > -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 > -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 > -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 > -XX:G1OldCSetRegionThresholdPercent=5{color} > {color:#00}block cache is disabled:{color}{color:#00} > hbase.bucketcache.size > 0.9 > {color} > >Reporter: Gang Xie >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, > HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, > perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png > > > When test the performance of the ShortCircuit Read of the HDFS with YCSB, we > find that SlotReleaser of the ShortCircuitCache has some performance issue. > The problem is that, the qps of the slot releasing could only reach to 1000+ > while the qps of the slot allocating is ~3000. This means that the replica > info on datanode could not be released in time, which causes a lot of GCs and > finally full GCs. > > The fireflame graph shows that SlotReleaser spends a lot of time to do domain > socket connecting and throw/catching the exception when close the domain > socket and its streams. It doesn't make any sense to do the connecting and > closing each time. Each time when we connect to the domain socket, Datanode > allocates a new thread to free the slot. There are a lot of initializing > work, and it's costly. We need reuse the domain socket. > > After switch to reuse the domain socket(see diff attached), we get great > improvement(see the perf): > # without reusing the domain socket, the get qps of the YCSB getting worse > and worse, and after about 45 mins, full GC starts. When we reuse the domain > socket, no full GC found, and the stress test could be finished smoothly, the > qps of allocating and releasing match. > # Due to the datanode young GC, without the improvement, the YCSB get qps is > even smaller than the one with the improvement, ~3700 VS ~4200. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
[jira] [Updated] (HDFS-13639) SlotReleaser is not fast enough
[ https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-13639: - Fix Version/s: 3.3.1 > SlotReleaser is not fast enough > --- > > Key: HDFS-13639 > URL: https://issues.apache.org/jira/browse/HDFS-13639 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.4.0, 2.6.0, 3.0.2 > Environment: 1. YCSB: > {color:#00} recordcount=20 > fieldcount=1 > fieldlength=1000 > operationcount=1000 > > workload=com.yahoo.ycsb.workloads.CoreWorkload > > table=ycsb-test > columnfamily=C > readproportion=1 > updateproportion=0 > insertproportion=0 > scanproportion=0 > > maxscanlength=0 > requestdistribution=zipfian > > # default > readallfields=true > writeallfields=true > scanlengthdistribution=constan{color} > {color:#00}2. datanode:{color} > -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m > -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log > -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled > -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 > -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure > -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent > -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking > -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps > {color:#00}3. regionserver:{color} > {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g > -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 > -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 > -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc > -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime > -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy > -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m > -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking > -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 > -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 > -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 > -XX:G1OldCSetRegionThresholdPercent=5{color} > {color:#00}block cache is disabled:{color}{color:#00} > hbase.bucketcache.size > 0.9 > {color} > >Reporter: Gang Xie >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, > HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, > perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png > > > When test the performance of the ShortCircuit Read of the HDFS with YCSB, we > find that SlotReleaser of the ShortCircuitCache has some performance issue. > The problem is that, the qps of the slot releasing could only reach to 1000+ > while the qps of the slot allocating is ~3000. This means that the replica > info on datanode could not be released in time, which causes a lot of GCs and > finally full GCs. > > The fireflame graph shows that SlotReleaser spends a lot of time to do domain > socket connecting and throw/catching the exception when close the domain > socket and its streams. It doesn't make any sense to do the connecting and > closing each time. Each time when we connect to the domain socket, Datanode > allocates a new thread to free the slot. There are a lot of initializing > work, and it's costly. We need reuse the domain socket. > > After switch to reuse the domain socket(see diff attached), we get great > improvement(see the perf): > # without reusing the domain socket, the get qps of the YCSB getting worse > and worse, and after about 45 mins, full GC starts. When we reuse the domain > socket, no full GC found, and the stress test could be finished smoothly, the > qps of allocating and releasing match. > # Due to the datanode young GC, without the improvement, the YCSB get qps is > even smaller than the one with the improvement, ~3700 VS ~4200. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15919) BlockPoolManager should log stack trace if unable to get Namenode addresses
[ https://issues.apache.org/jira/browse/HDFS-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308675#comment-17308675 ] Stephen O'Donnell commented on HDFS-15919: -- Thanks all for the reviews and committed this down the branches. > BlockPoolManager should log stack trace if unable to get Namenode addresses > --- > > Key: HDFS-15919 > URL: https://issues.apache.org/jira/browse/HDFS-15919 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15919.001.patch > > > If the hdfs config is badly configured, the datanode can fail to start with > this stack trace: > {code} > 2021-03-24 05:58:27,026 INFO datanode.DataNode > (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for > nameservices: null > 2021-03-24 05:58:27,033 WARN datanode.DataNode > (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode > addresses. > ... > 2021-03-24 05:58:27,077 ERROR datanode.DataNode > (DataNode.java:secureMain(2883)) - Exception in secureMain > java.io.IOException: No services to connect, missing NameNode address. > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:500) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876) > at > org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243) > {code} > In this case, the issue was an exception thrown in > DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of > scenarios within it which can cause an exception, so its difficult to figure > out what is wrong with the config. > We should simple add the exception onto the existing log message when an > error occurs so it is clear what caused it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang
[ https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=571825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571825 ] ASF GitHub Bot logged work on HDFS-15869: - Author: ASF GitHub Bot Created on: 25/Mar/21 11:43 Start Date: 25/Mar/21 11:43 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2737: URL: https://github.com/apache/hadoop/pull/2737#issuecomment-806583634 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 55s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 33s | | trunk passed | | +1 :green_heart: | compile | 1m 20s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 1m 13s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 1s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 22s | | trunk passed | | +1 :green_heart: | javadoc | 0m 55s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 25s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 8s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 14s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 10s | | the patch passed | | +1 :green_heart: | compile | 1m 9s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 1m 9s | | the patch passed | | +1 :green_heart: | compile | 1m 6s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | javac | 1m 6s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 52s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) | | +1 :green_heart: | mvnsite | 1m 11s | | the patch passed | | +1 :green_heart: | javadoc | 0m 45s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 19s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 7s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 3s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 730m 2s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 45s | | The patch does not generate ASF License warnings. | | | | 815m 32s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.TestDistributedFileSystemWithECFile | | | hadoop.hdfs.TestRollingUpgradeDowngrade | | | hadoop.hdfs.TestPersistBlocks | | | hadoop.hdfs.TestDFSShell | | | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot | | | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | | hadoop.hdfs.server.mover.TestMover | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion | | | hadoop.hdfs.server.namenode.TestCacheDirectivesWithViewDFS | | | hadoop.hdfs.TestFetchImage | | | hadoop.hdfs.TestDistributedFileSystemWithECFileWithRandomECPolicy | | |
[jira] [Work started] (HDFS-15921) Improve the log for the Storage Policy Operations
[ https://issues.apache.org/jira/browse/HDFS-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-15921 started by Bhavik Patel. --- > Improve the log for the Storage Policy Operations > - > > Key: HDFS-15921 > URL: https://issues.apache.org/jira/browse/HDFS-15921 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > > Improve the log for the Storage Policy Operations -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15921) Improve the log for the Storage Policy Operations
[ https://issues.apache.org/jira/browse/HDFS-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavik Patel updated HDFS-15921: Status: Patch Available (was: In Progress) > Improve the log for the Storage Policy Operations > - > > Key: HDFS-15921 > URL: https://issues.apache.org/jira/browse/HDFS-15921 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Attachments: HDFS-15921.001.patch > > > Improve the log for the Storage Policy Operations -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15921) Improve the log for the Storage Policy Operations
[ https://issues.apache.org/jira/browse/HDFS-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavik Patel updated HDFS-15921: Attachment: HDFS-15921.001.patch > Improve the log for the Storage Policy Operations > - > > Key: HDFS-15921 > URL: https://issues.apache.org/jira/browse/HDFS-15921 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Attachments: HDFS-15921.001.patch > > > Improve the log for the Storage Policy Operations -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15921) Improve the log for the Storage Policy Operations
Bhavik Patel created HDFS-15921: --- Summary: Improve the log for the Storage Policy Operations Key: HDFS-15921 URL: https://issues.apache.org/jira/browse/HDFS-15921 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Bhavik Patel Assignee: Bhavik Patel Improve the log for the Storage Policy Operations -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15920) Solve the problem that the value of SafeModeMonitor#RECHECK_INTERVAL can be configured
[ https://issues.apache.org/jira/browse/HDFS-15920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangHua Zhu reassigned HDFS-15920: --- Assignee: JiangHua Zhu > Solve the problem that the value of SafeModeMonitor#RECHECK_INTERVAL can be > configured > -- > > Key: HDFS-15920 > URL: https://issues.apache.org/jira/browse/HDFS-15920 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > > The current SafeModeMonitor#RECHECK_INTERVAL value has a fixed value (=1000), > and this value should be set and configurable. Because the lock is occupied > internally, it competes with other places. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308455#comment-17308455 ] Hadoop QA commented on HDFS-15160: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 8s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.3 Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 27s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 29s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 22m 3s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 3m 5s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green}{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 36s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 3m 6s{color} | {color:green}{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:red}-1{color} | {color:red} unit {color} | {color:red}184m 14s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/557/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}275m 59s{color} | {color:black}{color} | {color:black}{color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | |
[jira] [Created] (HDFS-15920) Solve the problem that the value of SafeModeMonitor#RECHECK_INTERVAL can be configured
JiangHua Zhu created HDFS-15920: --- Summary: Solve the problem that the value of SafeModeMonitor#RECHECK_INTERVAL can be configured Key: HDFS-15920 URL: https://issues.apache.org/jira/browse/HDFS-15920 Project: Hadoop HDFS Issue Type: Improvement Reporter: JiangHua Zhu The current SafeModeMonitor#RECHECK_INTERVAL value has a fixed value (=1000), and this value should be set and configurable. Because the lock is occupied internally, it competes with other places. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15879) Exclude slow nodes when choose targets for blocks
[ https://issues.apache.org/jira/browse/HDFS-15879?focusedWorklogId=571677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571677 ] ASF GitHub Bot logged work on HDFS-15879: - Author: ASF GitHub Bot Created on: 25/Mar/21 06:47 Start Date: 25/Mar/21 06:47 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #2748: URL: https://github.com/apache/hadoop/pull/2748#discussion_r601108849 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyExcludeSlowNodes.java ## @@ -0,0 +1,128 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.blockmanagement; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hdfs.DFSConfigKeys; +import org.apache.hadoop.hdfs.DFSTestUtil; +import org.apache.hadoop.hdfs.TestBlockStoragePolicy; +import org.apache.hadoop.hdfs.server.namenode.NameNode; +import org.apache.hadoop.net.Node; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Set; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +@RunWith(Parameterized.class) +public class TestReplicationPolicyExcludeSlowNodes +extends BaseReplicationPolicyTest { + + public TestReplicationPolicyExcludeSlowNodes(String blockPlacementPolicy) { +this.blockPlacementPolicy = blockPlacementPolicy; + } + + @Parameterized.Parameters + public static Iterable data() { +return Arrays.asList(new Object[][] { Review comment: Thanks @tasanuma for your advice. I will update it as soon as possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 571677) Time Spent: 2h 40m (was: 2.5h) > Exclude slow nodes when choose targets for blocks > - > > Key: HDFS-15879 > URL: https://issues.apache.org/jira/browse/HDFS-15879 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > Previously, we have monitored the slow nodes, related to > [HDFS-11194|https://issues.apache.org/jira/browse/HDFS-11194]. > We can use a thread to periodically collect these slow nodes into a set. Then > use the set to filter out slow nodes when choose targets for blocks. > This feature can be configured to be turned on when needed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org