[jira] [Updated] (HDFS-16830) HDFS-16830. [SBN READ] dfsrouter transmit state id according to client's demand
[ https://issues.apache.org/jira/browse/HDFS-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16830: --- Summary: HDFS-16830. [SBN READ] dfsrouter transmit state id according to client's demand (was: Improve router msync operation) > HDFS-16830. [SBN READ] dfsrouter transmit state id according to client's > demand > --- > > Key: HDFS-16830 > URL: https://issues.apache.org/jira/browse/HDFS-16830 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode, rbf >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16832) [SBN READ] Fix NPE when check the block location of empty directory
[ https://issues.apache.org/jira/browse/HDFS-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu reassigned HDFS-16832: -- Assignee: zhengchenyu > [SBN READ] Fix NPE when check the block location of empty directory > --- > > Key: HDFS-16832 > URL: https://issues.apache.org/jira/browse/HDFS-16832 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > > HDFS-16732 is introduced for check block location when getListing or > getFileInfo. But When we check block location of empty directory will throw > NPE. > Exception stack on tez client are below: > {code:java} > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) > at org.apache.hadoop.ipc.Client.call(Client.java:1492) > at org.apache.hadoop.ipc.Client.call(Client.java:1389) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy12.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:678) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy13.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1671) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1212) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1195) > at > org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1140) > at > org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1136) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1154) > at > org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2054) > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:278) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) >
[jira] [Updated] (HDFS-16832) [SBN READ] Fix NPE when check the block location of empty directory
[ https://issues.apache.org/jira/browse/HDFS-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16832: --- Description: HDFS-16732 is introduced for check block location when getListing or getFileInfo. But When we check block location of empty directory will throw NPE. Exception stack on tez client are below: {code:java} org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1492) at org.apache.hadoop.ipc.Client.call(Client.java:1389) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy12.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:678) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy13.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1671) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1212) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1195) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1140) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1136) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1154) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2054) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:278) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} was: HDFS-16732 is introduced for check block location when getListing or getFileInfo. But When we check block location of empty directory will throw NPE. Exception stack on tez client are below: > [SBN READ] Fix NPE when check the block location of empty directory >
[jira] [Updated] (HDFS-16832) [SBN READ] Fix NPE when check the block location of empty directory
[ https://issues.apache.org/jira/browse/HDFS-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16832: --- Description: HDFS-16732 is introduced for check block location when getListing or getFileInfo. But When we check block location of empty directory will throw NPE. Exception stack on tez client are below: > [SBN READ] Fix NPE when check the block location of empty directory > --- > > Key: HDFS-16832 > URL: https://issues.apache.org/jira/browse/HDFS-16832 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Priority: Major > > HDFS-16732 is introduced for check block location when getListing or > getFileInfo. But When we check block location of empty directory will throw > NPE. > Exception stack on tez client are below: > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16832) [SBN READ] Fix NPE when check the block location of empty directory
[ https://issues.apache.org/jira/browse/HDFS-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16832: --- Summary: [SBN READ] Fix NPE when check the block location of empty directory (was: [SBN READ] Fix NPE when check block location) > [SBN READ] Fix NPE when check the block location of empty directory > --- > > Key: HDFS-16832 > URL: https://issues.apache.org/jira/browse/HDFS-16832 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16832) [SBN READ] Fix NPE when check block location
zhengchenyu created HDFS-16832: -- Summary: [SBN READ] Fix NPE when check block location Key: HDFS-16832 URL: https://issues.apache.org/jira/browse/HDFS-16832 Project: Hadoop HDFS Issue Type: Bug Reporter: zhengchenyu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16830) Improve router msync operation
[ https://issues.apache.org/jira/browse/HDFS-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627073#comment-17627073 ] zhengchenyu commented on HDFS-16830: Hi, in our production cluster, huge msync is introduced to active. I think we still should continue two work: (1) propagate state id as client need, avoid msync to namenode which is not used by client. (2) share msync, reduce the mysnc operations. [~simbadzina] How about my proposal? > Improve router msync operation > -- > > Key: HDFS-16830 > URL: https://issues.apache.org/jira/browse/HDFS-16830 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode, rbf >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16830) Improve router msync operation
zhengchenyu created HDFS-16830: -- Summary: Improve router msync operation Key: HDFS-16830 URL: https://issues.apache.org/jira/browse/HDFS-16830 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, rbf Reporter: zhengchenyu Assignee: zhengchenyu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627065#comment-17627065 ] zhengchenyu commented on HDFS-13522: [~simbadzina] I agree with you! Indeed in our production, I did not dare to disable msync, consequently many msync are introduced to active namenode. Though msync is low cost operation, I think it is necessary for us to reduce the msync. > HDFS-13522: Add federated nameservices states to client protocol and > propagate it between routers and clients. > -- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{{}FederationNamenodeServiceState{}}}. > This patch captures the state of all namespaces in the routers and propagates > it to clients. A follow up patch will change router behavior to direct > requests to the observer. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16708) RBF: Support transmit state id from client in router.
[ https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu resolved HDFS-16708. Resolution: Duplicate > RBF: Support transmit state id from client in router. > - > > Key: HDFS-16708 > URL: https://issues.apache.org/jira/browse/HDFS-16708 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522_proposal_zhengchenyu.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > Implement the Design A described in HDFS-13522. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626553#comment-17626553 ] zhengchenyu commented on HDFS-13522: [~simbadzina] Thanks for your great patch, this design is very clever! Can you share some experience about you production environment? In this design, many client share the pool state id, can we set auto msync period time to -1 or a very big value in client side? > HDFS-13522: Add federated nameservices states to client protocol and > propagate it between routers and clients. > -- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{{}FederationNamenodeServiceState{}}}. > This patch captures the state of all namespaces in the routers and propagates > it to clients. A follow up patch will change router behavior to direct > requests to the observer. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16682) [SBN Read] make estimated transactions configurable
[ https://issues.apache.org/jira/browse/HDFS-16682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606888#comment-17606888 ] zhengchenyu commented on HDFS-16682: [~xkrogen] Can you please review this? These parameter should depend cluster's load, I think should config it. > [SBN Read] make estimated transactions configurable > --- > > Key: HDFS-16682 > URL: https://issues.apache.org/jira/browse/HDFS-16682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > In GlobalStateIdContext, ESTIMATED_TRANSACTIONS_PER_SECOND and > ESTIMATED_SERVER_TIME_MULTIPLIER should be configured. > These parameter depends on different cluster's load. In the other way, these > config will help use to simulate observer namenode was far behind. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] (HDFS-14117) RBF: We can only delete the files or dirs of one subcluster in a cluster with multiple subclusters when trash is enabled
[ https://issues.apache.org/jira/browse/HDFS-14117 ] zhengchenyu deleted comment on HDFS-14117: was (Author: zhengchenyu): In our cluster, I must mount all nameservice for /user/${user}/.Trash, it means router will rename all nameservice when move to trash. Though it works for long time, this will result to bad performance when one namenode degrade. I wanna only connect one nameservice. So I have a new proposal: Condition: (1) /test is mounted in ns0 (2) /user/hdfs is mounted is ns1 If we move /test/hello to /user/hdfs/.Trash/Current/test/hello. When we process the location with trash prefix, we just use the location which remove the prefix to find the mounted ns. For /user/hdfs/.Trash/Current/test/hello, we remove the prefix '/user/hdfs/.Trash/Current', get '/test/hello', use '/test/hello' to find the mounted ns. Then we got the location: ns0->/user/hdfs/.Trash/Current/test/hello, then rename to trash will work. The problem is that we must check the pattern of location in every call, but I think it is low cost. [~elgoiri] [~ayushtkn] [~hexiaoqiao] [~ramkumar] [~xuzq_zander] How about my proposal? > RBF: We can only delete the files or dirs of one subcluster in a cluster with > multiple subclusters when trash is enabled > > > Key: HDFS-14117 > URL: https://issues.apache.org/jira/browse/HDFS-14117 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ramkumar Ramalingam >Assignee: Ramkumar Ramalingam >Priority: Major > Labels: RBF > Attachments: HDFS-14117-HDFS-13891.001.patch, > HDFS-14117-HDFS-13891.002.patch, HDFS-14117-HDFS-13891.003.patch, > HDFS-14117-HDFS-13891.004.patch, HDFS-14117-HDFS-13891.005.patch, > HDFS-14117-HDFS-13891.006.patch, HDFS-14117-HDFS-13891.007.patch, > HDFS-14117-HDFS-13891.008.patch, HDFS-14117-HDFS-13891.009.patch, > HDFS-14117-HDFS-13891.010.patch, HDFS-14117-HDFS-13891.011.patch, > HDFS-14117-HDFS-13891.012.patch, HDFS-14117-HDFS-13891.013.patch, > HDFS-14117-HDFS-13891.014.patch, HDFS-14117-HDFS-13891.015.patch, > HDFS-14117-HDFS-13891.016.patch, HDFS-14117-HDFS-13891.017.patch, > HDFS-14117-HDFS-13891.018.patch, HDFS-14117-HDFS-13891.019.patch, > HDFS-14117-HDFS-13891.020.patch, HDFS-14117.001.patch, HDFS-14117.002.patch, > HDFS-14117.003.patch, HDFS-14117.004.patch, HDFS-14117.005.patch > > > When we delete files or dirs in hdfs, it will move the deleted files or dirs > to trash by default. > But in the global path we can only mount one trash dir /user. So we mount > trash dir /user of the subcluster ns1 to the global path /user. Then we can > delete files or dirs of ns1, but when we delete the files or dirs of another > subcluser, such as hacluster, it will be failed. > h1. Mount Table > ||Global path||Target nameservice||Target path||Order||Read > only||Owner||Group||Permission||Quota/Usage||Date Modified||Date Created|| > |/test|hacluster2|/test| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: > -/-]|2018/11/29 14:37:42|2018/11/29 14:37:42| > |/tmp|hacluster1|/tmp| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: > -/-]|2018/11/29 14:37:05|2018/11/29 14:37:05| > |/user|hacluster2,hacluster1|/user|HASH| |securedn|users|rwxr-xr-x|[NsQuota: > -/-, SsQuota: -/-]|2018/11/29 14:42:37|2018/11/29 14:38:20| > commands: > {noformat} > 1./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /test/. > 18/11/30 11:00:47 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Found 1 items > -rw-r--r-- 3 securedn supergroup 8081 2018-11-30 10:56 /test/hdfs.cmd > 2./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /tmp/. > 18/11/30 11:00:40 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Found 1 items > -rw-r--r-- 3 securedn supergroup 6311 2018-11-30 10:57 /tmp/mapred.cmd > 3../opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm > /tmp/mapred.cmd > 18/11/30 11:01:02 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > rm: Failed to move to trash: hdfs://router/tmp/mapred.cmd: rename destination > parent /user/securedn/.Trash/Current/tmp/mapred.cmd not found. > 4./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm /test/hdfs.cmd > 18/11/30 11:01:20 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/11/30 11:01:22 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://router/test/hdfs.cmd' to trash at: > hdfs://router/user/securedn/.Trash/Current/test/hdfs.cmd >
[jira] [Commented] (HDFS-14117) RBF: We can only delete the files or dirs of one subcluster in a cluster with multiple subclusters when trash is enabled
[ https://issues.apache.org/jira/browse/HDFS-14117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600659#comment-17600659 ] zhengchenyu commented on HDFS-14117: In our cluster, I must mount all nameservice for /user/${user}/.Trash, it means router will rename all nameservice when move to trash. Though it works for long time, this will result to bad performance when one namenode degrade. I wanna only connect one nameservice. So I have a new proposal: Condition: (1) /test is mounted in ns0 (2) /user/hdfs is mounted is ns1 If we move /test/hello to /user/hdfs/.Trash/Current/test/hello. When we process the location with trash prefix, we just use the location which remove the prefix to find the mounted ns. For /user/hdfs/.Trash/Current/test/hello, we remove the prefix '/user/hdfs/.Trash/Current', get '/test/hello', use '/test/hello' to find the mounted ns. Then we got the location: ns0->/user/hdfs/.Trash/Current/test/hello, then rename to trash will work. The problem is that we must check the pattern of location in every call, but I think it is low cost. [~elgoiri] [~ayushtkn] [~hexiaoqiao] [~ramkumar] [~xuzq_zander] How about my proposal? > RBF: We can only delete the files or dirs of one subcluster in a cluster with > multiple subclusters when trash is enabled > > > Key: HDFS-14117 > URL: https://issues.apache.org/jira/browse/HDFS-14117 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ramkumar Ramalingam >Assignee: Ramkumar Ramalingam >Priority: Major > Labels: RBF > Attachments: HDFS-14117-HDFS-13891.001.patch, > HDFS-14117-HDFS-13891.002.patch, HDFS-14117-HDFS-13891.003.patch, > HDFS-14117-HDFS-13891.004.patch, HDFS-14117-HDFS-13891.005.patch, > HDFS-14117-HDFS-13891.006.patch, HDFS-14117-HDFS-13891.007.patch, > HDFS-14117-HDFS-13891.008.patch, HDFS-14117-HDFS-13891.009.patch, > HDFS-14117-HDFS-13891.010.patch, HDFS-14117-HDFS-13891.011.patch, > HDFS-14117-HDFS-13891.012.patch, HDFS-14117-HDFS-13891.013.patch, > HDFS-14117-HDFS-13891.014.patch, HDFS-14117-HDFS-13891.015.patch, > HDFS-14117-HDFS-13891.016.patch, HDFS-14117-HDFS-13891.017.patch, > HDFS-14117-HDFS-13891.018.patch, HDFS-14117-HDFS-13891.019.patch, > HDFS-14117-HDFS-13891.020.patch, HDFS-14117.001.patch, HDFS-14117.002.patch, > HDFS-14117.003.patch, HDFS-14117.004.patch, HDFS-14117.005.patch > > > When we delete files or dirs in hdfs, it will move the deleted files or dirs > to trash by default. > But in the global path we can only mount one trash dir /user. So we mount > trash dir /user of the subcluster ns1 to the global path /user. Then we can > delete files or dirs of ns1, but when we delete the files or dirs of another > subcluser, such as hacluster, it will be failed. > h1. Mount Table > ||Global path||Target nameservice||Target path||Order||Read > only||Owner||Group||Permission||Quota/Usage||Date Modified||Date Created|| > |/test|hacluster2|/test| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: > -/-]|2018/11/29 14:37:42|2018/11/29 14:37:42| > |/tmp|hacluster1|/tmp| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: > -/-]|2018/11/29 14:37:05|2018/11/29 14:37:05| > |/user|hacluster2,hacluster1|/user|HASH| |securedn|users|rwxr-xr-x|[NsQuota: > -/-, SsQuota: -/-]|2018/11/29 14:42:37|2018/11/29 14:38:20| > commands: > {noformat} > 1./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /test/. > 18/11/30 11:00:47 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Found 1 items > -rw-r--r-- 3 securedn supergroup 8081 2018-11-30 10:56 /test/hdfs.cmd > 2./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /tmp/. > 18/11/30 11:00:40 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Found 1 items > -rw-r--r-- 3 securedn supergroup 6311 2018-11-30 10:57 /tmp/mapred.cmd > 3../opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm > /tmp/mapred.cmd > 18/11/30 11:01:02 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > rm: Failed to move to trash: hdfs://router/tmp/mapred.cmd: rename destination > parent /user/securedn/.Trash/Current/tmp/mapred.cmd not found. > 4./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm /test/hdfs.cmd > 18/11/30 11:01:20 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/11/30 11:01:22 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://router/test/hdfs.cmd' to trash at: >
[jira] [Updated] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16732: --- Issue Type: Bug (was: Improvement) > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > ... 4 more {code} > As describe in MAPREDUCE-7082, when the block is missing, then will throw > this exception, but my cluster had no missing block. > In this example, I found getListing return location information. When block > report of observer is delayed, will return the block without location. > HDFS-13924 is introduce to solve this problem, but only consider > getBlockLocations. > In observer node, all method which may return location should check whether > locations is empty or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581645#comment-17581645 ] zhengchenyu commented on HDFS-16732: [~sunchao] [~xkrogen] [~zero45] Can you please review this? > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > ... 4 more {code} > As describe in MAPREDUCE-7082, when the block is missing, then will throw > this exception, but my cluster had no missing block. > In this example, I found getListing return location information. When block > report of observer is delayed, will return the block without location. > HDFS-13924 is introduce to solve this problem, but only consider > getBlockLocations. > In observer node, all method which may return location should check whether > locations is empty or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16732: --- Description: Hive on tez application fail occasionally after observer is enable, log show below. {code:java} 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, vertex=vertex_1660618571916_4839_1_00 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) ... 4 more {code} As describe in MAPREDUCE-7082, when the block is missing, then will throw this exception, but my cluster had no missing block. In this example, I found getListing return location information. When block report of observer is delayed, will return the block without location. HDFS-13924 is introduce to solve this problem, but only consider getBlockLocations. In observer node, all method which may return location should check whether locations is empty or not. was: Hive on tez application fail occasionally after observer is enable, log show below. {code:java} 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, vertex=vertex_1660618571916_4839_1_00 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
[jira] [Updated] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16732: --- Description: Hive on tez application fail occasionally after observer is enable, log show below. {code:java} 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, vertex=vertex_1660618571916_4839_1_00 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) ... 4 more {code} As show in MAPREDUCE-7082, when the block is missing, then will throw this exception, but my cluster had no missing block. In this example, I found getListing return location information. When block report of observer is delayed, will return the block without location. HDFS-13924 is introduce to solve this problem, but only consider getBlockLocations. In observer node, all method which may return location should check whether locations is empty or not. > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at >
[jira] [Created] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
zhengchenyu created HDFS-16732: -- Summary: [SBN READ] Avoid get location from observer when the block report is delayed. Key: HDFS-16732 URL: https://issues.apache.org/jira/browse/HDFS-16732 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.2.1 Reporter: zhengchenyu Assignee: zhengchenyu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16708) RBF: Support transmit state id from client in router.
[ https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573722#comment-17573722 ] zhengchenyu edited comment on HDFS-16708 at 8/1/22 11:58 AM: - [~xkrogen] [~xuzq_zander] [~simbadzina] Let's continue Design A here. I think Design A is not implemented in all HDFS-13522's PR. there is no need to propagate all namespace's state ids in Design A. We can propagate by client's demand. I think we need a whole implement and document, then continue to discuss. I will submit new draft PR about the whole implement, document is here([^HDFS-13522_proposal_zhengchenyu.pdf]). Can you give me some suggestion? _Note: It is only a draft, the setting is a little complex, maybe I need to make it simple._ was (Author: zhengchenyu): [~xkrogen] [~xuzq_zander] [~simbadzina] Let's continue Design A here. I think Design A is not implemented in all HDFS-13522's PR. there is no need to propagate all namespace's state ids in Design A. We can propagate by client's demand. I think we need a whole implement and document, then continue to discuss. I will submit new draft PR about the whole implement, document is here([^HDFS-13522_proposal_zhengchenyu.pdf]). Can you give me some suggestion? _Note: It is a draft, the setting is a little complex, maybe I need to make it simple._ > RBF: Support transmit state id from client in router. > - > > Key: HDFS-16708 > URL: https://issues.apache.org/jira/browse/HDFS-16708 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522_proposal_zhengchenyu.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > Implement the Design A described in HDFS-13522. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16708) RBF: Support transmit state id from client in router.
[ https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573722#comment-17573722 ] zhengchenyu edited comment on HDFS-16708 at 8/1/22 11:58 AM: - [~xkrogen] [~xuzq_zander] [~simbadzina] Let's continue Design A here. I think Design A is not implemented in all HDFS-13522's PR. there is no need to propagate all namespace's state ids in Design A. We can propagate by client's demand. I think we need a whole implement and document, then continue to discuss. I will submit new draft PR about the whole implement, document is here([^HDFS-13522_proposal_zhengchenyu.pdf]). Can you give me some suggestion? _Note: It is a draft, the setting is a little complex, maybe I need to make it simple._ was (Author: zhengchenyu): [~xkrogen] [~xuzq_zander] [~simbadzina] Let's continue Design A here. I think Design A is not implemented in all HDFS-13522's PR. there is no need to propagate all namespace's state ids in Design A. We can propagate by client's demand. I think we need a whole implement and document, then continue to discuss. I will submit new draft PR about the whole implement, Can you give me some suggestion? _Note: It is a draft, the setting is a little complex, maybe I need to make it simple._ > RBF: Support transmit state id from client in router. > - > > Key: HDFS-16708 > URL: https://issues.apache.org/jira/browse/HDFS-16708 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522_proposal_zhengchenyu.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > Implement the Design A described in HDFS-13522. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16708) RBF: Support transmit state id from client in router.
[ https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573722#comment-17573722 ] zhengchenyu commented on HDFS-16708: [~xkrogen] [~xuzq_zander] [~simbadzina] Let's continue Design A here. I think Design A is not implemented in all HDFS-13522's PR. there is no need to propagate all namespace's state ids in Design A. We can propagate by client's demand. I think we need a whole implement and document, then continue to discuss. I will submit new draft PR about the whole implement, Can you give me some suggestion? _Note: It is a draft, the setting is a little complex, maybe I need to make it simple._ > RBF: Support transmit state id from client in router. > - > > Key: HDFS-16708 > URL: https://issues.apache.org/jira/browse/HDFS-16708 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522_proposal_zhengchenyu.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > Implement the Design A described in HDFS-13522. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-13522: --- Attachment: (was: HDFS-13522_proposal_zhengchenyu.pdf) > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16708) RBF: Support transmit state id from client in router.
[ https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16708: --- Description: Implement the Design A described in HDFS-13522. > RBF: Support transmit state id from client in router. > - > > Key: HDFS-16708 > URL: https://issues.apache.org/jira/browse/HDFS-16708 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Attachments: HDFS-13522_proposal_zhengchenyu.pdf > > > Implement the Design A described in HDFS-13522. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16708) RBF: Support transmit state id from client in router.
zhengchenyu created HDFS-16708: -- Summary: RBF: Support transmit state id from client in router. Key: HDFS-16708 URL: https://issues.apache.org/jira/browse/HDFS-16708 Project: Hadoop HDFS Issue Type: Improvement Reporter: zhengchenyu Assignee: zhengchenyu Attachments: HDFS-13522_proposal_zhengchenyu.pdf -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16708) RBF: Support transmit state id from client in router.
[ https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16708: --- Attachment: HDFS-13522_proposal_zhengchenyu.pdf > RBF: Support transmit state id from client in router. > - > > Key: HDFS-16708 > URL: https://issues.apache.org/jira/browse/HDFS-16708 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Attachments: HDFS-13522_proposal_zhengchenyu.pdf > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522 ] zhengchenyu deleted comment on HDFS-13522: was (Author: zhengchenyu): [~xkrogen] [~xuzq_zander] [~simbadzina] For I know, Design A is not implemented in all PR. For Design A, there is no need to propagate all namespace's state ids. We can propagate by client's demand. I think we need a whole implement and document, then continue to discuss. I have a draft which is combination of Design A and B. If someone are interested in Design A, can you help review this draft [https://github.com/zhengchenyu/hadoop/commit/a47ae882943f090836a801cf758761c5b970d813.] > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-13522: --- Attachment: HDFS-13522_proposal_zhengchenyu.pdf > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573704#comment-17573704 ] zhengchenyu edited comment on HDFS-13522 at 8/1/22 11:34 AM: - [~xuzq_zander] Hi, the use case about design A is very rare indeed. But Design A also have advantage. (1) More flexible Client could set their msync period time by itself. Example: In our cluster, one name service, some special user detect hdfs file is created periodically, may need high time precision, means more frequent msync.(Though I am oppose to this way). (2) Save msync I think there is no need to call msync periodically for most HIVE, MR application. Design A will save more msync than Design B。 I agree [~xuzq_zander] 's suggestion that focus on Design B first, add Design A as a bonus item. It is no easy to review both Design A and Design B. Could we only complete Design B in this issue? [~omalley] [~elgoiri] [~simbadzina] was (Author: zhengchenyu): [~xuzq_zander] Hi, the use case about design A is very rare indeed. But Design A also have advantage. (1) More flexible Client could set their msync period time by itself. Example: In our cluster, one name service, some special user detect hdfs file is created periodically, may need high time precision, means more frequent msync.(Though I am oppose to this way). (2) Save msync I think there is no need to call msync periodically for most HIVE, MR application. Design A will save more msync than Design B。 I agree [~xuzq_zander] 's suggestion that focus on Design B first, add Design A as a bonus item. It is no easy to review both Design A and Design B. Could we only complete Design B in this issue? [~omalley] [~elgoiri] > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573707#comment-17573707 ] zhengchenyu edited comment on HDFS-13522 at 8/1/22 11:32 AM: - [~xkrogen] [~xuzq_zander] [~simbadzina] For I know, Design A is not implemented in all PR. For Design A, there is no need to propagate all namespace's state ids. We can propagate by client's demand. I think we need a whole implement and document, then continue to discuss. I have a draft which is combination of Design A and B. If someone are interested in Design A, can you help review this draft [https://github.com/zhengchenyu/hadoop/commit/a47ae882943f090836a801cf758761c5b970d813.] was (Author: zhengchenyu): [~xkrogen] [~xuzq_zander] [~simbadzina] For I know, Design A is not implemented in all PR. For Design A, there is no need to propagate all namespace's state ids. We can propagate by client's demand. I think we need a whole implement and document, then continue to discuss. I have a draft which is combination of Design A and B. If someone are interested in Design A, can you help review this draft [https://github.com/zhengchenyu/hadoop/commit/a47ae882943f090836a801cf758761c5b970d813.] > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-13522: --- Attachment: (was: HDFS-13522_proposal_zhengchenyu_v1.pdf) > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573704#comment-17573704 ] zhengchenyu edited comment on HDFS-13522 at 8/1/22 11:26 AM: - [~xuzq_zander] Hi, the use case about design A is very rare indeed. But Design A also have advantage. (1) More flexible Client could set their msync period time by itself. Example: In our cluster, one name service, some special user detect hdfs file is created periodically, may need high time precision, means more frequent msync.(Though I am oppose to this way). (2) Save msync I think there is no need to call msync periodically for most HIVE, MR application. Design A will save more msync than Design B。 I agree [~xuzq_zander] 's suggestion that focus on Design B first, add Design A as a bonus item. It is no easy to review both Design A and Design B. Could we only complete Design B in this issue? [~omalley] [~elgoiri] was (Author: zhengchenyu): [~xuzq_zander] Hi, the use case about design A is very rare indeed. But Design A also have advantage. (1) More flexible Client could set their msync period time by itself. Example: In our cluster, one name service, some special user detect hdfs file is created periodically, may need high time precision, means more frequent msync.(Though I am oppose to this way). (2) Save msync I think there is no need to call msync periodically for most HIVE, MR application. Design A will save more msync than Design B。 I agree [~xuzq_zander] 's suggestion that focus on Design B first, add Design A as a bonus item. It is no easy to review both Design A and Design B. Could we only complete Design B in this issue? > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573707#comment-17573707 ] zhengchenyu commented on HDFS-13522: [~xkrogen] [~xuzq_zander] [~simbadzina] For I know, Design A is not implemented in all PR. For Design A, there is no need to propagate all namespace's state ids. We can propagate by client's demand. I think we need a whole implement and document, then continue to discuss. I have a draft which is combination of Design A and B. If someone are interested in Design A, can you help review this draft [https://github.com/zhengchenyu/hadoop/commit/a47ae882943f090836a801cf758761c5b970d813.] > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573704#comment-17573704 ] zhengchenyu edited comment on HDFS-13522 at 8/1/22 11:24 AM: - [~xuzq_zander] Hi, the use case about design A is very rare indeed. But Design A also have advantage. (1) More flexible Client could set their msync period time by itself. Example: In our cluster, one name service, some special user detect hdfs file is created periodically, may need high time precision, means more frequent msync.(Though I am oppose to this way). (2) Save msync I think there is no need to call msync periodically for most HIVE, MR application. Design A will save more msync than Design B。 I agree [~xuzq_zander] 's suggestion that focus on Design B first, add Design A as a bonus item. It is no easy to review both Design A and Design B. Could we only complete Design B in this issue? was (Author: zhengchenyu): [~xuzq_zander] Hi, the use case about design A is very rare indeed. But Design A also have advantage. (1) More flexible Client could set their msync period time by itself. Example: In our cluster, one name service, some special user detect hdfs file is created periodically, may need high time precision, means more frequent msync.(Though I am oppose to this way). (2) Save msync I think there is no need to call msync periodically for most HIVE, MR application. Design A will save more msync than Design B。 I agree your suggestion that focus on Design B first, add Design A as a bonus item. It is no easy to review both Design A and Design B. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573704#comment-17573704 ] zhengchenyu commented on HDFS-13522: [~xuzq_zander] Hi, the use case about design A is very rare indeed. But Design A also have advantage. (1) More flexible Client could set their msync period time by itself. Example: In our cluster, one name service, some special user detect hdfs file is created periodically, may need high time precision, means more frequent msync.(Though I am oppose to this way). (2) Save msync I think there is no need to call msync periodically for most HIVE, MR application. Design A will save more msync than Design B。 I agree your suggestion that focus on Design B first, add Design A as a bonus item. It is no easy to review both Design A and Design B. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571703#comment-17571703 ] zhengchenyu commented on HDFS-13522: [~simbadzina] I also agree the *combination of both Design A and B.* Most user use plan B, some special user could choose plan A by their demand. ** [~simbadzina] [~xuzq_zander] what's your email of slack? Or we can gather in hdfs channel fisrtly. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571703#comment-17571703 ] zhengchenyu edited comment on HDFS-13522 at 7/27/22 2:45 AM: - [~simbadzina] I also agree the *combination of both Design A and B.* Most user use plan B, some special user could choose plan A by their demand. [~simbadzina] [~xuzq_zander] what's your email of slack? Or we can gather in hdfs channel firstly. was (Author: zhengchenyu): [~simbadzina] I also agree the *combination of both Design A and B.* Most user use plan B, some special user could choose plan A by their demand. ** [~simbadzina] [~xuzq_zander] what's your email of slack? Or we can gather in hdfs channel fisrtly. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571186#comment-17571186 ] zhengchenyu edited comment on HDFS-13522 at 7/26/22 4:16 AM: - [~simbadzina] In your v2 document, maybe no detailed implement and structure. I still don't know how to implement in your design. What's you choice about design A and design B? (Note: The last picture is not clear in google document.) In my design, key is nameserviceId in router mode, key is clientid+callid in cilent mode. In my proposal, mainly describe client mode. About switches between routers, there is no problem in my proposal in client mode. because client carry real state id. I think we need more discuss about this. I create a channel named ''hdfs-13522" on slack. Let us discuss on this channel for more efficient communication, and continue to meeting. [~simbadzina] [~xuzq_zander] was (Author: zhengchenyu): [~simbadzina] In your v2 document, maybe no detailed implement and structure. I still don't know how to implement in your design. (Note: The last picture is not clear in google document.) In my design, key is nameserviceId in router mode, key is clientid+callid in cilent mode. In my proposal, mainly describe client mode. About switches between routers, there is no problem in my proposal in client mode. because client carry real state id. I think we need more discuss about this. I create a channel named ''hdfs-13522" on slack. Let us discuss on this channel for more efficient communication, and continue to meeting. [~simbadzina] [~xuzq_zander] > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571186#comment-17571186 ] zhengchenyu edited comment on HDFS-13522 at 7/26/22 4:15 AM: - [~simbadzina] In your v2 document, maybe no detailed implement and structure. I still don't know how to implement in your design. (Note: The last picture is not clear in google document.) In my design, key is nameserviceId in router mode, key is clientid+callid in cilent mode. In my proposal, mainly describe client mode. About switches between routers, there is no problem in my proposal in client mode. because client carry real state id. I think we need more discuss about this. I create a channel named ''hdfs-13522" on slack. Let us discuss on this channel for more efficient communication, and continue to meeting. [~simbadzina] [~xuzq_zander] was (Author: zhengchenyu): [~simbadzina] In your v2 document, maybe no implement and structure. In my design, key is nameserviceId in router mode, key is clientid+callid in cilent mode. In my proposal, mainly describe client mode. About switches between routers, there is no problem in my proposal in client mode. because client carry real state id. I think we need more discuss about this. I create a channel named ''hdfs-13522" on slack. Let us discuss on this channel for more efficient communication, and continue to meeting. [~simbadzina] [~xuzq_zander] > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571186#comment-17571186 ] zhengchenyu edited comment on HDFS-13522 at 7/26/22 4:11 AM: - [~simbadzina] In your v2 document, maybe no implement and structure. In my design, key is nameserviceId in router mode, key is clientid+callid in cilent mode. In my proposal, mainly describe client mode. About switches between routers, there is no problem in my proposal in client mode. because client carry real state id. I think we need more discuss about this. I create a channel named ''hdfs-13522" on slack. Let us discuss on this channel for more efficient communication, and continue to meeting. [~simbadzina] [~xuzq_zander] was (Author: zhengchenyu): [~simbadzina] In your v2 document, maybe no implement and structure. In my design, key is nameserviceId in router mode, key is clientid+callid in cilent mode. In my proposal, mainly describe client mode. About switches between routers, there is no problem in my proposal in client mode. because client carry real state id. I think we need more discuss about this. I create a channel named ''hdfs-13522" in slack. Let us discuss on this channel for more efficient communication, and continue to meeting. [~simbadzina] [~xuzq_zander] > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571186#comment-17571186 ] zhengchenyu commented on HDFS-13522: [~simbadzina] In your v2 document, maybe no implement and structure. In my design, key is nameserviceId in router mode, key is clientid+callid in cilent mode. In my proposal, mainly describe client mode. About switches between routers, there is no problem in my proposal in client mode. because client carry real state id. I think we need more discuss about this. I create a channel named ''hdfs-13522". Let us discuss on this channel for more efficient communication, and continue to meeting. [~simbadzina] [~xuzq_zander] > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571186#comment-17571186 ] zhengchenyu edited comment on HDFS-13522 at 7/26/22 4:10 AM: - [~simbadzina] In your v2 document, maybe no implement and structure. In my design, key is nameserviceId in router mode, key is clientid+callid in cilent mode. In my proposal, mainly describe client mode. About switches between routers, there is no problem in my proposal in client mode. because client carry real state id. I think we need more discuss about this. I create a channel named ''hdfs-13522" in slack. Let us discuss on this channel for more efficient communication, and continue to meeting. [~simbadzina] [~xuzq_zander] was (Author: zhengchenyu): [~simbadzina] In your v2 document, maybe no implement and structure. In my design, key is nameserviceId in router mode, key is clientid+callid in cilent mode. In my proposal, mainly describe client mode. About switches between routers, there is no problem in my proposal in client mode. because client carry real state id. I think we need more discuss about this. I create a channel named ''hdfs-13522". Let us discuss on this channel for more efficient communication, and continue to meeting. [~simbadzina] [~xuzq_zander] > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16682) [SBN Read] make estimated transactions configurable
zhengchenyu created HDFS-16682: -- Summary: [SBN Read] make estimated transactions configurable Key: HDFS-16682 URL: https://issues.apache.org/jira/browse/HDFS-16682 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: zhengchenyu Assignee: zhengchenyu In GlobalStateIdContext, ESTIMATED_TRANSACTIONS_PER_SECOND and ESTIMATED_SERVER_TIME_MULTIPLIER should be configured. These parameter depends on different cluster's load. In the other way, these config will help use to simulate observer namenode was far behind. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569802#comment-17569802 ] zhengchenyu edited comment on HDFS-13522 at 7/22/22 5:11 AM: - [~simbadzina] Thanks for involving me. HDFS-13522_proposal_zhengchenyu_v1.pdf is my proposal document. Chapter 2.1 is just solution C, and describe how to carry state id in my demo implement. Can you please review my proposal. was (Author: zhengchenyu): [~simbadzina] Thanks for involving me. HDFS-13522_proposal_zhengchenyu_v1.pdf is my proposal document. Chapter 2.1 describe how to carry state id in my demo implement. Can you please review my proposal. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569802#comment-17569802 ] zhengchenyu commented on HDFS-13522: [~simbadzina] Thanks for involving me. HDFS-13522_proposal_zhengchenyu_v1.pdf is my proposal document. Chapter 2.1 describe how to carry state id in my demo implement. Can you please review my proposal. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-13522: --- Attachment: HDFS-13522_proposal_zhengchenyu_v1.pdf > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer > support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569363#comment-17569363 ] zhengchenyu commented on HDFS-13522: It is a critical issue so that so many comment. Let me summarize the comments. Seems There are two solution in history: (A) Solution A is proposed by CR Hota. And describe in RBF_ Observer support.pdf This proposal is not implemented. Seems need both read only router and write router. I feel the deployment is complex. As no one notice it, let us ignore temporarily. (B) SolutionB is proposed by Surendra Singh Lilhore and Hemanth Boyina. Then zhoubing zheng and Simbarashe Dzinamarira wants rebase into the trunk. This solution is implemented, just HDFS-13522***.patch. I think the second solution meet the most user's demand. But router hide the state id from client. I think router should connect namenode with client's state id. And I think [~omalley] also means it. Here I named the proposal which carry the client state id is the solution C. Note: I had planned to implement solution C in another issue after [~simbadzina] 's work. As [~omalley] has said, let us work all it in this issue. [~simbadzina] , I proposed both solution B and solution C. And I had implement a demo version. Can I work it with you together? > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf > > Time Spent: 20h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562354#comment-17562354 ] zhengchenyu commented on HDFS-14703: Is this ongoing? It is a great work indeed. But I doubt that why not hold read lock in document and fgl branch? When write /a/b/c, I think we need hold the read lock of /a and /a/b, then hold the write lock of /a/b/c. If some write operation on /a/b happen in same time, they may result to inconsistence. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-4645) Move from randomly generated block ID to sequentially generated block ID
[ https://issues.apache.org/jira/browse/HDFS-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu reassigned HDFS-4645: - Assignee: zhengchenyu (was: Arpit Agarwal) > Move from randomly generated block ID to sequentially generated block ID > > > Key: HDFS-4645 > URL: https://issues.apache.org/jira/browse/HDFS-4645 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Suresh Srinivas >Assignee: zhengchenyu >Priority: Major > Fix For: 2.1.0-beta > > Attachments: HDFS-4645.001.patch, HDFS-4645.002.patch, > HDFS-4645.003.patch, HDFS-4645.004.patch, HDFS-4645.005.patch, > HDFS-4645.006.patch, HDFS-4645.branch-2.patch, > SequentialblockIDallocation.pdf, editsStored > > > Currently block IDs are randomly generated. This means there is no pattern to > block ID generation and no guarantees such as uniqueness of block ID for the > life time of the system can be made. I propose using SequentialNumber for > block ID generation. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551940#comment-17551940 ] zhengchenyu edited comment on HDFS-13522 at 6/9/22 3:52 AM: [~simbadzina] Router does not use client's AlignmentContext, So I think dfs.federation.router.observer.auto-msync-period must set to 0, there will be no problem. If this value is too big, router may not catch the latest modification. Because Router does not use client's AlignmentContext, there is no way to make sure the current router stat id is newer than client state id. Maybe we need to set dfs.federation.router.observer.auto-msync-period to fixed value '0', or note in document. was (Author: zhengchenyu): [~simbadzina] Router does not use client's AlignmentContext, So I think dfs.federation.router.observer.auto-msync-period must set to 0, then there will be no problem. If this value is too big, router may not catch the latest modification. Because Router does not use client's AlignmentContext, there is no way to make sure the current router stat id is newer than client state id. Maybe we need to set dfs.federation.router.observer.auto-msync-period to fixed value '0', or note in document. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 10h 20m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551940#comment-17551940 ] zhengchenyu commented on HDFS-13522: [~simbadzina] Router does not use client's AlignmentContext, So I think dfs.federation.router.observer.auto-msync-period must set to 0, then there will be no problem. If this value is too big, router may not catch the latest modification. Because Router does not use client's AlignmentContext, there is no way to make sure the current router stat id is newer than client state id. Maybe we need to set dfs.federation.router.observer.auto-msync-period to fixed value '0', or note in document. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 10h 20m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542746#comment-17542746 ] zhengchenyu commented on HDFS-13522: Hi [~simbadzina], I know you said, maybe you don't catch my idea. I means I can't config in rpc call level. Class RouterStateIdContext only implement the receiveRequestState. When router call namenode, they use their own AlignmentContext, but not from client. So can't meet different user's demand. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 8h 10m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541887#comment-17541887 ] zhengchenyu commented on HDFS-13522: Thanks for this good patch! But I have a question. In this patch, router hide the client's state id. I think it means we can not enable or disable this feature, and can not configure. As I know, Observer NameNode can't keep absolute consistency. So user may need to configure auto-mysnc-period and some other parameter so that meet different user's demand. I think it had better to use client's state id in client rpc call level. (Note: of course, we need more change, especially mount multi nameservice. ) > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 8h 10m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15715: --- Description: One of our Namenode which has 300M files and blocks. This namenode shoud not be in heavy load generally. But we found rpc process time keep high, and decommission is very slow. I search the metrics, I found uderreplicated blocks keep high. Then I jstack namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget can't find block, so result to performance degradation. Consider with HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget can't find proper block. Then I enable some debug. (Of course I revise some code so that only debug isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I found "the rack has too many chosen nodes" is called. Then I found some log like this {code:java} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} Then through some debug and simulation, I found the reason, and reproduction this exception. The reason is that some developer use COLD storage policy and mover, but the operations of setting storage policy and mover are asynchronous. So some file's real datanodestorages are not match with this storagePolicy. Let me simualte this proccess. If /tmp/a is create, then have 2 replications are DISK. Then set storage policy to COLD. When some logical trigger(For example decommission) to copy this block. chooseTarget then use chooseStorageTypes to filter real needed block. Here the size of variable requiredStorageTypes which chooseStorageTypes returned is 3. But the size of result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK storage. Then will request to choose 3 target. choose first target is right, but when choose seconde target, the variable 'counter' is 4 which is larger than maxTargetPerRack which is 3 in function isGoodTarget. So skip all datanodestorage. Then result to bad performance. I think chooseStorageTypes need to consider the result, when the exist replication doesn't meet storage policy's demand, we need to remove this from result. I changed by this way, and test in my unit-test. Then solve it. was: One of our Namenode which has 300M files and blocks. This namenode shoud not be in heavy load generally. But we found rpc process time keep high, and decommission is very slow. I search the metrics, I found uderreplicated blocks keep high. Then I jstack namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget can't find block, so result to performance degradation. Consider with HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget can't find proper block. Then I enable some debug. (Of course I revise some code so that only debug isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I found "the rack has too many chosen nodes" is called. Then I found some log like this {code:java} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} Then through some debug and simulation, I found the reason, and reproduction this exception. The reason is that some developer use COLD storage policy and mover, but the operatiosn of
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15715: --- Description: One of our Namenode which has 300M files and blocks. This namenode shoud not be in heavy load generally. But we found rpc process time keep high, and decommission is very slow. I search the metrics, I found uderreplicated blocks keep high. Then I jstack namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget can't find block, so result to performance degradation. Consider with HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget can't find proper block. Then I enable some debug. (Of course I revise some code so that only debug isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I found "the rack has too many chosen nodes" is called. Then I found some log like this {code:java} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} Then through some debug and simulation, I found the reason, and reproduction this exception. The reason is that some developer use COLD storage policy and mover, but the operatiosn of setting storage policy and mover are asynchronous. So some file's real datanodestorages are not match with this storagePolicy. Let me simualte this proccess. If /tmp/a is create, then have 2 replications are DISK. Then set storage policy to COLD. When some logical trigger(For example decommission) to copy this block. chooseTarget then use chooseStorageTypes to filter real needed block. Here the size of variable requiredStorageTypes which chooseStorageTypes returned is 3. But the size of result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK storage. Then will request to choose 3 target. choose first target is right, but when choose seconde target, the variable 'counter' is 4 which is larger than maxTargetPerRack which is 3 in function isGoodTarget. So skip all datanodestorage. Then result to bad performance. I think chooseStorageTypes need to consider the result, when the exist replication doesn't meet storage policy's demand, we need to remove this from result. I changed by this way, and test in my unit-test. Then solve it. was: One of our Namenode which has 300M files and blocks. In common way, this namode shoud not be in heavy load. But we found rpc process time keep high, and decommission is very slow. I search the metrics, I found uderreplicated blocks keep high. Then I jstack namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget can't find block, so result to performance degradation. Consider with HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget can't find proper block. Then I enable some debug. (Of course I revise some code so that only debug isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I found "the rack has too many chosen nodes" is called. Then I found some log like this {code} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} Then through some debug and simulation, I found the reason, and reproduction this exception. The reason is that some developer use COLD storage policy and mover, but the operatiosn of
[jira] [Updated] (HDFS-16070) DataTransfer block storm when datanode's io is busy.
[ https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16070: --- Description: When I speed up the decommission, I found that some datanode's io is busy, then I found host's load is very high, and ten thousands data transfer thread are running. Then I find log like below. {code} # setup datatranfer log 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.52:9866 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.31:9866 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.16.50:9866 # datatranfer done log 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.7.52:9866 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.16.50:9866 {code} You will see last datatranfser thread was done on 13:54:08, but next datatranfser was start at 13:52:36. If datatranfser was not done in 10min(pending timeout + check interval), then next datatranfser for same block will be running. Then disk and network are heavy. Note: decommission ec block will trigger this problem easily, becuase every ec internal block are unique. was: When I speed up the decommission, I found that some datanode's io is busy, then I found host's load is very high, and ten thousands data transfer thread are running. Then I find log like below. {code} # 启动线程的日志 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.52:9866 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.31:9866 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.16.50:9866 # 发送完成的标记 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.7.52:9866 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.16.50:9866 {code} You will see last datatranfser thread was done on 13:54:08, but next datatranfser was start at 13:52:36. If datatranfser was not done in 10min(pending timeout + check interval), then next datatranfser
[jira] [Updated] (HDFS-16070) DataTransfer block storm when datanode's io is busy.
[ https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16070: --- Description: When I speed up the decommission, I found that some datanode's io is busy, then I found host's load is very high, and ten thousands data transfer thread are running. Then I find log like below. {code} # 启动线程的日志 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.52:9866 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.31:9866 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.16.50:9866 # 发送完成的标记 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.7.52:9866 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.16.50:9866 {code} You will see last datatranfser thread was done on 13:54:08, but next datatranfser was start at 13:52:36. If datatranfser was not done in 10min(pending timeout + check interval), then next datatranfser for same block will be running. Then disk and network are heavy. Note: decommission ec block will trigger this problem easily, becuase every ec internal block are unique. was: When I speed up the decommission, I found that some datanode's io is busy, then I found host's load is very high, and ten thousands data transfer thread are running. Then I find log like below. {code} # 启动线程的日志 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.52:9866 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.31:9866 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.16.50:9866 # 发送完成的标记 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.7.52:9866 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.16.50:9866 {code} You will see last datatranfser thread was done on 13:54:08, but next datatranfser was start at 13:52:36. If datatranfser was not done in 10min(pending timeout + check interval), then next datatranfser for same block will be
[jira] [Commented] (HDFS-16070) DataTransfer block storm when datanode's io is busy.
[ https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363358#comment-17363358 ] zhengchenyu commented on HDFS-16070: [~ayushsaxena][~inigoiri] I have submit a pull request, can you help me review this patch? > DataTransfer block storm when datanode's io is busy. > > > Key: HDFS-16070 > URL: https://issues.apache.org/jira/browse/HDFS-16070 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.0, 3.2.1 >Reporter: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When I speed up the decommission, I found that some datanode's io is busy, > then I found host's load is very high, and ten thousands data transfer thread > are running. > Then I find log like below. > {code} > # 启动线程的日志 > 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.7.52:9866 > 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.7.31:9866 > 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.16.50:9866 > # 发送完成的标记 > 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 > (numBytes=7457424) to /10.201.7.52:9866 > 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 > (numBytes=7457424) to /10.201.16.50:9866 > {code} > You will see last datatranfser thread was done on 13:54:08, but next > datatranfser was start at 13:52:36. > If datatranfser was not done in 10min(pending timeout + check interval), then > next datatranfser for same block will be running. Then disk and network are > heavy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16070) DataTransfer block storm when datanode's io is busy.
zhengchenyu created HDFS-16070: -- Summary: DataTransfer block storm when datanode's io is busy. Key: HDFS-16070 URL: https://issues.apache.org/jira/browse/HDFS-16070 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.2.1, 3.3.0 Reporter: zhengchenyu When I speed up the decommission, I found that some datanode's io is busy, then I found host's load is very high, and ten thousands data transfer thread are running. Then I find log like below. {code} # 启动线程的日志 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.52:9866 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.31:9866 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.16.50:9866 # 发送完成的标记 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.7.52:9866 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.16.50:9866 {code} You will see last datatranfser thread was done on 13:54:08, but next datatranfser was start at 13:52:36. If datatranfser was not done in 10min(pending timeout + check interval), then next datatranfser for same block will be running. Then disk and network are heavy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning
[ https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-14849: --- Description: colored textWhen the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal block in that datanode will be replicated many times. // added 2019/09/19 I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes simultaneously. !scheduleReconstruction.png! !fsck-file.png! was: When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal block in that datanode will be replicated many times. // added 2019/09/19 I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes simultaneously. !scheduleReconstruction.png! !fsck-file.png! > Erasure Coding: the internal block is replicated many times when datanode is > decommissioning > > > Key: HDFS-14849 > URL: https://issues.apache.org/jira/browse/HDFS-14849 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding >Affects Versions: 3.3.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Labels: EC, HDFS, NameNode > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, > HDFS-14849.branch-3.1.patch, fsck-file.png, liveBlockIndices.png, > scheduleReconstruction.png > > > colored textWhen the datanode keeping in DECOMMISSION_INPROGRESS status, the > EC internal block in that datanode will be replicated many times. > // added 2019/09/19 > I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes > simultaneously. > !scheduleReconstruction.png! > !fsck-file.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15294) Federation balance tool
[ https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347493#comment-17347493 ] zhengchenyu commented on HDFS-15294: Thanks for this great work! But I have some question, if source directory be writting all the time, is it means Federation balance will never exit? In our cluster, we have tool like this. We use "distcp diff snapshot" firstly, but gave up it. Then I use multi dest nameservice mountable, write to the dst nameservice. Then copy the source data to dst. Then I have only one issue: keep data consistent , so I submit HDFS-15750. > Federation balance tool > --- > > Key: HDFS-15294 > URL: https://issues.apache.org/jira/browse/HDFS-15294 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.4.0 > > Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch, > HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch, > HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch, > HDFS-15294.007.patch, distcp-balance.pdf, distcp-balance.v2.pdf > > > This jira introduces a new HDFS federation balance tool to balance data > across different federation namespaces. It uses Distcp to copy data from the > source path to the target path. > The process is: > 1. Use distcp and snapshot diff to sync data between src and dst until they > are the same. > 2. Update mount table in Router if we specified RBF mode. > 3. Deal with src data, move to trash, delete or skip them. > The design of fedbalance tool comes from the discussion in HDFS-15087. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15715: --- Attachment: (was: image-2021-02-25-14-41-49-394.png) > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, > HDFS-15715.002.patch.addendum, image-2021-03-26-12-17-45-500.png > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290727#comment-17290727 ] zhengchenyu edited comment on HDFS-15715 at 3/26/21, 4:17 AM: -- [~hexiaoqiao] Yeah, no problem. Note: I found this problem in cluster which version is hadoop-2.7.3, but all version may trigger this bug. So I submit a patch base on trunk. 1. How to found it ? Due to limited length and my poor English, I will describe the analysis procedure simply. (a) demmission is very slow !image-2021-03-26-12-17-45-500.png! When do datanode demission, UnderReplicatedBlocks keep high, PendingDeletionBlocks decline slowly. We could speculate replicationMonitor is heavy. (b) strange Log from NameNode We could guess that some code in choosTarget may not be rational. {code:java} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} (c) many stack info statistical By many stack info statistical, I Found hot code in below jstack {code:java} "org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@64a8c844" #34 daemon prio=5 os_prio=0 tid=0x7f772e03a800 nid=0x6288f runnable [0x7f4507c0f000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296) at org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296) at org.apache.hadoop.net.NetworkTopology.getNode(NetworkTopology.java:556) at org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:808) at org.apache.hadoop.net.NetworkTopologyWithMultiDC.countNumOfAvailableNodes(NetworkTopologyWithMultiDC.java:259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseRandom(BlockPlacementPolicyDefaultWithMultiDC.java:803) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:473) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:300) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:177) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWorkWithMultiDC.chooseTargets(BlockManager.java:4448) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocksWithMultiDC(BlockManager.java:1740) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1419) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4341) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4293) at java.lang.Thread.run(Thread.java:748) {code} (d) continue to enable debug log After enable some debug log, print "is not chosen since the rack has too many chosen nodes" frequently. And the total number of this log are close to cluster's DataNodeStorage number. We could guess hit rate of choosTagert is very slow. Then I use unit-test to reproduce this problem. 2. How to repair this problem ? I have reproduced this case in trunk branch. I submit HDFS-15715.002.patch, in this patch, I doesn't repair the bug, The TestReplicationPolicyWithMultiStorage could reproduce the bug. When this bug was triggered, will print many logs like "is not chosen since the rack has too many chosen nodes." Then apply HDFS-15715.002.patch.addendum, this bug fix. The UnderReplicatedBlocks decline normally. was (Author: zhengchenyu): [~hexiaoqiao] Yeah, no problem. Note: I found this problem in cluster which version is hadoop-2.7.3, but all version may trigger this bug. So I submit a patch base on trunk. 1. How to found it ? Due to limited length and my poor English, I will describe the analysis procedure simply. (a) demmission is very slow
[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290727#comment-17290727 ] zhengchenyu edited comment on HDFS-15715 at 2/25/21, 6:55 AM: -- [~hexiaoqiao] Yeah, no problem. Note: I found this problem in cluster which version is hadoop-2.7.3, but all version may trigger this bug. So I submit a patch base on trunk. 1. How to found it ? Due to limited length and my poor English, I will describe the analysis procedure simply. (a) demmission is very slow !image-2021-02-25-14-41-49-394.png|width=378,height=155! When do datanode demission, UnderReplicatedBlocks keep high, PendingDeletionBlocks decline slowly. We could speculate replicationMonitor is heavy. (b) strange Log from NameNode We could guess that some code in choosTarget may not be rational. {code:java} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} (c) many stack info statistical By many stack info statistical, I Found hot code in below jstack {code:java} "org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@64a8c844" #34 daemon prio=5 os_prio=0 tid=0x7f772e03a800 nid=0x6288f runnable [0x7f4507c0f000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296) at org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296) at org.apache.hadoop.net.NetworkTopology.getNode(NetworkTopology.java:556) at org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:808) at org.apache.hadoop.net.NetworkTopologyWithMultiDC.countNumOfAvailableNodes(NetworkTopologyWithMultiDC.java:259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseRandom(BlockPlacementPolicyDefaultWithMultiDC.java:803) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:473) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:300) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:177) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWorkWithMultiDC.chooseTargets(BlockManager.java:4448) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocksWithMultiDC(BlockManager.java:1740) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1419) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4341) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4293) at java.lang.Thread.run(Thread.java:748) {code} (d) continue to enable debug log After enable some debug log, print "is not chosen since the rack has too many chosen nodes" frequently. And the total number of this log are close to cluster's DataNodeStorage number. We could guess hit rate of choosTagert is very slow. Then I use unit-test to reproduce this problem. 2. How to repair this problem ? I have reproduced this case in trunk branch. I submit HDFS-15715.002.patch, in this patch, I doesn't repair the bug, The TestReplicationPolicyWithMultiStorage could reproduce the bug. When this bug was triggered, will print many logs like "is not chosen since the rack has too many chosen nodes." Then apply HDFS-15715.002.patch.addendum, this bug fix. The UnderReplicatedBlocks decline normally. was (Author: zhengchenyu): [~hexiaoqiao] Yeah, no problem. Note: I found this problem in cluster which version is hadoop-2.7.3, but all version may trigger this bug. So I submit a patch base on trunk. 1. How to found it ? Due to limited length and my poor English, I will describe the analysis procedure simply. (a) demmission is very slow
[jira] [Commented] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290727#comment-17290727 ] zhengchenyu commented on HDFS-15715: [~hexiaoqiao] Yeah, no problem. Note: I found this problem in cluster which version is hadoop-2.7.3, but all version may trigger this bug. So I submit a patch base on trunk. 1. How to found it ? Due to limited length and my poor English, I will describe the analysis procedure simply. (a) demmission is very slow !image-2021-02-25-14-41-49-394.png|width=378,height=155! When do datanode demission, UnderReplicatedBlocks keep high, PendingDeletionBlocks decline slowly. We could speculate replicationMonitor is heavy. (b) strange Log from NameNode We could guess that some code in choosTarget may not be rational. {code} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} (c) many stack info statistical By many stack info statistical, I Found hot code in below jstack {code} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy ``` 既然是性能问题,那我们尽可能多收集jstack信息。通过分析多组线程栈调用信息,我们发现如下调用栈异常增多: ``` "org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@64a8c844" #34 daemon prio=5 os_prio=0 tid=0x7f772e03a800 nid=0x6288f runnable [0x7f4507c0f000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296) at org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296) at org.apache.hadoop.net.NetworkTopology.getNode(NetworkTopology.java:556) at org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:808) at org.apache.hadoop.net.NetworkTopologyWithMultiDC.countNumOfAvailableNodes(NetworkTopologyWithMultiDC.java:259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseRandom(BlockPlacementPolicyDefaultWithMultiDC.java:803) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:473) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:300) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:177) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWorkWithMultiDC.chooseTargets(BlockManager.java:4448) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocksWithMultiDC(BlockManager.java:1740) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1419) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4341) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4293) at java.lang.Thread.run(Thread.java:748) {code} (d) continue to enable debug log After enable some debug log, print "is not chosen since the rack has too many chosen nodes" frequently. And the total number of this log are close to cluster's DataNodeStorage number. We could
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15715: --- Attachment: image-2021-02-25-14-41-49-394.png > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, > HDFS-15715.002.patch.addendum, image-2021-02-25-14-41-49-394.png > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289885#comment-17289885 ] zhengchenyu commented on HDFS-15715: I think it's a critical bug if trigger. We encounter several times. [~hexiaoqiao] I think this issue is similar with HDFS-1045 which you submitted before. [~ayushsaxena] [~goirix] [~hexiaoqiao] Can you help me to review this path, or give me some suggesttion? > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, > HDFS-15715.002.patch.addendum > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289881#comment-17289881 ] zhengchenyu commented on HDFS-15715: I reconstruct the code, the patch has two part: (1) Reconstruct the BlockStoragePolicy some methods of BlockStoragePolicy is only used in hadoop-hdfs module. But BlockStoragePolicy are in hadoop-hdfs-client module. So I moved some methods to BlockStoragePolicyUtils which is create in hadoop-hdfs. These code are in HDFS-15715.002.patch (2) Fix the code so that the variable 'chosen' in chooseTarget will be removed when the block's real BlockStoragePolicy is not mathed with the expected StorageTypes。 These code are in HDFS-15715.002.patch.addendum I fix the unit-test. If without HDFS-15715.002.patch.addendum, the expected log "is not chosen since the rack has too many chosen nodes." print, and will traverse all datanode. After apply HDFS-15715.002.patch.addendum, the uni-test will chooseTarget normally, no excess traversal. > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, > HDFS-15715.002.patch.addendum > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe,
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15715: --- Attachment: HDFS-15715.002.patch > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, > HDFS-15715.002.patch.addendum > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15715: --- Attachment: HDFS-15715.002.patch.addendum > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, > HDFS-15715.002.patch.addendum > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
[ https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15750: --- Attachment: HDFS-15750.001.patch > RBF: Make sure the multi destination are consistent after write operation > -- > > Key: HDFS-15750 > URL: https://issues.apache.org/jira/browse/HDFS-15750 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Attachments: HDFS-15750.001.patch > > > Nowdays, RBF can't make sure the multi destination are consistent. > Case 1: RBF can't remove multi destination's file. > If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if > both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I > want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, > then only one nameservice take effect. I think it means inconsistence. > Through HDFS-14343 already solve the problem in some level, but not > completed. > Case 2: RBF regard the operation success, through only one of multi > destination operations success. > In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is > directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with > hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one > destination's result be success, the rbf regard the operation success > (invokeConcurrent and invokeAll's logical). We maybe can't rename all > location. I think it also means inconsistence. > I think we need stricter check. If one operation (which shoud success) failed > , we should throw exception. > Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But > when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and > rewrite some hive table's old partition (which mount multi destination), this > problem would occure! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
[ https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15750: --- Attachment: (was: HDFS-15750.001.patch) > RBF: Make sure the multi destination are consistent after write operation > -- > > Key: HDFS-15750 > URL: https://issues.apache.org/jira/browse/HDFS-15750 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Attachments: HDFS-15750.001.patch > > > Nowdays, RBF can't make sure the multi destination are consistent. > Case 1: RBF can't remove multi destination's file. > If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if > both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I > want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, > then only one nameservice take effect. I think it means inconsistence. > Through HDFS-14343 already solve the problem in some level, but not > completed. > Case 2: RBF regard the operation success, through only one of multi > destination operations success. > In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is > directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with > hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one > destination's result be success, the rbf regard the operation success > (invokeConcurrent and invokeAll's logical). We maybe can't rename all > location. I think it also means inconsistence. > I think we need stricter check. If one operation (which shoud success) failed > , we should throw exception. > Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But > when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and > rewrite some hive table's old partition (which mount multi destination), this > problem would occure! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
[ https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15750: --- Attachment: HDFS-15750.001.patch > RBF: Make sure the multi destination are consistent after write operation > -- > > Key: HDFS-15750 > URL: https://issues.apache.org/jira/browse/HDFS-15750 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Attachments: HDFS-15750.001.patch > > > Nowdays, RBF can't make sure the multi destination are consistent. > Case 1: RBF can't remove multi destination's file. > If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if > both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I > want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, > then only one nameservice take effect. I think it means inconsistence. > Through HDFS-14343 already solve the problem in some level, but not > completed. > Case 2: RBF regard the operation success, through only one of multi > destination operations success. > In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is > directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with > hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one > destination's result be success, the rbf regard the operation success > (invokeConcurrent and invokeAll's logical). We maybe can't rename all > location. I think it also means inconsistence. > I think we need stricter check. If one operation (which shoud success) failed > , we should throw exception. > Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But > when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and > rewrite some hive table's old partition (which mount multi destination), this > problem would occure! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
[ https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288350#comment-17288350 ] zhengchenyu commented on HDFS-15750: In our test cluster, I modify our code, we make RequireResponse in location level. I set location level's RequireResponse by DestinationOrder. I think for some strict DestinationOrder which requireResponse is set to false. If only one namservice operation failed, the operation will throw exception. Then we cant make sure consistent between mulit cluster. I submit first version path. Please give me some sugesstion. > RBF: Make sure the multi destination are consistent after write operation > -- > > Key: HDFS-15750 > URL: https://issues.apache.org/jira/browse/HDFS-15750 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Attachments: HDFS-15750.001.patch > > > Nowdays, RBF can't make sure the multi destination are consistent. > Case 1: RBF can't remove multi destination's file. > If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if > both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I > want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, > then only one nameservice take effect. I think it means inconsistence. > Through HDFS-14343 already solve the problem in some level, but not > completed. > Case 2: RBF regard the operation success, through only one of multi > destination operations success. > In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is > directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with > hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one > destination's result be success, the rbf regard the operation success > (invokeConcurrent and invokeAll's logical). We maybe can't rename all > location. I think it also means inconsistence. > I think we need stricter check. If one operation (which shoud success) failed > , we should throw exception. > Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But > when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and > rewrite some hive table's old partition (which mount multi destination), this > problem would occure! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
[ https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288313#comment-17288313 ] zhengchenyu commented on HDFS-15750: [~ayushsaxena] I think we need a stricter check! Let's consider this situation. If we have an hdfs cluster which router proxy ns1, and ns2. If ns1 is heavy, then we wanna copy data "hdfs://ns1/user/userA" from ns1 to ns2 "hdfs://ns2/user/userA" to lower the pressure. And we wanna users are unawre of data migration. Here I have below solution: A mountable hdfs://ns-fed/user/userA mounts hdfs://ns1/user/userA and hdfs://ns2/user/userA. I set a prior nameservice, which use firstly. At first prior ns is ns1 , hdfs client only use hdfs://ns1/user/userA through router firstly. Then we copy all data to hdfs://ns2/user/userA. When all data is copied, we switch the proxy, the prior is ns2, then hdfs client use hdfs://ns2/user/userA through router。 But in our cluster, many hive table partition are rerunned. So need to delete hdfs://ns-fed/user/userA/tableA/pt=XXX/XXX. Because copy operation may happen before rewrite hive table, so some hdfs file are not deleted in router view. The real reason is we regard success if only on rename operation succeed. > RBF: Make sure the multi destination are consistent after write operation > -- > > Key: HDFS-15750 > URL: https://issues.apache.org/jira/browse/HDFS-15750 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > > Nowdays, RBF can't make sure the multi destination are consistent. > Case 1: RBF can't remove multi destination's file. > If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if > both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I > want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, > then only one nameservice take effect. I think it means inconsistence. > Through HDFS-14343 already solve the problem in some level, but not > completed. > Case 2: RBF regard the operation success, through only one of multi > destination operations success. > In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is > directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with > hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one > destination's result be success, the rbf regard the operation success > (invokeConcurrent and invokeAll's logical). We maybe can't rename all > location. I think it also means inconsistence. > I think we need stricter check. If one operation (which shoud success) failed > , we should throw exception. > Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But > when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and > rewrite some hive table's old partition (which mount multi destination), this > problem would occure! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
[ https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu reassigned HDFS-15750: -- Assignee: zhengchenyu > RBF: Make sure the multi destination are consistent after write operation > -- > > Key: HDFS-15750 > URL: https://issues.apache.org/jira/browse/HDFS-15750 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > > Nowdays, RBF can't make sure the multi destination are consistent. > Case 1: RBF can't remove multi destination's file. > If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if > both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I > want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, > then only one nameservice take effect. I think it means inconsistence. > Through HDFS-14343 already solve the problem in some level, but not > completed. > Case 2: RBF regard the operation success, through only one of multi > destination operations success. > In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is > directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with > hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one > destination's result be success, the rbf regard the operation success > (invokeConcurrent and invokeAll's logical). We maybe can't rename all > location. I think it also means inconsistence. > I think we need stricter check. If one operation (which shoud success) failed > , we should throw exception. > Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But > when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and > rewrite some hive table's old partition (which mount multi destination), this > problem would occure! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters
[ https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288235#comment-17288235 ] zhengchenyu commented on HDFS-14343: [~ayushtkn] Maybe I think we should discuss in another issue HDFS-15750. I known there is no problem if all hdfs client visit namenode through router. But I think there are some situation that visit namnode without router. For example, we migrating data from one nameservice to another nameservice in background, and two nameservie are managed by same router, we migration data to lower the pressure of source namenode. > RBF: Fix renaming folders spread across multiple subclusters > > > Key: HDFS-14343 > URL: https://issues.apache.org/jira/browse/HDFS-14343 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0, HDFS-13891 > > Attachments: HDFS-14343-HDFS-13891-01.patch, > HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, > HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch > > > The {{RouterClientProtocol#rename()}} function assumes that we are renaming > files and only renames one of them (i.e., {{invokeSequential()}}). In the > case of folders which are in all subclusters (e.g., HASH_ALL) we should > rename all locations (i.e., {{invokeAll()}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
[ https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15750: --- Description: Nowdays, RBF can't make sure the multi destination are consistent. Case 1: RBF can't remove multi destination's file. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. Through HDFS-14343 already solve the problem in some level, but not completed. Case 2: RBF regard the operation success, through only one of multi destination operations success. In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one destination's result be success, the rbf regard the operation success (invokeConcurrent and invokeAll's logical). We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation (which shoud success) failed , we should throw exception. Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition (which mount multi destination), this problem would occure! was: Nowdays, RBF can't make sure the multi destination are consistent. Case 1: RBF can't remove multi destination's file. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. Through HDFS-14343 already solve the problem in some level, but not completed. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) Case 2: RBF regard the operation success, through only one of multi destination operations success. In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one destination's result be success, the rbf regard the operation success (invokeConcurrent and invokeAll's logical). We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation (which shoud success) failed , we should throw exception. > RBF: Make sure the multi destination are consistent after write operation > -- > > Key: HDFS-15750 > URL: https://issues.apache.org/jira/browse/HDFS-15750 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Priority: Major > > Nowdays, RBF can't make sure the multi destination are consistent. > Case 1: RBF can't remove multi destination's file. > If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if > both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I > want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, > then only one nameservice take effect. I think it means inconsistence. > Through HDFS-14343 already solve the problem in some level, but not > completed. > Case 2: RBF regard the operation success, through only one of multi > destination operations success. > In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is > directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with > hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one > destination's result be success, the rbf regard the operation success > (invokeConcurrent and invokeAll's logical). We maybe can't rename all > location. I think it also means inconsistence. > I think we need stricter check. If one operation (which shoud success) failed > , we should throw exception. > Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But > when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and > rewrite some hive table's old partition (which mount multi destination), this > problem would occure! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
[ https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15750: --- Description: Nowdays, RBF can't make sure the multi destination are consistent. Case 1: RBF can't remove multi destination's file. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. Through HDFS-14343 already solve the problem in some level, but not completed. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) Case 2: RBF regard the operation success, through only one of multi destination operations success. In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one destination's result be success, the rbf regard the operation success (invokeConcurrent and invokeAll's logical). We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation (which shoud success) failed , we should throw exception. was: Nowdays, RBF can't make sure the multi destination are consistent. Case 1: RBF can't remove multi destination. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. Through HDFS-14343 already solve the problem in some level, but not Completed. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) Case 2: In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down.If one result . We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation (which shoud success) failed , we should throw exception. > RBF: Make sure the multi destination are consistent after write operation > -- > > Key: HDFS-15750 > URL: https://issues.apache.org/jira/browse/HDFS-15750 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Priority: Major > > Nowdays, RBF can't make sure the multi destination are consistent. > Case 1: RBF can't remove multi destination's file. > If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if > both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I > want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, > then only one nameservice take effect. I think it means inconsistence. > Through HDFS-14343 already solve the problem in some level, but not > completed. > (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts > ns1 and ns2) , and rewrite some hive table's old partition, this problem > would occure!) > Case 2: RBF regard the operation success, through only one of multi > destination operations success. > In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is > directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with > hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one > destination's result be success, the rbf regard the operation success > (invokeConcurrent and invokeAll's logical). We maybe can't rename all > location. I think it also means inconsistence. > I think we need stricter check. If one operation (which shoud success) failed > , we should throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
[ https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15750: --- Description: Nowdays, RBF can't make sure the multi destination are consistent. Case 1: RBF can't remove multi destination. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. Through HDFS-14343 already solve the problem in some level, but not Completed. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) Case 2: In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down.If one result . We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation (which shoud success) failed , we should throw exception. was: Nowdays, RBF can't make sure If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation (which shoud success) failed , we should throw exception. > RBF: Make sure the multi destination are consistent after write operation > -- > > Key: HDFS-15750 > URL: https://issues.apache.org/jira/browse/HDFS-15750 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Priority: Major > > Nowdays, RBF can't make sure the multi destination are consistent. > Case 1: RBF can't remove multi destination. > If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if > both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I > want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, > then only one nameservice take effect. I think it means inconsistence. > Through HDFS-14343 already solve the problem in some level, but not > Completed. > (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts > ns1 and ns2) , and rewrite some hive table's old partition, this problem > would occure!) > Case 2: > In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) > If hdfs://ns1/user/userA/dirA's permission is not same with > hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down.If one > result . We maybe can't rename all location. I think it also means > inconsistence. > I think we need stricter check. If one operation (which shoud success) failed > , we should throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
[ https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15750: --- Description: Nowdays, RBF can't make sure If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation (which shoud success) failed , we should throw exception. > RBF: Make sure the multi destination are consistent after write operation > -- > > Key: HDFS-15750 > URL: https://issues.apache.org/jira/browse/HDFS-15750 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Priority: Major > > Nowdays, RBF can't make sure > If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if > both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I > want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, > then only one nameservice take effect. I think it means inconsistence. > (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts > ns1 and ns2) , and rewrite some hive table's old partition, this problem > would occure!) > In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) > If hdfs://ns1/user/userA/dirA's permission is not same with > hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We > maybe can't rename all location. I think it also means inconsistence. > I think we need stricter check. If one operation (which shoud success) failed > , we should throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters
[ https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254391#comment-17254391 ] zhengchenyu commented on HDFS-14343: [~elgoiri] OK, Let us discuss this issue in HDFS-15750. I will describe the detailed later. > RBF: Fix renaming folders spread across multiple subclusters > > > Key: HDFS-14343 > URL: https://issues.apache.org/jira/browse/HDFS-14343 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0, HDFS-13891 > > Attachments: HDFS-14343-HDFS-13891-01.patch, > HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, > HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch > > > The {{RouterClientProtocol#rename()}} function assumes that we are renaming > files and only renames one of them (i.e., {{invokeSequential()}}). In the > case of folders which are in all subclusters (e.g., HASH_ALL) we should > rename all locations (i.e., {{invokeAll()}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation
zhengchenyu created HDFS-15750: -- Summary: RBF: Make sure the multi destination are consistent after write operation Key: HDFS-15750 URL: https://issues.apache.org/jira/browse/HDFS-15750 Project: Hadoop HDFS Issue Type: Bug Reporter: zhengchenyu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters
[ https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253343#comment-17253343 ] zhengchenyu edited comment on HDFS-14343 at 12/22/20, 8:56 AM: --- [~inigoiri] [~ayushtkn]. Hi, I have some question, can you give me some suggestion. In this patch, use isMultiDestDirectory to check whether rename all location or not. I think the design of MultipleDestinationMountTableResolver may consider there is no repeated file among multi nameservice. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation (which shoud success) failed , we should throw exception. was (Author: zhengchenyu): [~inigoiri] [~ayushtkn]. Hi, I have some question, can you give me some suggestion. In this patch, use isMultiDestDirectory to check whether rename all location or not. I think the design of MultipleDestinationMountTableResolver may consider there is no repeated file among multi nameservice. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation error, we should throw exception. > RBF: Fix renaming folders spread across multiple subclusters > > > Key: HDFS-14343 > URL: https://issues.apache.org/jira/browse/HDFS-14343 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0, HDFS-13891 > > Attachments: HDFS-14343-HDFS-13891-01.patch, > HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, > HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch > > > The {{RouterClientProtocol#rename()}} function assumes that we are renaming > files and only renames one of them (i.e., {{invokeSequential()}}). In the > case of folders which are in all subclusters (e.g., HASH_ALL) we should > rename all locations (i.e., {{invokeAll()}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters
[ https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253343#comment-17253343 ] zhengchenyu edited comment on HDFS-14343 at 12/22/20, 8:53 AM: --- [~inigoiri] [~ayushtkn]. Hi, I have some question, can you give me some suggestion. In this patch, use isMultiDestDirectory to check whether rename all location or not. I think the design of MultipleDestinationMountTableResolver may consider there is no repeated file among multi nameservice. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation error, we should throw exception. was (Author: zhengchenyu): [~inigoiri][~ayushtkn]. Hi, I have some question, can you give me some suggestion. In this patch, use isMultiDestDirectory to check whether rename all location or not. I think the design of MultipleDestinationMountTableResolver may consider there is no repeated file among multi nameservice. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation error, we should throw exception. > RBF: Fix renaming folders spread across multiple subclusters > > > Key: HDFS-14343 > URL: https://issues.apache.org/jira/browse/HDFS-14343 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0, HDFS-13891 > > Attachments: HDFS-14343-HDFS-13891-01.patch, > HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, > HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch > > > The {{RouterClientProtocol#rename()}} function assumes that we are renaming > files and only renames one of them (i.e., {{invokeSequential()}}). In the > case of folders which are in all subclusters (e.g., HASH_ALL) we should > rename all locations (i.e., {{invokeAll()}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters
[ https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253343#comment-17253343 ] zhengchenyu edited comment on HDFS-14343 at 12/22/20, 8:52 AM: --- [~inigoiri][~ayushtkn]. Hi, I have some question, can you give me some suggestion. In this patch, use isMultiDestDirectory to check whether rename all location or not. I think the design of MultipleDestinationMountTableResolver may consider there is no repeated file among multi nameservice. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and rewrite some hive table's old partition, this problem would occure!) In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation error, we should throw exception. was (Author: zhengchenyu): [~inigoiri][~ayushtkn]. Hi, I have some question, can you give me some suggestion. In this patch, use isMultiDestDirectory to check whether rename all location or not. I think the design of MultipleDestinationMountTableResolver may consider there is no repeated file among multi nameservice. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. (Note: In fact, when migration data from one nameservice to nameservice, and rewrite some hive table's old partition, this problem would occure!) In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation error, we should throw exception. > RBF: Fix renaming folders spread across multiple subclusters > > > Key: HDFS-14343 > URL: https://issues.apache.org/jira/browse/HDFS-14343 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0, HDFS-13891 > > Attachments: HDFS-14343-HDFS-13891-01.patch, > HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, > HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch > > > The {{RouterClientProtocol#rename()}} function assumes that we are renaming > files and only renames one of them (i.e., {{invokeSequential()}}). In the > case of folders which are in all subclusters (e.g., HASH_ALL) we should > rename all locations (i.e., {{invokeAll()}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters
[ https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253343#comment-17253343 ] zhengchenyu commented on HDFS-14343: [~inigoiri][~ayushtkn]. Hi, I have some question, can you give me some suggestion. In this patch, use isMultiDestDirectory to check whether rename all location or not. I think the design of MultipleDestinationMountTableResolver may consider there is no repeated file among multi nameservice. If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only one nameservice take effect. I think it means inconsistence. (Note: In fact, when migration data from one nameservice to nameservice, and rewrite some hive table's old partition, this problem would occure!) In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We maybe can't rename all location. I think it also means inconsistence. I think we need stricter check. If one operation error, we should throw exception. > RBF: Fix renaming folders spread across multiple subclusters > > > Key: HDFS-14343 > URL: https://issues.apache.org/jira/browse/HDFS-14343 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0, HDFS-13891 > > Attachments: HDFS-14343-HDFS-13891-01.patch, > HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, > HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch > > > The {{RouterClientProtocol#rename()}} function assumes that we are renaming > files and only renames one of them (i.e., {{invokeSequential()}}). In the > case of folders which are in all subclusters (e.g., HASH_ALL) we should > rename all locations (i.e., {{invokeAll()}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248729#comment-17248729 ] zhengchenyu edited comment on HDFS-15715 at 12/14/20, 3:52 AM: --- I solve this problem, and run on one cluster in near one week. But our version is hadoop-2.7.3. Because after hadoop-2.8, moudle hadoop-hdfs was split with multi maven moudle. So I submited a ugly patch, this is not final version. I only wanna show how to slove this problem. I submited HDFS-15715.001.patch. I think there will be two way to solve this problem: (1) recode chooseStorageTypes, and remove the result which is not meet storage policy demand from results. (2) remove the result which is not meet storage policy demand from results, after chooseStorageTypes. I choose first way, becuase I thinks it save calculation. To label it, i use a new method 'chooseStorageTypesWIthNode'. But a little ugly, maybe we need to reorganize the code. was (Author: zhengchenyu): I solve this problem, and run on one cluster in near one week. But our version is hadoop-2.7.3. Because after hadoop-2.8, moudle hadoop-hdfs was split with multi maven moudle. So I submited a ugly patch, this is not final version. I only wanna show how to slove this problem. I submited HDFS-15715.001.patch. I think there will be two way to solve this problem: (1) recode chooseStorageTypes, and remove the result which is not meet storage policy demand from results. (2) remove the result which is not meet storage policy demand from results, after chooseStorageTypes. I choose first way, becuase I thinks it save calculation. To label it, i use a new method 'chooseStorageTypesWIth Node'. But a little ugly, maybe we need to reorganize the code. > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means
[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248729#comment-17248729 ] zhengchenyu edited comment on HDFS-15715 at 12/14/20, 3:52 AM: --- I solve this problem, and run on one cluster in near one week. But our version is hadoop-2.7.3. Because after hadoop-2.8, moudle hadoop-hdfs was split with multi maven moudle. So I submited a ugly patch, this is not final version. I only wanna show how to slove this problem. I submited HDFS-15715.001.patch. I think there will be two way to solve this problem: (1) recode chooseStorageTypes, and remove the result which is not meet storage policy demand from results. (2) remove the result which is not meet storage policy demand from results, after chooseStorageTypes. I choose first way, becuase I thinks it save calculation. To label it, i use a new method 'chooseStorageTypesWIth Node'. But a little ugly, maybe we need to reorganize the code. was (Author: zhengchenyu): I solve this problem, and run on one cluster in near one week. But our version is hadoop-2.7.3. Because after hadoop-2.8, moudle hadoop-hdfs was split with multi maven moudle. So I submited a ugly patch, this is not final version. I only wanna show how to slove this problem. I submited HDFS-15715.001.patch. I think there will be two way to solve this problem: (1) recode chooseStorageTypes, and remove the result which is not meet storage policy demand from results. (2) remove the result which is not meet storage policy demand from results, after chooseStorageTypes. I choose first way, becuase I thinks it save calculation. But a little ugly, maybe we need to reorganize the code. > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target.
[jira] [Commented] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248729#comment-17248729 ] zhengchenyu commented on HDFS-15715: I solve this problem, and run on one cluster in near one week. But our version is hadoop-2.7.3. Because after hadoop-2.8, moudle hadoop-hdfs was split with multi maven moudle. So I submited a ugly patch, this is not final version. I only wanna show how to slove this problem. I submited HDFS-15715.001.patch. I think there will be two way to solve this problem: (1) recode chooseStorageTypes, and remove the result which is not meet storage policy demand from results. (2) remove the result which is not meet storage policy demand from results, after chooseStorageTypes. I choose first way, becuase I thinks it save calculation. But a little ugly, maybe we need to reorganize the code. > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15715: --- Attachment: HDFS-15715.001.patch > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > Attachments: HDFS-15715.001.patch > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15715: --- Summary: ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage (was: ReplicatorMonitor performance degradation, when the storagePolicy of many file are not match with their real datanodestorage ) > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.1 > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degradation, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15715: --- Description: One of our Namenode which has 300M files and blocks. In common way, this namode shoud not be in heavy load. But we found rpc process time keep high, and decommission is very slow. I search the metrics, I found uderreplicated blocks keep high. Then I jstack namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget can't find block, so result to performance degradation. Consider with HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget can't find proper block. Then I enable some debug. (Of course I revise some code so that only debug isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I found "the rack has too many chosen nodes" is called. Then I found some log like this {code} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} Then through some debug and simulation, I found the reason, and reproduction this exception. The reason is that some developer use COLD storage policy and mover, but the operatiosn of setting storage policy and mover are asynchronous. So some file's real datanodestorages are not match with this storagePolicy. Let me simualte this proccess. If /tmp/a is create, then have 2 replications are DISK. Then set storage policy to COLD. When some logical trigger(For example decommission) to copy this block. chooseTarget then use chooseStorageTypes to filter real needed block. Here the size of variable requiredStorageTypes which chooseStorageTypes returned is 3. But the size of result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK storage. Then will request to choose 3 target. choose first target is right, but when choose seconde target, the variable 'counter' is 4 which is larger than maxTargetPerRack which is 3 in function isGoodTarget. So skip all datanodestorage. Then result to bad performance. I think chooseStorageTypes need to consider the result, when the exist replication doesn't meet storage policy's demand, we need to remove this from result. I changed by this way, and test in my unit-test. Then solve it. was: One of our Namenode which has 300M files and blocks. In common way, this namode shoud not be in heavy load. But we found rpc process time keep high, and decommission is very slow. I search the metrics, I found uderreplicated blocks keep high. Then I jstack namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget can't find block, so result to performance degradation. Consider with HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget can't find proper block. Then I enable some debug. (Of course I revise some code so that only debug isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I found "the rack has too many chosen nodes" is called. Then I found some log like this {code} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} Then through some debug and simulation, I found the reason, and reproduction this exception. The reason is that some developer use COLD storage policy and mover, but the
[jira] [Created] (HDFS-15715) ReplicatorMonitor performance degradation, when the storagePolicy of many file are not match with their real datanodestorage
zhengchenyu created HDFS-15715: -- Summary: ReplicatorMonitor performance degradation, when the storagePolicy of many file are not match with their real datanodestorage Key: HDFS-15715 URL: https://issues.apache.org/jira/browse/HDFS-15715 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.2.1, 2.7.3 Reporter: zhengchenyu Assignee: zhengchenyu Fix For: 3.3.1 One of our Namenode which has 300M files and blocks. In common way, this namode shoud not be in heavy load. But we found rpc process time keep high, and decommission is very slow. I search the metrics, I found uderreplicated blocks keep high. Then I jstack namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget can't find block, so result to performance degradation. Consider with HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget can't find proper block. Then I enable some debug. (Of course I revise some code so that only debug isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I found "the rack has too many chosen nodes" is called. Then I found some log like this {code} 2020-12-04 12:13:56,345 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 2020-12-04 12:14:03,843 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy {code} Then through some debug and simulation, I found the reason, and reproduction this exception. The reason is that some developer use COLD storage policy and mover, but the operatiosn of setting storage policy and mover are asynchronous. So some file's real datastorages are not match with this storagePolicy. Let me simualte this proccess. If /tmp/a is create, then have 2 replications are DISK. Then set storage policy to COLD. When some logical trigger(For example decommission) to copy this block. chooseTarget then use chooseStorageTypes to filter real needed block. Here the size of variable requiredStorageTypes which chooseStorageTypes returned is 3. But the size of result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK storage. Then will request to choose 3 target. choose first target is right, but when choose seconde target, the variable 'counter' is 4 which is larger than maxTargetPerRack which is 3 in function isGoodTarget. So skip all datanodestorage. Then result to bad performance. I think chooseStorageTypes need to consider the result, when the exist replication doesn't meet storage policy's demand, we need to remove this from result. I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15649) the standby namenode's ReplQueues need to keep pace with active namenode.
zhengchenyu created HDFS-15649: -- Summary: the standby namenode's ReplQueues need to keep pace with active namenode. Key: HDFS-15649 URL: https://issues.apache.org/jira/browse/HDFS-15649 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.2.1, 2.7.3 Reporter: zhengchenyu Fix For: 3.3.1 I think the standby namenode's ReplQueues need to keep pace with active namenode. You will code in fuction addStoredBlock like below: {code} // do not try to handle extra/low redundancy blocks during first safe mode if (!isPopulatingReplQueues()) { return storedBlock; } {code} Here, for standby namenode, through I think there are no need to tell standby to replicate blocks, but need to update neededReconstruction. Because some metrics need it. For example, missing blocks. Why I advise this? In our internal version, some bug trigger huge missing block number. In fact, these blocks are not missing, but addStoredBlock doesn't update blocks, so keep huge missing block number. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode
[ https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199910#comment-17199910 ] zhengchenyu commented on HDFS-15589: [~hexiaoqiao] Yes, in theroy, postponedMisreplicatedBlocks only compat fuction 'rescanPostponedMisreplicatedBlocks', and it use namesystem's writeLock, then may decrease namnode rpc performance. But dfs.namenode.blocks.per.postponedblocks.rescan’s default value is 1, so I think it may result to little performance. But let us see some log, some called wast long time. {code} hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:20:15,429 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 65 msecs. 19916 blocks are left. 0 blocks were removed. hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:20:18,496 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 64 msecs. 19916 blocks are left. 0 blocks were removed. hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:20:23,958 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 2459 msecs. 19916 blocks are left. 0 blocks were removed. hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:20:27,023 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 60 msecs. 19916 blocks are left. 0 blocks were removed. hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:20:30,088 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 61 msecs. 19916 blocks are left. 0 blocks were removed. hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:20:33,149 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 58 msecs. 19916 blocks are left. 0 blocks were removed. hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:20:47,890 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 5140 msecs. 19916 blocks are left. 0 blocks were removed. hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:32:36,458 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 110 msecs. 19916 blocks are left. 0 blocks were removed. hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:32:39,529 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 70 msecs. 19916 blocks are left. 0 blocks were removed. hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:32:42,596 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 66 msecs. 19916 blocks are left. 0 blocks were removed. hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 15:32:45,665 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan of postponedMisreplicatedBlocks completed in 65 msecs. 19916 blocks are left. 0 blocks were removed. {code} In fact, it found in our test cluster, a very small cluster, can't detect performace. But why I pay attention to this problem? My last comanpy, some day postponedMisreplicatedBlocks increase huge, then namenode rpc performane decrease. Then some hours laster, postponedMisreplicatedBlocks decrease, the namenode be well again. At that moment, I focus on yarn, so I didn't research the namenode log, and then no real truth. > Huge PostponedMisreplicatedBlocks can't decrease immediately when start > namenode after datanode > --- > > Key: HDFS-15589 > URL: https://issues.apache.org/jira/browse/HDFS-15589 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Environment: CentOS 7 >Reporter: zhengchenyu >Priority: Major > > In our test cluster, I restart my namenode. Then I found many > PostponedMisreplicatedBlocks which doesn't decrease immediately. > I search the log below like this. > {code:java} > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK*
[jira] [Comment Edited] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode
[ https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199847#comment-17199847 ] zhengchenyu edited comment on HDFS-15589 at 9/22/20, 6:24 AM: -- Yeps, I can solve this problem by trigger block report manually. My means is there any need to solve this problem by optimized some logical? For example make sure new block report which trigger by namenode's heartbeat happened after enter active state. Because you know when I trigger datanode's block report, means block report will occure twice. I thinks there is no need to increase the load to namenode. In addition, as I kown, trigger block report manually will block report to all namenode, then increase load to all namenode. was (Author: zhengchenyu): Yeps, I can solve this problem by trigger block report manually. My means is there any need to solve this problem by optimized some logical? For example make sure new block report which trigger by namenode's heartbeat happened after enter active state. Because you know when I trigger datanode's block report, means block report will occure twice. I thinks there is no need to increase the load to namenode. > Huge PostponedMisreplicatedBlocks can't decrease immediately when start > namenode after datanode > --- > > Key: HDFS-15589 > URL: https://issues.apache.org/jira/browse/HDFS-15589 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Environment: CentOS 7 >Reporter: zhengchenyu >Priority: Major > > In our test cluster, I restart my namenode. Then I found many > PostponedMisreplicatedBlocks which doesn't decrease immediately. > I search the log below like this. > {code:java} > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > {code} > Node: test cluster only have 6 datanode. > You will see the blockreport called before "Marking all datanodes as stale" > which is logged by startActiveServices. But > DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then > startActiveServices set all datnaode to stale node. So the datanodes will > keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a > huge number. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode
[ https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199847#comment-17199847 ] zhengchenyu edited comment on HDFS-15589 at 9/22/20, 6:20 AM: -- Yeps, I can solve this problem by trigger block report manually. My means is there any need to solve this problem by optimized some logical? For example make sure new block report which trigger by namenode's heartbeat happened after enter active state. Because you know when I trigger datanode's block report, means block report will occure twice. I thinks there is no need to increase the load to namenode. was (Author: zhengchenyu): Yeps, I can solve this problem by trigger block report manually. My means is there any need to solve this problem by optimized some logical? For example make sure new block report which trigger by namenode's heartbeat happened after enter active state. > Huge PostponedMisreplicatedBlocks can't decrease immediately when start > namenode after datanode > --- > > Key: HDFS-15589 > URL: https://issues.apache.org/jira/browse/HDFS-15589 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Environment: CentOS 7 >Reporter: zhengchenyu >Priority: Major > > In our test cluster, I restart my namenode. Then I found many > PostponedMisreplicatedBlocks which doesn't decrease immediately. > I search the log below like this. > {code:java} > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > {code} > Node: test cluster only have 6 datanode. > You will see the blockreport called before "Marking all datanodes as stale" > which is logged by startActiveServices. But > DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then > startActiveServices set all datnaode to stale node. So the datanodes will > keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a > huge number. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode
[ https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199847#comment-17199847 ] zhengchenyu commented on HDFS-15589: Yeps, I can solve this problem by trigger block report manually. My means is there any need to solve this problem by optimized some logical? For example make sure new block report which trigger by namenode's heartbeat happened after enter active state. > Huge PostponedMisreplicatedBlocks can't decrease immediately when start > namenode after datanode > --- > > Key: HDFS-15589 > URL: https://issues.apache.org/jira/browse/HDFS-15589 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Environment: CentOS 7 >Reporter: zhengchenyu >Priority: Major > > In our test cluster, I restart my namenode. Then I found many > PostponedMisreplicatedBlocks which doesn't decrease immediately. > I search the log below like this. > {code:java} > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > {code} > Node: test cluster only have 6 datanode. > You will see the blockreport called before "Marking all datanodes as stale" > which is logged by startActiveServices. But > DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then > startActiveServices set all datnaode to stale node. So the datanodes will > keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a > huge number. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode
[ https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199750#comment-17199750 ] zhengchenyu commented on HDFS-15589: [~ayushtkn] I know the postpone block's logical. I encounter a case, maybe a low probability case. Now we describe this logical simply: (1) When namenode transient from standby to active, namenode will label all DatanodeDescriptor be stale for aviod to delete some possible deleted block. (2) Then datanode blockreport to namenode, then set DatanodeDescriptor to not stale. Then some over replicate block could be delete. But if (2) happend before (1), the DatanodeDescriptor will keep stale util next blockreport, you know blockreport is low frequency rpc operaiton. So PostponedMisreplicatedBlocks will keep huge number for long time. > Huge PostponedMisreplicatedBlocks can't decrease immediately when start > namenode after datanode > --- > > Key: HDFS-15589 > URL: https://issues.apache.org/jira/browse/HDFS-15589 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Environment: CentOS 7 >Reporter: zhengchenyu >Priority: Major > > In our test cluster, I restart my namenode. Then I found many > PostponedMisreplicatedBlocks which doesn't decrease immediately. > I search the log below like this. > {code:java} > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: > from DatanodeRegistration(xx.xx.xx.xx:9866, > datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834), > reports.length=12 > {code} > Node: test cluster only have 6 datanode. > You will see the blockreport called before "Marking all datanodes as stale" > which is logged by startActiveServices. But > DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then > startActiveServices set all datnaode to stale node. So the datanodes will > keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a > huge number. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org