[jira] [Updated] (HDFS-16830) HDFS-16830. [SBN READ] dfsrouter transmit state id according to client's demand

2022-11-18 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16830:
---
Summary: HDFS-16830. [SBN READ] dfsrouter transmit state id according to 
client's demand  (was: Improve router msync operation)

> HDFS-16830. [SBN READ] dfsrouter transmit state id according to client's 
> demand
> ---
>
> Key: HDFS-16830
> URL: https://issues.apache.org/jira/browse/HDFS-16830
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, rbf
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16832) [SBN READ] Fix NPE when check the block location of empty directory

2022-11-02 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu reassigned HDFS-16832:
--

Assignee: zhengchenyu

> [SBN READ] Fix NPE when check the block location of empty directory
> ---
>
> Key: HDFS-16832
> URL: https://issues.apache.org/jira/browse/HDFS-16832
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>
> HDFS-16732 is introduced for check block location when getListing or 
> getFileInfo. But When we check block location of empty directory will throw 
> NPE.
> Exception stack on tez client are below:
> {code:java}
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1492)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1389)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy12.getListing(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:678)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy13.getListing(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1671)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1212)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1195)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1140)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1136)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1154)
>   at 
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2054)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:278)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   

[jira] [Updated] (HDFS-16832) [SBN READ] Fix NPE when check the block location of empty directory

2022-11-02 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16832:
---
Description: 
HDFS-16732 is introduced for check block location when getListing or 
getFileInfo. But When we check block location of empty directory will throw NPE.

Exception stack on tez client are below:
{code:java}
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
java.lang.NullPointerException

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1492)
at org.apache.hadoop.ipc.Client.call(Client.java:1389)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy12.getListing(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:678)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy13.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1671)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1212)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1195)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1140)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1136)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1154)
at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2054)
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:278)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}

  was:
HDFS-16732 is introduced for check block location when getListing or 
getFileInfo. But When we check block location of empty directory will throw NPE.

Exception stack on tez client are below:

 


> [SBN READ] Fix NPE when check the block location of empty directory
> 

[jira] [Updated] (HDFS-16832) [SBN READ] Fix NPE when check the block location of empty directory

2022-11-02 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16832:
---
Description: 
HDFS-16732 is introduced for check block location when getListing or 
getFileInfo. But When we check block location of empty directory will throw NPE.

Exception stack on tez client are below:

 

> [SBN READ] Fix NPE when check the block location of empty directory
> ---
>
> Key: HDFS-16832
> URL: https://issues.apache.org/jira/browse/HDFS-16832
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Priority: Major
>
> HDFS-16732 is introduced for check block location when getListing or 
> getFileInfo. But When we check block location of empty directory will throw 
> NPE.
> Exception stack on tez client are below:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16832) [SBN READ] Fix NPE when check the block location of empty directory

2022-11-02 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16832:
---
Summary: [SBN READ] Fix NPE when check the block location of empty 
directory  (was: [SBN READ] Fix NPE when check block location)

> [SBN READ] Fix NPE when check the block location of empty directory
> ---
>
> Key: HDFS-16832
> URL: https://issues.apache.org/jira/browse/HDFS-16832
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16832) [SBN READ] Fix NPE when check block location

2022-11-02 Thread zhengchenyu (Jira)
zhengchenyu created HDFS-16832:
--

 Summary: [SBN READ] Fix NPE when check block location
 Key: HDFS-16832
 URL: https://issues.apache.org/jira/browse/HDFS-16832
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: zhengchenyu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16830) Improve router msync operation

2022-11-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627073#comment-17627073
 ] 

zhengchenyu commented on HDFS-16830:


Hi, in our production cluster, huge msync is introduced to active. 

I think we still should continue two work:

(1) propagate state id as client need, avoid msync to namenode which is not 
used by client.

(2) share msync, reduce the mysnc operations.

[~simbadzina] How about my proposal? 

 

> Improve router msync operation
> --
>
> Key: HDFS-16830
> URL: https://issues.apache.org/jira/browse/HDFS-16830
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, rbf
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16830) Improve router msync operation

2022-11-01 Thread zhengchenyu (Jira)
zhengchenyu created HDFS-16830:
--

 Summary: Improve router msync operation
 Key: HDFS-16830
 URL: https://issues.apache.org/jira/browse/HDFS-16830
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, rbf
Reporter: zhengchenyu
Assignee: zhengchenyu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.

2022-11-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627065#comment-17627065
 ] 

zhengchenyu commented on HDFS-13522:


[~simbadzina] I agree with you! Indeed in our production, I did not dare to 
disable msync, consequently many msync are introduced to active namenode. 
Though msync is low cost operation, I think it is necessary for us to reduce 
the msync.

> HDFS-13522: Add federated nameservices states to client protocol and 
> propagate it between routers and clients.
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{{}FederationNamenodeServiceState{}}}.
> This patch captures the state of all namespaces in the routers and propagates 
> it to clients. A follow up patch will change router behavior to direct 
> requests to the observer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16708) RBF: Support transmit state id from client in router.

2022-11-01 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu resolved HDFS-16708.

Resolution: Duplicate

> RBF: Support transmit state id from client in router.
> -
>
> Key: HDFS-16708
> URL: https://issues.apache.org/jira/browse/HDFS-16708
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522_proposal_zhengchenyu.pdf
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Implement the Design A described in HDFS-13522.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.

2022-10-31 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626553#comment-17626553
 ] 

zhengchenyu commented on HDFS-13522:


[~simbadzina] Thanks for your great patch, this design is very clever! Can you 
share some experience about you production environment?  In this design, many 
client share the pool state id, can we set auto msync period time to -1 or a 
very big value in client side?

> HDFS-13522: Add federated nameservices states to client protocol and 
> propagate it between routers and clients.
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{{}FederationNamenodeServiceState{}}}.
> This patch captures the state of all namespaces in the routers and propagates 
> it to clients. A follow up patch will change router behavior to direct 
> requests to the observer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16682) [SBN Read] make estimated transactions configurable

2022-09-19 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606888#comment-17606888
 ] 

zhengchenyu commented on HDFS-16682:


[~xkrogen] Can you please review this? These parameter should depend cluster's 
load, I think should config it.

> [SBN Read] make estimated transactions configurable
> ---
>
> Key: HDFS-16682
> URL: https://issues.apache.org/jira/browse/HDFS-16682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In GlobalStateIdContext, ESTIMATED_TRANSACTIONS_PER_SECOND and 
> ESTIMATED_SERVER_TIME_MULTIPLIER should be configured.
> These parameter depends  on different cluster's load. In the other way, these 
> config will help use to simulate observer namenode was far behind.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] (HDFS-14117) RBF: We can only delete the files or dirs of one subcluster in a cluster with multiple subclusters when trash is enabled

2022-09-06 Thread zhengchenyu (Jira)


[ https://issues.apache.org/jira/browse/HDFS-14117 ]


zhengchenyu deleted comment on HDFS-14117:


was (Author: zhengchenyu):
    In our cluster, I must mount all nameservice for /user/${user}/.Trash, it 
means router will rename all nameservice when move to trash. Though it works 
for long time, this will result to bad performance when one namenode degrade.

I wanna only connect one nameservice. So I have a new proposal: 
    Condition: 
    (1) /test is mounted in ns0
    (2) /user/hdfs is mounted is ns1
    If we move /test/hello to /user/hdfs/.Trash/Current/test/hello.
    When we process the location with trash prefix, we just use the location 
which remove the prefix to find the mounted ns. For 
/user/hdfs/.Trash/Current/test/hello, we remove the prefix 
'/user/hdfs/.Trash/Current', get '/test/hello', use '/test/hello' to find the 
mounted ns. Then we got the location: 
ns0->/user/hdfs/.Trash/Current/test/hello, then rename to trash will work.
    The problem is that we must check the pattern of location in every call, 
but I think it is low cost.

[~elgoiri] [~ayushtkn] [~hexiaoqiao] [~ramkumar]  [~xuzq_zander] How about my 
proposal?

> RBF: We can only delete the files or dirs of one subcluster in a cluster with 
> multiple subclusters when trash is enabled
> 
>
> Key: HDFS-14117
> URL: https://issues.apache.org/jira/browse/HDFS-14117
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ramkumar Ramalingam
>Assignee: Ramkumar Ramalingam
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14117-HDFS-13891.001.patch, 
> HDFS-14117-HDFS-13891.002.patch, HDFS-14117-HDFS-13891.003.patch, 
> HDFS-14117-HDFS-13891.004.patch, HDFS-14117-HDFS-13891.005.patch, 
> HDFS-14117-HDFS-13891.006.patch, HDFS-14117-HDFS-13891.007.patch, 
> HDFS-14117-HDFS-13891.008.patch, HDFS-14117-HDFS-13891.009.patch, 
> HDFS-14117-HDFS-13891.010.patch, HDFS-14117-HDFS-13891.011.patch, 
> HDFS-14117-HDFS-13891.012.patch, HDFS-14117-HDFS-13891.013.patch, 
> HDFS-14117-HDFS-13891.014.patch, HDFS-14117-HDFS-13891.015.patch, 
> HDFS-14117-HDFS-13891.016.patch, HDFS-14117-HDFS-13891.017.patch, 
> HDFS-14117-HDFS-13891.018.patch, HDFS-14117-HDFS-13891.019.patch, 
> HDFS-14117-HDFS-13891.020.patch, HDFS-14117.001.patch, HDFS-14117.002.patch, 
> HDFS-14117.003.patch, HDFS-14117.004.patch, HDFS-14117.005.patch
>
>
> When we delete files or dirs in hdfs, it will move the deleted files or dirs 
> to trash by default.
> But in the global path we can only mount one trash dir /user. So we mount 
> trash dir /user of the subcluster ns1 to the global path /user. Then we can 
> delete files or dirs of ns1, but when we delete the files or dirs of another 
> subcluser, such as hacluster, it will be failed.
> h1. Mount Table
> ||Global path||Target nameservice||Target path||Order||Read 
> only||Owner||Group||Permission||Quota/Usage||Date Modified||Date Created||
> |/test|hacluster2|/test| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: 
> -/-]|2018/11/29 14:37:42|2018/11/29 14:37:42|
> |/tmp|hacluster1|/tmp| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: 
> -/-]|2018/11/29 14:37:05|2018/11/29 14:37:05|
> |/user|hacluster2,hacluster1|/user|HASH| |securedn|users|rwxr-xr-x|[NsQuota: 
> -/-, SsQuota: -/-]|2018/11/29 14:42:37|2018/11/29 14:38:20|
> commands: 
> {noformat}
> 1./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /test/.
> 18/11/30 11:00:47 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> -rw-r--r-- 3 securedn supergroup 8081 2018-11-30 10:56 /test/hdfs.cmd
> 2./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /tmp/.
> 18/11/30 11:00:40 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> -rw-r--r--   3 securedn supergroup   6311 2018-11-30 10:57 /tmp/mapred.cmd
> 3../opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm 
> /tmp/mapred.cmd
> 18/11/30 11:01:02 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> rm: Failed to move to trash: hdfs://router/tmp/mapred.cmd: rename destination 
> parent /user/securedn/.Trash/Current/tmp/mapred.cmd not found.
> 4./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm /test/hdfs.cmd
> 18/11/30 11:01:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/11/30 11:01:22 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://router/test/hdfs.cmd' to trash at: 
> hdfs://router/user/securedn/.Trash/Current/test/hdfs.cmd
> 

[jira] [Commented] (HDFS-14117) RBF: We can only delete the files or dirs of one subcluster in a cluster with multiple subclusters when trash is enabled

2022-09-06 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600659#comment-17600659
 ] 

zhengchenyu commented on HDFS-14117:


    In our cluster, I must mount all nameservice for /user/${user}/.Trash, it 
means router will rename all nameservice when move to trash. Though it works 
for long time, this will result to bad performance when one namenode degrade.

I wanna only connect one nameservice. So I have a new proposal: 
    Condition: 
    (1) /test is mounted in ns0
    (2) /user/hdfs is mounted is ns1
    If we move /test/hello to /user/hdfs/.Trash/Current/test/hello.
    When we process the location with trash prefix, we just use the location 
which remove the prefix to find the mounted ns. For 
/user/hdfs/.Trash/Current/test/hello, we remove the prefix 
'/user/hdfs/.Trash/Current', get '/test/hello', use '/test/hello' to find the 
mounted ns. Then we got the location: 
ns0->/user/hdfs/.Trash/Current/test/hello, then rename to trash will work.
    The problem is that we must check the pattern of location in every call, 
but I think it is low cost.

[~elgoiri] [~ayushtkn] [~hexiaoqiao] [~ramkumar]  [~xuzq_zander] How about my 
proposal?

> RBF: We can only delete the files or dirs of one subcluster in a cluster with 
> multiple subclusters when trash is enabled
> 
>
> Key: HDFS-14117
> URL: https://issues.apache.org/jira/browse/HDFS-14117
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ramkumar Ramalingam
>Assignee: Ramkumar Ramalingam
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14117-HDFS-13891.001.patch, 
> HDFS-14117-HDFS-13891.002.patch, HDFS-14117-HDFS-13891.003.patch, 
> HDFS-14117-HDFS-13891.004.patch, HDFS-14117-HDFS-13891.005.patch, 
> HDFS-14117-HDFS-13891.006.patch, HDFS-14117-HDFS-13891.007.patch, 
> HDFS-14117-HDFS-13891.008.patch, HDFS-14117-HDFS-13891.009.patch, 
> HDFS-14117-HDFS-13891.010.patch, HDFS-14117-HDFS-13891.011.patch, 
> HDFS-14117-HDFS-13891.012.patch, HDFS-14117-HDFS-13891.013.patch, 
> HDFS-14117-HDFS-13891.014.patch, HDFS-14117-HDFS-13891.015.patch, 
> HDFS-14117-HDFS-13891.016.patch, HDFS-14117-HDFS-13891.017.patch, 
> HDFS-14117-HDFS-13891.018.patch, HDFS-14117-HDFS-13891.019.patch, 
> HDFS-14117-HDFS-13891.020.patch, HDFS-14117.001.patch, HDFS-14117.002.patch, 
> HDFS-14117.003.patch, HDFS-14117.004.patch, HDFS-14117.005.patch
>
>
> When we delete files or dirs in hdfs, it will move the deleted files or dirs 
> to trash by default.
> But in the global path we can only mount one trash dir /user. So we mount 
> trash dir /user of the subcluster ns1 to the global path /user. Then we can 
> delete files or dirs of ns1, but when we delete the files or dirs of another 
> subcluser, such as hacluster, it will be failed.
> h1. Mount Table
> ||Global path||Target nameservice||Target path||Order||Read 
> only||Owner||Group||Permission||Quota/Usage||Date Modified||Date Created||
> |/test|hacluster2|/test| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: 
> -/-]|2018/11/29 14:37:42|2018/11/29 14:37:42|
> |/tmp|hacluster1|/tmp| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: 
> -/-]|2018/11/29 14:37:05|2018/11/29 14:37:05|
> |/user|hacluster2,hacluster1|/user|HASH| |securedn|users|rwxr-xr-x|[NsQuota: 
> -/-, SsQuota: -/-]|2018/11/29 14:42:37|2018/11/29 14:38:20|
> commands: 
> {noformat}
> 1./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /test/.
> 18/11/30 11:00:47 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> -rw-r--r-- 3 securedn supergroup 8081 2018-11-30 10:56 /test/hdfs.cmd
> 2./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /tmp/.
> 18/11/30 11:00:40 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> -rw-r--r--   3 securedn supergroup   6311 2018-11-30 10:57 /tmp/mapred.cmd
> 3../opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm 
> /tmp/mapred.cmd
> 18/11/30 11:01:02 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> rm: Failed to move to trash: hdfs://router/tmp/mapred.cmd: rename destination 
> parent /user/securedn/.Trash/Current/tmp/mapred.cmd not found.
> 4./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm /test/hdfs.cmd
> 18/11/30 11:01:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/11/30 11:01:22 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://router/test/hdfs.cmd' to trash at: 
> 

[jira] [Updated] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.

2022-08-19 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16732:
---
Issue Type: Bug  (was: Improvement)

> [SBN READ] Avoid get location from observer when the block report is delayed.
> -
>
> Key: HDFS-16732
> URL: https://issues.apache.org/jira/browse/HDFS-16732
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Critical
>  Labels: pull-request-available
>
> Hive on tez application fail occasionally after observer is enable, log show 
> below.
> {code:java}
> 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] 
> |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, 
> vertex=vertex_1660618571916_4839_1_00 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329)
>   at 
> com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
>   at 
> com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
>   at 
> com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
>   at 
> com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
>   at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>   ... 4 more {code}
> As describe in MAPREDUCE-7082, when the block is missing, then will throw 
> this exception, but my cluster had no missing block.
> In this example, I found getListing return location information. When block 
> report of observer is delayed, will return the block without location.
> HDFS-13924 is introduce to solve this problem, but only consider 
> getBlockLocations. 
> In observer node, all method which may return location should check whether 
> locations is empty or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.

2022-08-18 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581645#comment-17581645
 ] 

zhengchenyu commented on HDFS-16732:


[~sunchao] [~xkrogen] [~zero45] Can you please review this?

> [SBN READ] Avoid get location from observer when the block report is delayed.
> -
>
> Key: HDFS-16732
> URL: https://issues.apache.org/jira/browse/HDFS-16732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Critical
>  Labels: pull-request-available
>
> Hive on tez application fail occasionally after observer is enable, log show 
> below.
> {code:java}
> 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] 
> |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, 
> vertex=vertex_1660618571916_4839_1_00 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329)
>   at 
> com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
>   at 
> com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
>   at 
> com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
>   at 
> com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
>   at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>   ... 4 more {code}
> As describe in MAPREDUCE-7082, when the block is missing, then will throw 
> this exception, but my cluster had no missing block.
> In this example, I found getListing return location information. When block 
> report of observer is delayed, will return the block without location.
> HDFS-13924 is introduce to solve this problem, but only consider 
> getBlockLocations. 
> In observer node, all method which may return location should check whether 
> locations is empty or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.

2022-08-18 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16732:
---
Description: 
Hive on tez application fail occasionally after observer is enable, log show 
below.
{code:java}
2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] 
|impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, 
vertex=vertex_1660618571916_4839_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329)
at 
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
at 
com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at 
com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
at 
com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
at 
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
... 4 more {code}
As describe in MAPREDUCE-7082, when the block is missing, then will throw this 
exception, but my cluster had no missing block.

In this example, I found getListing return location information. When block 
report of observer is delayed, will return the block without location.

HDFS-13924 is introduce to solve this problem, but only consider 
getBlockLocations. 

In observer node, all method which may return location should check whether 
locations is empty or not.

  was:
Hive on tez application fail occasionally after observer is enable, log show 
below.
{code:java}
2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] 
|impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, 
vertex=vertex_1660618571916_4839_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329)
at 
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
at 
com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at 
com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
at 
com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
at 
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)

[jira] [Updated] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.

2022-08-18 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16732:
---
Description: 
Hive on tez application fail occasionally after observer is enable, log show 
below.
{code:java}
2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] 
|impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, 
vertex=vertex_1660618571916_4839_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329)
at 
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
at 
com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at 
com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
at 
com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
at 
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
... 4 more {code}
As show in MAPREDUCE-7082, when the block is missing, then will throw this 
exception, but my cluster had no missing block.

In this example, I found getListing return location information. When block 
report of observer is delayed, will return the block without location.

HDFS-13924 is introduce to solve this problem, but only consider 
getBlockLocations. 

In observer node, all method which may return location should check whether 
locations is empty or not.

> [SBN READ] Avoid get location from observer when the block report is delayed.
> -
>
> Key: HDFS-16732
> URL: https://issues.apache.org/jira/browse/HDFS-16732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Critical
>
> Hive on tez application fail occasionally after observer is enable, log show 
> below.
> {code:java}
> 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] 
> |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, 
> vertex=vertex_1660618571916_4839_1_00 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> 

[jira] [Created] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.

2022-08-18 Thread zhengchenyu (Jira)
zhengchenyu created HDFS-16732:
--

 Summary: [SBN READ] Avoid get location from observer when the 
block report is delayed.
 Key: HDFS-16732
 URL: https://issues.apache.org/jira/browse/HDFS-16732
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 3.2.1
Reporter: zhengchenyu
Assignee: zhengchenyu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16708) RBF: Support transmit state id from client in router.

2022-08-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573722#comment-17573722
 ] 

zhengchenyu edited comment on HDFS-16708 at 8/1/22 11:58 AM:
-

[~xkrogen] [~xuzq_zander] [~simbadzina] 

Let's continue Design A here. I think Design A is not implemented in all 
HDFS-13522's PR.

there is no need to propagate all namespace's state ids in Design A. We can 
propagate by client's demand. I think we need a whole implement and document, 
then continue to discuss. I will submit new draft PR about the whole implement, 
document is here([^HDFS-13522_proposal_zhengchenyu.pdf]). Can you give me some 
suggestion? 

_Note: It is only a draft, the setting is a little complex, maybe I need to 
make it simple._ 


was (Author: zhengchenyu):
[~xkrogen] [~xuzq_zander] [~simbadzina] 

Let's continue Design A here. I think Design A is not implemented in all 
HDFS-13522's PR.

there is no need to propagate all namespace's state ids in Design A. We can 
propagate by client's demand. I think we need a whole implement and document, 
then continue to discuss. I will submit new draft PR about the whole implement, 
document is here([^HDFS-13522_proposal_zhengchenyu.pdf]). Can you give me some 
suggestion? 

_Note: It is a draft, the setting is a little complex, maybe I need to make it 
simple._ 

> RBF: Support transmit state id from client in router.
> -
>
> Key: HDFS-16708
> URL: https://issues.apache.org/jira/browse/HDFS-16708
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522_proposal_zhengchenyu.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement the Design A described in HDFS-13522.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16708) RBF: Support transmit state id from client in router.

2022-08-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573722#comment-17573722
 ] 

zhengchenyu edited comment on HDFS-16708 at 8/1/22 11:58 AM:
-

[~xkrogen] [~xuzq_zander] [~simbadzina] 

Let's continue Design A here. I think Design A is not implemented in all 
HDFS-13522's PR.

there is no need to propagate all namespace's state ids in Design A. We can 
propagate by client's demand. I think we need a whole implement and document, 
then continue to discuss. I will submit new draft PR about the whole implement, 
document is here([^HDFS-13522_proposal_zhengchenyu.pdf]). Can you give me some 
suggestion? 

_Note: It is a draft, the setting is a little complex, maybe I need to make it 
simple._ 


was (Author: zhengchenyu):
[~xkrogen] [~xuzq_zander] [~simbadzina] 

Let's continue Design A here. I think Design A is not implemented in all 
HDFS-13522's PR.

there is no need to propagate all namespace's state ids in Design A. We can 
propagate by client's demand. I think we need a whole implement and document, 
then continue to discuss. I will submit new draft PR about the whole implement, 
Can you give me some suggestion? 

_Note: It is a draft, the setting is a little complex, maybe I need to make it 
simple._ 

> RBF: Support transmit state id from client in router.
> -
>
> Key: HDFS-16708
> URL: https://issues.apache.org/jira/browse/HDFS-16708
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522_proposal_zhengchenyu.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement the Design A described in HDFS-13522.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16708) RBF: Support transmit state id from client in router.

2022-08-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573722#comment-17573722
 ] 

zhengchenyu commented on HDFS-16708:


[~xkrogen] [~xuzq_zander] [~simbadzina] 

Let's continue Design A here. I think Design A is not implemented in all 
HDFS-13522's PR.

there is no need to propagate all namespace's state ids in Design A. We can 
propagate by client's demand. I think we need a whole implement and document, 
then continue to discuss. I will submit new draft PR about the whole implement, 
Can you give me some suggestion? 

_Note: It is a draft, the setting is a little complex, maybe I need to make it 
simple._ 

> RBF: Support transmit state id from client in router.
> -
>
> Key: HDFS-16708
> URL: https://issues.apache.org/jira/browse/HDFS-16708
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522_proposal_zhengchenyu.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement the Design A described in HDFS-13522.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-01 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-13522:
---
Attachment: (was: HDFS-13522_proposal_zhengchenyu.pdf)

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16708) RBF: Support transmit state id from client in router.

2022-08-01 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16708:
---
Description: Implement the Design A described in HDFS-13522.

> RBF: Support transmit state id from client in router.
> -
>
> Key: HDFS-16708
> URL: https://issues.apache.org/jira/browse/HDFS-16708
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-13522_proposal_zhengchenyu.pdf
>
>
> Implement the Design A described in HDFS-13522.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16708) RBF: Support transmit state id from client in router.

2022-08-01 Thread zhengchenyu (Jira)
zhengchenyu created HDFS-16708:
--

 Summary: RBF: Support transmit state id from client in router.
 Key: HDFS-16708
 URL: https://issues.apache.org/jira/browse/HDFS-16708
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: zhengchenyu
Assignee: zhengchenyu
 Attachments: HDFS-13522_proposal_zhengchenyu.pdf





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16708) RBF: Support transmit state id from client in router.

2022-08-01 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16708:
---
Attachment: HDFS-13522_proposal_zhengchenyu.pdf

> RBF: Support transmit state id from client in router.
> -
>
> Key: HDFS-16708
> URL: https://issues.apache.org/jira/browse/HDFS-16708
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-13522_proposal_zhengchenyu.pdf
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-01 Thread zhengchenyu (Jira)


[ https://issues.apache.org/jira/browse/HDFS-13522 ]


zhengchenyu deleted comment on HDFS-13522:


was (Author: zhengchenyu):
[~xkrogen] [~xuzq_zander] [~simbadzina] For I know, Design A is not implemented 
in all PR.

For Design A, there is no need to propagate all namespace's state ids. We can 
propagate by client's demand. I think we need a whole implement and document, 
then continue to discuss. I have a draft which is combination of Design A and 
B. If someone are interested in Design A, can you help review this draft 
[https://github.com/zhengchenyu/hadoop/commit/a47ae882943f090836a801cf758761c5b970d813.]

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-01 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-13522:
---
Attachment: HDFS-13522_proposal_zhengchenyu.pdf

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573704#comment-17573704
 ] 

zhengchenyu edited comment on HDFS-13522 at 8/1/22 11:34 AM:
-

[~xuzq_zander] 

Hi, the use case about design A is very rare indeed. But Design A also have 
advantage.
(1) More flexible
Client could set their msync period time by itself.
Example: In our cluster, one name service, some special user detect hdfs file 
is created periodically, may need high time precision, means more frequent 
msync.(Though I am oppose to this way).

(2) Save msync
I think there is no need to call msync periodically for most HIVE, MR 
application. Design A will save more msync than Design B。

I agree [~xuzq_zander] 's suggestion that focus on Design B first, add Design A 
as a bonus item. It is no easy to review both Design A and Design B.

Could we only complete Design B in this issue? [~omalley] [~elgoiri] 
[~simbadzina] 


was (Author: zhengchenyu):
[~xuzq_zander] 

Hi, the use case about design A is very rare indeed. But Design A also have 
advantage.
(1) More flexible
Client could set their msync period time by itself.
Example: In our cluster, one name service, some special user detect hdfs file 
is created periodically, may need high time precision, means more frequent 
msync.(Though I am oppose to this way).

(2) Save msync
I think there is no need to call msync periodically for most HIVE, MR 
application. Design A will save more msync than Design B。

I agree [~xuzq_zander] 's suggestion that focus on Design B first, add Design A 
as a bonus item. It is no easy to review both Design A and Design B.

Could we only complete Design B in this issue? [~omalley] [~elgoiri] 

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573707#comment-17573707
 ] 

zhengchenyu edited comment on HDFS-13522 at 8/1/22 11:32 AM:
-

[~xkrogen] [~xuzq_zander] [~simbadzina] For I know, Design A is not implemented 
in all PR.

For Design A, there is no need to propagate all namespace's state ids. We can 
propagate by client's demand. I think we need a whole implement and document, 
then continue to discuss. I have a draft which is combination of Design A and 
B. If someone are interested in Design A, can you help review this draft 
[https://github.com/zhengchenyu/hadoop/commit/a47ae882943f090836a801cf758761c5b970d813.]


was (Author: zhengchenyu):
[~xkrogen] [~xuzq_zander] [~simbadzina] For I know, Design A is not implemented 
in all PR.

For Design A, there is no need to propagate all namespace's state ids. We can 
propagate by client's demand. I think we need a whole implement and document, 
then continue to discuss. I have a draft which is combination of Design A and 
B. If someone are interested in Design A, can you help review this draft 
[https://github.com/zhengchenyu/hadoop/commit/a47ae882943f090836a801cf758761c5b970d813.]

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-01 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-13522:
---
Attachment: (was: HDFS-13522_proposal_zhengchenyu_v1.pdf)

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573704#comment-17573704
 ] 

zhengchenyu edited comment on HDFS-13522 at 8/1/22 11:26 AM:
-

[~xuzq_zander] 

Hi, the use case about design A is very rare indeed. But Design A also have 
advantage.
(1) More flexible
Client could set their msync period time by itself.
Example: In our cluster, one name service, some special user detect hdfs file 
is created periodically, may need high time precision, means more frequent 
msync.(Though I am oppose to this way).

(2) Save msync
I think there is no need to call msync periodically for most HIVE, MR 
application. Design A will save more msync than Design B。

I agree [~xuzq_zander] 's suggestion that focus on Design B first, add Design A 
as a bonus item. It is no easy to review both Design A and Design B.

Could we only complete Design B in this issue? [~omalley] [~elgoiri] 


was (Author: zhengchenyu):
[~xuzq_zander] 

Hi, the use case about design A is very rare indeed. But Design A also have 
advantage.
(1) More flexible
Client could set their msync period time by itself.
Example: In our cluster, one name service, some special user detect hdfs file 
is created periodically, may need high time precision, means more frequent 
msync.(Though I am oppose to this way).

(2) Save msync
I think there is no need to call msync periodically for most HIVE, MR 
application. Design A will save more msync than Design B。

I agree [~xuzq_zander] 's suggestion that focus on Design B first, add Design A 
as a bonus item. It is no easy to review both Design A and Design B.

Could we only complete Design B in this issue?

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573707#comment-17573707
 ] 

zhengchenyu commented on HDFS-13522:


[~xkrogen] [~xuzq_zander] [~simbadzina] For I know, Design A is not implemented 
in all PR.

For Design A, there is no need to propagate all namespace's state ids. We can 
propagate by client's demand. I think we need a whole implement and document, 
then continue to discuss. I have a draft which is combination of Design A and 
B. If someone are interested in Design A, can you help review this draft 
[https://github.com/zhengchenyu/hadoop/commit/a47ae882943f090836a801cf758761c5b970d813.]

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573704#comment-17573704
 ] 

zhengchenyu edited comment on HDFS-13522 at 8/1/22 11:24 AM:
-

[~xuzq_zander] 

Hi, the use case about design A is very rare indeed. But Design A also have 
advantage.
(1) More flexible
Client could set their msync period time by itself.
Example: In our cluster, one name service, some special user detect hdfs file 
is created periodically, may need high time precision, means more frequent 
msync.(Though I am oppose to this way).

(2) Save msync
I think there is no need to call msync periodically for most HIVE, MR 
application. Design A will save more msync than Design B。

I agree [~xuzq_zander] 's suggestion that focus on Design B first, add Design A 
as a bonus item. It is no easy to review both Design A and Design B.

Could we only complete Design B in this issue?


was (Author: zhengchenyu):
[~xuzq_zander] 

Hi, the use case about design A is very rare indeed. But Design A also have 
advantage.
(1) More flexible
Client could set their msync period time by itself.
Example: In our cluster, one name service, some special user detect hdfs file 
is created periodically, may need high time precision, means more frequent 
msync.(Though I am oppose to this way).

(2) Save msync
I think there is no need to call msync periodically for most HIVE, MR 
application. Design A will save more msync than Design B。

I agree your suggestion that focus on Design B first, add Design A as a bonus 
item. It is no easy to review both Design A and Design B.

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-01 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573704#comment-17573704
 ] 

zhengchenyu commented on HDFS-13522:


[~xuzq_zander] 

Hi, the use case about design A is very rare indeed. But Design A also have 
advantage.
(1) More flexible
Client could set their msync period time by itself.
Example: In our cluster, one name service, some special user detect hdfs file 
is created periodically, may need high time precision, means more frequent 
msync.(Though I am oppose to this way).

(2) Save msync
I think there is no need to call msync periodically for most HIVE, MR 
application. Design A will save more msync than Design B。

I agree your suggestion that focus on Design B first, add Design A as a bonus 
item. It is no easy to review both Design A and Design B.

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-26 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571703#comment-17571703
 ] 

zhengchenyu commented on HDFS-13522:


[~simbadzina] I also agree the *combination of both Design A and B.* Most user 
use plan B, some special user could choose plan A by their demand. **

[~simbadzina] [~xuzq_zander] what's your email of slack? Or we can gather in 
hdfs channel fisrtly.

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-26 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571703#comment-17571703
 ] 

zhengchenyu edited comment on HDFS-13522 at 7/27/22 2:45 AM:
-

[~simbadzina] I also agree the *combination of both Design A and B.* Most user 
use plan B, some special user could choose plan A by their demand. 

[~simbadzina] [~xuzq_zander] what's your email of slack? Or we can gather in 
hdfs channel firstly.


was (Author: zhengchenyu):
[~simbadzina] I also agree the *combination of both Design A and B.* Most user 
use plan B, some special user could choose plan A by their demand. **

[~simbadzina] [~xuzq_zander] what's your email of slack? Or we can gather in 
hdfs channel fisrtly.

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-25 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571186#comment-17571186
 ] 

zhengchenyu edited comment on HDFS-13522 at 7/26/22 4:16 AM:
-

[~simbadzina] 

In your v2 document, maybe no detailed implement and structure. I still don't 
know how to implement in your design. What's you choice about design A and 
design B? (Note: The last picture is not clear in google document.)

In my design, key is nameserviceId in router mode, key is clientid+callid in 
cilent mode. In my proposal, mainly describe client mode. About switches 
between routers, there is no problem in my proposal in client mode. because 
client carry real state id. 

I think we need more discuss about this. I create a channel named ''hdfs-13522" 
on slack. Let us discuss on this channel for more efficient communication, and 
continue to meeting. [~simbadzina] [~xuzq_zander] 


was (Author: zhengchenyu):
[~simbadzina] 

In your v2 document, maybe no detailed implement and structure. I still don't 
know how to implement in your design. (Note: The last picture is not clear in 
google document.)

In my design, key is nameserviceId in router mode, key is clientid+callid in 
cilent mode. In my proposal, mainly describe client mode. About switches 
between routers, there is no problem in my proposal in client mode. because 
client carry real state id. 

I think we need more discuss about this. I create a channel named ''hdfs-13522" 
on slack. Let us discuss on this channel for more efficient communication, and 
continue to meeting. [~simbadzina] [~xuzq_zander] 

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-25 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571186#comment-17571186
 ] 

zhengchenyu edited comment on HDFS-13522 at 7/26/22 4:15 AM:
-

[~simbadzina] 

In your v2 document, maybe no detailed implement and structure. I still don't 
know how to implement in your design. (Note: The last picture is not clear in 
google document.)

In my design, key is nameserviceId in router mode, key is clientid+callid in 
cilent mode. In my proposal, mainly describe client mode. About switches 
between routers, there is no problem in my proposal in client mode. because 
client carry real state id. 

I think we need more discuss about this. I create a channel named ''hdfs-13522" 
on slack. Let us discuss on this channel for more efficient communication, and 
continue to meeting. [~simbadzina] [~xuzq_zander] 


was (Author: zhengchenyu):
[~simbadzina] 

In your v2 document, maybe no implement and structure.

In my design, key is nameserviceId in router mode, key is clientid+callid in 
cilent mode. In my proposal, mainly describe client mode. About switches 
between routers, there is no problem in my proposal in client mode. because 
client carry real state id. 

I think we need more discuss about this. I create a channel named ''hdfs-13522" 
on slack. Let us discuss on this channel for more efficient communication, and 
continue to meeting. [~simbadzina] [~xuzq_zander] 

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-25 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571186#comment-17571186
 ] 

zhengchenyu edited comment on HDFS-13522 at 7/26/22 4:11 AM:
-

[~simbadzina] 

In your v2 document, maybe no implement and structure.

In my design, key is nameserviceId in router mode, key is clientid+callid in 
cilent mode. In my proposal, mainly describe client mode. About switches 
between routers, there is no problem in my proposal in client mode. because 
client carry real state id. 

I think we need more discuss about this. I create a channel named ''hdfs-13522" 
on slack. Let us discuss on this channel for more efficient communication, and 
continue to meeting. [~simbadzina] [~xuzq_zander] 


was (Author: zhengchenyu):
[~simbadzina] 

In your v2 document, maybe no implement and structure.

In my design, key is nameserviceId in router mode, key is clientid+callid in 
cilent mode. In my proposal, mainly describe client mode. About switches 
between routers, there is no problem in my proposal in client mode. because 
client carry real state id. 

I think we need more discuss about this. I create a channel named ''hdfs-13522" 
in slack. Let us discuss on this channel for more efficient communication, and 
continue to meeting. [~simbadzina] [~xuzq_zander] 

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-25 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571186#comment-17571186
 ] 

zhengchenyu commented on HDFS-13522:


[~simbadzina] 

In your v2 document, maybe no implement and structure.

In my design, key is nameserviceId in router mode, key is clientid+callid in 
cilent mode. In my proposal, mainly describe client mode. About switches 
between routers, there is no problem in my proposal in client mode. because 
client carry real state id. 

I think we need more discuss about this. I create a channel named 
''hdfs-13522". Let us discuss on this channel for more efficient communication, 
and continue to meeting. [~simbadzina] [~xuzq_zander] 

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-25 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571186#comment-17571186
 ] 

zhengchenyu edited comment on HDFS-13522 at 7/26/22 4:10 AM:
-

[~simbadzina] 

In your v2 document, maybe no implement and structure.

In my design, key is nameserviceId in router mode, key is clientid+callid in 
cilent mode. In my proposal, mainly describe client mode. About switches 
between routers, there is no problem in my proposal in client mode. because 
client carry real state id. 

I think we need more discuss about this. I create a channel named ''hdfs-13522" 
in slack. Let us discuss on this channel for more efficient communication, and 
continue to meeting. [~simbadzina] [~xuzq_zander] 


was (Author: zhengchenyu):
[~simbadzina] 

In your v2 document, maybe no implement and structure.

In my design, key is nameserviceId in router mode, key is clientid+callid in 
cilent mode. In my proposal, mainly describe client mode. About switches 
between routers, there is no problem in my proposal in client mode. because 
client carry real state id. 

I think we need more discuss about this. I create a channel named 
''hdfs-13522". Let us discuss on this channel for more efficient communication, 
and continue to meeting. [~simbadzina] [~xuzq_zander] 

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16682) [SBN Read] make estimated transactions configurable

2022-07-25 Thread zhengchenyu (Jira)
zhengchenyu created HDFS-16682:
--

 Summary: [SBN Read] make estimated transactions configurable
 Key: HDFS-16682
 URL: https://issues.apache.org/jira/browse/HDFS-16682
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: zhengchenyu
Assignee: zhengchenyu


In GlobalStateIdContext, ESTIMATED_TRANSACTIONS_PER_SECOND and 

ESTIMATED_SERVER_TIME_MULTIPLIER should be configured.

These parameter depends  on different cluster's load. In the other way, these 
config will help use to simulate observer namenode was far behind.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-21 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569802#comment-17569802
 ] 

zhengchenyu edited comment on HDFS-13522 at 7/22/22 5:11 AM:
-

[~simbadzina]

Thanks for involving me. HDFS-13522_proposal_zhengchenyu_v1.pdf is my proposal 
document. Chapter 2.1 is just solution C, and describe how to carry state id in 
my demo implement. Can you please review my proposal.


was (Author: zhengchenyu):
[~simbadzina]

Thanks for involving me. HDFS-13522_proposal_zhengchenyu_v1.pdf is my proposal 
document. Chapter 2.1 describe how to carry state id in my demo implement. Can 
you please review my proposal.

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-21 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569802#comment-17569802
 ] 

zhengchenyu commented on HDFS-13522:


[~simbadzina]

Thanks for involving me. HDFS-13522_proposal_zhengchenyu_v1.pdf is my proposal 
document. Chapter 2.1 describe how to carry state id in my demo implement. Can 
you please review my proposal.

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-21 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-13522:
---
Attachment: HDFS-13522_proposal_zhengchenyu_v1.pdf

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-07-21 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569363#comment-17569363
 ] 

zhengchenyu commented on HDFS-13522:


It is a critical issue so that so many comment. Let me summarize the comments. 
Seems There are two solution in history:

(A) Solution A is proposed by CR Hota. And describe in RBF_ Observer support.pdf
  This proposal is not implemented. Seems need both read only router and write 
router. I feel the deployment is complex.
  As no one notice it, let us ignore temporarily.
(B) SolutionB is proposed by Surendra Singh Lilhore and Hemanth Boyina.
  Then zhoubing zheng and Simbarashe Dzinamarira wants rebase into the trunk.
  This solution is implemented, just HDFS-13522***.patch.
  
I think the second solution meet the most user's demand. But router hide the 
state id from client. I think router should connect namenode with client's 
state id. And I think [~omalley]  also means it. Here I named the proposal 
which carry the client state id is the solution C.
Note: I had planned to implement solution C in another issue after 
[~simbadzina] 's work. As [~omalley] has said, let us work all it in this issue.

[~simbadzina] , I proposed both solution B and solution C. And I had implement 
a demo version. Can I work it with you together?

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2022-07-04 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562354#comment-17562354
 ] 

zhengchenyu commented on HDFS-14703:


Is this ongoing? It is a great work indeed. But I doubt that why not hold read 
lock in document and fgl branch?  When write /a/b/c, I think we need hold the 
read lock of /a and /a/b, then hold the write lock of /a/b/c. If some write 
operation on /a/b happen in same time, they may result to inconsistence.

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-4645) Move from randomly generated block ID to sequentially generated block ID

2022-06-27 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu reassigned HDFS-4645:
-

Assignee: zhengchenyu  (was: Arpit Agarwal)

> Move from randomly generated block ID to sequentially generated block ID
> 
>
> Key: HDFS-4645
> URL: https://issues.apache.org/jira/browse/HDFS-4645
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Suresh Srinivas
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 2.1.0-beta
>
> Attachments: HDFS-4645.001.patch, HDFS-4645.002.patch, 
> HDFS-4645.003.patch, HDFS-4645.004.patch, HDFS-4645.005.patch, 
> HDFS-4645.006.patch, HDFS-4645.branch-2.patch, 
> SequentialblockIDallocation.pdf, editsStored
>
>
> Currently block IDs are randomly generated. This means there is no pattern to 
> block ID generation and no guarantees such as uniqueness of block ID for the 
> life time of the system can be made. I propose using SequentialNumber for 
> block ID generation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-08 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551940#comment-17551940
 ] 

zhengchenyu edited comment on HDFS-13522 at 6/9/22 3:52 AM:


[~simbadzina] Router does not use client's AlignmentContext, So I think  
dfs.federation.router.observer.auto-msync-period must set to 0, there will be 
no problem.

If this value is too big, router may not catch the latest modification. Because 
Router does not use client's AlignmentContext, there is no way to make sure the 
current router stat id is newer than client state id.

Maybe we need to set dfs.federation.router.observer.auto-msync-period to fixed 
value '0', or note in document.


was (Author: zhengchenyu):
[~simbadzina] Router does not use client's AlignmentContext, So I think  
dfs.federation.router.observer.auto-msync-period must set to 0, then there will 
be no problem.

If this value is too big, router may not catch the latest modification. Because 
Router does not use client's AlignmentContext, there is no way to make sure the 
current router stat id is newer than client state id.

Maybe we need to set dfs.federation.router.observer.auto-msync-period to fixed 
value '0', or note in document.

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-08 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551940#comment-17551940
 ] 

zhengchenyu commented on HDFS-13522:


[~simbadzina] Router does not use client's AlignmentContext, So I think  
dfs.federation.router.observer.auto-msync-period must set to 0, then there will 
be no problem.

If this value is too big, router may not catch the latest modification. Because 
Router does not use client's AlignmentContext, there is no way to make sure the 
current router stat id is newer than client state id.

Maybe we need to set dfs.federation.router.observer.auto-msync-period to fixed 
value '0', or note in document.

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-05-26 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542746#comment-17542746
 ] 

zhengchenyu commented on HDFS-13522:


Hi [~simbadzina], I know you said, maybe you don't catch my idea. I means I 
can't config in rpc call level. Class RouterStateIdContext only implement the 
receiveRequestState.

When router call namenode, they use their own AlignmentContext, but not from 
client. So can't meet different user's demand.

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-05-25 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541887#comment-17541887
 ] 

zhengchenyu commented on HDFS-13522:


Thanks for this good patch!

But I have a question. In this patch, router hide the client's state id. I 
think it means we can not enable or disable this feature, and can not configure.

As I know, Observer NameNode can't keep absolute consistency. So user may need 
to configure auto-mysnc-period and some other parameter so that meet different 
user's demand.

I think it had better to use client's state id in client rpc call level. (Note: 
of course, we need more change, especially mount multi nameservice. )

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-12-07 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15715:
---
Description: 
One of our Namenode which has 300M files and blocks. This namenode shoud not be 
in heavy load generally. But we found rpc process time keep high, and 
decommission is very slow.

I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget 
can't find block, so result to performance degradation. Consider with 
HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget 
can't find proper block.

Then I enable some debug. (Of course I revise some code so that only debug 
isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I 
found "the rack has too many chosen nodes" is called. Then I found some log 
like this
{code:java}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
{code}
Then through some debug and simulation, I found the reason, and reproduction 
this exception.

The reason is that some developer use COLD storage policy and mover, but the 
operations of setting storage policy and mover are asynchronous. So some file's 
real datanodestorages are not match with this storagePolicy.

Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
are DISK. Then set storage policy to COLD. When some logical trigger(For 
example decommission) to copy this block. chooseTarget then use 
chooseStorageTypes to filter real needed block. Here the size of variable 
requiredStorageTypes which chooseStorageTypes returned is 3. But the size of 
result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
storage. Then will request to choose 3 target. choose first target is right, 
but when choose seconde target, the variable 'counter' is 4 which is larger 
than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
datanodestorage. Then result to bad performance.

I think chooseStorageTypes need to consider the result, when the exist 
replication doesn't meet storage policy's demand, we need to remove this from 
result.

I changed by this way, and test in my unit-test. Then solve it.

  was:
One of our Namenode which has 300M files and blocks. This namenode shoud not be 
in heavy load generally. But we found rpc process time keep high, and 
decommission is very slow.

I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget 
can't find block, so result to performance degradation. Consider with 
HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget 
can't find proper block.

Then I enable some debug. (Of course I revise some code so that only debug 
isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I 
found "the rack has too many chosen nodes" is called. Then I found some log 
like this
{code:java}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
{code}
Then through some debug and simulation, I found the reason, and reproduction 
this exception.

The reason is that some developer use COLD storage policy and mover, but the 
operatiosn of 

[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-12-07 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15715:
---
Description: 
One of our Namenode which has 300M files and blocks. This namenode shoud not be 
in heavy load generally. But we found rpc process time keep high, and 
decommission is very slow.

I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget 
can't find block, so result to performance degradation. Consider with 
HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget 
can't find proper block.

Then I enable some debug. (Of course I revise some code so that only debug 
isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I 
found "the rack has too many chosen nodes" is called. Then I found some log 
like this
{code:java}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
{code}
Then through some debug and simulation, I found the reason, and reproduction 
this exception.

The reason is that some developer use COLD storage policy and mover, but the 
operatiosn of setting storage policy and mover are asynchronous. So some file's 
real datanodestorages are not match with this storagePolicy.

Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
are DISK. Then set storage policy to COLD. When some logical trigger(For 
example decommission) to copy this block. chooseTarget then use 
chooseStorageTypes to filter real needed block. Here the size of variable 
requiredStorageTypes which chooseStorageTypes returned is 3. But the size of 
result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
storage. Then will request to choose 3 target. choose first target is right, 
but when choose seconde target, the variable 'counter' is 4 which is larger 
than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
datanodestorage. Then result to bad performance.

I think chooseStorageTypes need to consider the result, when the exist 
replication doesn't meet storage policy's demand, we need to remove this from 
result.

I changed by this way, and test in my unit-test. Then solve it.

  was:
One of our Namenode which has 300M files and blocks. In common way, this namode 
shoud not be in heavy load. But we found rpc process time keep high, and 
decommission is very slow.
 
I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget 
can't find block, so result to performance degradation. Consider with 
HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget 
can't find proper block.

Then I enable some debug. (Of course I revise some code so that only debug 
isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I 
found "the rack has too many chosen nodes" is called. Then I found some log 
like this 

{code}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
{code} 

Then through some debug and simulation, I found the reason, and reproduction 
this exception.

The reason is that some developer use COLD storage policy and mover, but the 
operatiosn of 

[jira] [Updated] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-15 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16070:
---
Description: 
When I speed up the decommission, I found that some datanode's io is busy, then 
I found host's load is very high, and ten thousands data transfer thread are 
running. 
Then I find log like below.
{code}
# setup datatranfer log
2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
# datatranfer done log
2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
{code}
You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36. 
If datatranfser was not done in 10min(pending timeout + check interval), then 
next datatranfser for same block will be running. Then disk and network are 
heavy.

Note: decommission ec block will trigger this problem easily, becuase every ec 
internal block are unique. 


  was:
When I speed up the decommission, I found that some datanode's io is busy, then 
I found host's load is very high, and ten thousands data transfer thread are 
running. 
Then I find log like below.
{code}
# 启动线程的日志
2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
# 发送完成的标记
2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
{code}
You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36. 
If datatranfser was not done in 10min(pending timeout + check interval), then 
next datatranfser 

[jira] [Updated] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-14 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16070:
---
Description: 
When I speed up the decommission, I found that some datanode's io is busy, then 
I found host's load is very high, and ten thousands data transfer thread are 
running. 
Then I find log like below.
{code}
# 启动线程的日志
2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
# 发送完成的标记
2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
{code}
You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36. 
If datatranfser was not done in 10min(pending timeout + check interval), then 
next datatranfser for same block will be running. Then disk and network are 
heavy.

Note: decommission ec block will trigger this problem easily, becuase every ec 
internal block are unique. 


  was:
When I speed up the decommission, I found that some datanode's io is busy, then 
I found host's load is very high, and ten thousands data transfer thread are 
running. 
Then I find log like below.
{code}
# 启动线程的日志
2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
# 发送完成的标记
2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
{code}
You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36. 
If datatranfser was not done in 10min(pending timeout + check interval), then 
next datatranfser for same block will be 

[jira] [Commented] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-14 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363358#comment-17363358
 ] 

zhengchenyu commented on HDFS-16070:


[~ayushsaxena][~inigoiri] I have submit a pull request, can you help me review 
this patch?

> DataTransfer block storm when datanode's io is busy.
> 
>
> Key: HDFS-16070
> URL: https://issues.apache.org/jira/browse/HDFS-16070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I speed up the decommission, I found that some datanode's io is busy, 
> then I found host's load is very high, and ten thousands data transfer thread 
> are running. 
> Then I find log like below.
> {code}
> # 启动线程的日志
> 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.52:9866
> 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.31:9866
> 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.16.50:9866
> # 发送完成的标记
> 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.7.52:9866
> 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.16.50:9866
> {code}
> You will see last datatranfser thread was done on 13:54:08, but next 
> datatranfser was start at 13:52:36. 
> If datatranfser was not done in 10min(pending timeout + check interval), then 
> next datatranfser for same block will be running. Then disk and network are 
> heavy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-14 Thread zhengchenyu (Jira)
zhengchenyu created HDFS-16070:
--

 Summary: DataTransfer block storm when datanode's io is busy.
 Key: HDFS-16070
 URL: https://issues.apache.org/jira/browse/HDFS-16070
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.2.1, 3.3.0
Reporter: zhengchenyu


When I speed up the decommission, I found that some datanode's io is busy, then 
I found host's load is very high, and ten thousands data transfer thread are 
running. 
Then I find log like below.
{code}
# 启动线程的日志
2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
# 发送完成的标记
2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
{code}
You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36. 
If datatranfser was not done in 10min(pending timeout + check interval), then 
next datatranfser for same block will be running. Then disk and network are 
heavy.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning

2021-05-26 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-14849:
---
Description: 
colored textWhen the datanode keeping in DECOMMISSION_INPROGRESS status, the EC 
internal block in that datanode will be replicated many times.

// added 2019/09/19
I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes 
simultaneously. 
 !scheduleReconstruction.png! 

 !fsck-file.png! 

  was:
When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal 
block in that datanode will be replicated many times.

// added 2019/09/19
I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes 
simultaneously. 
 !scheduleReconstruction.png! 

 !fsck-file.png! 


> Erasure Coding: the internal block is replicated many times when datanode is 
> decommissioning
> 
>
> Key: HDFS-14849
> URL: https://issues.apache.org/jira/browse/HDFS-14849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Affects Versions: 3.3.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
>  Labels: EC, HDFS, NameNode
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, 
> HDFS-14849.branch-3.1.patch, fsck-file.png, liveBlockIndices.png, 
> scheduleReconstruction.png
>
>
> colored textWhen the datanode keeping in DECOMMISSION_INPROGRESS status, the 
> EC internal block in that datanode will be replicated many times.
> // added 2019/09/19
> I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes 
> simultaneously. 
>  !scheduleReconstruction.png! 
>  !fsck-file.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15294) Federation balance tool

2021-05-19 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347493#comment-17347493
 ] 

zhengchenyu commented on HDFS-15294:


Thanks for this great work! But I have some question, if source directory be 
writting all the time, is it means Federation balance will never exit? 

In our cluster, we have tool like this. We use "distcp diff snapshot" firstly, 
but gave up it. Then I use multi dest nameservice mountable, write to the dst 
nameservice. Then copy the source data to dst. Then I have only one issue: keep 
data consistent , so I submit HDFS-15750.


> Federation balance tool
> ---
>
> Key: HDFS-15294
> URL: https://issues.apache.org/jira/browse/HDFS-15294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch, 
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch, 
> HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch, 
> HDFS-15294.007.patch, distcp-balance.pdf, distcp-balance.v2.pdf
>
>
> This jira introduces a new HDFS federation balance tool to balance data 
> across different federation namespaces. It uses Distcp to copy data from the 
> source path to the target path.
> The process is:
>  1. Use distcp and snapshot diff to sync data between src and dst until they 
> are the same.
>  2. Update mount table in Router if we specified RBF mode.
>  3. Deal with src data, move to trash, delete or skip them.
> The design of fedbalance tool comes from the discussion in HDFS-15087.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-03-25 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15715:
---
Attachment: (was: image-2021-02-25-14-41-49-394.png)

> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, 
> HDFS-15715.002.patch.addendum, image-2021-03-26-12-17-45-500.png
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-03-25 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290727#comment-17290727
 ] 

zhengchenyu edited comment on HDFS-15715 at 3/26/21, 4:17 AM:
--

[~hexiaoqiao]

Yeah, no problem. 

Note: I found this problem in cluster which version is hadoop-2.7.3, but all 
version may trigger this bug. So I submit a patch base on trunk.

1. How to found it ?

Due to limited length and my poor English, I will describe the analysis 
procedure simply.

(a) demmission is very slow

  !image-2021-03-26-12-17-45-500.png! 
When do datanode demission,  UnderReplicatedBlocks keep high, 
PendingDeletionBlocks decline slowly. We could speculate replicationMonitor is 
heavy. 

 

(b) strange Log from NameNode

We could guess that some code in choosTarget may not be rational. 
{code:java}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy

{code}
 

(c) many stack info statistical

By many stack info statistical, I Found hot code in below jstack 
{code:java}
"org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@64a8c844"
 #34 daemon prio=5 os_prio=0 tid=0x7f772e03a800 nid=0x6288f runnable 
[0x7f4507c0f000]
 java.lang.Thread.State: RUNNABLE
 at 
org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296)
 at 
org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296)
 at org.apache.hadoop.net.NetworkTopology.getNode(NetworkTopology.java:556)
 at 
org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:808)
 at 
org.apache.hadoop.net.NetworkTopologyWithMultiDC.countNumOfAvailableNodes(NetworkTopologyWithMultiDC.java:259)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseRandom(BlockPlacementPolicyDefaultWithMultiDC.java:803)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:473)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:300)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:177)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWorkWithMultiDC.chooseTargets(BlockManager.java:4448)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocksWithMultiDC(BlockManager.java:1740)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1419)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4341)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4293)
 at java.lang.Thread.run(Thread.java:748)

{code}
(d) continue to enable debug log

After enable some debug log, print "is not chosen since the rack has too many 
chosen nodes" frequently. And the total number of this log are close to 
cluster's DataNodeStorage number. We could guess hit rate of  choosTagert is 
very slow. 

Then I use unit-test to reproduce this problem. 

 

2. How to repair this problem ?

I have reproduced this case in trunk branch.

I submit HDFS-15715.002.patch, in this patch, I doesn't repair the bug, The 
TestReplicationPolicyWithMultiStorage could reproduce the bug. When this bug 
was triggered, will print many logs like "is not chosen since the rack has too 
many chosen nodes."

Then apply HDFS-15715.002.patch.addendum, this bug fix. The 
UnderReplicatedBlocks decline normally.

 

 


was (Author: zhengchenyu):
[~hexiaoqiao]

Yeah, no problem. 

Note: I found this problem in cluster which version is hadoop-2.7.3, but all 
version may trigger this bug. So I submit a patch base on trunk.

1. How to found it ?

Due to limited length and my poor English, I will describe the analysis 
procedure simply.

(a) demmission is very slow

 


[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-02-24 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290727#comment-17290727
 ] 

zhengchenyu edited comment on HDFS-15715 at 2/25/21, 6:55 AM:
--

[~hexiaoqiao]

Yeah, no problem. 

Note: I found this problem in cluster which version is hadoop-2.7.3, but all 
version may trigger this bug. So I submit a patch base on trunk.

1. How to found it ?

Due to limited length and my poor English, I will describe the analysis 
procedure simply.

(a) demmission is very slow

 

!image-2021-02-25-14-41-49-394.png|width=378,height=155!

When do datanode demission,  UnderReplicatedBlocks keep high, 
PendingDeletionBlocks decline slowly. We could speculate replicationMonitor is 
heavy. 

 

(b) strange Log from NameNode

We could guess that some code in choosTarget may not be rational. 
{code:java}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy

{code}
 

(c) many stack info statistical

By many stack info statistical, I Found hot code in below jstack 
{code:java}
"org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@64a8c844"
 #34 daemon prio=5 os_prio=0 tid=0x7f772e03a800 nid=0x6288f runnable 
[0x7f4507c0f000]
 java.lang.Thread.State: RUNNABLE
 at 
org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296)
 at 
org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296)
 at org.apache.hadoop.net.NetworkTopology.getNode(NetworkTopology.java:556)
 at 
org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:808)
 at 
org.apache.hadoop.net.NetworkTopologyWithMultiDC.countNumOfAvailableNodes(NetworkTopologyWithMultiDC.java:259)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseRandom(BlockPlacementPolicyDefaultWithMultiDC.java:803)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:473)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:300)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:177)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWorkWithMultiDC.chooseTargets(BlockManager.java:4448)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocksWithMultiDC(BlockManager.java:1740)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1419)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4341)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4293)
 at java.lang.Thread.run(Thread.java:748)

{code}
(d) continue to enable debug log

After enable some debug log, print "is not chosen since the rack has too many 
chosen nodes" frequently. And the total number of this log are close to 
cluster's DataNodeStorage number. We could guess hit rate of  choosTagert is 
very slow. 

Then I use unit-test to reproduce this problem. 

 

2. How to repair this problem ?

I have reproduced this case in trunk branch.

I submit HDFS-15715.002.patch, in this patch, I doesn't repair the bug, The 
TestReplicationPolicyWithMultiStorage could reproduce the bug. When this bug 
was triggered, will print many logs like "is not chosen since the rack has too 
many chosen nodes."

Then apply HDFS-15715.002.patch.addendum, this bug fix. The 
UnderReplicatedBlocks decline normally.

 

 


was (Author: zhengchenyu):
[~hexiaoqiao]

Yeah, no problem. 

Note: I found this problem in cluster which version is hadoop-2.7.3, but all 
version may trigger this bug. So I submit a patch base on trunk.

1. How to found it ?

Due to limited length and my poor English, I will describe the analysis 
procedure simply.

(a) demmission is very slow

 


[jira] [Commented] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-02-24 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290727#comment-17290727
 ] 

zhengchenyu commented on HDFS-15715:


[~hexiaoqiao]

Yeah, no problem. 

Note: I found this problem in cluster which version is hadoop-2.7.3, but all 
version may trigger this bug. So I submit a patch base on trunk.

1. How to found it ?

Due to limited length and my poor English, I will describe the analysis 
procedure simply.

(a) demmission is very slow

 

!image-2021-02-25-14-41-49-394.png|width=378,height=155!

When do datanode demission,  UnderReplicatedBlocks keep high, 
PendingDeletionBlocks decline slowly. We could speculate replicationMonitor is 
heavy. 

 

(b) strange Log from NameNode

We could guess that some code in choosTarget may not be rational. 

{code}

2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy

{code}

 

(c) many stack info statistical

By many stack info statistical, I Found hot code in below jstack 

{code}

2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy

```

既然是性能问题,那我们尽可能多收集jstack信息。通过分析多组线程栈调用信息,我们发现如下调用栈异常增多:

```
"org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@64a8c844"
 #34 daemon prio=5 os_prio=0 tid=0x7f772e03a800 nid=0x6288f runnable 
[0x7f4507c0f000]
 java.lang.Thread.State: RUNNABLE
 at 
org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296)
 at 
org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296)
 at org.apache.hadoop.net.NetworkTopology.getNode(NetworkTopology.java:556)
 at 
org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:808)
 at 
org.apache.hadoop.net.NetworkTopologyWithMultiDC.countNumOfAvailableNodes(NetworkTopologyWithMultiDC.java:259)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseRandom(BlockPlacementPolicyDefaultWithMultiDC.java:803)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:473)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:300)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:177)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWorkWithMultiDC.chooseTargets(BlockManager.java:4448)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocksWithMultiDC(BlockManager.java:1740)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1419)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4341)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4293)
 at java.lang.Thread.run(Thread.java:748)

{code}

(d) continue to enable debug log

After enable some debug log, print "is not chosen since the rack has too many 
chosen nodes" frequently. And the total number of this log are close to 
cluster's DataNodeStorage number. We could 

[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-02-24 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15715:
---
Attachment: image-2021-02-25-14-41-49-394.png

> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, 
> HDFS-15715.002.patch.addendum, image-2021-02-25-14-41-49-394.png
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-02-24 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289885#comment-17289885
 ] 

zhengchenyu commented on HDFS-15715:


I think it's a critical bug if trigger. We encounter several times. 

[~hexiaoqiao] I think this issue is similar with HDFS-1045 which you submitted 
before. 

[~ayushsaxena] [~goirix] [~hexiaoqiao] Can you help me to review this path, or 
give me some suggesttion? 

> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, 
> HDFS-15715.002.patch.addendum
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-02-24 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289881#comment-17289881
 ] 

zhengchenyu commented on HDFS-15715:


I reconstruct the code, the patch has two part:

(1) Reconstruct the BlockStoragePolicy

   some methods of BlockStoragePolicy is only used in hadoop-hdfs module. But 
BlockStoragePolicy are in hadoop-hdfs-client module. So I moved some methods to 
BlockStoragePolicyUtils which is create in hadoop-hdfs.

   These code are in HDFS-15715.002.patch

(2)  Fix the code so that the variable 'chosen' in chooseTarget will be removed 
when the block's real  BlockStoragePolicy is not mathed with the expected 
StorageTypes。

  These code are in HDFS-15715.002.patch.addendum

 

I fix the unit-test. If without HDFS-15715.002.patch.addendum, the expected log 
"is not chosen since the rack has too many chosen nodes." print, and will 
traverse all datanode. 

After apply HDFS-15715.002.patch.addendum, the uni-test will chooseTarget 
normally, no excess traversal.

 

 

> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, 
> HDFS-15715.002.patch.addendum
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, 

[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-02-24 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15715:
---
Attachment: HDFS-15715.002.patch

> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, 
> HDFS-15715.002.patch.addendum
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-02-24 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15715:
---
Attachment: HDFS-15715.002.patch.addendum

> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, 
> HDFS-15715.002.patch.addendum
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2021-02-22 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15750:
---
Attachment: HDFS-15750.001.patch

> RBF: Make sure the multi destination are consistent after write operation 
> --
>
> Key: HDFS-15750
> URL: https://issues.apache.org/jira/browse/HDFS-15750
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-15750.001.patch
>
>
> Nowdays, RBF can't make sure the multi destination are consistent. 
> Case 1: RBF can't remove multi destination's file. 
> If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if 
> both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I 
> want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, 
> then only one nameservice take effect. I think it means inconsistence. 
> Through HDFS-14343 already solve the problem in some level, but not 
> completed. 
> Case 2: RBF regard the operation success, through only one of multi 
> destination operations success.
> In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
> directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
> hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
> destination's result be success, the rbf regard the operation success 
> (invokeConcurrent and invokeAll's logical). We maybe can't rename all 
> location. I think it also means inconsistence.
> I think we need stricter check. If one operation (which shoud success) failed 
> , we should throw exception.
> Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But 
> when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and 
> rewrite some hive table's old partition (which mount multi destination), this 
> problem would occure!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2021-02-22 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15750:
---
Attachment: (was: HDFS-15750.001.patch)

> RBF: Make sure the multi destination are consistent after write operation 
> --
>
> Key: HDFS-15750
> URL: https://issues.apache.org/jira/browse/HDFS-15750
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-15750.001.patch
>
>
> Nowdays, RBF can't make sure the multi destination are consistent. 
> Case 1: RBF can't remove multi destination's file. 
> If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if 
> both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I 
> want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, 
> then only one nameservice take effect. I think it means inconsistence. 
> Through HDFS-14343 already solve the problem in some level, but not 
> completed. 
> Case 2: RBF regard the operation success, through only one of multi 
> destination operations success.
> In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
> directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
> hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
> destination's result be success, the rbf regard the operation success 
> (invokeConcurrent and invokeAll's logical). We maybe can't rename all 
> location. I think it also means inconsistence.
> I think we need stricter check. If one operation (which shoud success) failed 
> , we should throw exception.
> Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But 
> when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and 
> rewrite some hive table's old partition (which mount multi destination), this 
> problem would occure!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2021-02-22 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15750:
---
Attachment: HDFS-15750.001.patch

> RBF: Make sure the multi destination are consistent after write operation 
> --
>
> Key: HDFS-15750
> URL: https://issues.apache.org/jira/browse/HDFS-15750
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-15750.001.patch
>
>
> Nowdays, RBF can't make sure the multi destination are consistent. 
> Case 1: RBF can't remove multi destination's file. 
> If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if 
> both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I 
> want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, 
> then only one nameservice take effect. I think it means inconsistence. 
> Through HDFS-14343 already solve the problem in some level, but not 
> completed. 
> Case 2: RBF regard the operation success, through only one of multi 
> destination operations success.
> In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
> directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
> hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
> destination's result be success, the rbf regard the operation success 
> (invokeConcurrent and invokeAll's logical). We maybe can't rename all 
> location. I think it also means inconsistence.
> I think we need stricter check. If one operation (which shoud success) failed 
> , we should throw exception.
> Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But 
> when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and 
> rewrite some hive table's old partition (which mount multi destination), this 
> problem would occure!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2021-02-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288350#comment-17288350
 ] 

zhengchenyu commented on HDFS-15750:


In our test cluster, I modify our code, we make RequireResponse in location 
level. I set location level's RequireResponse by DestinationOrder. 

I think for some strict DestinationOrder which requireResponse is set to false. 
If only one namservice operation failed, the operation will throw exception. 
Then we cant make sure consistent between mulit cluster.

I submit first version path. Please give me some sugesstion.

 

> RBF: Make sure the multi destination are consistent after write operation 
> --
>
> Key: HDFS-15750
> URL: https://issues.apache.org/jira/browse/HDFS-15750
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-15750.001.patch
>
>
> Nowdays, RBF can't make sure the multi destination are consistent. 
> Case 1: RBF can't remove multi destination's file. 
> If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if 
> both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I 
> want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, 
> then only one nameservice take effect. I think it means inconsistence. 
> Through HDFS-14343 already solve the problem in some level, but not 
> completed. 
> Case 2: RBF regard the operation success, through only one of multi 
> destination operations success.
> In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
> directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
> hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
> destination's result be success, the rbf regard the operation success 
> (invokeConcurrent and invokeAll's logical). We maybe can't rename all 
> location. I think it also means inconsistence.
> I think we need stricter check. If one operation (which shoud success) failed 
> , we should throw exception.
> Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But 
> when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and 
> rewrite some hive table's old partition (which mount multi destination), this 
> problem would occure!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2021-02-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288313#comment-17288313
 ] 

zhengchenyu commented on HDFS-15750:


[~ayushsaxena]

I think we need a stricter check! Let's consider this situation.

If we have an hdfs cluster which router proxy ns1, and ns2. 

If ns1 is heavy, then we wanna copy data "hdfs://ns1/user/userA" from ns1 to 
ns2 "hdfs://ns2/user/userA" to lower the pressure. And we wanna users are 
unawre of data migration.

Here I have below solution: 

A mountable hdfs://ns-fed/user/userA mounts hdfs://ns1/user/userA and 
hdfs://ns2/user/userA. I set a prior nameservice, which use firstly.

At first prior  ns is ns1 , hdfs client only use hdfs://ns1/user/userA through 
router firstly. Then we copy all data to hdfs://ns2/user/userA.

When all data is copied, we switch the proxy, the prior is ns2, then hdfs 
client use hdfs://ns2/user/userA through router。

But in our cluster, many hive table partition are rerunned. So need to delete 
hdfs://ns-fed/user/userA/tableA/pt=XXX/XXX. Because copy operation may happen 
before rewrite hive table, so some hdfs file are not deleted in router view. 
The real reason is we regard success if only on rename operation succeed. 

> RBF: Make sure the multi destination are consistent after write operation 
> --
>
> Key: HDFS-15750
> URL: https://issues.apache.org/jira/browse/HDFS-15750
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>
> Nowdays, RBF can't make sure the multi destination are consistent. 
> Case 1: RBF can't remove multi destination's file. 
> If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if 
> both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I 
> want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, 
> then only one nameservice take effect. I think it means inconsistence. 
> Through HDFS-14343 already solve the problem in some level, but not 
> completed. 
> Case 2: RBF regard the operation success, through only one of multi 
> destination operations success.
> In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
> directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
> hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
> destination's result be success, the rbf regard the operation success 
> (invokeConcurrent and invokeAll's logical). We maybe can't rename all 
> location. I think it also means inconsistence.
> I think we need stricter check. If one operation (which shoud success) failed 
> , we should throw exception.
> Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But 
> when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and 
> rewrite some hive table's old partition (which mount multi destination), this 
> problem would occure!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2021-02-22 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu reassigned HDFS-15750:
--

Assignee: zhengchenyu

> RBF: Make sure the multi destination are consistent after write operation 
> --
>
> Key: HDFS-15750
> URL: https://issues.apache.org/jira/browse/HDFS-15750
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>
> Nowdays, RBF can't make sure the multi destination are consistent. 
> Case 1: RBF can't remove multi destination's file. 
> If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if 
> both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I 
> want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, 
> then only one nameservice take effect. I think it means inconsistence. 
> Through HDFS-14343 already solve the problem in some level, but not 
> completed. 
> Case 2: RBF regard the operation success, through only one of multi 
> destination operations success.
> In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
> directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
> hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
> destination's result be success, the rbf regard the operation success 
> (invokeConcurrent and invokeAll's logical). We maybe can't rename all 
> location. I think it also means inconsistence.
> I think we need stricter check. If one operation (which shoud success) failed 
> , we should throw exception.
> Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But 
> when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and 
> rewrite some hive table's old partition (which mount multi destination), this 
> problem would occure!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters

2021-02-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288235#comment-17288235
 ] 

zhengchenyu commented on HDFS-14343:


[~ayushtkn] Maybe I think we should discuss in another issue HDFS-15750.

I known there is no problem if all hdfs client visit namenode through router. 
But I think there are some situation that visit namnode without router. 

For example, we migrating data from one nameservice to another nameservice in 
background, and two nameservie are managed by same router, we migration data to 
lower the pressure of source namenode.


> RBF: Fix renaming folders spread across multiple subclusters
> 
>
> Key: HDFS-14343
> URL: https://issues.apache.org/jira/browse/HDFS-14343
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, HDFS-13891
>
> Attachments: HDFS-14343-HDFS-13891-01.patch, 
> HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, 
> HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch
>
>
> The {{RouterClientProtocol#rename()}} function assumes that we are renaming 
> files and only renames one of them (i.e., {{invokeSequential()}}). In the 
> case of folders which are in all subclusters (e.g., HASH_ALL) we should 
> rename all locations (i.e., {{invokeAll()}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2020-12-24 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15750:
---
Description: 
Nowdays, RBF can't make sure the multi destination are consistent. 

Case 1: RBF can't remove multi destination's file. 
If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence. Through HDFS-14343 
already solve the problem in some level, but not completed. 


Case 2: RBF regard the operation success, through only one of multi destination 
operations success.
In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
destination's result be success, the rbf regard the operation success 
(invokeConcurrent and invokeAll's logical). We maybe can't rename all location. 
I think it also means inconsistence.

I think we need stricter check. If one operation (which shoud success) failed , 
we should throw exception.

Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But 
when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and 
rewrite some hive table's old partition (which mount multi destination), this 
problem would occure!


  was:
Nowdays, RBF can't make sure the multi destination are consistent. 

Case 1: RBF can't remove multi destination's file. 
If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence. Through HDFS-14343 
already solve the problem in some level, but not completed. 
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2) , and rewrite some hive table's old partition, this problem would 
occure!)

Case 2: RBF regard the operation success, through only one of multi destination 
operations success.
In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
destination's result be success, the rbf regard the operation success 
(invokeConcurrent and invokeAll's logical). We maybe can't rename all location. 
I think it also means inconsistence.

I think we need stricter check. If one operation (which shoud success) failed , 
we should throw exception.


> RBF: Make sure the multi destination are consistent after write operation 
> --
>
> Key: HDFS-15750
> URL: https://issues.apache.org/jira/browse/HDFS-15750
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Priority: Major
>
> Nowdays, RBF can't make sure the multi destination are consistent. 
> Case 1: RBF can't remove multi destination's file. 
> If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if 
> both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I 
> want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, 
> then only one nameservice take effect. I think it means inconsistence. 
> Through HDFS-14343 already solve the problem in some level, but not 
> completed. 
> Case 2: RBF regard the operation success, through only one of multi 
> destination operations success.
> In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
> directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
> hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
> destination's result be success, the rbf regard the operation success 
> (invokeConcurrent and invokeAll's logical). We maybe can't rename all 
> location. I think it also means inconsistence.
> I think we need stricter check. If one operation (which shoud success) failed 
> , we should throw exception.
> Note: In fact,If we only use hdfs://ns-fed, I think there is no problem.But 
> when migration data from one ns1 to ns2 (a mountable mouts ns1 and ns2) , and 
> rewrite some hive table's old partition (which mount multi destination), this 
> problem would occure!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org


[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2020-12-24 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15750:
---
Description: 
Nowdays, RBF can't make sure the multi destination are consistent. 

Case 1: RBF can't remove multi destination's file. 
If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence. Through HDFS-14343 
already solve the problem in some level, but not completed. 
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2) , and rewrite some hive table's old partition, this problem would 
occure!)

Case 2: RBF regard the operation success, through only one of multi destination 
operations success.
In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
destination's result be success, the rbf regard the operation success 
(invokeConcurrent and invokeAll's logical). We maybe can't rename all location. 
I think it also means inconsistence.

I think we need stricter check. If one operation (which shoud success) failed , 
we should throw exception.

  was:
Nowdays, RBF can't make sure the multi destination are consistent. 

Case 1: RBF can't remove multi  destination. 
If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence. Through HDFS-14343 
already solve the problem in some level, but not Completed. 
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2) , and rewrite some hive table's old partition, this problem would 
occure!)

Case 2: 
In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down.If one 
result . We maybe can't rename all location. I think it also means 
inconsistence.

I think we need stricter check. If one operation (which shoud success) failed , 
we should throw exception.






> RBF: Make sure the multi destination are consistent after write operation 
> --
>
> Key: HDFS-15750
> URL: https://issues.apache.org/jira/browse/HDFS-15750
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Priority: Major
>
> Nowdays, RBF can't make sure the multi destination are consistent. 
> Case 1: RBF can't remove multi destination's file. 
> If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if 
> both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I 
> want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, 
> then only one nameservice take effect. I think it means inconsistence. 
> Through HDFS-14343 already solve the problem in some level, but not 
> completed. 
> (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts 
> ns1 and ns2) , and rewrite some hive table's old partition, this problem 
> would occure!)
> Case 2: RBF regard the operation success, through only one of multi 
> destination operations success.
> In other way, if we wanna delete hdfs://ns-fed/user/userA/dirA (Note: dirA is 
> directroy.) If hdfs://ns1/user/userA/dirA's permission is not same with 
> hdfs://ns2/user/userA/dirA's permission, or one of namenode is down.If one 
> destination's result be success, the rbf regard the operation success 
> (invokeConcurrent and invokeAll's logical). We maybe can't rename all 
> location. I think it also means inconsistence.
> I think we need stricter check. If one operation (which shoud success) failed 
> , we should throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2020-12-24 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15750:
---
Description: 
Nowdays, RBF can't make sure the multi destination are consistent. 

Case 1: RBF can't remove multi  destination. 
If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence. Through HDFS-14343 
already solve the problem in some level, but not Completed. 
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2) , and rewrite some hive table's old partition, this problem would 
occure!)

Case 2: 
In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down.If one 
result . We maybe can't rename all location. I think it also means 
inconsistence.

I think we need stricter check. If one operation (which shoud success) failed , 
we should throw exception.





  was:
Nowdays, RBF can't make sure 

If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence.
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2) , and rewrite some hive table's old partition, this problem would 
occure!)

In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We 
maybe can't rename all location. I think it also means inconsistence.

I think we need stricter check. If one operation (which shoud success) failed , 
we should throw exception.


> RBF: Make sure the multi destination are consistent after write operation 
> --
>
> Key: HDFS-15750
> URL: https://issues.apache.org/jira/browse/HDFS-15750
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Priority: Major
>
> Nowdays, RBF can't make sure the multi destination are consistent. 
> Case 1: RBF can't remove multi  destination. 
> If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if 
> both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I 
> want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, 
> then only one nameservice take effect. I think it means inconsistence. 
> Through HDFS-14343 already solve the problem in some level, but not 
> Completed. 
> (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts 
> ns1 and ns2) , and rewrite some hive table's old partition, this problem 
> would occure!)
> Case 2: 
> In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) 
> If hdfs://ns1/user/userA/dirA's permission is not same with 
> hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down.If one 
> result . We maybe can't rename all location. I think it also means 
> inconsistence.
> I think we need stricter check. If one operation (which shoud success) failed 
> , we should throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2020-12-24 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15750:
---
Description: 
Nowdays, RBF can't make sure 

If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence.
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2) , and rewrite some hive table's old partition, this problem would 
occure!)

In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We 
maybe can't rename all location. I think it also means inconsistence.

I think we need stricter check. If one operation (which shoud success) failed , 
we should throw exception.

> RBF: Make sure the multi destination are consistent after write operation 
> --
>
> Key: HDFS-15750
> URL: https://issues.apache.org/jira/browse/HDFS-15750
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhengchenyu
>Priority: Major
>
> Nowdays, RBF can't make sure 
> If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if 
> both hdfs://ns1/user/userA/a.log and hdfs://ns2/user/userA/a.log exists. I 
> want to remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, 
> then only one nameservice take effect. I think it means inconsistence.
> (Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts 
> ns1 and ns2) , and rewrite some hive table's old partition, this problem 
> would occure!)
> In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) 
> If hdfs://ns1/user/userA/dirA's permission is not same with 
> hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We 
> maybe can't rename all location. I think it also means inconsistence.
> I think we need stricter check. If one operation (which shoud success) failed 
> , we should throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters

2020-12-23 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254391#comment-17254391
 ] 

zhengchenyu commented on HDFS-14343:


[~elgoiri] OK, Let us discuss this issue in HDFS-15750. I will describe the 
detailed later.

> RBF: Fix renaming folders spread across multiple subclusters
> 
>
> Key: HDFS-14343
> URL: https://issues.apache.org/jira/browse/HDFS-14343
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, HDFS-13891
>
> Attachments: HDFS-14343-HDFS-13891-01.patch, 
> HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, 
> HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch
>
>
> The {{RouterClientProtocol#rename()}} function assumes that we are renaming 
> files and only renames one of them (i.e., {{invokeSequential()}}). In the 
> case of folders which are in all subclusters (e.g., HASH_ALL) we should 
> rename all locations (i.e., {{invokeAll()}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15750) RBF: Make sure the multi destination are consistent after write operation

2020-12-23 Thread zhengchenyu (Jira)
zhengchenyu created HDFS-15750:
--

 Summary: RBF: Make sure the multi destination are consistent after 
write operation 
 Key: HDFS-15750
 URL: https://issues.apache.org/jira/browse/HDFS-15750
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: zhengchenyu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters

2020-12-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253343#comment-17253343
 ] 

zhengchenyu edited comment on HDFS-14343 at 12/22/20, 8:56 AM:
---

[~inigoiri]  [~ayushtkn]. 

Hi, I have some question, can you give me some suggestion. In this patch, use 
isMultiDestDirectory to check whether rename all location or not. I think the 
design of MultipleDestinationMountTableResolver may consider there is no 
repeated file among multi nameservice.

If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and  hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence.
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2)  , and rewrite some hive table's old partition, this problem would 
occure!)

In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We 
maybe can't rename all location. I think it also means inconsistence. 

I think we need stricter check. If one operation (which shoud success) failed , 
we should throw exception. 



was (Author: zhengchenyu):
[~inigoiri]  [~ayushtkn]. 

Hi, I have some question, can you give me some suggestion. In this patch, use 
isMultiDestDirectory to check whether rename all location or not. I think the 
design of MultipleDestinationMountTableResolver may consider there is no 
repeated file among multi nameservice.

If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and  hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence.
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2)  , and rewrite some hive table's old partition, this problem would 
occure!)

In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We 
maybe can't rename all location. I think it also means inconsistence. I think 
we need stricter check. If one operation error, we should throw exception. 

 


> RBF: Fix renaming folders spread across multiple subclusters
> 
>
> Key: HDFS-14343
> URL: https://issues.apache.org/jira/browse/HDFS-14343
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, HDFS-13891
>
> Attachments: HDFS-14343-HDFS-13891-01.patch, 
> HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, 
> HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch
>
>
> The {{RouterClientProtocol#rename()}} function assumes that we are renaming 
> files and only renames one of them (i.e., {{invokeSequential()}}). In the 
> case of folders which are in all subclusters (e.g., HASH_ALL) we should 
> rename all locations (i.e., {{invokeAll()}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters

2020-12-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253343#comment-17253343
 ] 

zhengchenyu edited comment on HDFS-14343 at 12/22/20, 8:53 AM:
---

[~inigoiri]  [~ayushtkn]. 

Hi, I have some question, can you give me some suggestion. In this patch, use 
isMultiDestDirectory to check whether rename all location or not. I think the 
design of MultipleDestinationMountTableResolver may consider there is no 
repeated file among multi nameservice.

If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and  hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence.
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2)  , and rewrite some hive table's old partition, this problem would 
occure!)

In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We 
maybe can't rename all location. I think it also means inconsistence. I think 
we need stricter check. If one operation error, we should throw exception. 

 



was (Author: zhengchenyu):
[~inigoiri][~ayushtkn]. 

Hi, I have some question, can you give me some suggestion. In this patch, use 
isMultiDestDirectory to check whether rename all location or not. I think the 
design of MultipleDestinationMountTableResolver may consider there is no 
repeated file among multi nameservice.

If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and  hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence.
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2)  , and rewrite some hive table's old partition, this problem would 
occure!)

In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We 
maybe can't rename all location. I think it also means inconsistence. I think 
we need stricter check. If one operation error, we should throw exception. 

 


> RBF: Fix renaming folders spread across multiple subclusters
> 
>
> Key: HDFS-14343
> URL: https://issues.apache.org/jira/browse/HDFS-14343
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, HDFS-13891
>
> Attachments: HDFS-14343-HDFS-13891-01.patch, 
> HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, 
> HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch
>
>
> The {{RouterClientProtocol#rename()}} function assumes that we are renaming 
> files and only renames one of them (i.e., {{invokeSequential()}}). In the 
> case of folders which are in all subclusters (e.g., HASH_ALL) we should 
> rename all locations (i.e., {{invokeAll()}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters

2020-12-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253343#comment-17253343
 ] 

zhengchenyu edited comment on HDFS-14343 at 12/22/20, 8:52 AM:
---

[~inigoiri][~ayushtkn]. 

Hi, I have some question, can you give me some suggestion. In this patch, use 
isMultiDestDirectory to check whether rename all location or not. I think the 
design of MultipleDestinationMountTableResolver may consider there is no 
repeated file among multi nameservice.

If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and  hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence.
(Note: In fact, when migration data from one ns1 to ns2 (a mountable mouts ns1 
and ns2)  , and rewrite some hive table's old partition, this problem would 
occure!)

In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We 
maybe can't rename all location. I think it also means inconsistence. I think 
we need stricter check. If one operation error, we should throw exception. 

 



was (Author: zhengchenyu):
[~inigoiri][~ayushtkn]. 

Hi, I have some question, can you give me some suggestion. In this patch, use 
isMultiDestDirectory to check whether rename all location or not. I think the 
design of MultipleDestinationMountTableResolver may consider there is no 
repeated file among multi nameservice.

If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and  hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence.
(Note: In fact, when migration data from one nameservice to nameservice, and 
rewrite some hive table's old partition, this problem would occure!)

In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We 
maybe can't rename all location. I think it also means inconsistence. I think 
we need stricter check. If one operation error, we should throw exception. 

 


> RBF: Fix renaming folders spread across multiple subclusters
> 
>
> Key: HDFS-14343
> URL: https://issues.apache.org/jira/browse/HDFS-14343
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, HDFS-13891
>
> Attachments: HDFS-14343-HDFS-13891-01.patch, 
> HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, 
> HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch
>
>
> The {{RouterClientProtocol#rename()}} function assumes that we are renaming 
> files and only renames one of them (i.e., {{invokeSequential()}}). In the 
> case of folders which are in all subclusters (e.g., HASH_ALL) we should 
> rename all locations (i.e., {{invokeAll()}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14343) RBF: Fix renaming folders spread across multiple subclusters

2020-12-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253343#comment-17253343
 ] 

zhengchenyu commented on HDFS-14343:


[~inigoiri][~ayushtkn]. 

Hi, I have some question, can you give me some suggestion. In this patch, use 
isMultiDestDirectory to check whether rename all location or not. I think the 
design of MultipleDestinationMountTableResolver may consider there is no 
repeated file among multi nameservice.

If /user/userA is mountable which mounts two nameservice: ns1, ns2. But if both 
hdfs://ns1/user/userA/a.log and  hdfs://ns2/user/userA/a.log exists. I want to 
remove hdfs://ns-fed/user/userA/a.log (Note: a.log is file) to trash, then only 
one nameservice take effect. I think it means inconsistence.
(Note: In fact, when migration data from one nameservice to nameservice, and 
rewrite some hive table's old partition, this problem would occure!)

In other way, if we hdfs://ns-fed/user/userA/dirA (Note: dirA is directroy.) If 
hdfs://ns1/user/userA/dirA's permission is not same with 
hdfs://ns2/user/userA/dirA's permission. Or one of nameservice is down. We 
maybe can't rename all location. I think it also means inconsistence. I think 
we need stricter check. If one operation error, we should throw exception. 

 


> RBF: Fix renaming folders spread across multiple subclusters
> 
>
> Key: HDFS-14343
> URL: https://issues.apache.org/jira/browse/HDFS-14343
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, HDFS-13891
>
> Attachments: HDFS-14343-HDFS-13891-01.patch, 
> HDFS-14343-HDFS-13891-02.patch, HDFS-14343-HDFS-13891-03.patch, 
> HDFS-14343-HDFS-13891-04.patch, HDFS-14343-HDFS-13891-05.patch
>
>
> The {{RouterClientProtocol#rename()}} function assumes that we are renaming 
> files and only renames one of them (i.e., {{invokeSequential()}}). In the 
> case of folders which are in all subclusters (e.g., HASH_ALL) we should 
> rename all locations (i.e., {{invokeAll()}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2020-12-13 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248729#comment-17248729
 ] 

zhengchenyu edited comment on HDFS-15715 at 12/14/20, 3:52 AM:
---

I solve this problem, and run on one cluster in near one week. But our version 
is hadoop-2.7.3. Because after hadoop-2.8, moudle hadoop-hdfs was split with 
multi maven moudle. So I submited a ugly patch, this is not final version. I 
only wanna show how to slove this problem. I submited HDFS-15715.001.patch.

I think there will be two way to solve this problem:

(1) recode chooseStorageTypes, and remove the result which is not meet storage 
policy demand from results.
(2) remove the result which is not meet storage policy demand from results, 
after chooseStorageTypes.

I choose first way, becuase I thinks it save calculation. To label it, i use a 
new method 'chooseStorageTypesWIthNode'. But a little ugly, maybe we need to 
reorganize the code. 





was (Author: zhengchenyu):
I solve this problem, and run on one cluster in near one week. But our version 
is hadoop-2.7.3. Because after hadoop-2.8, moudle hadoop-hdfs was split with 
multi maven moudle. So I submited a ugly patch, this is not final version. I 
only wanna show how to slove this problem. I submited HDFS-15715.001.patch.

I think there will be two way to solve this problem:

(1) recode chooseStorageTypes, and remove the result which is not meet storage 
policy demand from results.
(2) remove the result which is not meet storage policy demand from results, 
after chooseStorageTypes.

I choose first way, becuase I thinks it save calculation. To label it, i use a 
new method 'chooseStorageTypesWIth Node'. But a little ugly, maybe we need to 
reorganize the code. 




> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: HDFS-15715.001.patch
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means 

[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2020-12-13 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248729#comment-17248729
 ] 

zhengchenyu edited comment on HDFS-15715 at 12/14/20, 3:52 AM:
---

I solve this problem, and run on one cluster in near one week. But our version 
is hadoop-2.7.3. Because after hadoop-2.8, moudle hadoop-hdfs was split with 
multi maven moudle. So I submited a ugly patch, this is not final version. I 
only wanna show how to slove this problem. I submited HDFS-15715.001.patch.

I think there will be two way to solve this problem:

(1) recode chooseStorageTypes, and remove the result which is not meet storage 
policy demand from results.
(2) remove the result which is not meet storage policy demand from results, 
after chooseStorageTypes.

I choose first way, becuase I thinks it save calculation. To label it, i use a 
new method 'chooseStorageTypesWIth Node'. But a little ugly, maybe we need to 
reorganize the code. 





was (Author: zhengchenyu):
I solve this problem, and run on one cluster in near one week. But our version 
is hadoop-2.7.3. Because after hadoop-2.8, moudle hadoop-hdfs was split with 
multi maven moudle. So I submited a ugly patch, this is not final version. I 
only wanna show how to slove this problem. I submited HDFS-15715.001.patch.

I think there will be two way to solve this problem:

(1) recode chooseStorageTypes, and remove the result which is not meet storage 
policy demand from results.
(2) remove the result which is not meet storage policy demand from results, 
after chooseStorageTypes.

I choose first way, becuase I thinks it save calculation. But a little ugly, 
maybe we need to reorganize the code. 




> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: HDFS-15715.001.patch
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. 

[jira] [Commented] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2020-12-13 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248729#comment-17248729
 ] 

zhengchenyu commented on HDFS-15715:


I solve this problem, and run on one cluster in near one week. But our version 
is hadoop-2.7.3. Because after hadoop-2.8, moudle hadoop-hdfs was split with 
multi maven moudle. So I submited a ugly patch, this is not final version. I 
only wanna show how to slove this problem. I submited HDFS-15715.001.patch.

I think there will be two way to solve this problem:

(1) recode chooseStorageTypes, and remove the result which is not meet storage 
policy demand from results.
(2) remove the result which is not meet storage policy demand from results, 
after chooseStorageTypes.

I choose first way, becuase I thinks it save calculation. But a little ugly, 
maybe we need to reorganize the code. 




> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: HDFS-15715.001.patch
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2020-12-13 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15715:
---
Attachment: HDFS-15715.001.patch

> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: HDFS-15715.001.patch
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2020-12-07 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15715:
---
Summary: ReplicatorMonitor performance degrades, when the storagePolicy of 
many file are not match with their real datanodestorage   (was: 
ReplicatorMonitor performance degradation, when the storagePolicy of many file 
are not match with their real datanodestorage )

> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degradation, when the storagePolicy of many file are not match with their real datanodestorage

2020-12-07 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15715:
---
Description: 
One of our Namenode which has 300M files and blocks. In common way, this namode 
shoud not be in heavy load. But we found rpc process time keep high, and 
decommission is very slow.
 
I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget 
can't find block, so result to performance degradation. Consider with 
HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget 
can't find proper block.

Then I enable some debug. (Of course I revise some code so that only debug 
isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I 
found "the rack has too many chosen nodes" is called. Then I found some log 
like this 

{code}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
{code} 

Then through some debug and simulation, I found the reason, and reproduction 
this exception.

The reason is that some developer use COLD storage policy and mover, but the 
operatiosn of setting storage policy and mover are asynchronous. So some file's 
real  datanodestorages are not match with this storagePolicy.

Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
are DISK. Then set storage policy to COLD. When some logical trigger(For 
example decommission) to copy this block. chooseTarget then use 
chooseStorageTypes to filter real needed block. Here the size of variable 
requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
storage. Then will request to choose 3 target. choose first target is right, 
but when choose seconde target, the variable 'counter' is 4 which is larger 
than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
datanodestorage. Then result to bad performance.

I think chooseStorageTypes need to consider the result, when the exist 
replication doesn't meet storage policy's demand, we need to remove this from 
result. 

I changed by this way, and test in my unit-test. Then solve it.



  was:
One of our Namenode which has 300M files and blocks. In common way, this namode 
shoud not be in heavy load. But we found rpc process time keep high, and 
decommission is very slow.
 
I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget 
can't find block, so result to performance degradation. Consider with 
HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget 
can't find proper block.

Then I enable some debug. (Of course I revise some code so that only debug 
isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I 
found "the rack has too many chosen nodes" is called. Then I found some log 
like this 

{code}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
{code} 

Then through some debug and simulation, I found the reason, and reproduction 
this exception.

The reason is that some developer use COLD storage policy and mover, but the 

[jira] [Created] (HDFS-15715) ReplicatorMonitor performance degradation, when the storagePolicy of many file are not match with their real datanodestorage

2020-12-07 Thread zhengchenyu (Jira)
zhengchenyu created HDFS-15715:
--

 Summary: ReplicatorMonitor performance degradation, when the 
storagePolicy of many file are not match with their real datanodestorage 
 Key: HDFS-15715
 URL: https://issues.apache.org/jira/browse/HDFS-15715
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.2.1, 2.7.3
Reporter: zhengchenyu
Assignee: zhengchenyu
 Fix For: 3.3.1


One of our Namenode which has 300M files and blocks. In common way, this namode 
shoud not be in heavy load. But we found rpc process time keep high, and 
decommission is very slow.
 
I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe chooseTarget 
can't find block, so result to performance degradation. Consider with 
HDFS-10453, I guess maybe some logical trigger to the scene where chooseTarget 
can't find proper block.

Then I enable some debug. (Of course I revise some code so that only debug 
isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). I 
found "the rack has too many chosen nodes" is called. Then I found some log 
like this 

{code}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
{code} 

Then through some debug and simulation, I found the reason, and reproduction 
this exception.

The reason is that some developer use COLD storage policy and mover, but the 
operatiosn of setting storage policy and mover are asynchronous. So some file's 
real  datastorages are not match with this storagePolicy.

Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
are DISK. Then set storage policy to COLD. When some logical trigger(For 
example decommission) to copy this block. chooseTarget then use 
chooseStorageTypes to filter real needed block. Here the size of variable 
requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
storage. Then will request to choose 3 target. choose first target is right, 
but when choose seconde target, the variable 'counter' is 4 which is larger 
than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
datanodestorage. Then result to bad performance.

I think chooseStorageTypes need to consider the result, when the exist 
replication doesn't meet storage policy's demand, we need to remove this from 
result. 

I changed by this way, and test in my unit-test. Then solve it.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15649) the standby namenode's ReplQueues need to keep pace with active namenode.

2020-10-23 Thread zhengchenyu (Jira)
zhengchenyu created HDFS-15649:
--

 Summary: the standby namenode's ReplQueues need to keep pace with 
active namenode.
 Key: HDFS-15649
 URL: https://issues.apache.org/jira/browse/HDFS-15649
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.2.1, 2.7.3
Reporter: zhengchenyu
 Fix For: 3.3.1


I think the standby namenode's ReplQueues need to keep pace with active 
namenode. You will code in fuction addStoredBlock like below: 
{code}
// do not try to handle extra/low redundancy blocks during first safe mode
if (!isPopulatingReplQueues()) {
  return storedBlock;
}
{code}
Here, for standby namenode, through I think there are no need to tell standby 
to replicate blocks, but need to update neededReconstruction. Because some 
metrics need it. For example, missing blocks.

Why I advise this? In our internal version, some bug trigger huge missing block 
number. In fact, these blocks are not missing, but addStoredBlock doesn't 
update blocks, so keep huge missing block number.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199910#comment-17199910
 ] 

zhengchenyu commented on HDFS-15589:


[~hexiaoqiao]
Yes, in theroy, postponedMisreplicatedBlocks only compat fuction 
'rescanPostponedMisreplicatedBlocks', and it use namesystem's writeLock, then 
may decrease namnode rpc performance. But 
dfs.namenode.blocks.per.postponedblocks.rescan’s default value is 1, so I 
think it may result to little performance.
But let us see some log, some called wast long time.
{code}
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:15,429 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 65 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:18,496 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 64 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:23,958 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 2459 msecs. 19916 blocks 
are left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:27,023 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 60 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:30,088 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 61 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:33,149 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 58 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:20:47,890 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 5140 msecs. 19916 blocks 
are left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:32:36,458 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 110 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:32:39,529 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 70 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:32:42,596 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 66 msecs. 19916 blocks are 
left. 0 blocks were removed.
hadoop-hdfs-namenode-bd-tz-hadoop-001012.ke.com.log.info.9:2020-09-21 
15:32:45,665 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
Rescan of postponedMisreplicatedBlocks completed in 65 msecs. 19916 blocks are 
left. 0 blocks were removed.
{code}
In fact, it found in our test cluster, a very small cluster, can't detect 
performace. But why I pay attention to this problem? My last comanpy, some day 
postponedMisreplicatedBlocks increase huge, then namenode rpc performane 
decrease. Then some hours laster, postponedMisreplicatedBlocks decrease, the 
namenode be well again. At that moment, I focus on yarn, so I didn't research 
the namenode log, and then no real truth. 

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* 

[jira] [Comment Edited] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199847#comment-17199847
 ] 

zhengchenyu edited comment on HDFS-15589 at 9/22/20, 6:24 AM:
--

Yeps, I can solve this problem by trigger block report manually. My means is 
there any need to solve this problem by optimized some logical?

For example make sure new block report which trigger by namenode's heartbeat 
happened after enter active state. 

Because you know when I trigger datanode's block report, means block report 
will occure twice. I thinks there is no need to increase the load to namenode. 
In addition, as I kown, trigger block report manually will block report to all 
namenode, then increase load to all namenode.


was (Author: zhengchenyu):
Yeps, I can solve this problem by trigger block report manually. My means is 
there any need to solve this problem by optimized some logical?

For example make sure new block report which trigger by namenode's heartbeat 
happened after enter active state. 

Because you know when I trigger datanode's block report, means block report 
will occure twice. I thinks there is no need to increase the load to namenode.

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> {code}
> Node: test cluster only have 6 datanode.
> You will see the blockreport called before "Marking all datanodes as stale" 
> which is logged by startActiveServices. But 
> DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then 
> startActiveServices set all datnaode to stale node. So the datanodes will 
> keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a 
> huge number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199847#comment-17199847
 ] 

zhengchenyu edited comment on HDFS-15589 at 9/22/20, 6:20 AM:
--

Yeps, I can solve this problem by trigger block report manually. My means is 
there any need to solve this problem by optimized some logical?

For example make sure new block report which trigger by namenode's heartbeat 
happened after enter active state. 

Because you know when I trigger datanode's block report, means block report 
will occure twice. I thinks there is no need to increase the load to namenode.


was (Author: zhengchenyu):
Yeps, I can solve this problem by trigger block report manually. My means is 
there any need to solve this problem by optimized some logical? For example 
make sure new block report which trigger by namenode's heartbeat happened after 
enter active state. 

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> {code}
> Node: test cluster only have 6 datanode.
> You will see the blockreport called before "Marking all datanodes as stale" 
> which is logged by startActiveServices. But 
> DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then 
> startActiveServices set all datnaode to stale node. So the datanodes will 
> keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a 
> huge number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-22 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199847#comment-17199847
 ] 

zhengchenyu commented on HDFS-15589:


Yeps, I can solve this problem by trigger block report manually. My means is 
there any need to solve this problem by optimized some logical? For example 
make sure new block report which trigger by namenode's heartbeat happened after 
enter active state. 

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> {code}
> Node: test cluster only have 6 datanode.
> You will see the blockreport called before "Marking all datanodes as stale" 
> which is logged by startActiveServices. But 
> DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then 
> startActiveServices set all datnaode to stale node. So the datanodes will 
> keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a 
> huge number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

2020-09-21 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199750#comment-17199750
 ] 

zhengchenyu commented on HDFS-15589:


[~ayushtkn] I know the postpone block's logical. I encounter a case, maybe a 
low probability case. Now we describe this logical simply:

(1) When namenode transient from standby to active, namenode will label all 
DatanodeDescriptor be stale for aviod to delete some possible deleted block.

(2) Then datanode blockreport to namenode, then set DatanodeDescriptor to not 
stale. Then some over replicate block could be delete.

But if (2) happend before (1), the DatanodeDescriptor will keep stale util next 
blockreport, you know blockreport is low frequency rpc operaiton. So 
PostponedMisreplicatedBlocks will keep huge number for long time.

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> ---
>
> Key: HDFS-15589
> URL: https://issues.apache.org/jira/browse/HDFS-15589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
> Environment: CentOS 7
>Reporter: zhengchenyu
>Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> {code}
> Node: test cluster only have 6 datanode.
> You will see the blockreport called before "Marking all datanodes as stale" 
> which is logged by startActiveServices. But 
> DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then 
> startActiveServices set all datnaode to stale node. So the datanodes will 
> keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a 
> huge number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >