[jira] [Commented] (HDFS-16487) RBF: getListing uses raw mount table points

2022-02-26 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498531#comment-17498531
 ] 

Fengnan Li commented on HDFS-16487:
---

Thanks for the discussion [~ayushtkn] [~elgoiri] !

One big part of HDFS-13506 is to set the right owner/group permission of the 
physical HDFS dir/file. These values need to be passed to router and doAs the 
user to create the right permission. 

Internally we haven't turned on HDFS-15554 since some services are creating 
mounts before creating the dirs.

However even with both patches there is one contradiction here: HDFS 
paths/files are generally created by clients and Router mounts are created by 
RouterAdmin. If we bundle them together we are either making Router know 
clients behavior at precisely each dir level and with all the right information 
to create the path (permission and even ACL), or grant clients RouterAdmin 
access (which is how one of our internal services is doing) and this is less 
ideal as well.

The context of the whole of this rethinking is:

We are backing up all data sets in secondary datacenter from primary 
datacenter. There are about ~10k Hive tables as a big part of it. These tables 
come with various owner/group. One service is constantly running jobs to copy 
data. Per table, partitions are at different HDFS clusters and Router mounts 
specify the location. Initially we only created the partition mounts, like:

table/2018 -> HDFS A

table/2022 -> HDFS B

When the copy service starts, it lists the dirs for one table and Router 
returns all of these mounts. Client think there is already 2018 partition and 
it starts to create 2018/01 then failed on NoSuchFileException. From the 
client's perspective, listing returns the wrong results.

> RBF: getListing uses raw mount table points
> ---
>
> Key: HDFS-16487
> URL: https://issues.apache.org/jira/browse/HDFS-16487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Aihua Xu
>Priority: Major
>
> In getListing, the result is a union of subclusters results and mount points. 
> However these two are of different concepts and the latter one is something 
> Router internal. It is very possible that the actual path doesn't exist in 
> the dest HDFS yet. 
> Can we choose a different strategy that check each children mount point and 
> confirm there is the HDFS path in the dest cluster? If so, we can add it; 
> otherwise we should skip this mount because it confuses clients. (Clients 
> could directly create a subdir under a dangling mount point)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16487) RBF: getListing uses raw mount table points

2022-02-25 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-16487:
-

Assignee: Aihua Xu  (was: Fengnan Li)

> RBF: getListing uses raw mount table points
> ---
>
> Key: HDFS-16487
> URL: https://issues.apache.org/jira/browse/HDFS-16487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Aihua Xu
>Priority: Major
>
> In getListing, the result is a union of subclusters results and mount points. 
> However these two are of different concepts and the latter one is something 
> Router internal. It is very possible that the actual path doesn't exist in 
> the dest HDFS yet. 
> Can we choose a different strategy that check each children mount point and 
> confirm there is the HDFS path in the dest cluster? If so, we can add it; 
> otherwise we should skip this mount because it confuses clients. (Clients 
> could directly create a subdir under a dangling mount point)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16487) RBF: getListing uses raw mount table points

2022-02-25 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498302#comment-17498302
 ] 

Fengnan Li commented on HDFS-16487:
---

[~elgoiri] What is the intention of the original design to include raw mounts? 
Are there use cases dependent on this behavior?

> RBF: getListing uses raw mount table points
> ---
>
> Key: HDFS-16487
> URL: https://issues.apache.org/jira/browse/HDFS-16487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>
> In getListing, the result is a union of subclusters results and mount points. 
> However these two are of different concepts and the latter one is something 
> Router internal. It is very possible that the actual path doesn't exist in 
> the dest HDFS yet. 
> Can we choose a different strategy that check each children mount point and 
> confirm there is the HDFS path in the dest cluster? If so, we can add it; 
> otherwise we should skip this mount because it confuses clients. (Clients 
> could directly create a subdir under a dangling mount point)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16487) RBF: getListing uses raw mount table points

2022-02-25 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-16487:
-

 Summary: RBF: getListing uses raw mount table points
 Key: HDFS-16487
 URL: https://issues.apache.org/jira/browse/HDFS-16487
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Fengnan Li
Assignee: Fengnan Li


In getListing, the result is a union of subclusters results and mount points. 
However these two are of different concepts and the latter one is something 
Router internal. It is very possible that the actual path doesn't exist in the 
dest HDFS yet. 

Can we choose a different strategy that check each children mount point and 
confirm there is the HDFS path in the dest cluster? If so, we can add it; 
otherwise we should skip this mount because it confuses clients. (Clients could 
directly create a subdir under a dangling mount point)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16486) RBF: Don't override listing if there is a physical path from subcluster

2022-02-25 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-16486:
-

 Summary: RBF: Don't override listing if there is a physical path 
from subcluster
 Key: HDFS-16486
 URL: https://issues.apache.org/jira/browse/HDFS-16486
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Fengnan Li
Assignee: Fengnan Li


In getListing in RouterClientProtocol, currently router mount point would 
override the listing from subclusters. This will result in different 
HdfsFileStatus especially for owner/group permissions since Router mounts and 
the actual HDFS path are created from different user.

 

[https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java#L857]

 

To mitigate this discrepancy we can skip the mount point if there is already 
such a listing from subcluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16483) RBF: DataNode talk to Router requesting block info in WebHDFS

2022-02-24 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-16483:
-

 Summary: RBF: DataNode talk to Router requesting block info in 
WebHDFS
 Key: HDFS-16483
 URL: https://issues.apache.org/jira/browse/HDFS-16483
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: Fengnan Li
Assignee: Fengnan Li


In Webhdfs, before router redirects the OPEN call to datanode, it will attach 
the namenoderpcaddress param. When Datanode WebHdfsHandler takes the call, it 
will construct a DFSClient based on the ip address, which is pointing to Router.

This is OK when Router and Datanode are both secure or nonsecure. However when 
DN is not but Router is secure, there will be 
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN, KERBEROS]]
Comments are welcome in terms of how to fix this.

One way is to always make Datanode construct the DFSClient based on the default 
FS since the default FS is always the Namenode in the same cluster which should 
is with the same security setting as Datanode.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16436) RBF: CheckSafeMode before Read Operation

2022-01-24 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-16436:
--
Description: 
In Router's 
[checkOperation|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java#L630]
 call, the READ operation check is before safemode check. This has one issue 
where in the case of Mount Table Unavailable, READ can still pass the check 
while Router can not get the correct path location. Down the path is the 
FileNotFoundException and clients cannot retry. It is better for clients to 
receive the standbyException and failover early. 

This will solve one issue when one router host is somehow having issues with 
mount table.

  was:In Router's 
[checkOperation|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java#L630]
 call, the READ operation check is before safemode check. This has one issue 
where in the case of Mount Table Unavailable, READ can still pass the check 
while Router can not get the correct path location.


> RBF: CheckSafeMode before Read Operation
> 
>
> Key: HDFS-16436
> URL: https://issues.apache.org/jira/browse/HDFS-16436
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>
> In Router's 
> [checkOperation|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java#L630]
>  call, the READ operation check is before safemode check. This has one issue 
> where in the case of Mount Table Unavailable, READ can still pass the check 
> while Router can not get the correct path location. Down the path is the 
> FileNotFoundException and clients cannot retry. It is better for clients to 
> receive the standbyException and failover early. 
> This will solve one issue when one router host is somehow having issues with 
> mount table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16436) RBF: CheckSafeMode before Read Operation

2022-01-24 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-16436:
-

 Summary: RBF: CheckSafeMode before Read Operation
 Key: HDFS-16436
 URL: https://issues.apache.org/jira/browse/HDFS-16436
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Reporter: Fengnan Li
Assignee: Fengnan Li


In Router's 
[checkOperation|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java#L630]
 call, the READ operation check is before safemode check. This has one issue 
where in the case of Mount Table Unavailable, READ can still pass the check 
while Router can not get the correct path location.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16218) RBF: RouterFedbalance should load HDFS config

2021-09-15 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415800#comment-17415800
 ] 

Fengnan Li commented on HDFS-16218:
---

[~aajisaka] [~elgoiri] I updated the code to use HdfsConfiguration instead. I 
can write a test to check instanceOf but it is too trivial.

The problem is that the Configuration will not load hdfs-rbf-site.xml while the 
HdfsConfiguration will. It is the best if we can distinguish if a property is 
included but without value and a property is not even included.

> RBF: RouterFedbalance should load HDFS config
> -
>
> Key: HDFS-16218
> URL: https://issues.apache.org/jira/browse/HDFS-16218
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
> Environment: Hadoop 3.3.0 + patches, Kerberos authentication is 
> enabled
>Reporter: Akira Ajisaka
>Assignee: Fengnan Li
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> RouterFedBalance fails to connect to DFSRouter when Kerberos is enabled 
> because "dfs.federation.router.kerberos.principal" in hdfs-site.xml is not 
> loaded.
> {quote}
> 21/09/08 17:21:38 ERROR rbfbalance.RouterFedBalance: Submit balance job 
> failed.
> java.io.IOException: DestHost:destPort 0.0.0.0:8111 , LocalHost:localPort 
> /:0. Failed on local exception: java.io.IOException: Couldn't set 
> up IO streams: java.lang.IllegalArgumentException: Failed to specify server's 
> Kerberos principal name
>   at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getMountTableEntries(RouterAdminProtocolTranslatorPB.java:198)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.MountTableProcedure.getMountEntry(MountTableProcedure.java:140)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.getSrcPath(RouterFedBalance.java:326)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.access$000(RouterFedBalance.java:68)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance$Builder.build(RouterFedBalance.java:168)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.submit(RouterFedBalance.java:302)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.run(RouterFedBalance.java:216)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.main(RouterFedBalance.java:376)
> {quote}
> When adding the property specifically by "-D" option, the command worked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16188) RBF: Router to support resolving monitored namenodes with DNS

2021-09-10 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li resolved HDFS-16188.
---
Resolution: Fixed

> RBF: Router to support resolving monitored namenodes with DNS
> -
>
> Key: HDFS-16188
> URL: https://issues.apache.org/jira/browse/HDFS-16188
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> We can use a DNS round-robin record to configure list of monitored namenodes, 
> so we don't have to reconfigure everything namenode hostname is changed. For 
> example, in containerized environment the hostname of namenode/observers can 
> change pretty often.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16218) RBF: RouterFedbalance should load HDFS config

2021-09-10 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-16218:
-

Assignee: Fengnan Li

> RBF: RouterFedbalance should load HDFS config
> -
>
> Key: HDFS-16218
> URL: https://issues.apache.org/jira/browse/HDFS-16218
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
> Environment: Hadoop 3.3.0 + patches, Kerberos authentication is 
> enabled
>Reporter: Akira Ajisaka
>Assignee: Fengnan Li
>Priority: Major
>  Labels: newbie
>
> RouterFedBalance fails to connect to DFSRouter when Kerberos is enabled 
> because "dfs.federation.router.kerberos.principal" in hdfs-site.xml is not 
> loaded.
> {quote}
> 21/09/08 17:21:38 ERROR rbfbalance.RouterFedBalance: Submit balance job 
> failed.
> java.io.IOException: DestHost:destPort 0.0.0.0:8111 , LocalHost:localPort 
> /:0. Failed on local exception: java.io.IOException: Couldn't set 
> up IO streams: java.lang.IllegalArgumentException: Failed to specify server's 
> Kerberos principal name
>   at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getMountTableEntries(RouterAdminProtocolTranslatorPB.java:198)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.MountTableProcedure.getMountEntry(MountTableProcedure.java:140)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.getSrcPath(RouterFedBalance.java:326)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.access$000(RouterFedBalance.java:68)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance$Builder.build(RouterFedBalance.java:168)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.submit(RouterFedBalance.java:302)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.run(RouterFedBalance.java:216)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.main(RouterFedBalance.java:376)
> {quote}
> When adding the property specifically by "-D" option, the command worked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16218) RBF: RouterFedbalance should load HDFS config

2021-09-10 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413437#comment-17413437
 ] 

Fengnan Li commented on HDFS-16218:
---

[~elgoiri] Sure.

> RBF: RouterFedbalance should load HDFS config
> -
>
> Key: HDFS-16218
> URL: https://issues.apache.org/jira/browse/HDFS-16218
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
> Environment: Hadoop 3.3.0 + patches, Kerberos authentication is 
> enabled
>Reporter: Akira Ajisaka
>Assignee: Fengnan Li
>Priority: Major
>  Labels: newbie
>
> RouterFedBalance fails to connect to DFSRouter when Kerberos is enabled 
> because "dfs.federation.router.kerberos.principal" in hdfs-site.xml is not 
> loaded.
> {quote}
> 21/09/08 17:21:38 ERROR rbfbalance.RouterFedBalance: Submit balance job 
> failed.
> java.io.IOException: DestHost:destPort 0.0.0.0:8111 , LocalHost:localPort 
> /:0. Failed on local exception: java.io.IOException: Couldn't set 
> up IO streams: java.lang.IllegalArgumentException: Failed to specify server's 
> Kerberos principal name
>   at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getMountTableEntries(RouterAdminProtocolTranslatorPB.java:198)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.MountTableProcedure.getMountEntry(MountTableProcedure.java:140)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.getSrcPath(RouterFedBalance.java:326)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.access$000(RouterFedBalance.java:68)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance$Builder.build(RouterFedBalance.java:168)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.submit(RouterFedBalance.java:302)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.run(RouterFedBalance.java:216)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.main(RouterFedBalance.java:376)
> {quote}
> When adding the property specifically by "-D" option, the command worked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16157) Support configuring DNS record to get list of journal nodes.

2021-08-25 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li resolved HDFS-16157.
---
Resolution: Resolved

> Support configuring DNS record to get list of journal nodes.
> 
>
> Key: HDFS-16157
> URL: https://issues.apache.org/jira/browse/HDFS-16157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We can use a DNS round-robin record to configure list of journal nodes, so we 
> don't have to reconfigure everything journal node hostname is changed. For 
> example, in some containerized environment the hostname of journal nodes can 
> change pretty often.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15599) RBF: Add API to expose resolved destinations (namespace) in Router

2021-05-19 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-15599:
-

Assignee: Qifan Shi

> RBF: Add API to expose resolved destinations (namespace) in Router
> --
>
> Key: HDFS-15599
> URL: https://issues.apache.org/jira/browse/HDFS-15599
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Qifan Shi
>Priority: Major
>
> We have seen quite often requests like where a path in Router is actually 
> pointed. Two main use cases are:
> 1) Calculate the HDFS capacity usage allocation of all Hive tables, whose 
> have onboarded to Router.
> 2) A failure prevention method for cross-cluster rename. First check the 
> source HDFS location and dest HDFS location, and then issue a distcp cmd if 
> possible to avoid the Exception.
> Inside Router, the function getLocationsForPath does the work but it is 
> internal only and not visible to Clients.
> RouterAdmin has getMountTableEntries but this is a cast of Mount table 
> without any resolving.
>  
> We are proposing adding such an API, and there are two ways:
> 1) Adding this API in RouterRpcServer, which requires the change in 
> ClientNameNodeProtocol to include this new API. 
> 2) Adding this API in RouterAdminServer, which requires the a protocol 
> between Client and the admin server.
>  
> There is one existing resolvePath in FileSystem which can be used to 
> implement this call from client side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15599) RBF: Add API to expose resolved destinations (namespace) in Router

2021-05-19 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-15599:
-

Assignee: (was: Fengnan Li)

> RBF: Add API to expose resolved destinations (namespace) in Router
> --
>
> Key: HDFS-15599
> URL: https://issues.apache.org/jira/browse/HDFS-15599
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Priority: Major
>
> We have seen quite often requests like where a path in Router is actually 
> pointed. Two main use cases are:
> 1) Calculate the HDFS capacity usage allocation of all Hive tables, whose 
> have onboarded to Router.
> 2) A failure prevention method for cross-cluster rename. First check the 
> source HDFS location and dest HDFS location, and then issue a distcp cmd if 
> possible to avoid the Exception.
> Inside Router, the function getLocationsForPath does the work but it is 
> internal only and not visible to Clients.
> RouterAdmin has getMountTableEntries but this is a cast of Mount table 
> without any resolving.
>  
> We are proposing adding such an API, and there are two ways:
> 1) Adding this API in RouterRpcServer, which requires the change in 
> ClientNameNodeProtocol to include this new API. 
> 2) Adding this API in RouterAdminServer, which requires the a protocol 
> between Client and the admin server.
>  
> There is one existing resolvePath in FileSystem which can be used to 
> implement this call from client side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15675) TestRouterRpcMultiDestination#testErasureCoding fails on trunk

2021-05-19 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-15675:
-

Assignee: Fengnan Li

> TestRouterRpcMultiDestination#testErasureCoding fails on trunk
> --
>
> Key: HDFS-15675
> URL: https://issues.apache.org/jira/browse/HDFS-15675
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Fengnan Li
>Priority: Major
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures in testErasureCoding



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15857) Space is missed in the print result of ECAdmin.RemoveECPolicyCommand

2021-05-19 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-15857:
-

Assignee: Fengnan Li

> Space is missed in the print result of ECAdmin.RemoveECPolicyCommand
> 
>
> Key: HDFS-15857
> URL: https://issues.apache.org/jira/browse/HDFS-15857
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec
>Affects Versions: 3.4.0
>Reporter: Shiyou xin
>Assignee: Fengnan Li
>Priority: Minor
>
> System.*_out_*.println("Erasure coding policy " + ecPolicyName +
>    "is removed");
>  
> It will be better if  insert a space between ecPolicyName and "is removed"
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-05-07 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li resolved HDFS-15878.
---
Resolution: Not A Problem

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
>   at 
> 

[jira] [Reopened] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-05-07 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reopened HDFS-15878:
---

Reopen to change the closing status.

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
>   at 
> 

[jira] [Commented] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-05-07 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340967#comment-17340967
 ] 

Fengnan Li commented on HDFS-15878:
---

[~hexiaoqiao] [~ayushtkn]  Updated the status as suggested. Thanks.

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> 

[jira] [Resolved] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-05-04 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li resolved HDFS-15878.
---
Resolution: Resolved

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
>   at 
> 

[jira] [Commented] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-05-04 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339192#comment-17339192
 ] 

Fengnan Li commented on HDFS-15878:
---

Thanks [~ayushtkn] for the answer. I was thinking you guys have some automated 
tools to monitor the failure rate of certain tests.

The goal was trying to prove that HDFS-15423 fixed it since some changes were 
made in that patch to make RouterWebHDFSContract fixture have separate 
Datanodes among downstream HDFS clusters thus the flakiness of this test is 
avoided.

I have run this test locally many times and couldn't reproduce the failure so 
will resolve it.

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> 

[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management

2021-05-03 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338633#comment-17338633
 ] 

Fengnan Li commented on HDFS-15757:
---

[~hexiaoqiao] Have you got time to test this patch? We found another tuning 
parameter for this is the min active ratio. The higher this number is the 
shorter the proxy time is (less cleanup). Thanks!

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Improving Router Connection Management_v3.pdf, RBF_ Router Connection 
> Management.pdf
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-05-03 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338632#comment-17338632
 ] 

Fengnan Li commented on HDFS-15878:
---

The failure didn't show up in this page: 
[https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/490/testReport/]

[~weichiu] [~aajisaka] [~ayushtkn] [~hexiaoqiao]  Since there is the effort 
going on cleaning up the tests. Can you guys help to confim?

Thanks a lot!

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 

[jira] [Created] (HDFS-16005) RBF: AccessControlException is counted as proxy failure

2021-05-02 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-16005:
-

 Summary: RBF: AccessControlException is counted as proxy failure
 Key: HDFS-16005
 URL: https://issues.apache.org/jira/browse/HDFS-16005
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Fengnan Li
Assignee: Fengnan Li


We are using ProxyOpCommunicateFailure as a metric for monitoring Router's 
performance. However we recently noticed that when some clients try to access 
files they don't have access to in Namenode. AccessControlException thrown from 
Namenode was counted in this metric.

In our understanding ProxyOpCommunicateFailure is used as network/hardware 
failure  between Router and Namenode instead of the communication failure due 
to client side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15810) RBF: RBFMetrics's TotalCapacity out of bounds

2021-04-26 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15810:
--
Attachment: Screen Shot 2021-04-26 at 5.25.12 PM.png

> RBF: RBFMetrics's TotalCapacity out of bounds
> -
>
> Key: HDFS-15810
> URL: https://issues.apache.org/jira/browse/HDFS-15810
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoxing Wei
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-04-26 at 5.25.12 PM.png, 
> image-2021-02-02-10-59-17-113.png
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The Long type fields TotalCapacity,UsedCapacity and RemainingCapacity in 
> RBFMetrics maybe ** out of bounds.
> !image-2021-02-02-10-59-17-113.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15561) RBF: Fix NullPointException when start dfsrouter

2021-04-25 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331422#comment-17331422
 ] 

Fengnan Li commented on HDFS-15561:
---

[~weichiu] [~hexiaoqiao] Attached a new patch, please review. Thanks!

> RBF: Fix NullPointException when start dfsrouter
> 
>
> Key: HDFS-15561
> URL: https://issues.apache.org/jira/browse/HDFS-15561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Xie Lei
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> when start dfsrouter, it throw NPE
> {code:java}
> 2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: null2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: nulljava.lang.IllegalArgumentException: 
> java.net.UnknownHostException: null at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:171)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:123) 
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:248)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:205)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
>  at java.base/java.lang.Thread.run(Thread.java:844)Caused by: 
> java.net.UnknownHostException: null ... 14 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-04-23 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330958#comment-17330958
 ] 

Fengnan Li commented on HDFS-15878:
---

[~weichiu] can you help verify the tests have been successful? thanks!

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> 

[jira] [Updated] (HDFS-15561) RBF: Fix NullPointException when start dfsrouter

2021-04-19 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15561:
--
Summary: RBF: Fix NullPointException when start dfsrouter  (was: Fix 
NullPointException when start dfsrouter)

> RBF: Fix NullPointException when start dfsrouter
> 
>
> Key: HDFS-15561
> URL: https://issues.apache.org/jira/browse/HDFS-15561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Xie Lei
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> when start dfsrouter, it throw NPE
> {code:java}
> 2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: null2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: nulljava.lang.IllegalArgumentException: 
> java.net.UnknownHostException: null at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:171)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:123) 
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:248)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:205)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
>  at java.base/java.lang.Thread.run(Thread.java:844)Caused by: 
> java.net.UnknownHostException: null ... 14 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15561) Fix NullPointException when start dfsrouter

2021-04-18 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-15561:
-

Assignee: Fengnan Li

> Fix NullPointException when start dfsrouter
> ---
>
> Key: HDFS-15561
> URL: https://issues.apache.org/jira/browse/HDFS-15561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Xie Lei
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> when start dfsrouter, it throw NPE
> {code:java}
> 2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: null2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: nulljava.lang.IllegalArgumentException: 
> java.net.UnknownHostException: null at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:171)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:123) 
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:248)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:205)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
>  at java.base/java.lang.Thread.run(Thread.java:844)Caused by: 
> java.net.UnknownHostException: null ... 14 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15561) Fix NullPointException when start dfsrouter

2021-04-18 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324706#comment-17324706
 ] 

Fengnan Li commented on HDFS-15561:
---

[~weichiu] Sure, I will take it. Thanks.

> Fix NullPointException when start dfsrouter
> ---
>
> Key: HDFS-15561
> URL: https://issues.apache.org/jira/browse/HDFS-15561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Xie Lei
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> when start dfsrouter, it throw NPE
> {code:java}
> 2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: null2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: nulljava.lang.IllegalArgumentException: 
> java.net.UnknownHostException: null at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:171)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:123) 
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:248)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:205)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
>  at java.base/java.lang.Thread.run(Thread.java:844)Caused by: 
> java.net.UnknownHostException: null ... 14 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-15561) Fix NullPointException when start dfsrouter

2021-04-18 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15561:
--
Comment: was deleted

(was: [~lamberken] Are you still working on this one? 

[~weichiu] I can take this one if Xie is not working on it.)

> Fix NullPointException when start dfsrouter
> ---
>
> Key: HDFS-15561
> URL: https://issues.apache.org/jira/browse/HDFS-15561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Xie Lei
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> when start dfsrouter, it throw NPE
> {code:java}
> 2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: null2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: nulljava.lang.IllegalArgumentException: 
> java.net.UnknownHostException: null at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:171)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:123) 
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:248)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:205)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
>  at java.base/java.lang.Thread.run(Thread.java:844)Caused by: 
> java.net.UnknownHostException: null ... 14 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15561) Fix NullPointException when start dfsrouter

2021-04-18 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324705#comment-17324705
 ] 

Fengnan Li commented on HDFS-15561:
---

[~lamberken] Are you still working on this one? 

[~weichiu] I can take this one if Xie is not working on it.

> Fix NullPointException when start dfsrouter
> ---
>
> Key: HDFS-15561
> URL: https://issues.apache.org/jira/browse/HDFS-15561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Xie Lei
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> when start dfsrouter, it throw NPE
> {code:java}
> 2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: null2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: nulljava.lang.IllegalArgumentException: 
> java.net.UnknownHostException: null at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:171)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:123) 
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:248)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:205)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
>  at java.base/java.lang.Thread.run(Thread.java:844)Caused by: 
> java.net.UnknownHostException: null ... 14 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15878) Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-04-16 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324127#comment-17324127
 ] 

Fengnan Li commented on HDFS-15878:
---

[~inigoiri] How can we verify the tests are fixed? Is there some jenkins job I 
can monitor on?

> Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> ---
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> 

[jira] [Updated] (HDFS-15810) RBF: RBFMetrics's TotalCapacity out of bounds

2021-04-14 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15810:
--
Status: Patch Available  (was: Open)

> RBF: RBFMetrics's TotalCapacity out of bounds
> -
>
> Key: HDFS-15810
> URL: https://issues.apache.org/jira/browse/HDFS-15810
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoxing Wei
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-02-02-10-59-17-113.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Long type fields TotalCapacity,UsedCapacity and RemainingCapacity in 
> RBFMetrics maybe ** out of bounds.
> !image-2021-02-02-10-59-17-113.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15810) RBF: RBFMetrics's TotalCapacity out of bounds

2021-04-14 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-15810:
-

Assignee: Fengnan Li

> RBF: RBFMetrics's TotalCapacity out of bounds
> -
>
> Key: HDFS-15810
> URL: https://issues.apache.org/jira/browse/HDFS-15810
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoxing Wei
>Assignee: Fengnan Li
>Priority: Major
> Attachments: image-2021-02-02-10-59-17-113.png
>
>
> The Long type fields TotalCapacity,UsedCapacity and RemainingCapacity in 
> RBFMetrics maybe ** out of bounds.
> !image-2021-02-02-10-59-17-113.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15810) RBF: RBFMetrics's TotalCapacity out of bounds

2021-04-14 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321352#comment-17321352
 ] 

Fengnan Li commented on HDFS-15810:
---

Sure. I will give it a try with BigInteger.

> RBF: RBFMetrics's TotalCapacity out of bounds
> -
>
> Key: HDFS-15810
> URL: https://issues.apache.org/jira/browse/HDFS-15810
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoxing Wei
>Priority: Major
> Attachments: image-2021-02-02-10-59-17-113.png
>
>
> The Long type fields TotalCapacity,UsedCapacity and RemainingCapacity in 
> RBFMetrics maybe ** out of bounds.
> !image-2021-02-02-10-59-17-113.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15810) RBF: RBFMetrics's TotalCapacity out of bounds

2021-04-14 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321315#comment-17321315
 ] 

Fengnan Li commented on HDFS-15810:
---

Can we use double which has much bigger MAX than long?

> RBF: RBFMetrics's TotalCapacity out of bounds
> -
>
> Key: HDFS-15810
> URL: https://issues.apache.org/jira/browse/HDFS-15810
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoxing Wei
>Priority: Major
> Attachments: image-2021-02-02-10-59-17-113.png
>
>
> The Long type fields TotalCapacity,UsedCapacity and RemainingCapacity in 
> RBFMetrics maybe ** out of bounds.
> !image-2021-02-02-10-59-17-113.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-13 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-15423 started by Fengnan Li.
-
> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-13 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320443#comment-17320443
 ] 

Fengnan Li commented on HDFS-15423:
---

Thanks very much [~ayushtkn] I have opened: 
[https://github.com/apache/hadoop/pull/2903] for it.

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-13 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320410#comment-17320410
 ] 

Fengnan Li commented on HDFS-15423:
---

Indeed. Let's wait for one day since by reverting the history would be 
cleaner...

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-13 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320354#comment-17320354
 ] 

Fengnan Li commented on HDFS-15423:
---

[~elgoiri] [~ferhui] The dependent call was removed in this task: 
https://issues.apache.org/jira/browse/HDFS-15884. Can we revert that?

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319891#comment-17319891
 ] 

Fengnan Li commented on HDFS-15423:
---

[~elgoiri] Sure, I will create a new one.

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319724#comment-17319724
 ] 

Fengnan Li commented on HDFS-15423:
---

Thanks [~elgoiri] [~ayushtkn] for the review! Let's see whether it can fix 
[HDFS-15878|https://issues.apache.org/jira/browse/HDFS-15878]

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15878) Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-04-12 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319573#comment-17319573
 ] 

Fengnan Li commented on HDFS-15878:
---

Let's wait after HDFS-15423 is committed. Thanks.

> Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> ---
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> 

[jira] [Commented] (HDFS-15878) Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-04-09 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318339#comment-17318339
 ] 

Fengnan Li commented on HDFS-15878:
---

I think this will be fixed by 
[HDFS-15423|https://issues.apache.org/jira/browse/HDFS-15423]

> Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> ---
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> 

[jira] [Updated] (HDFS-15878) Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-04-09 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15878:
--
Component/s: rbf
 hdfs

> Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> ---
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
>   at 
> 

[jira] [Assigned] (HDFS-15878) Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-04-09 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-15878:
-

Assignee: Fengnan Li

> Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> ---
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
>   at 
> 

[jira] [Commented] (HDFS-15675) TestRouterRpcMultiDestination#testErasureCoding fails on trunk

2021-04-09 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318315#comment-17318315
 ] 

Fengnan Li commented on HDFS-15675:
---

Is this still happening? If so I would like to take it.

> TestRouterRpcMultiDestination#testErasureCoding fails on trunk
> --
>
> Key: HDFS-15675
> URL: https://issues.apache.org/jira/browse/HDFS-15675
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Priority: Major
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures in testErasureCoding



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15756) RBF: Cannot get updated delegation token from zookeeper

2021-04-08 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317683#comment-17317683
 ] 

Fengnan Li commented on HDFS-15756:
---

This was discussed in 
[HDFS-14405|https://issues.apache.org/jira/browse/HDFS-14405]. And yes a 
different storage with strong consistency to the view of clients can solve the 
issue.

> RBF: Cannot get updated delegation token from zookeeper
> ---
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15836) RBF: Fix TestRouterHDFSContractCreate and TestRouterHDFSContractCreateSecure

2021-02-15 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-15836:
-

Assignee: Akira Ajisaka

> RBF: Fix TestRouterHDFSContractCreate and TestRouterHDFSContractCreateSecure
> 
>
> Key: HDFS-15836
> URL: https://issues.apache.org/jira/browse/HDFS-15836
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 19.094 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate)
>   Time elapsed: 0.102 s  <<< FAILURE!
> java.lang.AssertionError: Should not have capability: hflush in 
> FSDataOutputStream{wrappedStream=DFSOutputStream:block==null}
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at 
> org.apache.hadoop.fs.contract.ContractTestUtils.assertCapabilities(ContractTestUtils.java:1553)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:497)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2696/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15836) RBF: Fix TestRouterHDFSContractCreate and TestRouterHDFSContractCreateSecure

2021-02-15 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284992#comment-17284992
 ] 

Fengnan Li commented on HDFS-15836:
---

+1 for the fix. Thanks [~aajisaka]

> RBF: Fix TestRouterHDFSContractCreate and TestRouterHDFSContractCreateSecure
> 
>
> Key: HDFS-15836
> URL: https://issues.apache.org/jira/browse/HDFS-15836
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 19.094 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate)
>   Time elapsed: 0.102 s  <<< FAILURE!
> java.lang.AssertionError: Should not have capability: hflush in 
> FSDataOutputStream{wrappedStream=DFSOutputStream:block==null}
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at 
> org.apache.hadoop.fs.contract.ContractTestUtils.assertCapabilities(ContractTestUtils.java:1553)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:497)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2696/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15833) Make ObserverReadProxyProvider able to talk to DNS of Observers

2021-02-10 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15833:
-

 Summary: Make ObserverReadProxyProvider able to talk to DNS of 
Observers
 Key: HDFS-15833
 URL: https://issues.apache.org/jira/browse/HDFS-15833
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Aihua Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15832) Using DNS to access Zookeeper cluster

2021-02-10 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15832:
-

 Summary: Using DNS to access Zookeeper cluster
 Key: HDFS-15832
 URL: https://issues.apache.org/jira/browse/HDFS-15832
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Aihua Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15831) Adopt more DNS resolving for HDFS

2021-02-10 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15831:
-

 Summary: Adopt more DNS resolving for HDFS
 Key: HDFS-15831
 URL: https://issues.apache.org/jira/browse/HDFS-15831
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li


There are some opportunities inside HDFS where we can use DNS for hosts instead 
of host names. This will help to a large extent in two aspects:
1. Server management, i.e. host replacement
2. Client transparency, i.e. client config with DNS without knowing the 
specific host.

It is worth mentioning that secure environment should be supported, we 
recommend having the principal wildcard matching turned on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management

2021-02-04 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279190#comment-17279190
 ] 

Fengnan Li commented on HDFS-15757:
---

[~hexiaoqiao] Thanks for the question. There are three latencies:
1. rpc queue time, this is measured the time in the RPC queue, which is not 
related with the change.
2. rpc processing time, this is measured before the actual proxy op (get tcp 
connection and talk to nn), which is not related.
3. proxy time, this is directly impacted since the change improves 
getConnection() a lot. I have done some flamegraphes for Router to understand 
the performance bottleneck and often I can see getConnection() in the stack 
taking a lot of time. With this change, connections are actually maintained as 
Active as possible. v.s. previously the connection left not quite closed and 
hitting the connection cap for the pool thus no more active connection can be 
created.
>From the last graph I included in the doc we can see the ProxyTime is improved.
Feel free to give a try in your setup. It's always good to have a second eye on 
it.

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Improving Router Connection Management_v3.pdf, RBF_ Router Connection 
> Management.pdf
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management

2021-02-02 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277412#comment-17277412
 ] 

Fengnan Li commented on HDFS-15757:
---

[~elgoiri] [~hexiaoqiao] 
Addressed comments in the PR. What's more important is that you guys can try 
this from your setup since this essentially is an optimization where only 
metrics improvement can justify it.

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Improving Router Connection Management_v3.pdf, RBF_ Router Connection 
> Management.pdf
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management

2021-02-01 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276730#comment-17276730
 ] 

Fengnan Li commented on HDFS-15757:
---

[~elgoiri] [~hexiaoqiao] Thanks for the review and take a look at the updated 
design doc where I put our evaluation metrics.

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Improving Router Connection Management_v3.pdf, RBF_ Router Connection 
> Management.pdf
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15757) RBF: Improving Router Connection Management

2021-02-01 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15757:
--
Attachment: RBF_ Improving Router Connection Management_v3.pdf

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Improving Router Connection Management_v3.pdf, RBF_ Router Connection 
> Management.pdf
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-29 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275455#comment-17275455
 ] 

Fengnan Li commented on HDFS-15757:
---

Updated the latest patch in github.
We saw ~50% less connections with min ratio as 50%. Some improvement in 
ProxyTime since it contains the getConnection. I will update with more data.
Please try with your set up.

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Router Connection Management.pdf
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14558) RBF: Isolation/Fairness documentation

2021-01-11 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14558:
--
Attachment: HDFS-14558.003.patch

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14558.001.patch, HDFS-14558.002.patch, 
> HDFS-14558.003.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14558) RBF: Isolation/Fairness documentation

2021-01-11 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263129#comment-17263129
 ] 

Fengnan Li commented on HDFS-14558:
---

Thanks for the pointer [~linyiqun]. Edited accordingly.

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14558.001.patch, HDFS-14558.002.patch, 
> HDFS-14558.003.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14558) RBF: Isolation/Fairness documentation

2021-01-11 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14558:
--
Attachment: HDFS-14558.003.patch

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14558.001.patch, HDFS-14558.002.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14558) RBF: Isolation/Fairness documentation

2021-01-11 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14558:
--
Attachment: (was: HDFS-14558.003.patch)

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14558.001.patch, HDFS-14558.002.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14558) RBF: Isolation/Fairness documentation

2021-01-10 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14558:
--
Attachment: HDFS-14558.002.patch

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14558.001.patch, HDFS-14558.002.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14558) RBF: Isolation/Fairness documentation

2021-01-10 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262262#comment-17262262
 ] 

Fengnan Li commented on HDFS-14558:
---

Thanks a lot for the review [~linyiqun] I uploaded an updated patch. 

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14558.001.patch, HDFS-14558.002.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-01-07 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260857#comment-17260857
 ] 

Fengnan Li commented on HDFS-15423:
---

[~csun] [~elgoiri] [~hexiaoqiao] Please review the PR. Thanks!

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-05 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259086#comment-17259086
 ] 

Fengnan Li commented on HDFS-15757:
---

[~elgoiri] I actually have done some very simple POC for it and I did see 
improvement in terms of decreasing total number of conns. I will share more 
later once I get more code and data.

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Router Connection Management.pdf
>
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-04 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15757:
--
Attachment: RBF_ Improving Router Connection Management_v2.pdf

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Router Connection Management.pdf
>
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-04 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15757:
--
Attachment: (was: RBF_ Improving Router Connection Management_v2.pdf)

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: RBF_ Router Connection Management.pdf
>
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-04 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15757:
--
Attachment: RBF_ Improving Router Connection Management_v2.pdf

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Router Connection Management.pdf
>
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-04 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258611#comment-17258611
 ] 

Fengnan Li commented on HDFS-15757:
---

Uploaded v2 with more metrics and some changes. I will start some POC towards 
this direction.

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Router Connection Management.pdf
>
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-04 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15757:
--
Attachment: (was: RBF_ Improving Router Connection Management_v2.pdf)

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: RBF_ Router Connection Management.pdf
>
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-04 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15757:
--
Attachment: RBF_ Improving Router Connection Management_v2.pdf

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ 
> Router Connection Management.pdf
>
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-04 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258376#comment-17258376
 ] 

Fengnan Li commented on HDFS-15757:
---

Thanks for the review [~elgoiri] There are two metrics we will try to improve.
1. RpcClientNumConnections should go down in each router
2. RpcClientNumActiveConnections / RpcClientNumConnections should go up in each 
router.

I will add more graphs for this in an updated doc. The first version was trying 
to get some initial feedback.

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: RBF_ Router Connection Management.pdf
>
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-01 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257230#comment-17257230
 ] 

Fengnan Li commented on HDFS-15757:
---

[~inigoiri] [~hexiaoqiao][~ayushtkn] Please take a look. Thanks!

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: RBF_ Router Connection Management.pdf
>
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-01 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15757:
-

 Summary: RBF: Improving Router Connection Management
 Key: HDFS-15757
 URL: https://issues.apache.org/jira/browse/HDFS-15757
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Reporter: Fengnan Li
Assignee: Fengnan Li
 Attachments: RBF_ Router Connection Management.pdf

We have seen high number of connections from Router to namenodes, leaving 
namenodes unstable.
This ticket is trying to reduce connections through some changes. Please take a 
look at the design and leave comments. 
Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-01 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15757:
--
Attachment: RBF_ Router Connection Management.pdf

> RBF: Improving Router Connection Management
> ---
>
> Key: HDFS-15757
> URL: https://issues.apache.org/jira/browse/HDFS-15757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: RBF_ Router Connection Management.pdf
>
>
> We have seen high number of connections from Router to namenodes, leaving 
> namenodes unstable.
> This ticket is trying to reduce connections through some changes. Please take 
> a look at the design and leave comments. 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15754) Create packet metrics for DataNode

2020-12-29 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15754:
-

 Summary: Create packet metrics for DataNode
 Key: HDFS-15754
 URL: https://issues.apache.org/jira/browse/HDFS-15754
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Fengnan Li
Assignee: Fengnan Li


In BlockReceiver, right now when there is slowness in writeToMirror, 
writeToDisk and writeToOsCache, it is dumped in the debug log. In practice we 
have found these are quite useful signal to detect issues in DataNode, so it 
will be great these metrics can be exposed by JMX.
Also we introduced totalPacket received to use a percentage as a signal to 
detect the potentially underperforming datanode since datanodes across one HDFS 
cluster may received different numbers of packets totally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14558) RBF: Isolation/Fairness documentation

2020-12-09 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-14558:
-

Assignee: Fengnan Li  (was: CR Hota)

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14558.001.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14558) RBF: Isolation/Fairness documentation

2020-12-09 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247059#comment-17247059
 ] 

Fengnan Li commented on HDFS-14558:
---

[~ferhui] Thanks for the ping. I will provide an updated patch soon.

> RBF: Isolation/Fairness documentation
> -
>
> Key: HDFS-14558
> URL: https://issues.apache.org/jira/browse/HDFS-14558
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Attachments: HDFS-14558.001.patch
>
>
> Documentation is needed to make users aware of this feature HDFS-14090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-12-09 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247035#comment-17247035
 ] 

Fengnan Li commented on HDFS-15383:
---

[~John Smith] It is a good question.
First of all, when the token is stale it will be deleted by the clean up 
thread, thus when a client access this Router with a renewed token this Router 
would not recognize it thus will load from ZK. The default scan interval is 1h, 
which is long.
On the other hand, clients normally renew a token before it expires. For 
example, Yarn renews a token when it reaches 92% (configurable, I forgot the 
exact value) of the renew date, meaning when the client renews token, there are 
still over 1 hour left for the token to be effective. Internally we set our 
sync interval as 10min, so all Routers will be able to get the new renew date 
in around 10min. In the meanwhile this is still a valid token, though there may 
be different renew date on different Routers. 
10 minutes is time for loading 1M tokens from zk to router memory in our env.
So theoretically your client will fail if you set the sync interval to be a 
very large value like 2 hours, but we don't use such a big value in this poll 
model. We can also make the deletion period shorter like every 15 mins to 
further prevent the auth failures.
Hope it makes sense.

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-12 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231064#comment-17231064
 ] 

Fengnan Li commented on HDFS-14090:
---

 [^HDFS-14090.025.patch] to fix tests and address comments.
[~elgoiri] I will start working on HDFS-14558 soon.

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, 
> HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, 
> HDFS-14090.024.patch, HDFS-14090.025.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-12 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14090:
--
Attachment: HDFS-14090.025.patch

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, 
> HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, 
> HDFS-14090.024.patch, HDFS-14090.025.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-11 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230394#comment-17230394
 ] 

Fengnan Li edited comment on HDFS-14090 at 11/12/20, 7:07 AM:
--

Uploaded  [^HDFS-14090.024.patch] to add configs for it.
I feel like there should be more optimization about how this config can be 
specified and be made less verbose (like specify certain default values so we 
don't need to specify all nameservices), but I cannot come up with a clean way 
of doing this now. Will revisit when I start tackle the dynamic allocations.
Thanks!


was (Author: fengnanli):
Uploaded  [^HDFS-14090.024.patch] to add configs for it.
I feel like there should be more optimization about how this config can be 
specified but make it less verbose (like specify certain default values so we 
don't need to specify all nameservices), but I cannot come up with a clean way 
of doing this now. Will revisit when I start tackle the dynamic allocations.
Thanks!

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, 
> HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, 
> HDFS-14090.024.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-11 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230394#comment-17230394
 ] 

Fengnan Li commented on HDFS-14090:
---

Uploaded  [^HDFS-14090.024.patch] to add configs for it.
I feel like there should be more optimization about how this config can be 
specified but make it less verbose (like specify certain default values so we 
don't need to specify all nameservices), but I cannot come up with a clean way 
of doing this now. Will revisit when I start tackle the dynamic allocations.
Thanks!

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, 
> HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, 
> HDFS-14090.024.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-11 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14090:
--
Attachment: HDFS-14090.024.patch

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, 
> HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, 
> HDFS-14090.024.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-11 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230344#comment-17230344
 ] 

Fengnan Li commented on HDFS-14090:
---

Thanks for the review [~linyiqun] and here is the late response.

1. the isolated concurrent is for the case where renew lease doesn't stop due 
to connection leak. We have seen this production and YARN added some fix to it. 
Interestingly, the bug was fixed by you in 
[HDFS-10549|https://issues.apache.org/jira/browse/HDFS-10549] . If we remove 
concurrent, these calls would consume the permit for other calls quickly and 
soon the whole cluster will be swamped with renew lease.

I will address 2 in a following patch and leave 3 as a follow-up JIRA when more 
production data is collected.

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, 
> HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, RBF_ 
> Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-02 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14090:
--
Attachment: HDFS-14090.023.patch

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, 
> HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, RBF_ 
> Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-02 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14090:
--
Attachment: HDFS-14090.022.patch

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, 
> HDFS-14090.021.patch, HDFS-14090.022.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-01 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14090:
--
Attachment: HDFS-14090.021.patch

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, 
> HDFS-14090.021.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-01 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224360#comment-17224360
 ] 

Fengnan Li commented on HDFS-14090:
---

Thanks [~elgoiri] let's see whether [^HDFS-14090.020.patch] can fix the javadoc.

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, RBF_ 
> Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-01 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14090:
--
Attachment: HDFS-14090.020.patch

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, RBF_ 
> Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-01 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224314#comment-17224314
 ] 

Fengnan Li commented on HDFS-14090:
---

Uploaded [^HDFS-14090.019.patch] to fix styling issues.

[~xkrogen] [~hexiaoqiao] [~elgoiri] [~ferhui] Mind giving it another look? It 
will be great if we can push it out and unblock the later dynamic allocation. 
Thanks!

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-11-01 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14090:
--
Attachment: HDFS-14090.019.patch

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-10-30 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14090:
--
Attachment: HDFS-14090.018.patch

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-10-30 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14090:
--
Attachment: HDFS-14090.017.patch

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, RBF_ 
> Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-10-27 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-14090:
--
Attachment: HDFS-14090.016.patch

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-10-27 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221751#comment-17221751
 ] 

Fengnan Li commented on HDFS-14090:
---

Uploaded [^HDFS-14090.016.patch] to fix styling and unit tests

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2020-10-20 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reassigned HDFS-14090:
-

Assignee: Fengnan Li  (was: CR Hota)

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   >