[jira] [Commented] (HDFS-13522) Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190767#comment-17190767 ] CR Hota commented on HDFS-13522: [~elgoiri] Thanks for following-up. [~hemanthboyina] Thanks for uploading the patch and feel free to take this jira. I can also help with the code review. Meanwhile can you help me understand if the consistency guarantees are same with and without router or router relaxes the consistency guarantees ? This was a discussion point when we were last working on this. Please refer to the notes in the thread. The last design doc which was uploaded was intended to allow routers in the middle to still honor the same consistency guarantees that client to Nameode/ObserverNamenode honor without routers. > Support observer node from Router-Based Federation > -- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13522.001.patch, HDFS-13522_WIP.patch, RBF_ > Observer support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png > > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940620#comment-16940620 ] CR Hota commented on HDFS-14090: [~elgoiri] [~xkrogen] Thanks for your patience. Lets try and close this in the coming week. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14284) RBF: Log Router identifier when reporting exceptions
[ https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938979#comment-16938979 ] CR Hota commented on HDFS-14284: [~hemanthboyina] [~inigoiri] [~ayushtkn] Thanks for the discussion so far. Overall approach looks fine. Can we separate out RIOEx from hadoop-common and StandByExe not extend from RIOEx? Its best not to change hadoop-common directly for this feature. RIOEx can be added in hdfs-rbf project and standby can be used directly to construct the error msg containing the router id before creating standby exception. Anyways standby already has logic in client side to failover, log of standby will automatically output the router id used when exception was created in server. > RBF: Log Router identifier when reporting exceptions > > > Key: HDFS-14284 > URL: https://issues.apache.org/jira/browse/HDFS-14284 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14284.001.patch, HDFS-14284.002.patch > > > The typical setup is to use multiple Routers through > ConfiguredFailoverProxyProvider. > In a regular HA Namenode setup, it is easy to know which NN was used. > However, in RBF, any Router can be the one reporting the exception and it is > hard to know which was the one. > We should have a way to identify which Router/Namenode was the one triggering > the exception. > This would also apply with Observer Namenodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test
[ https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938970#comment-16938970 ] CR Hota commented on HDFS-14461: [~hexiaoqiao] This looks so much better. Thanks for getting this through the finish line. +1 for v5. > RBF: Fix intermittently failing kerberos related unit test > -- > > Key: HDFS-14461 > URL: https://issues.apache.org/jira/browse/HDFS-14461 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14461.001.patch, HDFS-14461.002.patch, > HDFS-14461.003.patch, HDFS-14461.004.patch, HDFS-14461.005.patch > > > TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It > may be due to some race condition before using the keytab that's created for > testing. > > {code:java} > Failed > org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken > Failing for the past 1 build (Since > [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! > #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] ) > [Took 89 > ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history] > > Error Message > org.apache.hadoop.security.KerberosAuthException: failure to login: for > principal: router/localh...@example.com from keytab > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab > javax.security.auth.login.LoginException: Integrity check on decrypted field > failed (31) - PREAUTH_FAILED > h3. Stacktrace > org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.security.KerberosAuthException: failure to login: for > principal: router/localh...@example.com from keytab > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab > javax.security.auth.login.LoginException: Integrity check on decrypted field > failed (31) - PREAUTH_FAILED at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) > at > org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.setup(TestRouterHttpDelegationToken.java:99) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Caused by: org.apache.hadoop.security.KerberosAuthException: failure to > login: for principal: router/localh...@example.com from keytab >
[jira] [Commented] (HDFS-14851) WebHdfs Returns 200 Status Code for Open of Files with Corrupt Blocks
[ https://issues.apache.org/jira/browse/HDFS-14851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933784#comment-16933784 ] CR Hota commented on HDFS-14851: [Íñigo Goiri|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=elgoiri] Thanks for tagging me. [Danny Becker|http://jira/secure/ViewProfile.jspa?name=dannytbecker] Thanks for working on this. Yes, we do use webhdfs but haven't come across a scenario like this yet. The change looks quite expensive performance wise for all calls to fix response code. Iterating through all blocks to find what is corrupted or not looks expensive especially when 1048576 is the limit of blocks per file. We may want to rather expose an API through InputStream that exposes List of all corrupted blocks (just like it exposes getAllBlocks), if the size of this list is positive, this web call can throw BlockMissingException. Cc [~xkrogen] [~jojochuang] > WebHdfs Returns 200 Status Code for Open of Files with Corrupt Blocks > - > > Key: HDFS-14851 > URL: https://issues.apache.org/jira/browse/HDFS-14851 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Danny Becker >Assignee: Danny Becker >Priority: Minor > Attachments: HDFS-14851.001.patch > > > WebHdfs returns 200 status code for Open operations on files with missing or > corrupt blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test
[ https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933626#comment-16933626 ] CR Hota commented on HDFS-14461: [~elgoiri] Thanks for the comment. This is a common issue with jiras that are "dependent". We should commit HDFS-14609 as its meant to solve a specific problem and continue working on this. Meanwhile [~hexiaoqiao] can you please help by backporting the change in HDFS-14609 to your workspace and help fix this issue. Will be happy to help if you need any. > RBF: Fix intermittently failing kerberos related unit test > -- > > Key: HDFS-14461 > URL: https://issues.apache.org/jira/browse/HDFS-14461 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14461.001.patch, HDFS-14461.002.patch > > > TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It > may be due to some race condition before using the keytab that's created for > testing. > > {code:java} > Failed > org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken > Failing for the past 1 build (Since > [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! > #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] ) > [Took 89 > ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history] > > Error Message > org.apache.hadoop.security.KerberosAuthException: failure to login: for > principal: router/localh...@example.com from keytab > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab > javax.security.auth.login.LoginException: Integrity check on decrypted field > failed (31) - PREAUTH_FAILED > h3. Stacktrace > org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.security.KerberosAuthException: failure to login: for > principal: router/localh...@example.com from keytab > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab > javax.security.auth.login.LoginException: Integrity check on decrypted field > failed (31) - PREAUTH_FAILED at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) > at > org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.setup(TestRouterHttpDelegationToken.java:99) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Caused by:
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932738#comment-16932738 ] CR Hota commented on HDFS-14090: Hey [~elgoiri] Should we commit this? Most of the folks had already reviewed the patch earlier. Thoughts? > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14431) RBF: Rename with multiple subclusters should fail if no eligible locations
[ https://issues.apache.org/jira/browse/HDFS-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930830#comment-16930830 ] CR Hota commented on HDFS-14431: [~elgoiri] Many thanks for all the work done so far. Took a look at the patch and approach seems error prone as the operations in totality are NOT atomic. Filesystems are not transactional in nature. Since rename is very hard to get right, may I suggest we approach it as we did with some other features. Let's come up with a design doc and write down the issues, possible approaches and what all use cases we can solve and can't. We can all collaborate. Please count me in. For someone new, its very hard to get the context of what is being solved and what use cases are not. On a side note, with the lack of atomic renames here is how we are approaching renames in the short term. Most query engines (ex Hive) are equipped to handle rename failure by initiating a copy. In the scenario where rename is across clusters, hive is instructed to invoke a copy operation. FYI [~ayushtkn] [~xuzq_zander] > RBF: Rename with multiple subclusters should fail if no eligible locations > -- > > Key: HDFS-14431 > URL: https://issues.apache.org/jira/browse/HDFS-14431 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-14431-HDFS-13891.001.patch, > HDFS-14431-HDFS-13891.002.patch, HDFS-14431-HDFS-13891.003.patch, > HDFS-14431-HDFS-13891.004.patch, HDFS-14431-HDFS-13891.005.patch, > HDFS-14431-HDFS-13891.006.patch, HDFS-14431-HDFS-13891.007.patch > > > Currently, the rename will fail with FileNotFoundException which is not clear > to the user. > The operation should fail stating the reason is that there are no eligible > destinations. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928913#comment-16928913 ] CR Hota commented on HDFS-14090: [~elgoiri] Thanks for the final review. [~brahmareddy] [~aajisaka] [~xkrogen] [~hexiaoqiao] [~linyiqun] [~tanyuxin] Gentle ping. Let me know if you folks have any final thoughts on v014.patch. I am trying to see if we can target this with 3.3 release. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928774#comment-16928774 ] CR Hota commented on HDFS-14090: [~elgoiri] Thanks for the review. Uploaded v014. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Attachment: HDFS-14090.014.patch > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928750#comment-16928750 ] CR Hota commented on HDFS-14090: [~elgoiri] Thanks a lot for the clarification. Have taken care of all review comments in the latest v013 patch. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928747#comment-16928747 ] CR Hota commented on HDFS-14609: [~tasanuma] You are correct. HDFS-14461 was created to address the issue. > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, > HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch, > HDFS-14609.006.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Attachment: HDFS-14090.013.patch > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928195#comment-16928195 ] CR Hota commented on HDFS-14609: [~zhangchen] Thanks for the clarification. I think we are good. Anyways InvalidToken is captured in other tests as well. +1 for 006.patch. > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, > HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch, > HDFS-14609.006.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927242#comment-16927242 ] CR Hota commented on HDFS-14609: [~zhangchen] Thanks for the ping and patch. After cancellation of token, we should try to renew and get InvalidToken exception. How do we validate InvalidToken exception test? Am i missing something? > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, > HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926816#comment-16926816 ] CR Hota commented on HDFS-14090: [~elgoiri] Thanks for the review. Sorry, couldn't understand the second point. The idea of the test is to make sure all threads finish execution, threads gracefully shutdown and then metrics analyzed based on if fairness is enabled/disabled. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14774) RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling
[ https://issues.apache.org/jira/browse/HDFS-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926050#comment-16926050 ] CR Hota commented on HDFS-14774: Hey [~jojochuang], Do you have any follow up questions or shall we close this? > RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling > - > > Key: HDFS-14774 > URL: https://issues.apache.org/jira/browse/HDFS-14774 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Minor > > HDFS-13972 added the following code: > {code} > try { > dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE); > } catch (IOException e) { > LOG.error("Cannot get the datanodes from the RPC server", e); > } finally { > // Reset ugi to remote user for remaining operations. > RouterRpcServer.resetCurrentUser(); > } > HashSet excludes = new HashSet(); > if (excludeDatanodes != null) { > Collection collection = > getTrimmedStringCollection(excludeDatanodes); > for (DatanodeInfo dn : dns) { > if (collection.contains(dn.getName())) { > excludes.add(dn); > } > } > } > {code} > If {{rpcServer.getDatanodeReport()}} throws an exception, {{dns}} will become > null. This does't look like the best way to handle the exception. Should > router retry upon exception? Does it perform retry automatically under the > hood? > [~crh] [~brahmareddy] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Attachment: HDFS-14090.012.patch > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924427#comment-16924427 ] CR Hota commented on HDFS-14090: [~elgoiri] Thanks for the reviews. Some thoughts below. {quote}My main issue is that PermitAllocationException is too generic. As you mention, it currently covers both (1) not enough handlers and (2) missconfigured nameservices. I think they should be two separate exceptions. The #1 case makes sense but the other one seems more like an IllegalArgumentException {quote} Both are theoretically misconfigurations and hence wanted to keep them under the same umbrella of PermitAllocationException which all implementations should throw if allocation fails, and this failure will happen due to mis configurations. {quote} BTW, should we also add the fairness per user to the Router RPC server? It would go to a separate JIRA though. {quote} Fairness at user level can still be enabled via FairCallQueue. We don't need to add anything separate from Router's perspective. With HADOOP-16268 already checked in, fairness along with balancing across routers is taken care of to a large extent. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14784) Add more methods to WebHdfsTestUtil to support tests outside of package
[ https://issues.apache.org/jira/browse/HDFS-14784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923845#comment-16923845 ] CR Hota commented on HDFS-14784: [~elgoiri] Thanks for pointing out. +1 for 002.patch > Add more methods to WebHdfsTestUtil to support tests outside of package > --- > > Key: HDFS-14784 > URL: https://issues.apache.org/jira/browse/HDFS-14784 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14784.001.patch, HDFS-14784.002.patch, > HDFS-14784.002.patch > > > Before HDFS-14434, we can access a secure cluster by WebHDFS using user.name > parameter and {{PseudoAuthenticationHandler}} without kerberos > authentication, it's quite useful for some test situation. > HDFS-14434 ignores user.name query parameter in secure WebHDFS when we using > WebHdfsFileSystem, so the only way to use user.name parameter is to access by > URL. > This Jira try to add more methods to WebHdfsTestUtil to support UT out of > package to test WebHDFS in customize way. > More background and discuss, see HDFS-14609. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14784) Add more methods to WebHdfsTestUtil to support tests outside of package
[ https://issues.apache.org/jira/browse/HDFS-14784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923827#comment-16923827 ] CR Hota commented on HDFS-14784: [~zhangchen] Thanks for the latest patch. Looks good to me too. Yes, maybe it's an overkill to add a test for the wrapper. When we use the function, we will obviously add the test which will anyways automatically use it. > Add more methods to WebHdfsTestUtil to support tests outside of package > --- > > Key: HDFS-14784 > URL: https://issues.apache.org/jira/browse/HDFS-14784 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14784.001.patch, HDFS-14784.002.patch, > HDFS-14784.002.patch > > > Before HDFS-14434, we can access a secure cluster by WebHDFS using user.name > parameter and {{PseudoAuthenticationHandler}} without kerberos > authentication, it's quite useful for some test situation. > HDFS-14434 ignores user.name query parameter in secure WebHDFS when we using > WebHdfsFileSystem, so the only way to use user.name parameter is to access by > URL. > This Jira try to add more methods to WebHdfsTestUtil to support UT out of > package to test WebHDFS in customize way. > More background and discuss, see HDFS-14609. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919031#comment-16919031 ] CR Hota commented on HDFS-14090: [~elgoiri] Thanks for the comments. Uploaded 011.patch Have taken care of the nits except the below. * I think we can make PermitAllocationException more specific (right now it just takes whatever string). I think it would be nice to have the messages in PermitAllocationException itself and we would just pass the number of handlers, the min and the nsId as a parameter. This is already nice in PermitLimitExceededException. * StaticFairnessPolicyController#184 can fit in one line. Actually, this might be better to have as a different exception (which can be a subclass of PermitAllocationException). PermitAllocationException can happen not just for misconfigured handlers but also for misconfigured nameservices, hence deliberately kept it generic with a String msg as param. Also dint create a subclass for permitallocation again just to keep it simple in the beginning. I suggest we can always re-look at refactoring these based on how dynamic allocation work shapes up. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14774) RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling
[ https://issues.apache.org/jira/browse/HDFS-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919011#comment-16919011 ] CR Hota edited comment on HDFS-14774 at 8/29/19 10:32 PM: -- [~jojochuang] Thanks for reporting this. This is ok at this point. Reason being, router has 2 layers, One the server to external clients and client to downstream namenodes. Client to downstream namenodes (aka RouterRpcClient) is configured to retry multiple times based on failures from downstream namenode. It also has logic to failover and try standby namenode if standby becomes active etc. So ya retries are present before dns comes back as null. And if it does come back as null then parent method sends back an appropriate IOexception. {code:java} if (dn == null) { throw new IOException("Failed to find datanode, suggest to check cluster" + " health. excludeDatanodes=" + excludeDatanodes); } {code} Let me know if this helps ? was (Author: crh): [~jojochuang] Thanks for reporting this. This is ok at this point. Reason being, router has 2 layers, One the server to external clients and client to downstream namenodes. Client to downstream namenodes (aka RouterRpcClient) is configured to retry multiple times based on failures from downstream namenode. It also has logic to failover and try standby namenode if standby becomes active etc. So ya retries are present before dns comes back as null. And if it does come back as null then parent method does send back an appropriate IOexception. {code:java} if (dn == null) { throw new IOException("Failed to find datanode, suggest to check cluster" + " health. excludeDatanodes=" + excludeDatanodes); } {code} Let me know if this helps ? > RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling > - > > Key: HDFS-14774 > URL: https://issues.apache.org/jira/browse/HDFS-14774 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Minor > > HDFS-13972 added the following code: > {code} > try { > dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE); > } catch (IOException e) { > LOG.error("Cannot get the datanodes from the RPC server", e); > } finally { > // Reset ugi to remote user for remaining operations. > RouterRpcServer.resetCurrentUser(); > } > HashSet excludes = new HashSet(); > if (excludeDatanodes != null) { > Collection collection = > getTrimmedStringCollection(excludeDatanodes); > for (DatanodeInfo dn : dns) { > if (collection.contains(dn.getName())) { > excludes.add(dn); > } > } > } > {code} > If {{rpcServer.getDatanodeReport()}} throws an exception, {{dns}} will become > null. This does't look like the best way to handle the exception. Should > router retry upon exception? Does it perform retry automatically under the > hood? > [~crh] [~brahmareddy] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14774) RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling
[ https://issues.apache.org/jira/browse/HDFS-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919011#comment-16919011 ] CR Hota commented on HDFS-14774: [~jojochuang] Thanks for reporting this. This is ok at this point. Reason being, router has 2 layers, One the server to external clients and client to downstream namenodes. Client to downstream namenodes (aka RouterRpcClient) is configured to retry multiple times based on failures from downstream namenode. It also has logic to failover and try standby namenode if standby becomes active etc. So ya retries are present before dns comes back as null. And if it does come back as null then parent method does send back an appropriate IOexception. {code:java} if (dn == null) { throw new IOException("Failed to find datanode, suggest to check cluster" + " health. excludeDatanodes=" + excludeDatanodes); } {code} Let me know if this helps ? > RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling > - > > Key: HDFS-14774 > URL: https://issues.apache.org/jira/browse/HDFS-14774 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Minor > > HDFS-13972 added the following code: > {code} > try { > dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE); > } catch (IOException e) { > LOG.error("Cannot get the datanodes from the RPC server", e); > } finally { > // Reset ugi to remote user for remaining operations. > RouterRpcServer.resetCurrentUser(); > } > HashSet excludes = new HashSet(); > if (excludeDatanodes != null) { > Collection collection = > getTrimmedStringCollection(excludeDatanodes); > for (DatanodeInfo dn : dns) { > if (collection.contains(dn.getName())) { > excludes.add(dn); > } > } > } > {code} > If {{rpcServer.getDatanodeReport()}} throws an exception, {{dns}} will become > null. This does't look like the best way to handle the exception. Should > router retry upon exception? Does it perform retry automatically under the > hood? > [~crh] [~brahmareddy] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14774) RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling
[ https://issues.apache.org/jira/browse/HDFS-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota reassigned HDFS-14774: -- Assignee: CR Hota > RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling > - > > Key: HDFS-14774 > URL: https://issues.apache.org/jira/browse/HDFS-14774 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Minor > > HDFS-13972 added the following code: > {code} > try { > dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE); > } catch (IOException e) { > LOG.error("Cannot get the datanodes from the RPC server", e); > } finally { > // Reset ugi to remote user for remaining operations. > RouterRpcServer.resetCurrentUser(); > } > HashSet excludes = new HashSet(); > if (excludeDatanodes != null) { > Collection collection = > getTrimmedStringCollection(excludeDatanodes); > for (DatanodeInfo dn : dns) { > if (collection.contains(dn.getName())) { > excludes.add(dn); > } > } > } > {code} > If {{rpcServer.getDatanodeReport()}} throws an exception, {{dns}} will become > null. This does't look like the best way to handle the exception. Should > router retry upon exception? Does it perform retry automatically under the > hood? > [~crh] [~brahmareddy] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14784) Add more methods to WebHdfsTestUtil to support tests outside of package
[ https://issues.apache.org/jira/browse/HDFS-14784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919003#comment-16919003 ] CR Hota commented on HDFS-14784: [~zhangchen] Thanks for working on this. Overall patch looks fine. Is it possible to add a test for WebHdfsFileSystem#convertJsonToDelegationToken ? This is the new method we added here. > Add more methods to WebHdfsTestUtil to support tests outside of package > --- > > Key: HDFS-14784 > URL: https://issues.apache.org/jira/browse/HDFS-14784 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14784.001.patch > > > Before HDFS-14434, we can access a secure cluster by WebHDFS using user.name > parameter and {{PseudoAuthenticationHandler}} without kerberos > authentication, it's quite useful for some test situation. > HDFS-14434 ignores user.name query parameter in secure WebHDFS when we using > WebHdfsFileSystem, so the only way to use user.name parameter is to access by > URL. > This Jira try to add more methods to WebHdfsTestUtil to support UT out of > package to test WebHDFS in customize way. > More background and discuss, see HDFS-14609. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Attachment: HDFS-14090.011.patch > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14793) BlockTokenSecretManager should LOG block token range it operates on.
[ https://issues.apache.org/jira/browse/HDFS-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14793: --- Summary: BlockTokenSecretManager should LOG block token range it operates on. (was: BlockTokenSecretManager should LOG block tokaen range it operates on.) > BlockTokenSecretManager should LOG block token range it operates on. > > > Key: HDFS-14793 > URL: https://issues.apache.org/jira/browse/HDFS-14793 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > > At startup log enough information to identified the range of block token keys > for the NameNode. This should make it easier to debug issues with block > tokens. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916840#comment-16916840 ] CR Hota commented on HDFS-14090: Hey [~elgoiri] [~brahmareddy] [~aajisaka] [~xkrogen] , Could you help take a final look and commit 010.patch? I have already broken down this Jira to static and dynamic. Once this is committed we can focus on designing dynamic allocation model. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14760) Log INFO mode if snapshot usage and actual usage differ
[ https://issues.apache.org/jira/browse/HDFS-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916836#comment-16916836 ] CR Hota commented on HDFS-14760: [~jojochuang] Thanks for the review. Could you help commit 002.patch? > Log INFO mode if snapshot usage and actual usage differ > --- > > Key: HDFS-14760 > URL: https://issues.apache.org/jira/browse/HDFS-14760 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14760.001.patch, HDFS-14760.002.patch > > > In DirectoryWithQuotaFeature#checkStoragespace code logs in error mode > without throwing any exceptions or action and pollutes logs. This should be > in INFO mode. > {code} > private void checkStoragespace(final INodeDirectory dir, final long > computed) { > if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) > { > NameNode.LOG.error("BUG: Inconsistent storagespace for directory " > + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() > + " != Computed = " + computed); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916101#comment-16916101 ] CR Hota commented on HDFS-14609: [~zhangchen] Thanks for the ping and clarifications. It makes sense why hdfs changes are needed, lets still do the hdfs changes in a separate Jira and then fix these tests after. > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14760) Log INFO mode if snapshot usage and actual usage differ
[ https://issues.apache.org/jira/browse/HDFS-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913826#comment-16913826 ] CR Hota commented on HDFS-14760: [~jojochuang] Thanks! Seems 002.patch is safe to commit. > Log INFO mode if snapshot usage and actual usage differ > --- > > Key: HDFS-14760 > URL: https://issues.apache.org/jira/browse/HDFS-14760 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14760.001.patch, HDFS-14760.002.patch > > > In DirectoryWithQuotaFeature#checkStoragespace code logs in error mode > without throwing any exceptions or action and pollutes logs. This should be > in INFO mode. > {code} > private void checkStoragespace(final INodeDirectory dir, final long > computed) { > if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) > { > NameNode.LOG.error("BUG: Inconsistent storagespace for directory " > + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() > + " != Computed = " + computed); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14760) Log INFO mode if snapshot usage and actual usage differ
[ https://issues.apache.org/jira/browse/HDFS-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14760: --- Attachment: HDFS-14760.002.patch > Log INFO mode if snapshot usage and actual usage differ > --- > > Key: HDFS-14760 > URL: https://issues.apache.org/jira/browse/HDFS-14760 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14760.001.patch, HDFS-14760.002.patch > > > In DirectoryWithQuotaFeature#checkStoragespace code logs in error mode > without throwing any exceptions or action and pollutes logs. This should be > in INFO mode. > {code} > private void checkStoragespace(final INodeDirectory dir, final long > computed) { > if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) > { > NameNode.LOG.error("BUG: Inconsistent storagespace for directory " > + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() > + " != Computed = " + computed); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14760) Log INFO mode if snapshot usage and actual usage differ
[ https://issues.apache.org/jira/browse/HDFS-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913522#comment-16913522 ] CR Hota commented on HDFS-14760: [~xkrogen] Thanks for the review. 'WARN' makes sense too. Honestly I haven't been able to wrap around my head on the whole feature yet and how to handle these cases. But at this point, our hdfs installation wants to make sure no 'ERROR' is logged if it's not really an error that should/can be actionized. > Log INFO mode if snapshot usage and actual usage differ > --- > > Key: HDFS-14760 > URL: https://issues.apache.org/jira/browse/HDFS-14760 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14760.001.patch > > > In DirectoryWithQuotaFeature#checkStoragespace code logs in error mode > without throwing any exceptions or action and pollutes logs. This should be > in INFO mode. > {code} > private void checkStoragespace(final INodeDirectory dir, final long > computed) { > if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) > { > NameNode.LOG.error("BUG: Inconsistent storagespace for directory " > + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() > + " != Computed = " + computed); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14760) Log INFO mode if snapshot usage and actual usage differ
[ https://issues.apache.org/jira/browse/HDFS-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911837#comment-16911837 ] CR Hota commented on HDFS-14760: Thanks [~jojochuang] Adding some more folks for context. [~ayushtkn] [~xkrogen] [~RANith] [~brahmareddy] . > Log INFO mode if snapshot usage and actual usage differ > --- > > Key: HDFS-14760 > URL: https://issues.apache.org/jira/browse/HDFS-14760 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14760.001.patch > > > In DirectoryWithQuotaFeature#checkStoragespace code logs in error mode > without throwing any exceptions or action and pollutes logs. This should be > in INFO mode. > {code} > private void checkStoragespace(final INodeDirectory dir, final long > computed) { > if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) > { > NameNode.LOG.error("BUG: Inconsistent storagespace for directory " > + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() > + " != Computed = " + computed); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14403) Cost-Based RPC FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14403: --- Description: *strong text*HADOOP-15016 initially described extensions to the Hadoop FairCallQueue encompassing both cost-based analysis of incoming RPCs, as well as support for reservations of RPC capacity for system/platform users. This JIRA intends to track the former, as HADOOP-15016 was repurposed to more specifically focus on the reservation portion of the work. (was: HADOOP-15016 initially described extensions to the Hadoop FairCallQueue encompassing both cost-based analysis of incoming RPCs, as well as support for reservations of RPC capacity for system/platform users. This JIRA intends to track the former, as HADOOP-15016 was repurposed to more specifically focus on the reservation portion of the work.) > Cost-Based RPC FairCallQueue > > > Key: HDFS-14403 > URL: https://issues.apache.org/jira/browse/HDFS-14403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc, namenode >Reporter: Erik Krogen >Assignee: Christopher Gregorian >Priority: Major > Labels: qos, rpc > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: CostBasedFairCallQueueDesign_v0.pdf, > HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, > HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, > HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, > HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, > HDFS-14403.012.patch, HDFS-14403.013.patch, HDFS-14403.branch-2.8.patch > > > *strong text*HADOOP-15016 initially described extensions to the Hadoop > FairCallQueue encompassing both cost-based analysis of incoming RPCs, as well > as support for reservations of RPC capacity for system/platform users. This > JIRA intends to track the former, as HADOOP-15016 was repurposed to more > specifically focus on the reservation portion of the work. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14403) Cost-Based RPC FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14403: --- Description: HADOOP-15016 initially described extensions to the Hadoop FairCallQueue encompassing both cost-based analysis of incoming RPCs, as well as support for reservations of RPC capacity for system/platform users. This JIRA intends to track the former, as HADOOP-15016 was repurposed to more specifically focus on the reservation portion of the work. (was: *strong text*HADOOP-15016 initially described extensions to the Hadoop FairCallQueue encompassing both cost-based analysis of incoming RPCs, as well as support for reservations of RPC capacity for system/platform users. This JIRA intends to track the former, as HADOOP-15016 was repurposed to more specifically focus on the reservation portion of the work.) > Cost-Based RPC FairCallQueue > > > Key: HDFS-14403 > URL: https://issues.apache.org/jira/browse/HDFS-14403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc, namenode >Reporter: Erik Krogen >Assignee: Christopher Gregorian >Priority: Major > Labels: qos, rpc > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: CostBasedFairCallQueueDesign_v0.pdf, > HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, > HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, > HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, > HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, > HDFS-14403.012.patch, HDFS-14403.013.patch, HDFS-14403.branch-2.8.patch > > > HADOOP-15016 initially described extensions to the Hadoop FairCallQueue > encompassing both cost-based analysis of incoming RPCs, as well as support > for reservations of RPC capacity for system/platform users. This JIRA intends > to track the former, as HADOOP-15016 was repurposed to more specifically > focus on the reservation portion of the work. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14760) Log INFO mode if snapshot usage and actual usage differ
[ https://issues.apache.org/jira/browse/HDFS-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911816#comment-16911816 ] CR Hota commented on HDFS-14760: [~jojochuang] Could you help take a look at this? Am not very familiar why historically this check was added and also that it doesn't take any action but logged in error mode. > Log INFO mode if snapshot usage and actual usage differ > --- > > Key: HDFS-14760 > URL: https://issues.apache.org/jira/browse/HDFS-14760 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14760.001.patch > > > In DirectoryWithQuotaFeature#checkStoragespace code logs in error mode > without throwing any exceptions or action and pollutes logs. This should be > in INFO mode. > {code} > private void checkStoragespace(final INodeDirectory dir, final long > computed) { > if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) > { > NameNode.LOG.error("BUG: Inconsistent storagespace for directory " > + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() > + " != Computed = " + computed); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14760) Log INFO mode if snapshot usage and actual usage differ
[ https://issues.apache.org/jira/browse/HDFS-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14760: --- Attachment: HDFS-14760.001.patch Status: Patch Available (was: Open) > Log INFO mode if snapshot usage and actual usage differ > --- > > Key: HDFS-14760 > URL: https://issues.apache.org/jira/browse/HDFS-14760 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14760.001.patch > > > In DirectoryWithQuotaFeature#checkStoragespace code logs in error mode > without throwing any exceptions or action and pollutes logs. This should be > in INFO mode. > {code} > private void checkStoragespace(final INodeDirectory dir, final long > computed) { > if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) > { > NameNode.LOG.error("BUG: Inconsistent storagespace for directory " > + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() > + " != Computed = " + computed); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14760) Log INFO mode if snapshot usage and actual usage differ
[ https://issues.apache.org/jira/browse/HDFS-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14760: --- Description: In DirectoryWithQuotaFeature#checkStoragespace code logs in error mode without throwing any exceptions or action and pollutes logs. This should be in INFO mode. {code} private void checkStoragespace(final INodeDirectory dir, final long computed) { if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) { NameNode.LOG.error("BUG: Inconsistent storagespace for directory " + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() + " != Computed = " + computed); } } {code} was: {code} private void checkStoragespace(final INodeDirectory dir, final long computed) { if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) { NameNode.LOG.error("BUG: Inconsistent storagespace for directory " + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() + " != Computed = " + computed); } } {code} The above code logs in error mode without throwing any exceptions or action and pollutes logs. This should be in INFO mode. > Log INFO mode if snapshot usage and actual usage differ > --- > > Key: HDFS-14760 > URL: https://issues.apache.org/jira/browse/HDFS-14760 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > > In DirectoryWithQuotaFeature#checkStoragespace code logs in error mode > without throwing any exceptions or action and pollutes logs. This should be > in INFO mode. > {code} > private void checkStoragespace(final INodeDirectory dir, final long > computed) { > if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) > { > NameNode.LOG.error("BUG: Inconsistent storagespace for directory " > + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() > + " != Computed = " + computed); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14760) Log INFO mode if snapshot usage and actual usage differ
CR Hota created HDFS-14760: -- Summary: Log INFO mode if snapshot usage and actual usage differ Key: HDFS-14760 URL: https://issues.apache.org/jira/browse/HDFS-14760 Project: Hadoop HDFS Issue Type: Improvement Reporter: CR Hota Assignee: CR Hota {code} private void checkStoragespace(final INodeDirectory dir, final long computed) { if (-1 != quota.getStorageSpace() && usage.getStorageSpace() != computed) { NameNode.LOG.error("BUG: Inconsistent storagespace for directory " + dir.getFullPathName() + ". Cached = " + usage.getStorageSpace() + " != Computed = " + computed); } } {code} The above code logs in error mode without throwing any exceptions or action and pollutes logs. This should be in INFO mode. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12510) RBF: Add security to UI
[ https://issues.apache.org/jira/browse/HDFS-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910726#comment-16910726 ] CR Hota commented on HDFS-12510: Thanks [~elgoiri] > RBF: Add security to UI > --- > > Key: HDFS-12510 > URL: https://issues.apache.org/jira/browse/HDFS-12510 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: CR Hota >Priority: Major > Labels: RBF > > HDFS-12273 implemented the UI for Router Based Federation without security. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12510) RBF: Add security to UI
[ https://issues.apache.org/jira/browse/HDFS-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota resolved HDFS-12510. Resolution: Resolved > RBF: Add security to UI > --- > > Key: HDFS-12510 > URL: https://issues.apache.org/jira/browse/HDFS-12510 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: CR Hota >Priority: Major > Labels: RBF > > HDFS-12273 implemented the UI for Router Based Federation without security. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Parent: (was: HDFS-14603) Issue Type: New Feature (was: Sub-task) > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14750) RBF: Improved isolation for downstream name nodes. {Dynamic}
CR Hota created HDFS-14750: -- Summary: RBF: Improved isolation for downstream name nodes. {Dynamic} Key: HDFS-14750 URL: https://issues.apache.org/jira/browse/HDFS-14750 Project: Hadoop HDFS Issue Type: Improvement Reporter: CR Hota Assignee: CR Hota This Jira tracks the work around dynamic allocation of resources in routers for downstream hdfs clusters. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Summary: RBF: Improved isolation for downstream name nodes. {Static} (was: RBF: Improved isolation for downstream name nodes.) > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14749) RBF: Isolation across multiple downstream hdfs clusters
[ https://issues.apache.org/jira/browse/HDFS-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14749: --- Summary: RBF: Isolation across multiple downstream hdfs clusters (was: Isolation across multiple downstream hdfs clusters) > RBF: Isolation across multiple downstream hdfs clusters > --- > > Key: HDFS-14749 > URL: https://issues.apache.org/jira/browse/HDFS-14749 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > > This parent Jira tracks all work done within the context of router isolation > across multiple downstream hdfs clusters. > # Phase 1 will be static allocation of resources. > # Phase 2 will introduce more dynamic approach/preemption based. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14749) Isolation across multiple downstream hdfs clusters
CR Hota created HDFS-14749: -- Summary: Isolation across multiple downstream hdfs clusters Key: HDFS-14749 URL: https://issues.apache.org/jira/browse/HDFS-14749 Project: Hadoop HDFS Issue Type: Sub-task Reporter: CR Hota Assignee: CR Hota This parent Jira tracks all work done within the context of router isolation across multiple downstream hdfs clusters. # Phase 1 will be static allocation of resources. # Phase 2 will introduce more dynamic approach/preemption based. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910708#comment-16910708 ] CR Hota commented on HDFS-14090: [~hexiaoqiao] Thanks for the review. The points you raised are very valid. In the design doc also I have mentioned that at some point we need to introduce and look into preemption/dynamic allocation. Yes, but for Phase 1 the current patch will help installations move forward with the concept of isolation. Dynamic/Preemption will obviously be a separate implementation of {{FairnessPolicyController}}. I will open a ticket to track this next phase. This would also need a through design analysis and review. Lets wait for [~elgoiri] [~brahmareddy] [~aajisaka] [~xkrogen] to review the 010 patch. > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910618#comment-16910618 ] CR Hota commented on HDFS-14609: [~zhangchen] Thanks for the patch. Few points # Lets not add hadoop-hdfs and rbf changes together. For any changes needed to hdfs that rbf depends on, can be first done in hdfs through a separate jira. Feel free to create one. # It's not clear why hadoop-hdfs changes are needed in this context. # Let's use the configs from DFSConfigs instead of defining them in the test class again for ex : private static final String HTTP_KERBEROS_PRINCIPAL_CONF_KEY = "hadoop.http.authentication.kerberos.principal"; > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14744) RBF: Non secured routers should not log in error mode when UGI is default.
[ https://issues.apache.org/jira/browse/HDFS-14744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910186#comment-16910186 ] CR Hota commented on HDFS-14744: [~ayushtkn] Thanks for the review. > RBF: Non secured routers should not log in error mode when UGI is default. > -- > > Key: HDFS-14744 > URL: https://issues.apache.org/jira/browse/HDFS-14744 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14744.001.patch > > > RouterClientProtocol#getMountPointStatus logs error when groups are not found > for default web user dr.who. The line should be logged in "error" mode for > secured cluster, for unsecured clusters, we may want to just specify "debug" > or else logs are filled up with this non-critical line > {{ERROR org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: > Cannot get the remote user: There is no primary group for UGI dr.who > (auth:SIMPLE)}} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12510) RBF: Add security to UI
[ https://issues.apache.org/jira/browse/HDFS-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909449#comment-16909449 ] CR Hota commented on HDFS-12510: [~elgoiri] [~brahmareddy] Should we mark this done? we can revisit if any issues are reported in the future. > RBF: Add security to UI > --- > > Key: HDFS-12510 > URL: https://issues.apache.org/jira/browse/HDFS-12510 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: CR Hota >Priority: Major > Labels: RBF > > HDFS-12273 implemented the UI for Router Based Federation without security. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14744) RBF: Non secured routers should not log in error mode when UGI is default.
[ https://issues.apache.org/jira/browse/HDFS-14744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14744: --- Attachment: HDFS-14744.001.patch Status: Patch Available (was: Open) > RBF: Non secured routers should not log in error mode when UGI is default. > -- > > Key: HDFS-14744 > URL: https://issues.apache.org/jira/browse/HDFS-14744 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14744.001.patch > > > RouterClientProtocol#getMountPointStatus logs error when groups are not found > for default web user dr.who. The line should be logged in "error" mode for > secured cluster, for unsecured clusters, we may want to just specify "debug" > or else logs are filled up with this non-critical line > {{ERROR org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: > Cannot get the remote user: There is no primary group for UGI dr.who > (auth:SIMPLE)}} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14744) RBF: Non secured routers should not log in error mode when UGI is default.
CR Hota created HDFS-14744: -- Summary: RBF: Non secured routers should not log in error mode when UGI is default. Key: HDFS-14744 URL: https://issues.apache.org/jira/browse/HDFS-14744 Project: Hadoop HDFS Issue Type: Sub-task Reporter: CR Hota Assignee: CR Hota RouterClientProtocol#getMountPointStatus logs error when groups are not found for default web user dr.who. The line should be logged in "error" mode for secured cluster, for unsecured clusters, we may want to just specify "debug" or else logs are filled up with this non-critical line {{ERROR org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Cannot get the remote user: There is no primary group for UGI dr.who (auth:SIMPLE)}} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908758#comment-16908758 ] CR Hota commented on HDFS-14090: [~xkrogen] [~elgoiri] Many thanks for the detailed reviews. Very helpful :) Have incorporated almost all the points you folks mentioned. On a high level, changes are # "permit" is still the word being used. # One configuration controls the feature, {{NoFairnessPolicyController}} is dummy whereas {{StaticFairnessPolicyController}} is the fairness implementation. # The whole start-up will fail if fairness class loading has issues. Test cases are appropriately changed to reflect that. # {{NoPermitAvailableException}} is renamed to {{PermitLimitExceededException.}} To [~xkrogen] observations, {quote}I was considering the scenario where there are two routers R1 and R2, and two NameNodes N1 and N2. Assume most clients need to access both N1 and N2. What happens in the situation when all of R1's N1-handlers are full (but N2-handlers mostly empty), and all of R2's N2-handlers are full (but N1-handlers mostly empty)? I'm not sure if this is a situation that is likely to arise, or if the system will easily self-heal based on the backoff behavior. Maybe worth thinking about a little--not a blocking concern for me, more of a thought experiment. {quote} It should ideally not happen that all handlers of a specific router are busy and other handlers are completely free, since clients are expected to use random order while connecting. However, from the beginning the design focuses on getting the system to self-heal as much as possible to eventually get similar traffic across all routers in a cluster. {quote}The configuration for this seems like it will be really tricky to get right, particularly knowing how many fan-out handlers to allocate. I imagine as an administrator, my thought process would be like: I want 35% allocated to NN1 and 65% allocated to NN2, since NN2 is about 2x as loaded as NN1. This part is fairly intuitive. Then I encounter the fan-out configuration... What am I supposed to do with it? Are there perhaps any heuristics we can provide for reasonable values? {quote} Yes, configurations values are something, which users have to pay attention to specially concurrent calls. In the documentation sub-Jira HDFS-14558, I plan to write more about the concurrent calls and some points for users to focus on. Also configurations may need to be changed by users based on new use cases and load on downstream clusters etc. [~aajisaka] [~brahmareddy] [~linyiqun] [~hexiaoqiao] FYI. > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908758#comment-16908758 ] CR Hota edited comment on HDFS-14090 at 8/16/19 5:58 AM: - [~xkrogen] [~elgoiri] Many thanks for the detailed reviews. Very helpful :) Have incorporated almost all the points you folks mentioned in 010.patch. On a high level, changes are # "permit" is still the word being used. # One configuration controls the feature, {{NoFairnessPolicyController}} is dummy whereas {{StaticFairnessPolicyController}} is the fairness implementation. # The whole start-up will fail if fairness class loading has issues. Test cases are appropriately changed to reflect that. # {{NoPermitAvailableException}} is renamed to {{PermitLimitExceededException.}} To [~xkrogen] observations, {quote}I was considering the scenario where there are two routers R1 and R2, and two NameNodes N1 and N2. Assume most clients need to access both N1 and N2. What happens in the situation when all of R1's N1-handlers are full (but N2-handlers mostly empty), and all of R2's N2-handlers are full (but N1-handlers mostly empty)? I'm not sure if this is a situation that is likely to arise, or if the system will easily self-heal based on the backoff behavior. Maybe worth thinking about a little--not a blocking concern for me, more of a thought experiment. {quote} It should ideally not happen that all handlers of a specific router are busy and other handlers are completely free, since clients are expected to use random order while connecting. However, from the beginning the design focuses on getting the system to self-heal as much as possible to eventually get similar traffic across all routers in a cluster. {quote}The configuration for this seems like it will be really tricky to get right, particularly knowing how many fan-out handlers to allocate. I imagine as an administrator, my thought process would be like: I want 35% allocated to NN1 and 65% allocated to NN2, since NN2 is about 2x as loaded as NN1. This part is fairly intuitive. Then I encounter the fan-out configuration... What am I supposed to do with it? Are there perhaps any heuristics we can provide for reasonable values? {quote} Yes, configurations values are something, which users have to pay attention to specially concurrent calls. In the documentation sub-Jira HDFS-14558, I plan to write more about the concurrent calls and some points for users to focus on. Also configurations may need to be changed by users based on new use cases and load on downstream clusters etc. [~aajisaka] [~brahmareddy] [~linyiqun] [~hexiaoqiao] FYI. was (Author: crh): [~xkrogen] [~elgoiri] Many thanks for the detailed reviews. Very helpful :) Have incorporated almost all the points you folks mentioned. On a high level, changes are # "permit" is still the word being used. # One configuration controls the feature, {{NoFairnessPolicyController}} is dummy whereas {{StaticFairnessPolicyController}} is the fairness implementation. # The whole start-up will fail if fairness class loading has issues. Test cases are appropriately changed to reflect that. # {{NoPermitAvailableException}} is renamed to {{PermitLimitExceededException.}} To [~xkrogen] observations, {quote}I was considering the scenario where there are two routers R1 and R2, and two NameNodes N1 and N2. Assume most clients need to access both N1 and N2. What happens in the situation when all of R1's N1-handlers are full (but N2-handlers mostly empty), and all of R2's N2-handlers are full (but N1-handlers mostly empty)? I'm not sure if this is a situation that is likely to arise, or if the system will easily self-heal based on the backoff behavior. Maybe worth thinking about a little--not a blocking concern for me, more of a thought experiment. {quote} It should ideally not happen that all handlers of a specific router are busy and other handlers are completely free, since clients are expected to use random order while connecting. However, from the beginning the design focuses on getting the system to self-heal as much as possible to eventually get similar traffic across all routers in a cluster. {quote}The configuration for this seems like it will be really tricky to get right, particularly knowing how many fan-out handlers to allocate. I imagine as an administrator, my thought process would be like: I want 35% allocated to NN1 and 65% allocated to NN2, since NN2 is about 2x as loaded as NN1. This part is fairly intuitive. Then I encounter the fan-out configuration... What am I supposed to do with it? Are there perhaps any heuristics we can provide for reasonable values? {quote} Yes, configurations values are something, which users have to pay attention to specially concurrent calls. In the documentation sub-Jira HDFS-14558, I plan to write more about the concurrent calls and some
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Attachment: HDFS-14090.010.patch > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8631) WebHDFS : Support setQuota
[ https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908496#comment-16908496 ] CR Hota commented on HDFS-8631: --- [~csun] Please ignore the TestRouter* test cases, they are tracked in HDFS-14609 > WebHDFS : Support setQuota > -- > > Key: HDFS-8631 > URL: https://issues.apache.org/jira/browse/HDFS-8631 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.7.2 >Reporter: nijel >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, > HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, > HDFS-8631-006.patch, HDFS-8631-007.patch, HDFS-8631-008.patch > > > User is able do quota management from filesystem object. Same operation can > be allowed trough REST API. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906462#comment-16906462 ] CR Hota commented on HDFS-14090: [~aajisaka] [~elgoiri] "Quota" may confuse admins/readers/developers with actual quota system present in router/hdfs ? > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905421#comment-16905421 ] CR Hota commented on HDFS-14090: [~xkrogen] :) > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13123) RBF: Add a balancer tool to move data across subcluster
[ https://issues.apache.org/jira/browse/HDFS-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905411#comment-16905411 ] CR Hota commented on HDFS-13123: [~hemanthboyina] Thanks for the initial patch. We may need a final design doc for this task, explaining some of the below points. # How is atomicity in distcp taken into account here? If distcp fails, destination cluster may have unused files lying around unaudited. May be user can specify atomicity flag through admin. # Will all the actual work be done by common yarn queue belonging to "router" irrespective of user ? # How are multiple rebalancings going to work if executed? Should admin maintain a state of what all rebalancing is in progress and what all completed. Some basic auditing at least. # How does this rebalancing work play with overall user quota management ? # Rebalancing across secured clusters? etc. > RBF: Add a balancer tool to move data across subcluster > > > Key: HDFS-13123 > URL: https://issues.apache.org/jira/browse/HDFS-13123 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei Yan >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS Router-Based Federation Rebalancer.pdf, > HDFS-13123.patch > > > Follow the discussion in HDFS-12615. This Jira is to track effort for > building a rebalancer tool, used by router-based federation to move data > among subclusters. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905391#comment-16905391 ] CR Hota commented on HDFS-14609: [~zhangchen] Thanks for the update. Appreciate you for digging into this. It may be a good idea to just work on trunk and not look into HDFS-13891 anymore and see how these tests can be fixed. As part of the filter work which was done in HADOOP-16314 and HADOOP-16354, there are some test case examples in them. You may want to take a look at those for reference. > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14715) RBF: Fix RBF failed tests
[ https://issues.apache.org/jira/browse/HDFS-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota resolved HDFS-14715. Resolution: Duplicate > RBF: Fix RBF failed tests > - > > Key: HDFS-14715 > URL: https://issues.apache.org/jira/browse/HDFS-14715 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > > including: > hadoop.hdfs.server.federation.router.TestRouterWithSecureStartup > hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14715) RBF: Fix RBF failed tests
[ https://issues.apache.org/jira/browse/HDFS-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904194#comment-16904194 ] CR Hota commented on HDFS-14715: [~elgoiri] Yeah, its duplicate. [~zhangchen] I have assigned HDFS-14609 to you, will be happy to help you get it going. > RBF: Fix RBF failed tests > - > > Key: HDFS-14715 > URL: https://issues.apache.org/jira/browse/HDFS-14715 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > > including: > hadoop.hdfs.server.federation.router.TestRouterWithSecureStartup > hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota reassigned HDFS-14609: -- Assignee: Chen Zhang (was: CR Hota) > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14705) Remove unused configuration dfs.min.replication
[ https://issues.apache.org/jira/browse/HDFS-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901737#comment-16901737 ] CR Hota commented on HDFS-14705: [~jojochuang] Thanks for the review. Should we commit this? > Remove unused configuration dfs.min.replication > --- > > Key: HDFS-14705 > URL: https://issues.apache.org/jira/browse/HDFS-14705 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Trivial > Attachments: HDFS-14705.001.patch > > > A few HDFS tests sets a configuration property dfs.min.replication. This is > not being used anywhere in the code. It doesn't seem like a leftover from > legacy code either. Better to clean them out. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14705) Remove unused configuration dfs.min.replication
[ https://issues.apache.org/jira/browse/HDFS-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14705: --- Attachment: HDFS-14705.001.patch > Remove unused configuration dfs.min.replication > --- > > Key: HDFS-14705 > URL: https://issues.apache.org/jira/browse/HDFS-14705 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Trivial > Attachments: HDFS-14705.001.patch > > > A few HDFS tests sets a configuration property dfs.min.replication. This is > not being used anywhere in the code. It doesn't seem like a leftover from > legacy code either. Better to clean them out. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14705) Remove unused configuration dfs.min.replication
[ https://issues.apache.org/jira/browse/HDFS-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14705: --- Status: Patch Available (was: Open) > Remove unused configuration dfs.min.replication > --- > > Key: HDFS-14705 > URL: https://issues.apache.org/jira/browse/HDFS-14705 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Trivial > Attachments: HDFS-14705.001.patch > > > A few HDFS tests sets a configuration property dfs.min.replication. This is > not being used anywhere in the code. It doesn't seem like a leftover from > legacy code either. Better to clean them out. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14705) Remove unused configuration dfs.min.replication
[ https://issues.apache.org/jira/browse/HDFS-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901438#comment-16901438 ] CR Hota commented on HDFS-14705: [~jojochuang] Found this in just 2 tests and yes as you said, its not getting used. Not sure why it was added in the first place. Can you help take a look at the patch? > Remove unused configuration dfs.min.replication > --- > > Key: HDFS-14705 > URL: https://issues.apache.org/jira/browse/HDFS-14705 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Trivial > Attachments: HDFS-14705.001.patch > > > A few HDFS tests sets a configuration property dfs.min.replication. This is > not being used anywhere in the code. It doesn't seem like a leftover from > legacy code either. Better to clean them out. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14705) Remove unused configuration dfs.min.replication
[ https://issues.apache.org/jira/browse/HDFS-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota reassigned HDFS-14705: -- Assignee: CR Hota > Remove unused configuration dfs.min.replication > --- > > Key: HDFS-14705 > URL: https://issues.apache.org/jira/browse/HDFS-14705 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Trivial > > A few HDFS tests sets a configuration property dfs.min.replication. This is > not being used anywhere in the code. It doesn't seem like a leftover from > legacy code either. Better to clean them out. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14705) Remove unused configuration dfs.min.replication
[ https://issues.apache.org/jira/browse/HDFS-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901332#comment-16901332 ] CR Hota commented on HDFS-14705: [~jojochuang] Thanks for creating this. May I assign this to myself? Will be good to know how this property has been moving around historically. > Remove unused configuration dfs.min.replication > --- > > Key: HDFS-14705 > URL: https://issues.apache.org/jira/browse/HDFS-14705 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Priority: Trivial > > A few HDFS tests sets a configuration property dfs.min.replication. This is > not being used anywhere in the code. It doesn't seem like a leftover from > legacy code either. Better to clean them out. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14704) RBF: NnId should not be null in NamenodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901300#comment-16901300 ] CR Hota commented on HDFS-14704: [~xuzq_zander] Thanks for the comment. The reason i think it's better to put the null check outside of the method is to avoid redundant null checks and also let method callers decide what params to pass. Also helps fail fast. createLocalNamenodeHeartbeatService already has a check for nnId == null, the code can utilize this existing check and not call createNamenodeHeartbeatService if nnId is null. > RBF: NnId should not be null in NamenodeHeartbeatService > > > Key: HDFS-14704 > URL: https://issues.apache.org/jira/browse/HDFS-14704 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14704-trunk-001.patch > > > NnId should not be null in NamenodeHeartbeatService. > If NnId is null, it will also print the error message like: > {code:java} > 2019-08-06 10:38:07,455 ERROR router.NamenodeHeartbeatService > (NamenodeHeartbeatService.java:updateState(229)) - Unhandled exception > updating NN registration for ns1:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:267) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14704) RBF: NnId should not be null in NamenodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14704: --- Issue Type: Sub-task (was: Improvement) Parent: HDFS-14603 > RBF: NnId should not be null in NamenodeHeartbeatService > > > Key: HDFS-14704 > URL: https://issues.apache.org/jira/browse/HDFS-14704 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14704-trunk-001.patch > > > NnId should not be null in NamenodeHeartbeatService. > If NnId is null, it will also print the error message like: > {code:java} > 2019-08-06 10:38:07,455 ERROR router.NamenodeHeartbeatService > (NamenodeHeartbeatService.java:updateState(229)) - Unhandled exception > updating NN registration for ns1:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:267) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14704) RBF: NnId should not be null in NamenodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14704: --- Summary: RBF: NnId should not be null in NamenodeHeartbeatService (was: RBF:NnId should not be null in NamenodeHeartbeatService) > RBF: NnId should not be null in NamenodeHeartbeatService > > > Key: HDFS-14704 > URL: https://issues.apache.org/jira/browse/HDFS-14704 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14704-trunk-001.patch > > > NnId should not be null in NamenodeHeartbeatService. > If NnId is null, it will also print the error message like: > {code:java} > 2019-08-06 10:38:07,455 ERROR router.NamenodeHeartbeatService > (NamenodeHeartbeatService.java:updateState(229)) - Unhandled exception > updating NN registration for ns1:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:267) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14704) RBF:NnId should not be null in NamenodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota reassigned HDFS-14704: -- Assignee: xuzq > RBF:NnId should not be null in NamenodeHeartbeatService > --- > > Key: HDFS-14704 > URL: https://issues.apache.org/jira/browse/HDFS-14704 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14704-trunk-001.patch > > > NnId should not be null in NamenodeHeartbeatService. > If NnId is null, it will also print the error message like: > {code:java} > 2019-08-06 10:38:07,455 ERROR router.NamenodeHeartbeatService > (NamenodeHeartbeatService.java:updateState(229)) - Unhandled exception > updating NN registration for ns1:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:267) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14704) RBF:NnId should not be null in NamenodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900671#comment-16900671 ] CR Hota commented on HDFS-14704: [~xuzq_zander] Thanks for reporting this and the patch. It may be good to add the check before the method is called. nsid check is already present, nnid can be clubbed with that. It will look something like below. {code:java} if (nsId != null && nnId != null) { NamenodeHeartbeatService heartbeatService = createNamenodeHeartbeatService(nsId, nnId); if (heartbeatService != null) { ret.put(heartbeatService.getNamenodeDesc(), heartbeatService); } } {code} Can you also add a test for this ? > RBF:NnId should not be null in NamenodeHeartbeatService > --- > > Key: HDFS-14704 > URL: https://issues.apache.org/jira/browse/HDFS-14704 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: xuzq >Priority: Major > Attachments: HDFS-14704-trunk-001.patch > > > NnId should not be null in NamenodeHeartbeatService. > If NnId is null, it will also print the error message like: > {code:java} > 2019-08-06 10:38:07,455 ERROR router.NamenodeHeartbeatService > (NamenodeHeartbeatService.java:updateState(229)) - Unhandled exception > updating NN registration for ns1:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:267) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14702) Datanode.ReplicaMap memory leak
[ https://issues.apache.org/jira/browse/HDFS-14702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900467#comment-16900467 ] CR Hota commented on HDFS-14702: [~hexiaoqiao] Thanks for reporting this. Can you try to backport HDFS-8859 to 2.7.1 installation you have and let us know how the heap dump looks like for the data node. > Datanode.ReplicaMap memory leak > --- > > Key: HDFS-14702 > URL: https://issues.apache.org/jira/browse/HDFS-14702 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: He Xiaoqiao >Priority: Major > > DataNode memory is occupied by ReplicaMaps and cause GC high frequency then > write performance degrade. > It is about 600K block replicas located at DataNode, but when dump heap, > there are over 8M items of ReplicaMaps and footprint over 500MB. It seems > that memory leak. One more situation, the block w/r ops is very high. > Do not test HDFS-8859 and no idea if it can solve this issue. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897621#comment-16897621 ] CR Hota commented on HDFS-14090: [~aajisaka] [~elgoiri] [~fengnanli] [~linyiqun] [~hexiaoqiao] Thanks for the previous reviews. Took care of most the comments in 009.patch. Below are 3 things that is still not changed. # I could not remove the log line from assignHandlersToNameservices before exception is thrown, since there is no other way to test how the assignment fails when the instance is created. Mainly to test the root cause and error message. # Did not add assertJ apis, the dependencies need to be added in pom and various places changed. I suggest lets introduce AssertJ api in a separate Jira for easier review. Will create one. # concurrent permits won't have any default value. Users are expected to specify the values based on load of the clusters. Default value assigned is actually a division of all threads with sum of all nameservices + 1 during run time. > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Attachment: HDFS-14090.009.patch > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Attachment: HDFS-14090.008.patch > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14678) Allow triggerBlockReport to a specific namenode
[ https://issues.apache.org/jira/browse/HDFS-14678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895754#comment-16895754 ] CR Hota edited comment on HDFS-14678 at 7/30/19 4:16 AM: - [~LeonG] This is a very important issue. Thanks for creating the ticket. Looking forward to the fix. was (Author: crh): [~LeonG] This is a very important. Thanks for creating the ticket. Looking forward to the fix. > Allow triggerBlockReport to a specific namenode > --- > > Key: HDFS-14678 > URL: https://issues.apache.org/jira/browse/HDFS-14678 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.8.2 >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > > In our largest prod cluster (running 2.8.2) we have >3k hosts. Every time > when rolling restarting NNs we will need to wait for block report which takes > >2.5 hours for each NN. > One way to make it faster is to manually trigger a full block report from all > datanodes. [HDFS-7278|https://issues.apache.org/jira/browse/HDFS-7278]. > However, the current triggerBlockReport command will trigger a block report > on all NNs which will flood the active NN as well. > A quick solution will be adding an option to specify a NN that the manually > triggered block report will go to, something like: > *_hdfs dfsadmin [-triggerBlockReport [-incremental] ] > [-namenode] _* > So when doing a restart of standby NN or observer NN we can trigger an > aggressive block report to a specific NN to exit safemode faster without > risking active NN performance. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14678) Allow triggerBlockReport to a specific namenode
[ https://issues.apache.org/jira/browse/HDFS-14678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895754#comment-16895754 ] CR Hota commented on HDFS-14678: [~LeonG] This is a very important. Thanks for creating the ticket. Looking forward to the fix. > Allow triggerBlockReport to a specific namenode > --- > > Key: HDFS-14678 > URL: https://issues.apache.org/jira/browse/HDFS-14678 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.8.2 >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > > In our largest prod cluster (running 2.8.2) we have >3k hosts. Every time > when rolling restarting NNs we will need to wait for block report which takes > >2.5 hours for each NN. > One way to make it faster is to manually trigger a full block report from all > datanodes. [HDFS-7278|https://issues.apache.org/jira/browse/HDFS-7278]. > However, the current triggerBlockReport command will trigger a block report > on all NNs which will flood the active NN as well. > A quick solution will be adding an option to specify a NN that the manually > triggered block report will go to, something like: > *_hdfs dfsadmin [-triggerBlockReport [-incremental] ] > [-namenode] _* > So when doing a restart of standby NN or observer NN we can trigger an > aggressive block report to a specific NN to exit safemode faster without > risking active NN performance. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14558) RBF: Isolation/Fairness documentation
[ https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14558: --- Status: Patch Available (was: Open) > RBF: Isolation/Fairness documentation > - > > Key: HDFS-14558 > URL: https://issues.apache.org/jira/browse/HDFS-14558 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14558.001.patch > > > Documentation is needed to make users aware of this feature HDFS-14090. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14558) RBF: Isolation/Fairness documentation
[ https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14558: --- Attachment: HDFS-14558.001.patch > RBF: Isolation/Fairness documentation > - > > Key: HDFS-14558 > URL: https://issues.apache.org/jira/browse/HDFS-14558 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14558.001.patch > > > Documentation is needed to make users aware of this feature HDFS-14090. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895503#comment-16895503 ] CR Hota commented on HDFS-14090: [~brahmareddy] [~aajisaka] [~linyiqun] [~hexiaoqiao] Gentle ping. Please help review 007.patch. Am thinking to add this feature to 3.3 release. > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12748) NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY
[ https://issues.apache.org/jira/browse/HDFS-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893127#comment-16893127 ] CR Hota commented on HDFS-12748: [~cheersyang] We deployed this change in our clusters and this helped resolve NN mem leak issue. I can work on the GETFILEBLOCKLOCATIONS issue on branch-2. Was a new ticket created? I can create one if not. > NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY > > > Key: HDFS-12748 > URL: https://issues.apache.org/jira/browse/HDFS-12748 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.2 >Reporter: Jiandan Yang >Assignee: Weiwei Yang >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-12748-branch-3.1.01.patch, HDFS-12748.001.patch, > HDFS-12748.002.patch, HDFS-12748.003.patch, HDFS-12748.004.patch, > HDFS-12748.005.patch > > > In our production environment, the standby NN often do fullgc, through mat we > found the largest object is FileSystem$Cache, which contains 7,844,890 > DistributedFileSystem. > By view hierarchy of method FileSystem.get() , I found only > NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating > different DistributedFileSystem every time instead of get a FileSystem from > cache. > {code:java} > case GETHOMEDIRECTORY: { > final String js = JsonUtil.toJsonString("Path", > FileSystem.get(conf != null ? conf : new Configuration()) > .getHomeDirectory().toUri().getPath()); > return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); > } > {code} > When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc. > {code:java} > case GETHOMEDIRECTORY: { > FileSystem fs = null; > try { > fs = FileSystem.get(conf != null ? conf : new Configuration()); > final String js = JsonUtil.toJsonString("Path", > fs.getHomeDirectory().toUri().getPath()); > return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); > } finally { > if (fs != null) { > fs.close(); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14670) RBF: Create secret manager instance using FederationUtil#newInstance.
[ https://issues.apache.org/jira/browse/HDFS-14670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14670: --- Description: Since HDFS-14577 is done, as discussed in tha ticket, security and isolation work will use this. This ticket is tracking the work around security class instantiation. (was: Since HDFS-14577 is done, as discussed in tha ticket, security and isolation work will use this. This ticket is tracking the work for around security class instantiation.) > RBF: Create secret manager instance using FederationUtil#newInstance. > - > > Key: HDFS-14670 > URL: https://issues.apache.org/jira/browse/HDFS-14670 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > > Since HDFS-14577 is done, as discussed in tha ticket, security and isolation > work will use this. This ticket is tracking the work around security class > instantiation. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14670) RBF: Create secret manager instance using FederationUtil#newInstance.
[ https://issues.apache.org/jira/browse/HDFS-14670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893113#comment-16893113 ] CR Hota commented on HDFS-14670: [~ayushtkn] [~elgoiri] Created PR for this. [https://github.com/apache/hadoop/pull/1162] > RBF: Create secret manager instance using FederationUtil#newInstance. > - > > Key: HDFS-14670 > URL: https://issues.apache.org/jira/browse/HDFS-14670 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > > Since HDFS-14577 is done, as discussed in tha ticket, security and isolation > work will use this. This ticket is tracking the work around security class > instantiation. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14670) RBF: Create secret manager instance using FederationUtil#newInstance.
CR Hota created HDFS-14670: -- Summary: RBF: Create secret manager instance using FederationUtil#newInstance. Key: HDFS-14670 URL: https://issues.apache.org/jira/browse/HDFS-14670 Project: Hadoop HDFS Issue Type: Sub-task Reporter: CR Hota Assignee: CR Hota Since HDFS-14577 is done, as discussed in tha ticket, security and isolation work will use this. This ticket is tracking the work for around security class instantiation. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14603) Über-JIRA: HDFS RBF stabilization phase II
[ https://issues.apache.org/jira/browse/HDFS-14603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893019#comment-16893019 ] CR Hota commented on HDFS-14603: All Developers, For the next JIRAs, please use PRs from the beginning. Anything that already started as patches, let's keep it as is. > Über-JIRA: HDFS RBF stabilization phase II > -- > > Key: HDFS-14603 > URL: https://issues.apache.org/jira/browse/HDFS-14603 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Brahma Reddy Battula >Priority: Major > > To track the pending issues/any new issues after HDFS-13891.(Even for > grouping all the RBF issues which will easier for tracking/maintenance) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test
[ https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893016#comment-16893016 ] CR Hota commented on HDFS-14461: [~hexiaoqiao] Thanks for working on this. There are couple of things.Sleep may not be the best way to solve this. What we should look at is, there are many places in current hadoop which uses minikdc. But the main difference that I see is, for each hadoop test the minikdc is instantiated and shutdown immediately. In the case of router tests, its static (due to SecurityUtil) and not being shut down after test finishes. If multiple tests are run in the same jvm this will cause issue and thats what was happening based on what I had seen earlier when I opened the ticket. We may want to follow similar logic of creating and destroying minikdc as all other tests. [~elgoiri] Am going to put your comments about PR/patch in the parent ticket for reference. > RBF: Fix intermittently failing kerberos related unit test > -- > > Key: HDFS-14461 > URL: https://issues.apache.org/jira/browse/HDFS-14461 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14461.001.patch > > > TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It > may be due to some race condition before using the keytab that's created for > testing. > > {code:java} > Failed > org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken > Failing for the past 1 build (Since > [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! > #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] ) > [Took 89 > ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history] > > Error Message > org.apache.hadoop.security.KerberosAuthException: failure to login: for > principal: router/localh...@example.com from keytab > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab > javax.security.auth.login.LoginException: Integrity check on decrypted field > failed (31) - PREAUTH_FAILED > h3. Stacktrace > org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.security.KerberosAuthException: failure to login: for > principal: router/localh...@example.com from keytab > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab > javax.security.auth.login.LoginException: Integrity check on decrypted field > failed (31) - PREAUTH_FAILED at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) > at > org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.setup(TestRouterHttpDelegationToken.java:99) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at >
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892252#comment-16892252 ] CR Hota commented on HDFS-14090: Thanks [~elgoiri] for the review. Will wait for others to also help review and then take a stab back at the patch. > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891999#comment-16891999 ] CR Hota commented on HDFS-14090: [~linyiqun] [~hexiaoqiao] Thanks for the reviews. Was out of office and hence the delay in putting in the new patches. Have taken care of all review comments and kept the code as simple as possible for easy understanding. Let me know if you have any further comments. [~elgoiri] [~brahmareddy] [~aajisaka] Could you also help review ? > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Attachment: HDFS-14090.007.patch > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14090: --- Attachment: HDFS-14090.006.patch > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876690#comment-16876690 ] CR Hota commented on HDFS-14609: [~tasanuma] Highly appreciate your effort. Sure, I will help you get ramped up with all security related changes in router. This is what is strange and needed deeper digging as I mentioned. The tests were fine for a very long period of time. This was added as part of HDFS-14052, as you can see in all subsequent changes made to HDFS-13891 branch was fine till we merged these changes to trunk few days back. BTW, what you tried above is also strange because you can see that HADOOP-16354 is also pulled even though you checked out a relatively old commit id which is 506d0734825f01daa7bc4ef93664d450b03f0890. . > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876513#comment-16876513 ] CR Hota commented on HDFS-14609: [Eric Yang|http://jira/secure/ViewProfile.jspa?name=eyang] Thanks for the detailed explanation. Apologies for a delayed response. For TestRouterWithSecureStartup#testStartupWithoutSpnegoPrincipal Since the test was fine earlier, we will just remove the test as it wont make such sense now considering the generic hadoop.http.authentication.kerberos.principal is to be used to grab the spnego principal. In any case, it's still unclear to me why this was working just fine earlier with same version of AbstractService. This would need some more digging. For TestRouterHttpDelegationToken We wanted to make sure for webhdfs, some tests were done to see if tokens could be generated by router's security manager. This was NOT intended to do a E2E security test. Again router works just fine as it inherits namenode implementation, but we may need to modify the test to inject an appropriate no auth filter and bypass auth to maintain the rationale behind the test. [~tasanuma] Do you have any cycles to help with this? Will be out of office soon, but I will be happy to help review and guide you. Feel free to assign this to yourself if you work. > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872872#comment-16872872 ] CR Hota commented on HDFS-14609: [~eyang] Thanks for chiming in. TestRouterHttpDelegationToken (all 3 tests) and TestRouterWithSecureStartup#testStartupWithoutSpnegoPrincipal are the ones failing currently. > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CR Hota updated HDFS-14609: --- Description: We worked on router based federation security as part of HDFS-13532. We kept it compatible with the way namenode works. However with HDFS-16314 and HDFS-16354 in trunk, auth filters seems to have been changed causing tests to fail. Changes are needed appropriately in RBF, mainly fixing broken tests. was: We worked on router based federation as part of HDFS-13532. We kept it compatible with the way namenode works. However with HDFS-16314 and HDFS-16354 in trunk, auth filters seems to have been changed causing tests to fail. Changes are needed appropriately in RBF, mainly fixing broken tests. > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HDFS-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org