[jira] [Commented] (HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers
[ https://issues.apache.org/jira/browse/HDFS-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755198#comment-17755198 ] Hector Sandoval Chaverri commented on HDFS-17128: - [~slfan1989] gentle ping on the request above :) The patch I provided should apply cleanly on branch-3.3. Thanks for all your help! > RBF: SQLDelegationTokenSecretManager should use version of tokens updated by > other routers > -- > > Key: HDFS-17128 > URL: https://issues.apache.org/jira/browse/HDFS-17128 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17128-branch-3.3.patch > > > The SQLDelegationTokenSecretManager keeps tokens that it has interacted with > in a memory cache. This prevents routers from connecting to the SQL server > for each token operation, improving performance. > We've noticed issues with some tokens being loaded in one router's cache and > later renewed on a different one. If clients try to use the token in the > outdated router, it will throw an "Auth failed" error when the cached token's > expiration has passed. > This can also affect cancelation scenarios since a token can be removed from > one router's cache and still exist in another one. > A possible solution is already implemented on the > ZKDelegationTokenSecretManager, which consists of having an executor > refreshing each router's cache on a periodic basis. We should evaluate > whether this will work with the volume of tokens expected to be handled by > the SQLDelegationTokenSecretManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers
[ https://issues.apache.org/jira/browse/HDFS-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754175#comment-17754175 ] Hector Sandoval Chaverri commented on HDFS-17128: - [~slfan1989] could you help commit the attached patch to branch-3.3? [^HDFS-17128-branch-3.3.patch] > RBF: SQLDelegationTokenSecretManager should use version of tokens updated by > other routers > -- > > Key: HDFS-17128 > URL: https://issues.apache.org/jira/browse/HDFS-17128 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17128-branch-3.3.patch > > > The SQLDelegationTokenSecretManager keeps tokens that it has interacted with > in a memory cache. This prevents routers from connecting to the SQL server > for each token operation, improving performance. > We've noticed issues with some tokens being loaded in one router's cache and > later renewed on a different one. If clients try to use the token in the > outdated router, it will throw an "Auth failed" error when the cached token's > expiration has passed. > This can also affect cancelation scenarios since a token can be removed from > one router's cache and still exist in another one. > A possible solution is already implemented on the > ZKDelegationTokenSecretManager, which consists of having an executor > refreshing each router's cache on a periodic basis. We should evaluate > whether this will work with the volume of tokens expected to be handled by > the SQLDelegationTokenSecretManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers
[ https://issues.apache.org/jira/browse/HDFS-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HDFS-17128: Attachment: HDFS-17128-branch-3.3.patch > RBF: SQLDelegationTokenSecretManager should use version of tokens updated by > other routers > -- > > Key: HDFS-17128 > URL: https://issues.apache.org/jira/browse/HDFS-17128 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17128-branch-3.3.patch > > > The SQLDelegationTokenSecretManager keeps tokens that it has interacted with > in a memory cache. This prevents routers from connecting to the SQL server > for each token operation, improving performance. > We've noticed issues with some tokens being loaded in one router's cache and > later renewed on a different one. If clients try to use the token in the > outdated router, it will throw an "Auth failed" error when the cached token's > expiration has passed. > This can also affect cancelation scenarios since a token can be removed from > one router's cache and still exist in another one. > A possible solution is already implemented on the > ZKDelegationTokenSecretManager, which consists of having an executor > refreshing each router's cache on a periodic basis. We should evaluate > whether this will work with the volume of tokens expected to be handled by > the SQLDelegationTokenSecretManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL
Hector Sandoval Chaverri created HDFS-17148: --- Summary: RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL Key: HDFS-17148 URL: https://issues.apache.org/jira/browse/HDFS-17148 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: Hector Sandoval Chaverri The SQLDelegationTokenSecretManager fetches tokens from SQL and stores them temporarily in a memory cache with a short TTL. The ExpiredTokenRemover in AbstractDelegationTokenSecretManager runs periodically to cleanup any expired tokens from the cache, but most tokens have been evicted automatically per the TTL configuration. This leads to many expired tokens in the SQL database that should be cleaned up. The SQLDelegationTokenSecretManager should find expired tokens in SQL instead of in the memory cache when running the periodic cleanup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL
Hector Sandoval Chaverri created HDFS-17148: --- Summary: RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL Key: HDFS-17148 URL: https://issues.apache.org/jira/browse/HDFS-17148 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: Hector Sandoval Chaverri The SQLDelegationTokenSecretManager fetches tokens from SQL and stores them temporarily in a memory cache with a short TTL. The ExpiredTokenRemover in AbstractDelegationTokenSecretManager runs periodically to cleanup any expired tokens from the cache, but most tokens have been evicted automatically per the TTL configuration. This leads to many expired tokens in the SQL database that should be cleaned up. The SQLDelegationTokenSecretManager should find expired tokens in SQL instead of in the memory cache when running the periodic cleanup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers
[ https://issues.apache.org/jira/browse/HDFS-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748845#comment-17748845 ] Hector Sandoval Chaverri commented on HDFS-17128: - [~slfan1989] Would you be able to help review this PR or can recommend someone who can help commit? > RBF: SQLDelegationTokenSecretManager should use version of tokens updated by > other routers > -- > > Key: HDFS-17128 > URL: https://issues.apache.org/jira/browse/HDFS-17128 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > > The SQLDelegationTokenSecretManager keeps tokens that it has interacted with > in a memory cache. This prevents routers from connecting to the SQL server > for each token operation, improving performance. > We've noticed issues with some tokens being loaded in one router's cache and > later renewed on a different one. If clients try to use the token in the > outdated router, it will throw an "Auth failed" error when the cached token's > expiration has passed. > This can also affect cancelation scenarios since a token can be removed from > one router's cache and still exist in another one. > A possible solution is already implemented on the > ZKDelegationTokenSecretManager, which consists of having an executor > refreshing each router's cache on a periodic basis. We should evaluate > whether this will work with the volume of tokens expected to be handled by > the SQLDelegationTokenSecretManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers
[ https://issues.apache.org/jira/browse/HDFS-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HDFS-17128: Description: The SQLDelegationTokenSecretManager keeps tokens that it has interacted with in a memory cache. This prevents routers from connecting to the SQL server for each token operation, improving performance. We've noticed issues with some tokens being loaded in one router's cache and later renewed on a different one. If clients try to use the token in the outdated router, it will throw an "Auth failed" error when the cached token's expiration has passed. This can also affect cancelation scenarios since a token can be removed from one router's cache and still exist in another one. A possible solution is already implemented on the ZKDelegationTokenSecretManager, which consists of having an executor refreshing each router's cache on a periodic basis. We should evaluate whether this will work with the volume of tokens expected to be handled by the SQLDelegationTokenSecretManager. was: The SQLDelegationTokenSecretManager keeps tokens that it has interacted with in a memory cache. This prevents routers from connecting to the SQL server for each token operation. We've noticed issues with some tokens being loaded in one router's cache and later renewed on a different one. If clients try to use the token in the outdated router, it will throw an "Auth failed" error when the cached token's expiration has passed. This can also affect cancelation scenarios since a token can be removed from one router's cache and still exist in another one. A possible solution is already implemented on the ZKDelegationTokenSecretManager, which consists of having an executor refreshing each router's cache on a periodic basis. We should evaluate whether this will work with the volume of tokens expected to be handled by the SQLDelegationTokenSecretManager. > RBF: SQLDelegationTokenSecretManager should use version of tokens updated by > other routers > -- > > Key: HDFS-17128 > URL: https://issues.apache.org/jira/browse/HDFS-17128 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Hector Sandoval Chaverri >Priority: Major > > The SQLDelegationTokenSecretManager keeps tokens that it has interacted with > in a memory cache. This prevents routers from connecting to the SQL server > for each token operation, improving performance. > We've noticed issues with some tokens being loaded in one router's cache and > later renewed on a different one. If clients try to use the token in the > outdated router, it will throw an "Auth failed" error when the cached token's > expiration has passed. > This can also affect cancelation scenarios since a token can be removed from > one router's cache and still exist in another one. > A possible solution is already implemented on the > ZKDelegationTokenSecretManager, which consists of having an executor > refreshing each router's cache on a periodic basis. We should evaluate > whether this will work with the volume of tokens expected to be handled by > the SQLDelegationTokenSecretManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers
[ https://issues.apache.org/jira/browse/HDFS-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HDFS-17128: Summary: RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers (was: SQLDelegationTokenSecretManager should use version of tokens updated by other routers) > RBF: SQLDelegationTokenSecretManager should use version of tokens updated by > other routers > -- > > Key: HDFS-17128 > URL: https://issues.apache.org/jira/browse/HDFS-17128 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Hector Sandoval Chaverri >Priority: Major > > The SQLDelegationTokenSecretManager keeps tokens that it has interacted with > in a memory cache. This prevents routers from connecting to the SQL server > for each token operation. > We've noticed issues with some tokens being loaded in one router's cache and > later renewed on a different one. If clients try to use the token in the > outdated router, it will throw an "Auth failed" error when the cached token's > expiration has passed. > This can also affect cancelation scenarios since a token can be removed from > one router's cache and still exist in another one. > A possible solution is already implemented on the > ZKDelegationTokenSecretManager, which consists of having an executor > refreshing each router's cache on a periodic basis. We should evaluate > whether this will work with the volume of tokens expected to be handled by > the SQLDelegationTokenSecretManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17128) SQLDelegationTokenSecretManager should use version of tokens updated by other routers
Hector Sandoval Chaverri created HDFS-17128: --- Summary: SQLDelegationTokenSecretManager should use version of tokens updated by other routers Key: HDFS-17128 URL: https://issues.apache.org/jira/browse/HDFS-17128 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: Hector Sandoval Chaverri The SQLDelegationTokenSecretManager keeps tokens that it has interacted with in a memory cache. This prevents routers from connecting to the SQL server for each token operation. We've noticed issues with some tokens being loaded in one router's cache and later renewed on a different one. If clients try to use the token in the outdated router, it will throw an "Auth failed" error when the cached token's expiration has passed. This can also affect cancelation scenarios since a token can be removed from one router's cache and still exist in another one. A possible solution is already implemented on the ZKDelegationTokenSecretManager, which consists of having an executor refreshing each router's cache on a periodic basis. We should evaluate whether this will work with the volume of tokens expected to be handled by the SQLDelegationTokenSecretManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17128) SQLDelegationTokenSecretManager should use version of tokens updated by other routers
Hector Sandoval Chaverri created HDFS-17128: --- Summary: SQLDelegationTokenSecretManager should use version of tokens updated by other routers Key: HDFS-17128 URL: https://issues.apache.org/jira/browse/HDFS-17128 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: Hector Sandoval Chaverri The SQLDelegationTokenSecretManager keeps tokens that it has interacted with in a memory cache. This prevents routers from connecting to the SQL server for each token operation. We've noticed issues with some tokens being loaded in one router's cache and later renewed on a different one. If clients try to use the token in the outdated router, it will throw an "Auth failed" error when the cached token's expiration has passed. This can also affect cancelation scenarios since a token can be removed from one router's cache and still exist in another one. A possible solution is already implemented on the ZKDelegationTokenSecretManager, which consists of having an executor refreshing each router's cache on a periodic basis. We should evaluate whether this will work with the volume of tokens expected to be handled by the SQLDelegationTokenSecretManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17026) RBF: NamenodeHeartbeatService should update JMX report with configurable frequency
[ https://issues.apache.org/jira/browse/HDFS-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729013#comment-17729013 ] Hector Sandoval Chaverri commented on HDFS-17026: - [~hexiaoqiao] added PR for branch-3.3: [https://github.com/apache/hadoop/pull/5714] Thanks! > RBF: NamenodeHeartbeatService should update JMX report with configurable > frequency > -- > > Key: HDFS-17026 > URL: https://issues.apache.org/jira/browse/HDFS-17026 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-17026-branch-3.3.patch > > > The NamenodeHeartbeatService currently calls each of the Namenode's JMX > endpoint every time it wakes up (default value is every 5 seconds). > In a cluster with 40 routers, we have observed service degradation on some of > the Namenodes, since the JMX request obtains Datanode status and blocks > other RPC requests. However, JMX report data doesn't seem to be used for > critical paths on the routers. > We should configure the NamenodeHeartbeatService so it updates the JMX > reports on a slower frequency than the Namenode states or to disable the > reports completely. > The class calls out the JMX request being optional even though there is no > implementation to turn it off: > {noformat} > // Read the stats from JMX (optional) > updateJMXParameters(webAddress, report);{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17026) RBF: NamenodeHeartbeatService should update JMX report with configurable frequency
[ https://issues.apache.org/jira/browse/HDFS-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728021#comment-17728021 ] Hector Sandoval Chaverri commented on HDFS-17026: - [~hexiaoqiao] would you be able to commit the attached patch to branch-3.3? > RBF: NamenodeHeartbeatService should update JMX report with configurable > frequency > -- > > Key: HDFS-17026 > URL: https://issues.apache.org/jira/browse/HDFS-17026 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-17026-branch-3.3.patch > > > The NamenodeHeartbeatService currently calls each of the Namenode's JMX > endpoint every time it wakes up (default value is every 5 seconds). > In a cluster with 40 routers, we have observed service degradation on some of > the Namenodes, since the JMX request obtains Datanode status and blocks > other RPC requests. However, JMX report data doesn't seem to be used for > critical paths on the routers. > We should configure the NamenodeHeartbeatService so it updates the JMX > reports on a slower frequency than the Namenode states or to disable the > reports completely. > The class calls out the JMX request being optional even though there is no > implementation to turn it off: > {noformat} > // Read the stats from JMX (optional) > updateJMXParameters(webAddress, report);{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17026) RBF: NamenodeHeartbeatService should update JMX report with configurable frequency
[ https://issues.apache.org/jira/browse/HDFS-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727661#comment-17727661 ] Hector Sandoval Chaverri commented on HDFS-17026: - [~elgoiri] I added a patch for branch-3.3 if you can take a look as well > RBF: NamenodeHeartbeatService should update JMX report with configurable > frequency > -- > > Key: HDFS-17026 > URL: https://issues.apache.org/jira/browse/HDFS-17026 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17026-branch-3.3.patch > > > The NamenodeHeartbeatService currently calls each of the Namenode's JMX > endpoint every time it wakes up (default value is every 5 seconds). > In a cluster with 40 routers, we have observed service degradation on some of > the Namenodes, since the JMX request obtains Datanode status and blocks > other RPC requests. However, JMX report data doesn't seem to be used for > critical paths on the routers. > We should configure the NamenodeHeartbeatService so it updates the JMX > reports on a slower frequency than the Namenode states or to disable the > reports completely. > The class calls out the JMX request being optional even though there is no > implementation to turn it off: > {noformat} > // Read the stats from JMX (optional) > updateJMXParameters(webAddress, report);{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17026) RBF: NamenodeHeartbeatService should update JMX report with configurable frequency
[ https://issues.apache.org/jira/browse/HDFS-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HDFS-17026: Attachment: HDFS-17026-branch-3.3.patch > RBF: NamenodeHeartbeatService should update JMX report with configurable > frequency > -- > > Key: HDFS-17026 > URL: https://issues.apache.org/jira/browse/HDFS-17026 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17026-branch-3.3.patch > > > The NamenodeHeartbeatService currently calls each of the Namenode's JMX > endpoint every time it wakes up (default value is every 5 seconds). > In a cluster with 40 routers, we have observed service degradation on some of > the Namenodes, since the JMX request obtains Datanode status and blocks > other RPC requests. However, JMX report data doesn't seem to be used for > critical paths on the routers. > We should configure the NamenodeHeartbeatService so it updates the JMX > reports on a slower frequency than the Namenode states or to disable the > reports completely. > The class calls out the JMX request being optional even though there is no > implementation to turn it off: > {noformat} > // Read the stats from JMX (optional) > updateJMXParameters(webAddress, report);{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17026) NamenodeHeartbeatService should update JMX report with configurable frequency
Hector Sandoval Chaverri created HDFS-17026: --- Summary: NamenodeHeartbeatService should update JMX report with configurable frequency Key: HDFS-17026 URL: https://issues.apache.org/jira/browse/HDFS-17026 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: Hector Sandoval Chaverri The NamenodeHeartbeatService currently calls each of the Namenode's JMX endpoint every time it wakes up (default value is every 5 seconds). In a cluster with 40 routers, we have observed service degradation on some of the Namenodes, since the JMX request obtains Datanode status and blocks other RPC requests. However, JMX report data doesn't seem to be used for critical paths on the routers. We should configure the NamenodeHeartbeatService so it updates the JMX reports on a slower frequency than the Namenode states or to disable the reports completely. The class calls out the JMX request being optional even though there is no implementation to turn it off: {noformat} // Read the stats from JMX (optional) updateJMXParameters(webAddress, report);{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17026) NamenodeHeartbeatService should update JMX report with configurable frequency
Hector Sandoval Chaverri created HDFS-17026: --- Summary: NamenodeHeartbeatService should update JMX report with configurable frequency Key: HDFS-17026 URL: https://issues.apache.org/jira/browse/HDFS-17026 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: Hector Sandoval Chaverri The NamenodeHeartbeatService currently calls each of the Namenode's JMX endpoint every time it wakes up (default value is every 5 seconds). In a cluster with 40 routers, we have observed service degradation on some of the Namenodes, since the JMX request obtains Datanode status and blocks other RPC requests. However, JMX report data doesn't seem to be used for critical paths on the routers. We should configure the NamenodeHeartbeatService so it updates the JMX reports on a slower frequency than the Namenode states or to disable the reports completely. The class calls out the JMX request being optional even though there is no implementation to turn it off: {noformat} // Read the stats from JMX (optional) updateJMXParameters(webAddress, report);{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16895) NamenodeHeartbeatService should use credentials of logged in user
Hector Sandoval Chaverri created HDFS-16895: --- Summary: NamenodeHeartbeatService should use credentials of logged in user Key: HDFS-16895 URL: https://issues.apache.org/jira/browse/HDFS-16895 Project: Hadoop HDFS Issue Type: Bug Components: rbf Reporter: Hector Sandoval Chaverri NamenodeHeartbeatService has been found to log the errors when querying protected Namenode JMX APIs. We have been able to work around this by running kinit with the DFS_ROUTER_KEYTAB_FILE_KEY and DFS_ROUTER_KERBEROS_PRINCIPAL_KEY on the router. While investigating a solution, we found that doing the request as part of a UserGroupInformation.getLoginUser.doAs() call doesn't require to kinit before. The error logged is: {noformat} 2022-08-16 21:35:00,265 ERROR org.apache.hadoop.hdfs.server.federation.router.FederationUtil: Cannot parse JMX output for Hadoop:service=NameNode,name=FSNamesystem* from server ltx1-yugiohnn03-ha1.grid.linkedin.com:50070 org.apache.hadoop.security.authentication.client.AuthenticationException: Error while authenticating with endpoint: http://ltx1-yugiohnn03-ha1.grid.linkedin.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem* at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:232) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:219) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:350) at org.apache.hadoop.hdfs.web.URLConnectionFactory.openConnection(URLConnectionFactory.java:186) at org.apache.hadoop.hdfs.server.federation.router.FederationUtil.getJmx(FederationUtil.java:82) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateJMXParameters(NamenodeHeartbeatService.java:352) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:295) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:218) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:172) at org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:360) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:204) ... 15 more Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:336) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:310) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.
[jira] [Created] (HDFS-16895) NamenodeHeartbeatService should use credentials of logged in user
Hector Sandoval Chaverri created HDFS-16895: --- Summary: NamenodeHeartbeatService should use credentials of logged in user Key: HDFS-16895 URL: https://issues.apache.org/jira/browse/HDFS-16895 Project: Hadoop HDFS Issue Type: Bug Components: rbf Reporter: Hector Sandoval Chaverri NamenodeHeartbeatService has been found to log the errors when querying protected Namenode JMX APIs. We have been able to work around this by running kinit with the DFS_ROUTER_KEYTAB_FILE_KEY and DFS_ROUTER_KERBEROS_PRINCIPAL_KEY on the router. While investigating a solution, we found that doing the request as part of a UserGroupInformation.getLoginUser.doAs() call doesn't require to kinit before. The error logged is: {noformat} 2022-08-16 21:35:00,265 ERROR org.apache.hadoop.hdfs.server.federation.router.FederationUtil: Cannot parse JMX output for Hadoop:service=NameNode,name=FSNamesystem* from server ltx1-yugiohnn03-ha1.grid.linkedin.com:50070 org.apache.hadoop.security.authentication.client.AuthenticationException: Error while authenticating with endpoint: http://ltx1-yugiohnn03-ha1.grid.linkedin.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem* at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:232) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:219) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:350) at org.apache.hadoop.hdfs.web.URLConnectionFactory.openConnection(URLConnectionFactory.java:186) at org.apache.hadoop.hdfs.server.federation.router.FederationUtil.getJmx(FederationUtil.java:82) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateJMXParameters(NamenodeHeartbeatService.java:352) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:295) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:218) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:172) at org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:360) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:204) ... 15 more Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:336) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:310) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.
[jira] [Updated] (HADOOP-18535) Implement token storage solution based on MySQL
[ https://issues.apache.org/jira/browse/HADOOP-18535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HADOOP-18535: -- Description: Hadoop RBF supports custom implementations of secret managers. At the moment, the only available implementation is ZKDelegationTokenSecretManagerImpl, which stores tokens and delegation keys in Zookeeper. During our investigation, we found that the performance of routers is limited by the writes to the Zookeeper token store, which impacts requests for token creation, renewal and cancellation. An alternative secret manager implementation has been created, based on MySQL, to handle a higher number of writes. We measured the throughput of each token operation (create/renew/cancel) on different setups and obtained the following results: # Sending requests directly to Namenode (no RBF): Token creations: 290 reqs per sec Token renewals: 86 reqs per sec Token cancellations: 97 reqs per sec # Sending requests to routers using Zookeeper based secret manager: Token creations: 31 reqs per sec Token renewals: 29 reqs per sec Token cancellations: 40 reqs per sec # Sending requests to routers using SQL based secret manager: Token creations: 241 reqs per sec Token renewals: 103 reqs per sec Token cancellations: 114 reqs per sec We noticed a significant improvement when using a SQL secret manager, comparable to the throughput offered by Namenodes. was: Hadoop RBF supports custom implementations of secret managers. At the moment, the only available implementation is ZKDelegationTokenSecretManagerImpl, which stores tokens and delegation keys in Zookeeper. During our investigation, we found that the performance of routers is limited by the writes to the Zookeeper token store, which impacts requests for token creation, renewal and cancellation. An alternative secret manager implementation has been created, based on MySQL, to handle a higher number of writes. We measured the throughput of each token operation (create/renew/cancel) on different setups and obtained the following results: # Sending requests directly to Namenode (no RBF): Token creations: 290 reqs per sec Token renewals: 86 reqs per sec Token cancellations: 97 reqs per sec # Sending requests to routers using Zookeeper based secret manager: Token creations: 31 reqs per sec Token renewals: 29 reqs per sec Token cancellations: 40 reqs per sec # Sending requests to routers using SQL based secret manager: Token creations: 241 reqs per sec Token renewals: 103 reqs per sec Token cancellations: 114 reqs per sec We noticed a significant improvement when using a SQL secret manager, comparable to the throughput offered by Namenodes. For this reason, > Implement token storage solution based on MySQL > --- > > Key: HADOOP-18535 > URL: https://issues.apache.org/jira/browse/HADOOP-18535 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > > Hadoop RBF supports custom implementations of secret managers. At the moment, > the only available implementation is ZKDelegationTokenSecretManagerImpl, > which stores tokens and delegation keys in Zookeeper. > During our investigation, we found that the performance of routers is limited > by the writes to the Zookeeper token store, which impacts requests for token > creation, renewal and cancellation. An alternative secret manager > implementation has been created, based on MySQL, to handle a higher number of > writes. > We measured the throughput of each token operation (create/renew/cancel) on > different setups and obtained the following results: > # Sending requests directly to Namenode (no RBF): > Token creations: 290 reqs per sec > Token renewals: 86 reqs per sec > Token cancellations: 97 reqs per sec > # Sending requests to routers using Zookeeper based secret manager: > Token creations: 31 reqs per sec > Token renewals: 29 reqs per sec > Token cancellations: 40 reqs per sec > # Sending requests to routers using SQL based secret manager: > Token creations: 241 reqs per sec > Token renewals: 103 reqs per sec > Token cancellations: 114 reqs per sec > We noticed a significant improvement when using a SQL secret manager, > comparable to the throughput offered by Namenodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18535) Implement token storage solution based on MySQL
[ https://issues.apache.org/jira/browse/HADOOP-18535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HADOOP-18535: -- Description: Hadoop RBF supports custom implementations of secret managers. At the moment, the only available implementation is ZKDelegationTokenSecretManagerImpl, which stores tokens and delegation keys in Zookeeper. During our investigation, we found that the performance of routers is limited by the writes to the Zookeeper token store, which impacts requests for token creation, renewal and cancellation. An alternative secret manager implementation has been created, based on MySQL, to handle a higher number of writes. We measured the throughput of each token operation (create/renew/cancel) on different setups and obtained the following results: # Sending requests directly to Namenode (no RBF): Token creations: 290 reqs per sec Token renewals: 86 reqs per sec Token cancellations: 97 reqs per sec # Sending requests to routers using Zookeeper based secret manager: Token creations: 31 reqs per sec Token renewals: 29 reqs per sec Token cancellations: 40 reqs per sec # Sending requests to routers using SQL based secret manager: Token creations: 241 reqs per sec Token renewals: 103 reqs per sec Token cancellations: 114 reqs per sec We noticed a significant improvement when using a SQL secret manager, comparable to the throughput offered by Namenodes. For this reason, was: Hadoop RBF supports custom implementations of secret managers. At the moment, the only available implementation is ZKDelegationTokenSecretManagerImpl, which stores tokens and delegation keys in Zookeeper. During our investigation, we found that the performance of routers is limited by the writes to the Zookeeper token store, which impacts requests for token creation, renewal and cancellation. An alternative secret manager implementation will be made available, based on MySQL, to handle a higher number of writes. > Implement token storage solution based on MySQL > --- > > Key: HADOOP-18535 > URL: https://issues.apache.org/jira/browse/HADOOP-18535 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > > Hadoop RBF supports custom implementations of secret managers. At the moment, > the only available implementation is ZKDelegationTokenSecretManagerImpl, > which stores tokens and delegation keys in Zookeeper. > During our investigation, we found that the performance of routers is limited > by the writes to the Zookeeper token store, which impacts requests for token > creation, renewal and cancellation. An alternative secret manager > implementation has been created, based on MySQL, to handle a higher number of > writes. > We measured the throughput of each token operation (create/renew/cancel) on > different setups and obtained the following results: > # Sending requests directly to Namenode (no RBF): > Token creations: 290 reqs per sec > Token renewals: 86 reqs per sec > Token cancellations: 97 reqs per sec > # Sending requests to routers using Zookeeper based secret manager: > Token creations: 31 reqs per sec > Token renewals: 29 reqs per sec > Token cancellations: 40 reqs per sec > # Sending requests to routers using SQL based secret manager: > Token creations: 241 reqs per sec > Token renewals: 103 reqs per sec > Token cancellations: 114 reqs per sec > We noticed a significant improvement when using a SQL secret manager, > comparable to the throughput offered by Namenodes. For this reason, -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18535) Implement token storage solution based on MySQL
Hector Sandoval Chaverri created HADOOP-18535: - Summary: Implement token storage solution based on MySQL Key: HADOOP-18535 URL: https://issues.apache.org/jira/browse/HADOOP-18535 Project: Hadoop Common Issue Type: Improvement Reporter: Hector Sandoval Chaverri Assignee: Hector Sandoval Chaverri Hadoop RBF supports custom implementations of secret managers. At the moment, the only available implementation is ZKDelegationTokenSecretManagerImpl, which stores tokens and delegation keys in Zookeeper. During our investigation, we found that the performance of routers is limited by the writes to the Zookeeper token store, which impacts requests for token creation, renewal and cancellation. An alternative secret manager implementation will be made available, based on MySQL, to handle a higher number of writes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18535) Implement token storage solution based on MySQL
Hector Sandoval Chaverri created HADOOP-18535: - Summary: Implement token storage solution based on MySQL Key: HADOOP-18535 URL: https://issues.apache.org/jira/browse/HADOOP-18535 Project: Hadoop Common Issue Type: Improvement Reporter: Hector Sandoval Chaverri Assignee: Hector Sandoval Chaverri Hadoop RBF supports custom implementations of secret managers. At the moment, the only available implementation is ZKDelegationTokenSecretManagerImpl, which stores tokens and delegation keys in Zookeeper. During our investigation, we found that the performance of routers is limited by the writes to the Zookeeper token store, which impacts requests for token creation, renewal and cancellation. An alternative secret manager implementation will be made available, based on MySQL, to handle a higher number of writes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18264) ZKDelegationTokenSecretManager should handle duplicate Token sequenceNums
[ https://issues.apache.org/jira/browse/HADOOP-18264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HADOOP-18264: -- Priority: Minor (was: Major) > ZKDelegationTokenSecretManager should handle duplicate Token sequenceNums > - > > Key: HADOOP-18264 > URL: https://issues.apache.org/jira/browse/HADOOP-18264 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Hector Sandoval Chaverri >Priority: Minor > > The ZKDelegationTokenSecretManager relies on the TokenIdentifier > sequenceNumber to identify each Token in the ZK Store. It's possible for > multiple TokenIdentifiers to share the same sequenceNumber, as this is an int > that can overflow. > The AbstractDelegationTokenSecretManager uses a Map DelegationTokenInformation> so all properties in the TokenIdentifier must > match. ZKDelegationTokenSecretManager should follow the same logic. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18264) ZKDelegationTokenSecretManager should handle duplicate Token sequenceNums
Hector Sandoval Chaverri created HADOOP-18264: - Summary: ZKDelegationTokenSecretManager should handle duplicate Token sequenceNums Key: HADOOP-18264 URL: https://issues.apache.org/jira/browse/HADOOP-18264 Project: Hadoop Common Issue Type: Bug Components: security Reporter: Hector Sandoval Chaverri The ZKDelegationTokenSecretManager relies on the TokenIdentifier sequenceNumber to identify each Token in the ZK Store. It's possible for multiple TokenIdentifiers to share the same sequenceNumber, as this is an int that can overflow. The AbstractDelegationTokenSecretManager uses a Map so all properties in the TokenIdentifier must match. ZKDelegationTokenSecretManager should follow the same logic. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18264) ZKDelegationTokenSecretManager should handle duplicate Token sequenceNums
Hector Sandoval Chaverri created HADOOP-18264: - Summary: ZKDelegationTokenSecretManager should handle duplicate Token sequenceNums Key: HADOOP-18264 URL: https://issues.apache.org/jira/browse/HADOOP-18264 Project: Hadoop Common Issue Type: Bug Components: security Reporter: Hector Sandoval Chaverri The ZKDelegationTokenSecretManager relies on the TokenIdentifier sequenceNumber to identify each Token in the ZK Store. It's possible for multiple TokenIdentifiers to share the same sequenceNumber, as this is an int that can overflow. The AbstractDelegationTokenSecretManager uses a Map so all properties in the TokenIdentifier must match. ZKDelegationTokenSecretManager should follow the same logic. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16591) StateStoreZooKeeper fails to initialize
Hector Sandoval Chaverri created HDFS-16591: --- Summary: StateStoreZooKeeper fails to initialize Key: HDFS-16591 URL: https://issues.apache.org/jira/browse/HDFS-16591 Project: Hadoop HDFS Issue Type: Bug Components: rbf Reporter: Hector Sandoval Chaverri MembershipStore and MountTableStore are failing to initialize, logging the following errors on the Router logs: {noformat} 2022-05-23 16:43:01,156 ERROR org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService: Cannot get version for class org.apache.hadoop.hdfs.server.federation.store.MembershipStore org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException: Cached State Store not initialized, MembershipState records not valid at org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.checkCacheAvailable(CachedRecordStore.java:106) at org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.getCachedRecords(CachedRecordStore.java:227) at org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.getStateStoreVersion(RouterHeartbeatService.java:131) at org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.updateStateStore(RouterHeartbeatService.java:92) at org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.periodicInvoke(RouterHeartbeatService.java:159) at org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){noformat} After investigating, we noticed that ZKDelegationTokenSecretManager normally initializes properties for ZooKeeper clients to connect using SASL/Kerberos. If ZKDelegationTokenSecretManager is replaced with a new SecretManager, the SASL properties don't get configured and any StateStores that connect to ZooKeeper fail with the above error. A potential way to fix this is by setting the JaasConfiguration (currently done in ZKDelegationTokenSecretManager) as part of the StateStoreZooKeeperImpl initialization method. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16591) StateStoreZooKeeper fails to initialize
Hector Sandoval Chaverri created HDFS-16591: --- Summary: StateStoreZooKeeper fails to initialize Key: HDFS-16591 URL: https://issues.apache.org/jira/browse/HDFS-16591 Project: Hadoop HDFS Issue Type: Bug Components: rbf Reporter: Hector Sandoval Chaverri MembershipStore and MountTableStore are failing to initialize, logging the following errors on the Router logs: {noformat} 2022-05-23 16:43:01,156 ERROR org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService: Cannot get version for class org.apache.hadoop.hdfs.server.federation.store.MembershipStore org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException: Cached State Store not initialized, MembershipState records not valid at org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.checkCacheAvailable(CachedRecordStore.java:106) at org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.getCachedRecords(CachedRecordStore.java:227) at org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.getStateStoreVersion(RouterHeartbeatService.java:131) at org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.updateStateStore(RouterHeartbeatService.java:92) at org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.periodicInvoke(RouterHeartbeatService.java:159) at org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){noformat} After investigating, we noticed that ZKDelegationTokenSecretManager normally initializes properties for ZooKeeper clients to connect using SASL/Kerberos. If ZKDelegationTokenSecretManager is replaced with a new SecretManager, the SASL properties don't get configured and any StateStores that connect to ZooKeeper fail with the above error. A potential way to fix this is by setting the JaasConfiguration (currently done in ZKDelegationTokenSecretManager) as part of the StateStoreZooKeeperImpl initialization method. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18167) Add metrics to track delegation token secret manager operations
[ https://issues.apache.org/jira/browse/HADOOP-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HADOOP-18167: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Add metrics to track delegation token secret manager operations > --- > > Key: HADOOP-18167 > URL: https://issues.apache.org/jira/browse/HADOOP-18167 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-18167-branch-2.10-2.patch, > HADOOP-18167-branch-2.10-3.patch, HADOOP-18167-branch-2.10-4.patch, > HADOOP-18167-branch-2.10.patch, HADOOP-18167-branch-3.3.patch > > Time Spent: 6h 10m > Remaining Estimate: 0h > > New metrics to track operations that store, update and remove delegation > tokens in implementations of AbstractDelegationTokenSecretManager. This will > help evaluate the impact of using different secret managers and add > optimizations. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18167) Add metrics to track delegation token secret manager operations
[ https://issues.apache.org/jira/browse/HADOOP-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532413#comment-17532413 ] Hector Sandoval Chaverri commented on HADOOP-18167: --- [~ayushtkn] I haven't been able to repro by starting ResourceManager, but I see this can happen if there's a call to MetricsSystemImpl#init. I've created HADOOP-18222 to track this issue. > Add metrics to track delegation token secret manager operations > --- > > Key: HADOOP-18167 > URL: https://issues.apache.org/jira/browse/HADOOP-18167 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-18167-branch-2.10-2.patch, > HADOOP-18167-branch-2.10-3.patch, HADOOP-18167-branch-2.10-4.patch, > HADOOP-18167-branch-2.10.patch, HADOOP-18167-branch-3.3.patch > > Time Spent: 5h 50m > Remaining Estimate: 0h > > New metrics to track operations that store, update and remove delegation > tokens in implementations of AbstractDelegationTokenSecretManager. This will > help evaluate the impact of using different secret managers and add > optimizations. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18222) Prevent DelegationTokenSecretManagerMetrics from registering multiple times
Hector Sandoval Chaverri created HADOOP-18222: - Summary: Prevent DelegationTokenSecretManagerMetrics from registering multiple times Key: HADOOP-18222 URL: https://issues.apache.org/jira/browse/HADOOP-18222 Project: Hadoop Common Issue Type: Improvement Reporter: Hector Sandoval Chaverri After committing HADOOP-18167, we received reports of the following error when ResourceManager is initialized: {noformat} Caused by: java.io.IOException: Problem starting http server at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1389) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:475) ... 4 more Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source DelegationTokenSecretManagerMetrics already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) at org.apache.hadoop.metrics2.MetricsSystem.register(MetricsSystem.java:71) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$DelegationTokenSecretManagerMetrics.create(AbstractDelegationTokenSecretManager.java:878) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.(AbstractDelegationTokenSecretManager.java:152) at org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$DelegationTokenSecretManager.(DelegationTokenManager.java:72) at org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.(DelegationTokenManager.java:122) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.initTokenManager(DelegationTokenAuthenticationHandler.java:161) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.init(DelegationTokenAuthenticationHandler.java:130) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.initializeAuthHandler(DelegationTokenAuthenticationFilter.java:214) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:180) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.init(RMAuthenticationFilter.java:53){noformat} This can happen if MetricsSystemImpl#init is called and multiple metrics are registered with the same name. A proposed solution is to declare the metrics in AbstractDelegationTokenSecretManager as singleton, which would prevent multiple instances DelegationTokenSecretManagerMetrics from being registered. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18222) Prevent DelegationTokenSecretManagerMetrics from registering multiple times
Hector Sandoval Chaverri created HADOOP-18222: - Summary: Prevent DelegationTokenSecretManagerMetrics from registering multiple times Key: HADOOP-18222 URL: https://issues.apache.org/jira/browse/HADOOP-18222 Project: Hadoop Common Issue Type: Improvement Reporter: Hector Sandoval Chaverri After committing HADOOP-18167, we received reports of the following error when ResourceManager is initialized: {noformat} Caused by: java.io.IOException: Problem starting http server at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1389) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:475) ... 4 more Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source DelegationTokenSecretManagerMetrics already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) at org.apache.hadoop.metrics2.MetricsSystem.register(MetricsSystem.java:71) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$DelegationTokenSecretManagerMetrics.create(AbstractDelegationTokenSecretManager.java:878) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.(AbstractDelegationTokenSecretManager.java:152) at org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$DelegationTokenSecretManager.(DelegationTokenManager.java:72) at org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.(DelegationTokenManager.java:122) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.initTokenManager(DelegationTokenAuthenticationHandler.java:161) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.init(DelegationTokenAuthenticationHandler.java:130) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.initializeAuthHandler(DelegationTokenAuthenticationFilter.java:214) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:180) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.init(RMAuthenticationFilter.java:53){noformat} This can happen if MetricsSystemImpl#init is called and multiple metrics are registered with the same name. A proposed solution is to declare the metrics in AbstractDelegationTokenSecretManager as singleton, which would prevent multiple instances DelegationTokenSecretManagerMetrics from being registered. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18167) Add metrics to track delegation token secret manager operations
[ https://issues.apache.org/jira/browse/HADOOP-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HADOOP-18167: -- Attachment: HADOOP-18167-branch-2.10-4.patch > Add metrics to track delegation token secret manager operations > --- > > Key: HADOOP-18167 > URL: https://issues.apache.org/jira/browse/HADOOP-18167 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-18167-branch-2.10-2.patch, > HADOOP-18167-branch-2.10-3.patch, HADOOP-18167-branch-2.10-4.patch, > HADOOP-18167-branch-2.10.patch, HADOOP-18167-branch-3.3.patch > > Time Spent: 5h 50m > Remaining Estimate: 0h > > New metrics to track operations that store, update and remove delegation > tokens in implementations of AbstractDelegationTokenSecretManager. This will > help evaluate the impact of using different secret managers and add > optimizations. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18167) Add metrics to track delegation token secret manager operations
[ https://issues.apache.org/jira/browse/HADOOP-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HADOOP-18167: -- Attachment: HADOOP-18167-branch-3.3.patch > Add metrics to track delegation token secret manager operations > --- > > Key: HADOOP-18167 > URL: https://issues.apache.org/jira/browse/HADOOP-18167 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-18167-branch-2.10-2.patch, > HADOOP-18167-branch-2.10-3.patch, HADOOP-18167-branch-2.10.patch, > HADOOP-18167-branch-3.3.patch > > Time Spent: 5h 50m > Remaining Estimate: 0h > > New metrics to track operations that store, update and remove delegation > tokens in implementations of AbstractDelegationTokenSecretManager. This will > help evaluate the impact of using different secret managers and add > optimizations. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18167) Add metrics to track delegation token secret manager operations
[ https://issues.apache.org/jira/browse/HADOOP-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HADOOP-18167: -- Attachment: HADOOP-18167-branch-2.10-3.patch > Add metrics to track delegation token secret manager operations > --- > > Key: HADOOP-18167 > URL: https://issues.apache.org/jira/browse/HADOOP-18167 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-18167-branch-2.10-2.patch, > HADOOP-18167-branch-2.10-3.patch, HADOOP-18167-branch-2.10.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > New metrics to track operations that store, update and remove delegation > tokens in implementations of AbstractDelegationTokenSecretManager. This will > help evaluate the impact of using different secret managers and add > optimizations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18167) Add metrics to track delegation token secret manager operations
[ https://issues.apache.org/jira/browse/HADOOP-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522530#comment-17522530 ] Hector Sandoval Chaverri commented on HADOOP-18167: --- [~fengnanli] / [~jing9] / [~inigoiri] Would you be able to help review this, since Owen is out for next week? Thank you! > Add metrics to track delegation token secret manager operations > --- > > Key: HADOOP-18167 > URL: https://issues.apache.org/jira/browse/HADOOP-18167 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-18167-branch-2.10-2.patch, > HADOOP-18167-branch-2.10.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > New metrics to track operations that store, update and remove delegation > tokens in implementations of AbstractDelegationTokenSecretManager. This will > help evaluate the impact of using different secret managers and add > optimizations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18167) Add metrics to track delegation token secret manager operations
[ https://issues.apache.org/jira/browse/HADOOP-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HADOOP-18167: -- Attachment: HADOOP-18167-branch-2.10-2.patch > Add metrics to track delegation token secret manager operations > --- > > Key: HADOOP-18167 > URL: https://issues.apache.org/jira/browse/HADOOP-18167 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-18167-branch-2.10-2.patch, > HADOOP-18167-branch-2.10.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > New metrics to track operations that store, update and remove delegation > tokens in implementations of AbstractDelegationTokenSecretManager. This will > help evaluate the impact of using different secret managers and add > optimizations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18167) Add metrics to track delegation token secret manager operations
[ https://issues.apache.org/jira/browse/HADOOP-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HADOOP-18167: -- Attachment: HADOOP-18167-branch-2.10.patch Status: Patch Available (was: Open) > Add metrics to track delegation token secret manager operations > --- > > Key: HADOOP-18167 > URL: https://issues.apache.org/jira/browse/HADOOP-18167 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-18167-branch-2.10.patch > > Time Spent: 20m > Remaining Estimate: 0h > > New metrics to track operations that store, update and remove delegation > tokens in implementations of AbstractDelegationTokenSecretManager. This will > help evaluate the impact of using different secret managers and add > optimizations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18167) Add metrics to track delegation token secret manager operations
[ https://issues.apache.org/jira/browse/HADOOP-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510910#comment-17510910 ] Hector Sandoval Chaverri commented on HADOOP-18167: --- Hi [~ste...@apache.org], I saw that a few classes , such as S3AInstrumentation, implement IOStatisticsSource and use IOStatisticsStore to track different counters. Is this the approach that you think we should follow? Could you also help explain what's the consumer of the IOStatistics? > Add metrics to track delegation token secret manager operations > --- > > Key: HADOOP-18167 > URL: https://issues.apache.org/jira/browse/HADOOP-18167 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > New metrics to track operations that store, update and remove delegation > tokens in implementations of AbstractDelegationTokenSecretManager. This will > help evaluate the impact of using different secret managers and add > optimizations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18167) Add metrics to track delegation token secret manager operations
Hector Sandoval Chaverri created HADOOP-18167: - Summary: Add metrics to track delegation token secret manager operations Key: HADOOP-18167 URL: https://issues.apache.org/jira/browse/HADOOP-18167 Project: Hadoop Common Issue Type: Improvement Reporter: Hector Sandoval Chaverri New metrics to track operations that store, update and remove delegation tokens in implementations of AbstractDelegationTokenSecretManager. This will help evaluate the impact of using different secret managers and add optimizations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18167) Add metrics to track delegation token secret manager operations
Hector Sandoval Chaverri created HADOOP-18167: - Summary: Add metrics to track delegation token secret manager operations Key: HADOOP-18167 URL: https://issues.apache.org/jira/browse/HADOOP-18167 Project: Hadoop Common Issue Type: Improvement Reporter: Hector Sandoval Chaverri New metrics to track operations that store, update and remove delegation tokens in implementations of AbstractDelegationTokenSecretManager. This will help evaluate the impact of using different secret managers and add optimizations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17819) Add extensions to ProtobufRpcEngine RequestHeaderProto
Hector Sandoval Chaverri created HADOOP-17819: - Summary: Add extensions to ProtobufRpcEngine RequestHeaderProto Key: HADOOP-17819 URL: https://issues.apache.org/jira/browse/HADOOP-17819 Project: Hadoop Common Issue Type: Improvement Components: common Reporter: Hector Sandoval Chaverri The header used in ProtobufRpcEngine messages doesn't allow for new properties to be added by child classes. We can add a range of extensions that can be useful for proto classes that need to extend RequestHeaderProto. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17819) Add extensions to ProtobufRpcEngine RequestHeaderProto
Hector Sandoval Chaverri created HADOOP-17819: - Summary: Add extensions to ProtobufRpcEngine RequestHeaderProto Key: HADOOP-17819 URL: https://issues.apache.org/jira/browse/HADOOP-17819 Project: Hadoop Common Issue Type: Improvement Components: common Reporter: Hector Sandoval Chaverri The header used in ProtobufRpcEngine messages doesn't allow for new properties to be added by child classes. We can add a range of extensions that can be useful for proto classes that need to extend RequestHeaderProto. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17680) Allow ProtobufRpcEngine to be extensible
[ https://issues.apache.org/jira/browse/HADOOP-17680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338645#comment-17338645 ] Hector Sandoval Chaverri commented on HADOOP-17680: --- Thanks for taking a look [~shv]. I've made the following changes: # Moved this Jira to the Hadoop Common project. # Reverted the changes to make members protected and only added the getters that are needed. There are still warnings about pre-existing issues, regarding the number of parameters in the Invoker constructor. > Allow ProtobufRpcEngine to be extensible > > > Key: HADOOP-17680 > URL: https://issues.apache.org/jira/browse/HADOOP-17680 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations > to extend some of its inner classes (e.g. Invoker and > Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such > that overriding them would result in a lot of code duplication (e.g. > Invoker#invoke and Server.ProtoBufRpcInvoker#call). > When implementing a new RpcEngine, it would be helpful to reuse most of the > code already in ProtobufRpcEngine. This would allow new fields to be added to > the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-17680) Allow ProtobufRpcEngine to be extensible
[ https://issues.apache.org/jira/browse/HADOOP-17680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri reassigned HADOOP-17680: - Component/s: (was: hdfs) common Key: HADOOP-17680 (was: HDFS-15912) Assignee: (was: Hector Sandoval Chaverri) Project: Hadoop Common (was: Hadoop HDFS) > Allow ProtobufRpcEngine to be extensible > > > Key: HADOOP-17680 > URL: https://issues.apache.org/jira/browse/HADOOP-17680 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations > to extend some of its inner classes (e.g. Invoker and > Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such > that overriding them would result in a lot of code duplication (e.g. > Invoker#invoke and Server.ProtoBufRpcInvoker#call). > When implementing a new RpcEngine, it would be helpful to reuse most of the > code already in ProtobufRpcEngine. This would allow new fields to be added to > the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15912) Allow ProtobufRpcEngine to be extensible
[ https://issues.apache.org/jira/browse/HDFS-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320603#comment-17320603 ] Hector Sandoval Chaverri commented on HDFS-15912: - The following changes are proposed on ProtobufRpcEngine and ProtobufRpcEngine2: # Change Invoker class, its constructors and fields from private to protected. # Move creation of RpcProtobufRequest object out of the Invoker#invoke method and into a new Invoker#constructRpcRequest method that can be overriden. # Create overload of the Server.ProtoBufRpcInvoker#call method that can be invoked after the RPC request is obtained. > Allow ProtobufRpcEngine to be extensible > > > Key: HDFS-15912 > URL: https://issues.apache.org/jira/browse/HDFS-15912 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations > to extend some of its inner classes (e.g. Invoker and > Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such > that overriding them would result in a lot of code duplication (e.g. > Invoker#invoke and Server.ProtoBufRpcInvoker#call). > When implementing a new RpcEngine, it would be helpful to reuse most of the > code already in ProtobufRpcEngine. This would allow new fields to be added to > the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15912) Allow ProtobufRpcEngine to be extensible
[ https://issues.apache.org/jira/browse/HDFS-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri updated HDFS-15912: Status: Patch Available (was: In Progress) > Allow ProtobufRpcEngine to be extensible > > > Key: HDFS-15912 > URL: https://issues.apache.org/jira/browse/HDFS-15912 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations > to extend some of its inner classes (e.g. Invoker and > Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such > that overriding them would result in a lot of code duplication (e.g. > Invoker#invoke and Server.ProtoBufRpcInvoker#call). > When implementing a new RpcEngine, it would be helpful to reuse most of the > code already in ProtobufRpcEngine. This would allow new fields to be added to > the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-15912) Allow ProtobufRpcEngine to be extensible
[ https://issues.apache.org/jira/browse/HDFS-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-15912 started by Hector Sandoval Chaverri. --- > Allow ProtobufRpcEngine to be extensible > > > Key: HDFS-15912 > URL: https://issues.apache.org/jira/browse/HDFS-15912 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations > to extend some of its inner classes (e.g. Invoker and > Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such > that overriding them would result in a lot of code duplication (e.g. > Invoker#invoke and Server.ProtoBufRpcInvoker#call). > When implementing a new RpcEngine, it would be helpful to reuse most of the > code already in ProtobufRpcEngine. This would allow new fields to be added to > the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15912) Allow ProtobufRpcEngine to be extensible
[ https://issues.apache.org/jira/browse/HDFS-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Sandoval Chaverri reassigned HDFS-15912: --- Assignee: Hector Sandoval Chaverri > Allow ProtobufRpcEngine to be extensible > > > Key: HDFS-15912 > URL: https://issues.apache.org/jira/browse/HDFS-15912 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations > to extend some of its inner classes (e.g. Invoker and > Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such > that overriding them would result in a lot of code duplication (e.g. > Invoker#invoke and Server.ProtoBufRpcInvoker#call). > When implementing a new RpcEngine, it would be helpful to reuse most of the > code already in ProtobufRpcEngine. This would allow new fields to be added to > the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15912) Allow ProtobufRpcEngine to be extensible
Hector Sandoval Chaverri created HDFS-15912: --- Summary: Allow ProtobufRpcEngine to be extensible Key: HDFS-15912 URL: https://issues.apache.org/jira/browse/HDFS-15912 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Hector Sandoval Chaverri The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations to extend some of its inner classes (e.g. Invoker and Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such that overriding them would result in a lot of code duplication (e.g. Invoker#invoke and Server.ProtoBufRpcInvoker#call). When implementing a new RpcEngine, it would be helpful to reuse most of the code already in ProtobufRpcEngine. This would allow new fields to be added to the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15912) Allow ProtobufRpcEngine to be extensible
Hector Sandoval Chaverri created HDFS-15912: --- Summary: Allow ProtobufRpcEngine to be extensible Key: HDFS-15912 URL: https://issues.apache.org/jira/browse/HDFS-15912 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Hector Sandoval Chaverri The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations to extend some of its inner classes (e.g. Invoker and Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such that overriding them would result in a lot of code duplication (e.g. Invoker#invoke and Server.ProtoBufRpcInvoker#call). When implementing a new RpcEngine, it would be helpful to reuse most of the code already in ProtobufRpcEngine. This would allow new fields to be added to the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15623) Respect configured values of rpc.engine
[ https://issues.apache.org/jira/browse/HDFS-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219841#comment-17219841 ] Hector Sandoval Chaverri commented on HDFS-15623: - [Pull Request #2403|https://github.com/apache/hadoop/pull/2403] updates {{RPC.setProtocolEngine()}} to prevent configured values of {{rpc.engine}} from being overwritten. With the new behavior, we can use implementations of {{RpcEngine}} other than the default ({{ProtobufRpcEngine2}}) by specifying them in the Configuration (e.g. {{core-default.xml}}). > Respect configured values of rpc.engine > --- > > Key: HDFS-15623 > URL: https://issues.apache.org/jira/browse/HDFS-15623 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The HDFS Configuration allows users to specify the RPCEngine implementation > to use when communicating with Datanodes and Namenodes. However, the value is > overwritten to ProtobufRpcEngine.class in different classes. As an example in > NameNodeRpcServer: > {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, > ProtobufRpcEngine.class);}} > {{The configured value of rpc.engine.[protocolName] should be respected to > allow for other implementations of RPCEngine to be used}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15623) Respect configured values of rpc.engine
[ https://issues.apache.org/jira/browse/HDFS-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218606#comment-17218606 ] Hector Sandoval Chaverri commented on HDFS-15623: - Thanks [~shv], I can add the {{rpc.engine}} implementations to {{core-default.xml}} and remove the RPC.setProtocolEngine() calls. To clarify, should we only do this on {{NameNodeRpcServer}} or on every instance? > Respect configured values of rpc.engine > --- > > Key: HDFS-15623 > URL: https://issues.apache.org/jira/browse/HDFS-15623 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Priority: Major > > The HDFS Configuration allows users to specify the RPCEngine implementation > to use when communicating with Datanodes and Namenodes. However, the value is > overwritten to ProtobufRpcEngine.class in different classes. As an example in > NameNodeRpcServer: > {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, > ProtobufRpcEngine.class);}} > {{The configured value of rpc.engine.[protocolName] should be respected to > allow for other implementations of RPCEngine to be used}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15623) Respect configured values of rpc.engine
[ https://issues.apache.org/jira/browse/HDFS-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216901#comment-17216901 ] Hector Sandoval Chaverri commented on HDFS-15623: - Hi [~jianghuazhu], Thanks for looking into this. I'm not sure if I understand your comments completely, but this is what I see: # There are multiple implementations of RpcEngine, such as ProtobufRpcEngine, ProtobufRpcEngine2 and WriteableRpcEngine. Users can also create their own custom implementations, but won't be able to make use of them without this fix. # The RPC.setProtocolEngine method updates the configuration with the specified class value: {code:java} conf.setClass(ENGINE_PROP + "." + protocol.getName(), engine, RpcEngine.class){code} In the current state, is there a proper way to use a custom implementation of RpcEngine? > Respect configured values of rpc.engine > --- > > Key: HDFS-15623 > URL: https://issues.apache.org/jira/browse/HDFS-15623 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Priority: Major > > The HDFS Configuration allows users to specify the RPCEngine implementation > to use when communicating with Datanodes and Namenodes. However, the value is > overwritten to ProtobufRpcEngine.class in different classes. As an example in > NameNodeRpcServer: > {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, > ProtobufRpcEngine.class);}} > {{The configured value of rpc.engine.[protocolName] should be respected to > allow for other implementations of RPCEngine to be used}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15623) Respect configured values of rpc.engine
Hector Sandoval Chaverri created HDFS-15623: --- Summary: Respect configured values of rpc.engine Key: HDFS-15623 URL: https://issues.apache.org/jira/browse/HDFS-15623 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Hector Sandoval Chaverri The HDFS Configuration allows users to specify the RPCEngine implementation to use when communicating with Datanodes and Namenodes. However, the value is overwritten to ProtobufRpcEngine.class in different classes. As an example in NameNodeRpcServer: {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, ProtobufRpcEngine.class);}} {{The configured value of rpc.engine.[protocolName] should be respected to allow for other implementations of RPCEngine to be used}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15623) Respect configured values of rpc.engine
Hector Sandoval Chaverri created HDFS-15623: --- Summary: Respect configured values of rpc.engine Key: HDFS-15623 URL: https://issues.apache.org/jira/browse/HDFS-15623 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Hector Sandoval Chaverri The HDFS Configuration allows users to specify the RPCEngine implementation to use when communicating with Datanodes and Namenodes. However, the value is overwritten to ProtobufRpcEngine.class in different classes. As an example in NameNodeRpcServer: {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, ProtobufRpcEngine.class);}} {{The configured value of rpc.engine.[protocolName] should be respected to allow for other implementations of RPCEngine to be used}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org