[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15599835#comment-15599835 ] Xiaoyu Yao edited comment on HDFS-10757 at 10/23/16 3:28 PM: - Thanks [~brahma] for reporting the issue. HADOOP-13748 is a test bug that was surfaced with this change. I've revert this one from trunk, branch-2 and branch-2.8 and convert it to hadoop common. I will recommit it after the unit test fix for HADOOP-13748 is in. was (Author: xyao): Thanks [~brahma] for reporting the issue. HADOOP-13748 is a test bug that was surfaced with this change. Once we have the fix for HADOOP-13748 in, I will revert this one and convert it to hadoop common. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Xiaoyu Yao >Priority: Critical > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-10757.00.patch, HDFS-10757.01.patch, > HDFS-10757.02.patch, HDFS-10757.03.patch > > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429781#comment-15429781 ] Arun Suresh edited comment on HDFS-10757 at 8/21/16 4:21 PM: - bq. Now when Sever1 instantiates a client2 to make a call to Server2 it should not use ugi1 because the authentication context in ugi1 is not relevant for this call. In my opinion a new ugi2 should be explicitly setup, which has the right credentials. [~jnp], I agree with you. I also feel maybe the ugi should be setup outside of the KMSCSP. This would also simplify the code. We could # either modify the apis to include a ugi argument and the caller should ensure the ugi has the credentials (This would be equivalent to probably documenting somewhere that the client has to ensure that the keyprovider apis are always called inside a ugi.doAs()) # Maybe create a new {{KeyProviderExtension}} implementation that takes an existing KeyProvider, and a ugi and invokes all the keyprovider's API via the the provided ugi.doAs() context. Option 2 might actually be easier to implement. Thoughts ? was (Author: asuresh): bq. Now when Sever1 instantiates a client2 to make a call to Server2 it should not use ugi1 because the authentication context in ugi1 is not relevant for this call. In my opinion a new ugi2 should be explicitly setup, which has the right credentials. [~jnp], I agree with you. I also feel maybe the ugi should be setup outside of the KMSCSP. This would also simplify the code. We could # either modify the apis to include a ugi argument and the caller should ensure the ugi has the credentials (This would be equivalent to probably documenting somewhere that the client has to ensure that the keyprovider apis are always called inside a ugi.doAs()) # Maybe create a new {{KeyProviderExtension}} implementation that takes an existing KeyProvider, and a ugi and invokes all the keyprovider's API via the the provided ugi.doAs() context. Option 2 might actually be easier to implement. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Xiaoyu Yao >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423519#comment-15423519 ] Xiaoyu Yao edited comment on HDFS-10757 at 8/17/16 7:02 PM: Thanks [~xiaochen], [~asuresh] and [~jnp] for the discussion. The original issue leaking thread in the title of HDFS-7718 has been fixed with HADOOP-11368 before HDFS-7718 is resolved. HDFS-7718 introduced KeyProviderCache to the ClientContext. Maybe we should revisit the goal of KeyProviderCache, which seems to be one of the sources of the problem. KeyProviderCache contains a map with key based on KMS URI. When combining with KMSClientProvider that caches UGI(actualUgi), wrong context may be used as the example [~jnp] mentioned above. HADOOP-13381 changed the KMSClientProvider#createConnection() by checking if the currentUGI contains kms-dt but only for non-proxy currentUGI. Correct me if I'm wrong: when the currentUGI is a new proxy user with kms-dt, I don't think we should use the stale actualUGI here. In a recent change of KMSClientProvider by HADOOP-13155, we can see that the KeyProviderCache is bypassed by creating a new instance of KMSClientProvider for each of the renew/cancel operation. was (Author: xyao): Thanks [~xiaochen], [~asuresh] and [~jnp] for the discussion. The original issue leaking thread in the title of HDFS-7718 has been fixed with HADOOP-11368 before HDFS-7718 is resolved. HDFS-7718 introduced KeyProviderCache to the ClientContext. Maybe we should revisit the goal of KeyProviderCache, which seems to be one of the sources of the problem. KeyProviderCache contains a map with key based on KMS URI. When combining with KMSClientProvider that caches UGI(actualUgi), wrong context may be used as the example [~jnp] mentioned above. HADOOP-13381 changed the KMSClientProvider#createConnection() by checking if the currentUGI contains kms-dt but only for non-proxy currentUGI. Correct me if I'm wrong: when the currentUGI is a new proxy user with kms-dt, I don't think we should use the stale actualUGI here. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Xiaoyu Yao >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423519#comment-15423519 ] Xiaoyu Yao edited comment on HDFS-10757 at 8/17/16 6:49 PM: Thanks [~xiaochen], [~asuresh] and [~jnp] for the discussion. The original issue leaking thread in the title of HDFS-7718 has been fixed with HADOOP-11368 before HDFS-7718 is resolved. HDFS-7718 introduced KeyProviderCache to the ClientContext. Maybe we should revisit the goal of KeyProviderCache, which seems to be one of the sources of the problem. KeyProviderCache contains a map with key based on KMS URI. When combining with KMSClientProvider that caches UGI(actualUgi), wrong context may be used as the example [~jnp] mentioned above. HADOOP-13381 changed the KMSClientProvider#createConnection() by checking if the currentUGI contains kms-dt but only for non-proxy currentUGI. Correct me if I'm wrong: when the currentUGI is a new proxy user with kms-dt, I don't think we should use the stale actualUGI here. was (Author: xyao): Thanks [~xiaochen], [~asuresh] and [~jnp] for the discussion. The original issue leaking thread in the title of HDFS-7718 has been fixed with HADOOP-11368 before HDFS-7718 is resolved. HDFS-7718 introduced KeyProviderCache to the ClientContext. Maybe we should revisit the goal of KeyProviderCache, which seems to be one of the sources of the problem. KeyProviderCache contains a map with key based on KMS URI. When combining with KMSClientProvider that caches UGI(actualUgi), wrong context may be used as the example [~jnp] mentioned above. HADOOP-13381 changed the KMSClientProvider#createConnection() by checking if the currentUGI contains kms-dt but only for non-proxy currentUGI. Correct me if I'm wrong: when the currentUGI is a new proxy user with kms-dt, I don't think we should use the stale actualUGI here. Also, we have a few KMS operations (such as add, and renew/cancel delegation token from HADOOP-13155) that don't go through KMSClientProvider#createConnection() but use the cached actualUGI. It will cause similar issue when using with KeyProviderCache enabled. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Xiaoyu Yao >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419234#comment-15419234 ] Arun Suresh edited comment on HDFS-10757 at 8/12/16 6:09 PM: - Hmmm... I think I remember the context for why it was implemented as such. bq. If the currentUgi is a proxy user it will have a real UGI. currentUgi.getRealUser() should give us the actual ugi. That is true, but the KMSCP was being implemented around the same time as HADOOP-10835. That JIRA was meant to plumb proxy user through HTTP. If you look at this [snippet|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationFilter.java#L247-L267] of code, you will notice that if the currentUser is authenticated via a delegation token, the realUser is actually a dummy user created via {{UserGroupInformation.createRemoteUser()}} and does not have any credentials to create the connection, which is why I guess it was decided to have a loginUgi/actualUgi created in the KMSCP constructor. was (Author: asuresh): Hmmm... I think I remember the context for why it was implemented as such. bq. If the currentUgi is a proxy user it will have a real UGI. currentUgi.getRealUser() should give us the actual ugi. That is true, but the KMSCP was being implemented around the same time as HADOOP-10835. That JIRA was meant to plumb proxy user through HTTP. If you look at this [snippet|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationFilter.java#L247-L267] of code, you will notice that if the currentUser is authenticated via a delegation token, the realUser is actually a dummy user created via {{ UserGroupInformation.createRemoteUser()}} and does not have any credentials to create the connection, which is why I guess it was decided to have a loginUgi/actualUgi created in the KMSCP constructor. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418422#comment-15418422 ] Arun Suresh edited comment on HDFS-10757 at 8/12/16 6:37 AM: - [~jnp], > If the currentUGI has a realUgi, use the realUgi as actualUgi or use the > currentUgi as the actualUgi if currentUgi is a proxy user, the later case wont work... right ? As Xiao commented, the actualUgi was added to support proxy users. was (Author: asuresh): [~jnp], > If the currentUGI has a realUgi, use the realUgi as actualUgi or use the > currentUgi as the actualUgi if currentUgi is a proxy user, the later case wont wont. As Xiao commented, the actualUgi was added to support proxy users. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418409#comment-15418409 ] Jitendra Nath Pandey edited comment on HDFS-10757 at 8/12/16 6:30 AM: -- I think storing the {{actualUgi}} in KMSClientProvider is incorrect because the providers are cached for a long time, and the currentUGI may be completely different from the actualUGI. Therefore, it may be a good idea to consider removing actualUgi from KMSClientProvider. I am inclined to say that setting up of the UGI should be done by client code using the FileSystem. The KMSClientProvider on every call should only check following: If the currentUGI has a realUgi, use the realUgi as actualUgi or use the currentUgi as the actualUgi. I may not have the whole context on why actualUgi was added in the constructor of KMSClientProvider, but would like to understand. was (Author: jnp): I think storing the {{actualUgi}} in KMSClientProvider is incorrect because the providers are cached for a long time, and the currentUGI may be completely different from the actualUGI. Therefore, it may be a good idea to consider removing actualUgi from KMSClientProvider. I am inclined to say that setting up of the UGI should be done by client code using the FileSystem. The KMSClientProvider on every call should only check following: If the currentUGI has a realUgi, us the realUgi as actualUgi or use the currentUgi as the actualUgi. I may not have the whole context on why actualUgi was added in the constructor of KMSClientProvider, but would like to understand. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org