[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-10-23 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15599835#comment-15599835
 ] 

Xiaoyu Yao edited comment on HDFS-10757 at 10/23/16 3:28 PM:
-

Thanks [~brahma] for reporting the issue. HADOOP-13748 is a test bug that was 
surfaced with this change. 
I've revert this one from trunk, branch-2 and branch-2.8 and convert it to 
hadoop common. I will recommit it after the unit test fix for HADOOP-13748 is 
in.


was (Author: xyao):
Thanks [~brahma] for reporting the issue. HADOOP-13748 is a test bug that was 
surfaced with this change. 
Once we have the fix for HADOOP-13748 in, I will revert this one and convert it 
to hadoop common.

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Xiaoyu Yao
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-10757.00.patch, HDFS-10757.01.patch, 
> HDFS-10757.02.patch, HDFS-10757.03.patch
>
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-21 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429781#comment-15429781
 ] 

Arun Suresh edited comment on HDFS-10757 at 8/21/16 4:21 PM:
-

bq. Now when Sever1 instantiates a client2 to make a call to Server2 it should 
not use ugi1 because the authentication context in ugi1 is not relevant for 
this call. In my opinion a new ugi2 should be explicitly setup, which has the 
right credentials.
[~jnp], I agree with you. I also feel maybe the ugi should be setup outside of 
the KMSCSP. This would also simplify the code. We could
# either modify the apis to include a ugi argument and the caller should ensure 
the ugi has the credentials (This would be equivalent to probably documenting 
somewhere that the client has to ensure that the keyprovider apis are always 
called inside a ugi.doAs())
# Maybe create a new {{KeyProviderExtension}} implementation that takes an 
existing KeyProvider, and a ugi and invokes all the keyprovider's API via the 
the provided ugi.doAs() context.
Option 2 might actually be easier to implement.

Thoughts ?


was (Author: asuresh):
bq. Now when Sever1 instantiates a client2 to make a call to Server2 it should 
not use ugi1 because the authentication context in ugi1 is not relevant for 
this call. In my opinion a new ugi2 should be explicitly setup, which has the 
right credentials.
[~jnp], I agree with you. I also feel maybe the ugi should be setup outside of 
the KMSCSP. This would also simplify the code. We could
# either modify the apis to include a ugi argument and the caller should ensure 
the ugi has the credentials (This would be equivalent to probably documenting 
somewhere that the client has to ensure that the keyprovider apis are always 
called inside a ugi.doAs())
# Maybe create a new {{KeyProviderExtension}} implementation that takes an 
existing KeyProvider, and a ugi and invokes all the keyprovider's API via the 
the provided ugi.doAs() context.
Option 2 might actually be easier to implement.

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Xiaoyu Yao
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-17 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423519#comment-15423519
 ] 

Xiaoyu Yao edited comment on HDFS-10757 at 8/17/16 7:02 PM:


Thanks [~xiaochen], [~asuresh] and [~jnp] for the discussion. The original 
issue leaking thread in the title of HDFS-7718 has been fixed with HADOOP-11368 
before HDFS-7718 is resolved. HDFS-7718 introduced KeyProviderCache to the 
ClientContext. Maybe we should revisit the goal of KeyProviderCache, which 
seems to be one of the sources of the problem. KeyProviderCache contains a map 
with key based on KMS URI. When combining with KMSClientProvider that caches 
UGI(actualUgi), wrong context may be used as the example [~jnp] mentioned above.

HADOOP-13381 changed the KMSClientProvider#createConnection() by checking if 
the currentUGI contains kms-dt but only for non-proxy currentUGI. Correct me if 
I'm wrong: when the currentUGI is a new proxy user with kms-dt, I don't think 
we should use the stale actualUGI here. 

In a recent change of KMSClientProvider by HADOOP-13155, we can see that the 
KeyProviderCache is bypassed by creating a new instance of KMSClientProvider 
for each of the renew/cancel operation. 


was (Author: xyao):
Thanks [~xiaochen], [~asuresh] and [~jnp] for the discussion. The original 
issue leaking thread in the title of HDFS-7718 has been fixed with HADOOP-11368 
before HDFS-7718 is resolved. HDFS-7718 introduced KeyProviderCache to the 
ClientContext. Maybe we should revisit the goal of KeyProviderCache, which 
seems to be one of the sources of the problem. KeyProviderCache contains a map 
with key based on KMS URI. When combining with KMSClientProvider that caches 
UGI(actualUgi), wrong context may be used as the example [~jnp] mentioned above.

HADOOP-13381 changed the KMSClientProvider#createConnection() by checking if 
the currentUGI contains kms-dt but only for non-proxy currentUGI. Correct me if 
I'm wrong: when the currentUGI is a new proxy user with kms-dt, I don't think 
we should use the stale actualUGI here. 

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Xiaoyu Yao
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-17 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423519#comment-15423519
 ] 

Xiaoyu Yao edited comment on HDFS-10757 at 8/17/16 6:49 PM:


Thanks [~xiaochen], [~asuresh] and [~jnp] for the discussion. The original 
issue leaking thread in the title of HDFS-7718 has been fixed with HADOOP-11368 
before HDFS-7718 is resolved. HDFS-7718 introduced KeyProviderCache to the 
ClientContext. Maybe we should revisit the goal of KeyProviderCache, which 
seems to be one of the sources of the problem. KeyProviderCache contains a map 
with key based on KMS URI. When combining with KMSClientProvider that caches 
UGI(actualUgi), wrong context may be used as the example [~jnp] mentioned above.

HADOOP-13381 changed the KMSClientProvider#createConnection() by checking if 
the currentUGI contains kms-dt but only for non-proxy currentUGI. Correct me if 
I'm wrong: when the currentUGI is a new proxy user with kms-dt, I don't think 
we should use the stale actualUGI here. 


was (Author: xyao):
Thanks [~xiaochen], [~asuresh] and [~jnp] for the discussion. The original 
issue leaking thread in the title of HDFS-7718 has been fixed with HADOOP-11368 
before HDFS-7718 is resolved. HDFS-7718 introduced KeyProviderCache to the 
ClientContext. Maybe we should revisit the goal of KeyProviderCache, which 
seems to be one of the sources of the problem. KeyProviderCache contains a map 
with key based on KMS URI. When combining with KMSClientProvider that caches 
UGI(actualUgi), wrong context may be used as the example [~jnp] mentioned above.

HADOOP-13381 changed the KMSClientProvider#createConnection() by checking if 
the currentUGI contains kms-dt but only for non-proxy currentUGI. Correct me if 
I'm wrong: when the currentUGI is a new proxy user with kms-dt, I don't think 
we should use the stale actualUGI here. Also, we have a few KMS operations 
(such as add, and renew/cancel delegation token from HADOOP-13155) that don't 
go through KMSClientProvider#createConnection() but use the cached actualUGI. 
It will cause similar issue when using with KeyProviderCache enabled. 

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Xiaoyu Yao
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-12 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419234#comment-15419234
 ] 

Arun Suresh edited comment on HDFS-10757 at 8/12/16 6:09 PM:
-

Hmmm... I think I remember the context for why it was implemented as such.

bq. If the currentUgi is a proxy user it will have a real UGI. 
currentUgi.getRealUser() should give us the actual ugi.
That is true, but the KMSCP was being implemented around the same time as 
HADOOP-10835. That JIRA was meant to plumb proxy user through HTTP. If you look 
at this 
[snippet|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationFilter.java#L247-L267]
 of code, you will notice that if the currentUser is authenticated via a 
delegation token, the realUser is actually a dummy user created via 
{{UserGroupInformation.createRemoteUser()}} and does not have any credentials 
to create the connection, which is why I guess it was decided to have a 
loginUgi/actualUgi created in the KMSCP constructor.


was (Author: asuresh):
Hmmm... I think I remember the context for why it was implemented as such.

bq. If the currentUgi is a proxy user it will have a real UGI. 
currentUgi.getRealUser() should give us the actual ugi.
That is true, but the KMSCP was being implemented around the same time as 
HADOOP-10835. That JIRA was meant to plumb proxy user through HTTP. If you look 
at this 
[snippet|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationFilter.java#L247-L267]
 of code, you will notice that if the currentUser is authenticated via a 
delegation token, the realUser is actually a dummy user created via {{ 
UserGroupInformation.createRemoteUser()}} and does not have any credentials to 
create the connection, which is why I guess it was decided to have a 
loginUgi/actualUgi created in the KMSCP constructor.

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-12 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418422#comment-15418422
 ] 

Arun Suresh edited comment on HDFS-10757 at 8/12/16 6:37 AM:
-

[~jnp],
> If the currentUGI has a realUgi, use the realUgi as actualUgi or use the 
> currentUgi as the actualUgi
if currentUgi is a proxy user, the later case wont work... right ?

As Xiao commented, the actualUgi was added to support proxy users. 



was (Author: asuresh):
[~jnp],
> If the currentUGI has a realUgi, use the realUgi as actualUgi or use the 
> currentUgi as the actualUgi
if currentUgi is a proxy user, the later case wont wont.

As Xiao commented, the actualUgi was added to support proxy users. 


> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-12 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418409#comment-15418409
 ] 

Jitendra Nath Pandey edited comment on HDFS-10757 at 8/12/16 6:30 AM:
--

  I think storing the {{actualUgi}} in KMSClientProvider is incorrect because 
the providers are cached for a long time, and the currentUGI may be completely 
different from the actualUGI.  Therefore, it may be a good idea to consider 
removing actualUgi from KMSClientProvider. I am inclined to say that setting up 
of the UGI should be done by client code using the FileSystem. The 
KMSClientProvider on every call should only check following: If the currentUGI 
has a realUgi, use the realUgi as actualUgi or use the currentUgi as the 
actualUgi. 
  I may not have the whole context on why actualUgi was added in the 
constructor of KMSClientProvider, but would like to understand.


was (Author: jnp):
  I think storing the {{actualUgi}} in KMSClientProvider is incorrect because 
the providers are cached for a long time, and the currentUGI may be completely 
different from the actualUGI.  Therefore, it may be a good idea to consider 
removing actualUgi from KMSClientProvider. I am inclined to say that setting up 
of the UGI should be done by client code using the FileSystem. The 
KMSClientProvider on every call should only check following: If the currentUGI 
has a realUgi, us the realUgi as actualUgi or use the currentUgi as the 
actualUgi. 
  I may not have the whole context on why actualUgi was added in the 
constructor of KMSClientProvider, but would like to understand.

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org