[jira] [Comment Edited] (HDFS-12357) Let NameNode to bypass external attribute provider for special user

2017-09-06 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155925#comment-16155925
 ] 

Manoj Govindassamy edited comment on HDFS-12357 at 9/6/17 8:13 PM:
---

Thanks for working on the patch revision [~yzhangal]. Overall looks good to me. 
+1. few nits.

1. {{FSDirectory.java#initUsersToBypassExtProvider}}
{noformat}
373 List bpUserList = new ArrayList();
374 for(int i = 0; i < bypassUsers.length; i++) {
375   String tmp = bypassUsers[i].trim();
376   if (!tmp.isEmpty()) {
377 bpUserList.add(tmp);
378   }
379 }
380 if (bpUserList.size() > 0) {
381   usersToBypassExtAttrProvider = new HashSet();
382   for(String user : bpUserList) {
383 LOG.info("Add user " + user + " to the list that will bypass 
external"
384 + " attribute provider.");
385 usersToBypassExtAttrProvider.add(user.trim());
386   }
387 }
{noformat}

The above 2 for loops can be simplified to 1 loop. Checking for trim and adding 
to _usersToBypassExtAttrProvider_ can be done in the same block.

2. {{TestINodeAttributeProvider}}
{noformat}
240 String[] bypassUsers = {"u2", "u3"};
{noformat}
* Can we please add "u4" also this list? yes, since this user is not in the 
bypass list the getFileStatus is going to differ from other users. Adding this 
non bypass user will make the test complete.
* And, it would be great if we can verify the base hdfs permission instead of 
assuming the same with the CALLED map contents.


was (Author: manojg):
Thanks for working on the patch revision [~yzhangal]. Overall looks good to me. 
+1. few nits.

1. {{FSDirectory.java#initUsersToBypassExtProvider}}
{noformat}
373 List bpUserList = new ArrayList();
374 for(int i = 0; i < bypassUsers.length; i++) {
375   String tmp = bypassUsers[i].trim();
376   if (!tmp.isEmpty()) {
377 bpUserList.add(tmp);
378   }
379 }
380 if (bpUserList.size() > 0) {
381   usersToBypassExtAttrProvider = new HashSet();
382   for(String user : bpUserList) {
383 LOG.info("Add user " + user + " to the list that will bypass 
external"
384 + " attribute provider.");
385 usersToBypassExtAttrProvider.add(user.trim());
386   }
387 }
{noformat}

The above 2 for loops can be simplified to 1 loop. Checking for trim and adding 
to _usersToBypassExtAttrProvider_ can be done in the same block.

2. {{TestINodeAttributeProvider}}
{noformat}
240 String[] bypassUsers = {"u2", "u3"};
{noformat}
Can we please add "u4" also this list? yes, since this user is not in the 
bypass list the getFileStatus is going to differ from other users. Adding this 
non bypass user will make the test complete.

> Let NameNode to bypass external attribute provider for special user
> ---
>
> Key: HDFS-12357
> URL: https://issues.apache.org/jira/browse/HDFS-12357
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-12357.001a.patch, HDFS-12357.001b.patch, 
> HDFS-12357.001.patch, HDFS-12357.002.patch, HDFS-12357.003.patch, 
> HDFS-12357.004.patch, HDFS-12357.005.patch, HDFS-12357.006.patch
>
>
> This is a third proposal to solve the problem described in HDFS-12202.
> The problem is, when we do distcp from one cluster to another (or within the 
> same cluster), in addition to copying file data, we copy the metadata from 
> source to target. If external attribute provider is enabled, the metadata may 
> be read from the provider, thus provider data read from source may be saved 
> to target HDFS. 
> We want to avoid saving metadata from external provider to HDFS, so we want 
> to bypass external provider when doing the distcp (or hadoop fs -cp) 
> operation.
> Two alternative approaches were proposed earlier, one in HDFS-12202, the 
> other in HDFS-12294. The proposal here is the third one.
> The idea is, we introduce a new config, that specifies a special user (or a 
> list of users), and let NN bypass external provider when the current user is 
> a special user.
> If we run applications as the special user that need data from external 
> attribute provider, then it won't work. So the constraint on this approach 
> is, the special users here should not run applications that need data from 
> external provider.
> Thanks [~asuresh] for proposing this idea and [~chris.douglas], [~daryn], 
> [~manojg] for the discussions in the other jiras. 
> I'm creating this one to discuss further.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HDFS-12357) Let NameNode to bypass external attribute provider for special user

2017-09-06 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155925#comment-16155925
 ] 

Manoj Govindassamy edited comment on HDFS-12357 at 9/6/17 8:10 PM:
---

Thanks for working on the patch revision [~yzhangal]. Overall looks good to me. 
+1. few nits.

1. {{FSDirectory.java#initUsersToBypassExtProvider}}
{noformat}
373 List bpUserList = new ArrayList();
374 for(int i = 0; i < bypassUsers.length; i++) {
375   String tmp = bypassUsers[i].trim();
376   if (!tmp.isEmpty()) {
377 bpUserList.add(tmp);
378   }
379 }
380 if (bpUserList.size() > 0) {
381   usersToBypassExtAttrProvider = new HashSet();
382   for(String user : bpUserList) {
383 LOG.info("Add user " + user + " to the list that will bypass 
external"
384 + " attribute provider.");
385 usersToBypassExtAttrProvider.add(user.trim());
386   }
387 }
{noformat}

The above 2 for loops can be simplified to 1 loop. Checking for trim and adding 
to _usersToBypassExtAttrProvider_ can be done in the same block.

2. {{TestINodeAttributeProvider}}
{noformat}
240 String[] bypassUsers = {"u2", "u3"};
{noformat}
Can we please add "u4" also this list? yes, since this user is not in the 
bypass list the getFileStatus is going to differ from other users. Adding this 
non bypass user will make the test complete.


was (Author: manojg):
Thanks for working on the patch revision [~yzhangal]. Overall looks good to me. 
+1, pending following nits.

1. {{FSDirectory.java#initUsersToBypassExtProvider}}
{noformat}
373 List bpUserList = new ArrayList();
374 for(int i = 0; i < bypassUsers.length; i++) {
375   String tmp = bypassUsers[i].trim();
376   if (!tmp.isEmpty()) {
377 bpUserList.add(tmp);
378   }
379 }
380 if (bpUserList.size() > 0) {
381   usersToBypassExtAttrProvider = new HashSet();
382   for(String user : bpUserList) {
383 LOG.info("Add user " + user + " to the list that will bypass 
external"
384 + " attribute provider.");
385 usersToBypassExtAttrProvider.add(user.trim());
386   }
387 }
{noformat}

The above 2 for loops can be simplified to 1 loop. Checking for trim and adding 
to _usersToBypassExtAttrProvider_ can be done in the same block.

2. {{TestINodeAttributeProvider}}
{noformat}
240 String[] bypassUsers = {"u2", "u3"};
{noformat}
Can we please add "u4" also this list? yes, since this user is not in the 
bypass list the getFileStatus is going to differ from other users. Adding this 
non bypass user will make the test complete.

> Let NameNode to bypass external attribute provider for special user
> ---
>
> Key: HDFS-12357
> URL: https://issues.apache.org/jira/browse/HDFS-12357
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-12357.001a.patch, HDFS-12357.001b.patch, 
> HDFS-12357.001.patch, HDFS-12357.002.patch, HDFS-12357.003.patch, 
> HDFS-12357.004.patch, HDFS-12357.005.patch, HDFS-12357.006.patch
>
>
> This is a third proposal to solve the problem described in HDFS-12202.
> The problem is, when we do distcp from one cluster to another (or within the 
> same cluster), in addition to copying file data, we copy the metadata from 
> source to target. If external attribute provider is enabled, the metadata may 
> be read from the provider, thus provider data read from source may be saved 
> to target HDFS. 
> We want to avoid saving metadata from external provider to HDFS, so we want 
> to bypass external provider when doing the distcp (or hadoop fs -cp) 
> operation.
> Two alternative approaches were proposed earlier, one in HDFS-12202, the 
> other in HDFS-12294. The proposal here is the third one.
> The idea is, we introduce a new config, that specifies a special user (or a 
> list of users), and let NN bypass external provider when the current user is 
> a special user.
> If we run applications as the special user that need data from external 
> attribute provider, then it won't work. So the constraint on this approach 
> is, the special users here should not run applications that need data from 
> external provider.
> Thanks [~asuresh] for proposing this idea and [~chris.douglas], [~daryn], 
> [~manojg] for the discussions in the other jiras. 
> I'm creating this one to discuss further.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (HDFS-12357) Let NameNode to bypass external attribute provider for special user

2017-09-01 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151047#comment-16151047
 ] 

Yongjun Zhang edited comment on HDFS-12357 at 9/1/17 9:36 PM:
--

Thanks [~chris.douglas] and [~manojg].

Sorry for a lengthy reply here:

{quote}
Would a filter implementation wrapping the configured, external attribute 
provider suffice?
{quote}
The current patch implements this logic (like an inlined version of the wrapper 
class in C++ world). If we put this logic to the wrapper class, I can see some 
issues:

1. the wrapper need to create two provider objects, one is the default (HDFS), 
the other is the external provider, and switch between these two. However, in 
the existing code, I don't see the default provider object is always created. 
See 2.a below.

The default value of the following config is empty, which means no default 
provider will be created.
{code}

  dfs.namenode.inode.attributes.provider.class
  
  
Name of class to use for delegating HDFS authorization.
  

{code}
Not sure whether we should have the default provider configured here.

2. currently there are two places to decide whether to consult external 
attribute provider
2.a.
{code}
  INodeAttributes getAttributes(INodesInPath iip)
  throws FileNotFoundException {
INode node = FSDirectory.resolveLastINode(iip);
int snapshot = iip.getPathSnapshotId();
INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
if (attributeProvider != null) {
  // permission checking sends the full components array including the
  // first empty component for the root.  however file status
  // related calls are expected to strip out the root component according
  // to TestINodeAttributeProvider.
  byte[][] components = iip.getPathComponents();
  components = Arrays.copyOfRange(components, 1, components.length);
  nodeAttrs = attributeProvider.getAttributes(components, nodeAttrs);
}
return nodeAttrs;
  }
{code}
we already got the attributes from HDFS, then we decide to whether to overwrite 
it with provider's data. The easiest way is to check if the user is a special 
user, then we don't ask for provider's data at all. If we do this in a wrapper 
class, we always have to get some attributes, which maybe from HDFS or not. 
It's not a clear implementation and may incur runtime cost.

2.b
{code}
 @VisibleForTesting
  FSPermissionChecker getPermissionChecker(String fsOwner, String superGroup,
  UserGroupInformation ugi) throws AccessControlException {
return new FSPermissionChecker(
fsOwner, superGroup, ugi, attributeProvider);
  }
{code}
Here we need to pass either a null or the external attributeProvider configured 
to permission checker. if we include this logic to the external provider, we 
need have an API in this wrapper class, to return the external provicer or 
null, and pass it to the "attributeProvider" parameter in the above code. like
{code}
return new FSPermissionChecker(
fsOwner, superGroup, ugi, attributeProvider.getRealAttributeProvider());
{code}
We need to add this getRealAttibuteProvider() API to the base provider class, 
which is a bit weird because this API is only meaning ful in the wrapper layer. 
And changing the provider API is what we try to avoid here.

Thoughts?

Thanks.



was (Author: yzhangal):
Thanks [~chris.douglas] and [~manojg].

Sorry for a lengthy reply here:

{quote}
Would a filter implementation wrapping the configured, external attribute 
provider suffice?
{quote}
The current patch implements this logic (like an inlined version of the wrapper 
class in C++ world). If we put this logic to the wrapper class, I can see some 
issues:

1. the wrapper need to create two provider objects, one is the default (HDFS), 
the other is the external provider, and switch between these two. However, in 
the existing code, I don't see the default provider object is always created. 
See 2.a below.

The default value of the following config is empty, which means no default 
provider will be created.
{code}

  dfs.namenode.inode.attributes.provider.class
  
  
Name of class to use for delegating HDFS authorization.
  

{code}
Not sure whether we should have the default provider configured here.

2. currently there are two places to decide whether to consult external 
attribute provider
2.a.
{code}
  INodeAttributes getAttributes(INodesInPath iip)
  throws FileNotFoundException {
INode node = FSDirectory.resolveLastINode(iip);
int snapshot = iip.getPathSnapshotId();
INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
if (attributeProvider != null) {
  // permission checking sends the full components array including the
  // first empty component for the root.  however file status
  // related calls are expected to strip out the root component according
  // to 

[jira] [Comment Edited] (HDFS-12357) Let NameNode to bypass external attribute provider for special user

2017-09-01 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151047#comment-16151047
 ] 

Yongjun Zhang edited comment on HDFS-12357 at 9/1/17 7:32 PM:
--

Thanks [~chris.douglas] and [~manojg].

Sorry for a lengthy reply here:

{quote}
Would a filter implementation wrapping the configured, external attribute 
provider suffice?
{quote}
The current patch implements this logic (like an inlined version of the wrapper 
class in C++ world). If we put this logic to the wrapper class, I can see some 
issues:

1. the wrapper need to create two provider objects, one is the default (HDFS), 
the other is the external provider, and switch between these two. However, in 
the existing code, I don't see the default provider object is always created. 
See 2.a below.

The default value of the following config is empty, which means no default 
provider will be created.
{code}

  dfs.namenode.inode.attributes.provider.class
  
  
Name of class to use for delegating HDFS authorization.
  

{code}
Not sure whether we should have the default provider configured here.

2. currently there are two places to decide whether to consult external 
attribute provider
2.a.
{code}
  INodeAttributes getAttributes(INodesInPath iip)
  throws FileNotFoundException {
INode node = FSDirectory.resolveLastINode(iip);
int snapshot = iip.getPathSnapshotId();
INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
if (attributeProvider != null) {
  // permission checking sends the full components array including the
  // first empty component for the root.  however file status
  // related calls are expected to strip out the root component according
  // to TestINodeAttributeProvider.
  byte[][] components = iip.getPathComponents();
  components = Arrays.copyOfRange(components, 1, components.length);
  nodeAttrs = attributeProvider.getAttributes(components, nodeAttrs);
}
return nodeAttrs;
  }
{code}
we already got the attributes from HDFS, then we decide to whether to overwrite 
it with provider's data. The easiest way is to check if the user is a special 
user, then we don't ask for provider's data at all. If we do this in a wrapper 
class, we always have to get some attributes, which maybe from HDFS or not. 
It's not a clear implementation and may incur runtime cost.

2.b
{code}
 @VisibleForTesting
  FSPermissionChecker getPermissionChecker(String fsOwner, String superGroup,
  UserGroupInformation ugi) throws AccessControlException {
return new FSPermissionChecker(
fsOwner, superGroup, ugi, attributeProvider);
  }
{code}
Here we need to pass either a null or the external attributeProvider configured 
to permission checker. if we include this logic to the external provider, we 
need have an API in this wrapper class, to return the external provicer or 
null, and pass it to the "attributeProvider" parameter in the above code. like
{code}
return new FSPermissionChecker(
fsOwner, superGroup, ugi, attributeProvider.getRealAttributeProvider());
{code}
We need to add this getRealAttibuteProvider() API to the base provider class, 
which is a bit weird because this API is only meaning ful in the wrapper layer.

Thoughts?

Thanks.



was (Author: yzhangal):
Thanks [~chris.douglas] and [~manojg].

Sorry for a lengthy reply here:

{quote}
Would a filter implementation wrapping the configured, external attribute 
provider suffice?
{quote}
The current patch implements this logic (like an inlined version of the wrapper 
class in C++ world). If we put this logic to the wrapper class, I can see some 
issues:

1. the wrapper need to create two provider objects, one is the default (HDFS), 
the other is the external provider, and switch between these two. However, in 
the existing code, I don't see the default provider object is always created. 
See 2.a below.

2. currently there are two places to decide whether to consult external 
attribute provider
2.a.
{code}
  INodeAttributes getAttributes(INodesInPath iip)
  throws FileNotFoundException {
INode node = FSDirectory.resolveLastINode(iip);
int snapshot = iip.getPathSnapshotId();
INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
if (attributeProvider != null) {
  // permission checking sends the full components array including the
  // first empty component for the root.  however file status
  // related calls are expected to strip out the root component according
  // to TestINodeAttributeProvider.
  byte[][] components = iip.getPathComponents();
  components = Arrays.copyOfRange(components, 1, components.length);
  nodeAttrs = attributeProvider.getAttributes(components, nodeAttrs);
}
return nodeAttrs;
  }
{code}
we already got the attributes from HDFS, then we decide to whether to overwrite 
it with provider's data. The easiest 

[jira] [Comment Edited] (HDFS-12357) Let NameNode to bypass external attribute provider for special user

2017-08-30 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148139#comment-16148139
 ] 

Yongjun Zhang edited comment on HDFS-12357 at 8/31/17 1:14 AM:
---

Thanks you all for the review and comments!

[~atm]: good point, will remove the static. Thanks.

[~daryn], thanks for your comments, some thoughts:
1. Based on user/path to decide what attributes to reveal is indeed more 
refined. However, it adds complexity. And every provider has to provide an 
implementation. Wonder if you can provide an example we want to decide things 
based on user/path combination?
2. Currently I use NameNode.getRemoteUser() to tell which user it is. If we put 
this bypass logic into Provider, the provider need to know what the current 
user is. we either have to change the API of provider, or add some new methods 
in parallel, to pass the user information. 

[~manojg], talking about SnapshotDiff to bypass provider, the caller need to 
tell the provider to do that, thus new API is needed. Right? thanks.

Look forward to your further thoughts and comments!

Thanks a lot.




was (Author: yzhangal):
Thanks you all for the review and comments!

[~atm]: good point, will remove the static. Thanks.

[~daryn], thanks for your comments, some thoughts:
1. Based on user/path to decide what attributes to reveal is indeed more 
refined. However, it adds complexity. And every provider has to provide an 
implementation. Wonder if you can provide an example we want to decide things 
based on user/path?
2. Currently I use NameNode.getRemoteUser() to tell which user it is. If we put 
this bypass logic into Provider, the provider need to know what the current 
user is. we either have to change the API of provider, or add some new methods 
in parallel.

[~manojg], talking about SnapshotDiff to bypass provider, the caller need to 
tell the provider to do that, thus new API is needed. Right? thanks.

Look forward to your further thoughts and comments!

Thanks a lot.



> Let NameNode to bypass external attribute provider for special user
> ---
>
> Key: HDFS-12357
> URL: https://issues.apache.org/jira/browse/HDFS-12357
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-12357.001.patch
>
>
> This is a third proposal to solve the problem described in HDFS-12202.
> The problem is, when we do distcp from one cluster to another (or within the 
> same cluster), in addition to copying file data, we copy the metadata from 
> source to target. If external attribute provider is enabled, the metadata may 
> be read from the provider, thus provider data read from source may be saved 
> to target HDFS. 
> We want to avoid saving metadata from external provider to HDFS, so we want 
> to bypass external provider when doing the distcp (or hadoop fs -cp) 
> operation.
> Two alternative approaches were proposed earlier, one in HDFS-12202, the 
> other in HDFS-12294. The proposal here is the third one.
> The idea is, we introduce a new config, that specifies a special user (or a 
> list of users), and let NN bypass external provider when the current user is 
> a special user.
> If we run applications as the special user that need data from external 
> attribute provider, then it won't work. So the constraint on this approach 
> is, the special users here should not run applications that need data from 
> external provider.
> Thanks [~asuresh] for proposing this idea and [~chris.douglas], [~daryn], 
> [~manojg] for the discussions in the other jiras. 
> I'm creating this one to discuss further.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org