Artur Myseliuk created HDFS-17114:
-------------------------------------

             Summary: HDFS Directory Level Access
                 Key: HDFS-17114
                 URL: https://issues.apache.org/jira/browse/HDFS-17114
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: hdfs
            Reporter: Artur Myseliuk


Problem: Currently, checking and setting ACLs on file-level is time-consuming 
and API-intensive for large HDFS clusters with billions of files, particularly 
for use-cases where permissions and ACLs should be uniform across all nested 
files within a directory. For example, Hive table files and directories should 
have the same permissions and ACLs. 

Solution like default ACLs doesn’t work if:
 # If a user moves or rename directories with nested files. Moved directory 
with files don’t inherit default ACLs of the new location.
 # If a user wants to change access to all files under some path prefixes then 
the user needs to update permissions and ACLs for all files in the directory. 
It takes hours or even days if there are millions of files under directory.

 

Proposed solution: 

Use ancestor directory POSIX permissions and ACLs to check access to files. 
When a user tries to access file “/a/b/c.txt” , the new model will use the 
closest ancestor directory “/a/b” ACLs and permissions to check access to file 
“c.txt”. If the user doesn’t have access to the directory then there are 2 
options:
 # Fallback to default HDFS file POSIX permission and ACLs check on file level. 
So the user has access to the file when: [the user has access to ancestor 
directory] OR [the user has access to file].
 # Throw AccessControlException.

 

The feature can be enabled only for some prefixes or for all files in the HDFS 
cluster via configuration.

 

Alternative  solutions:
 # Use federated authorization model for HDFS path prefixes. Implementation:  
[Apache Ranger|https://ranger.apache.org/index.html] and [Apache 
Sentry|https://sentry.apache.org/] provides an AuthZ plugin to check access to 
files. Check is implemented by matching file path to managed resource with path 
prefix. All files under the prefix path will use the resource policy managed by 
the framework.The plugin will default to HDFS permissions and ACLs if there is 
no matching prefix.
 # Cons: 
 # Requires set up of external service to manage policies.
 # Adding external dependency will impact HDFS NN availability. 


 # Similar to the solution of Sentry and Ranger but use native HDFS directory 
permissions and ACLs instead of federated policies. The problem is to find 
which directory permissions/ACLs to check for the requested file. There are 2 
solutions:
 # Maintain list of prefixes as Rangers plugin and use permissions and ACLs of 
prefix directory to check access to all nested files and directories.

Using flags on directory [HDFS-15638]. For example, set a flag with HDFS 
Extended Attributes. When a user tries to access a file, HDFS will traverse 
ancestors and check if there is any directory with the flag. If directory with 
flag exists: then use directory permissions, otherwise default to file 
permissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to