Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20221 )

Change subject: IMPALA-12291 impala checks hdfs ranger policy
......................................................................


Patch Set 9:

> Patch Set 9:
>
> Hi all, based on our discussion so far, there are 2 approaches discussed. Any 
> any better alternative is also appreciated!
> - Set the access level of a loaded table to both READ and WRITE as long as 
> Ranger is used as the authorization provider.
> - Introduce a startup flag to allow the administrator to decide whether to 
> skip the file system permissions check during table loading.
>
> There are some implementation details to consider for each approach.
> - How to determine whether Ranger is enabled (corresponding to the 1st 
> approach)? It seems checking the value of the key 
> 'dfs.namenode.inode.attributes.provider.class' in core-site.xml via the 
> instance of Configuration as done in the patch set 9 could not be verified 
> easily via a new test due to HDFS Ranger plug-in not being configured 
> correctly. To be more specific, if we try to add the following configuration 
> via  
> https://github.com/apache/impala/blob/master/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py,
>  the name node of HDFS could not be started. More plumbing has to be done to 
> set up the plug-in. On the other hand, using the instance of BackendConfig 
> allows us to add an end-to-end test to briefly verify Impala's behavior after 
> the patch more easily.
>
> <property>
>  <name>dfs.namenode.inode.attributes.provider.class</name>
>  <value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
> </property>
>
> - What should be the default behavior with respect to setting the access 
> level of a loaded table after this patch (corresponding to the 2nd approach)? 
> It looks like making Impala assume the READ and WRITE access by default may 
> be better especially in the legacy catalog mode. This relieves the Impala 
> administrator of having to manually tweak the HDFS access control entries of 
> the target HDFS paths that are not writable to the Impala service every time 
> when an end user encounters such a problem.
>
> I have also collected the related tests that need to be revised if we decide 
> to adopt the 2nd approach and make Impala assume the READ and WRITE access by 
> default.
> - 
> https://github.com/apache/impala/blob/master/tests/metadata/test_hdfs_permissions.py#L56
>  (TestHdfsPermissions.test_insert_into_read_only_table()).
> - 
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L563
>  (TestInsertBehaviour.test_multiple_group_acls).
> - 
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L331
>  (TestInsertBehaviour.test_readonly_table_dir).
> -  
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L362
>  (TestInsertBehaviour.test_insert_acl_permissions).
> -  
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L439
>  (TestInsertBehaviour.test_load_permissions).
> -  
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L252
>  (TestInsertBehaviour.test_mixed_partition_permissions).
> -  
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L202
>  (TestInsertBehaviour.test_insert_file_permissions).
> -  
> https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4008
>  (AnalyzeStmtsTest.TestLoadData).
> -  
> https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4042-L4047
>  (AnalyzeStmtsTest.TestInsert).
> - 
> https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java#L110-L119
>  (CatalogObjectToFromThriftTest.TestPartitionedTable).

The first option sounds good to me:
- Set the access level of a loaded table to both READ and WRITE as long as 
Ranger is used as the authorization provider.

The second option of introducing a startup flag would create confusion because 
it would only apply to the legacy catalog mode, not the local catalog mode. 
Further, for the cloud object stores, we are in any case doing the first 
approach.

The downside of the first approach is that it will defer the full access level 
checks to execution time and suppose some partitions allow the write operation 
while other partitions don't, we can end up with partial writes into the table. 
 However, this issue already exists with the local catalog mode (which is the 
default mode).


--
To view, visit http://gerrit.cloudera.org:8080/20221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9
Gerrit-Change-Number: 20221
Gerrit-PatchSet: 9
Gerrit-Owner: Halim Kim <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Halim Kim <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Fri, 05 Apr 2024 23:37:30 +0000
Gerrit-HasComments: No

Reply via email to