Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/20221 )
Change subject: IMPALA-12291 impala checks hdfs ranger policy ...................................................................... Patch Set 9: (2 comments) Thanks for the help Halim! I left some comments in HdfsTable.java. Let us know what you think about them. http://gerrit.cloudera.org:8080/#/c/20221/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/20221/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@852 PS5, Line 852: location = location.getParent(); > Thank you for your detailed review. Thanks for the suggestion Quanlong! I think at the moment skipping the permissions checking is not a bad idea. Specifically, we could change the if-statement at https://gerrit.cloudera.org/c/20221/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#846 to something like the following as Halim suggested above. Note that instead of introducing a new variable denoting whether HDFS' Ranger plug-in is enabled, I just use the BackendConfig instance to verify whether the authorization provider is Ranger. Thus when Impala's Ranger plug-in is not enabled and the table is not stored on the listed file systems in the cloud, permissions checking would still occur. So if there is any permission-related issue during the execution by Impala's backend, we will have a runtime error after this patch as Quanlong mentioned but by skipping the permissions checking we won't have a regression in performance especially when there is a large number of partitions within a table. if (assumeReadWriteAccess(fs) || BackendConfig.INSTANCE.getAuthorizationProvider().equalsIgnoreCase("ranger")) { return TAccessLevel.READ_WRITE; } I would also suggest we add the following end-to-end test in https://github.com/apache/impala/blob/master/tests/authorization/test_ranger.py to verify Impala frontend's behavior after this patch. I verified that it would work with my change suggested above. @pytest.mark.execute_serially @SkipIfFS.hdfs_acls @CustomClusterTestSuite.with_args( impalad_args=IMPALAD_ARGS, catalogd_args=CATALOGD_ARGS) def test_insert_with_catalog_v1(self, unique_name): """ Test that when Ranger is the authorization provider in the legacy catalog mode, Impala does not throw an AnalysisException when an authorized user tries to execute an INSERT query against a table which is not writable according to Impala's FsPermissionChecker. """ user = getuser() admin_client = self.create_impala_client() unique_database = unique_name + "_db" unique_table = unique_name + "_tbl" table_path = "test-warehouse/{0}.db/{1}".format(unique_database, unique_table) try: admin_client.execute("drop database if exists {0} cascade" .format(unique_database), user=ADMIN) admin_client.execute("create database {0}".format(unique_database), user=ADMIN) admin_client.execute("create table {0}.{1} (x int)" .format(unique_database, unique_table), user=ADMIN) admin_client.execute("grant insert on table {0}.{1} to user {2}" .format(unique_database, unique_table, user)) # Change the owner user and group of the HDFS path corresponding to the table # so that according to Impala's FsPermissionChecker, the table could not be # writable to the user that loads the table. This user usually is the one # representing the Impala service. self.hdfs_client.chown(table_path, "another_user", "another_group") # Invalidate the table metadata to force the catalog server to reload the HDFS # table. admin_client.execute("invalidate metadata {0}.{1}" .format(unique_database, unique_table), user=ADMIN) # Verify that the user granted the INSERT privilege on the table does not encounter # the AnalysisException that would have been thrown if we had not skipped the # permissions checking in HdfsTable#getAvailableAccessLevel(). self._run_query_as_user("insert into {0}.{1} values (1)" .format(unique_database, unique_table), user, True) finally: admin_client.execute("revoke insert on table {0}.{1} from user {2}" .format(unique_database, unique_table, user)) admin_client.execute("drop database if exists {0} cascade" .format(unique_database), user=ADMIN) http://gerrit.cloudera.org:8080/#/c/20221/5/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/20221/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2109 PS5, Line 2109: LOG.warn("Analysis Exception query {}: {}", : queryCtx.client_request.stmt, errorMsg); Not very sure if I missed something. Since we will throw the exception that is caught, is it necessary to log or even catch the exception here? -- To view, visit http://gerrit.cloudera.org:8080/20221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9 Gerrit-Change-Number: 20221 Gerrit-PatchSet: 9 Gerrit-Owner: Halim Kim <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Wed, 03 Apr 2024 00:46:18 +0000 Gerrit-HasComments: Yes
