Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20221 )

Change subject: IMPALA-12291 impala checks hdfs ranger policy
......................................................................


Patch Set 9:

(2 comments)

Thanks for the help Halim! I left some comments in HdfsTable.java. Let us know 
what you think about them.

http://gerrit.cloudera.org:8080/#/c/20221/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/20221/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@852
PS5, Line 852:         location = location.getParent();
> Thank you for your detailed review.
Thanks for the suggestion Quanlong!

I think at the moment skipping the permissions checking is not a bad idea. 
Specifically, we could change the if-statement at 
https://gerrit.cloudera.org/c/20221/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#846
 to something like the following as Halim suggested above.

Note that instead of introducing a new variable denoting whether HDFS' Ranger 
plug-in is enabled, I just use the BackendConfig instance to verify whether the 
authorization provider is Ranger. Thus when Impala's Ranger plug-in is not 
enabled and the table is not stored on the listed file systems in the cloud, 
permissions checking would still occur.

So if there is any permission-related issue during the execution by Impala's 
backend, we will have a runtime error after this patch as Quanlong mentioned 
but by skipping the permissions checking we won't have a regression in 
performance especially when there is a large number of partitions within a 
table.

    if (assumeReadWriteAccess(fs) ||
        
BackendConfig.INSTANCE.getAuthorizationProvider().equalsIgnoreCase("ranger")) {
      return TAccessLevel.READ_WRITE;
    }

I would also suggest we add the following end-to-end test in 
https://github.com/apache/impala/blob/master/tests/authorization/test_ranger.py 
to verify Impala frontend's behavior after this patch. I verified that it would 
work with my change suggested above.

  @pytest.mark.execute_serially
  @SkipIfFS.hdfs_acls
  @CustomClusterTestSuite.with_args(
    impalad_args=IMPALAD_ARGS, catalogd_args=CATALOGD_ARGS)
  def test_insert_with_catalog_v1(self, unique_name):
    """
    Test that when Ranger is the authorization provider in the legacy catalog 
mode,
    Impala does not throw an AnalysisException when an authorized user tries to 
execute
    an INSERT query against a table which is not writable according to Impala's
    FsPermissionChecker.
    """
    user = getuser()
    admin_client = self.create_impala_client()
    unique_database = unique_name + "_db"
    unique_table = unique_name + "_tbl"
    table_path = "test-warehouse/{0}.db/{1}".format(unique_database, 
unique_table)
    try:
      admin_client.execute("drop database if exists {0} cascade"
           .format(unique_database), user=ADMIN)
      admin_client.execute("create database {0}".format(unique_database), 
user=ADMIN)
      admin_client.execute("create table {0}.{1} (x int)"
          .format(unique_database, unique_table), user=ADMIN)
      admin_client.execute("grant insert on table {0}.{1} to user {2}"
                           .format(unique_database, unique_table, user))

      # Change the owner user and group of the HDFS path corresponding to the 
table
      # so that according to Impala's FsPermissionChecker, the table could not 
be
      # writable to the user that loads the table. This user usually is the one
      # representing the Impala service.
      self.hdfs_client.chown(table_path, "another_user", "another_group")

      # Invalidate the table metadata to force the catalog server to reload the 
HDFS
      # table.
      admin_client.execute("invalidate metadata {0}.{1}"
           .format(unique_database, unique_table), user=ADMIN)

      # Verify that the user granted the INSERT privilege on the table does not 
encounter
      # the AnalysisException that would have been thrown if we had not skipped 
the
      # permissions checking in HdfsTable#getAvailableAccessLevel().
      self._run_query_as_user("insert into {0}.{1} values (1)"
          .format(unique_database, unique_table), user, True)
    finally:
      admin_client.execute("revoke insert on table {0}.{1} from user {2}"
           .format(unique_database, unique_table, user))
      admin_client.execute("drop database if exists {0} cascade"
          .format(unique_database), user=ADMIN)


http://gerrit.cloudera.org:8080/#/c/20221/5/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/20221/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2109
PS5, Line 2109:           LOG.warn("Analysis Exception query {}: {}",
              :               queryCtx.client_request.stmt, errorMsg);
Not very sure if I missed something. Since we will throw the exception that is 
caught, is it necessary to log or even catch the exception here?



--
To view, visit http://gerrit.cloudera.org:8080/20221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9
Gerrit-Change-Number: 20221
Gerrit-PatchSet: 9
Gerrit-Owner: Halim Kim <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Wed, 03 Apr 2024 00:46:18 +0000
Gerrit-HasComments: Yes

Reply via email to