Fang-Yu Rao created IMPALA-11871:
------------------------------------

             Summary: INSERT statement does not respect Ranger policies for HDFS
                 Key: IMPALA-11871
                 URL: https://issues.apache.org/jira/browse/IMPALA-11871
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
            Reporter: Fang-Yu Rao
            Assignee: Fang-Yu Rao


In a cluster with Ranger auth (and with legacy catalog mode), even if you 
provide RWX to cm_hdfs -> all-path for the user impala, inserting into a table 
whose HDFS POSIX permissions happen to exclude impala access will result in an
{noformat}
"AnalysisException: Unable to INSERT into target table (default.t1) because 
Impala does not have WRITE access to HDFS location: 
hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
 
{noformat}
[root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
/warehouse/tablespace/external/hive/t1

file: /warehouse/tablespace/external/hive/t1 
owner: hive 
group: supergroup
user::rwx
user:impala:rwx #effective:r-x
group::rwx #effective:r-x
mask::r-x
other::---
default:user::rwx
default:user:impala:rwx
default:group::rwx
default:mask::rwx
default:other::--- {noformat}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ANALYSIS

Stack trace from a version of Cloudera's distribution of Impala (impalad 
version 3.4.0-SNAPSHOT RELEASE (build 
{*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
{noformat}
at 
org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
at org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat}
The exception occurs at analysis time, so I tested and succeeded in writing 
directly into the said directory.
{noformat}
[root@nightly-71x-vx-3 ~]# hdfs dfs -touchz 
/warehouse/tablespace/external/hive/t1/test
[root@nightly-71x-vx-3 ~]# hdfs dfs -ls /warehouse/tablespace/external/hive/t1/
Found 8 items
rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 
/warehouse/tablespace/external/hive/t1/000000_0
rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 
/warehouse/tablespace/external/hive/t1/000000_0_copy_1
rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 
/warehouse/tablespace/external/hive/t1/000000_0_copy_2
rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 
/warehouse/tablespace/external/hive/t1/000000_0_copy_3
rw-rw---+ 3 impala hive 355 2023-01-27 17:17 
/warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d00000000_2029811630_data.0.parq
rw-rw---+ 3 impala hive 355 2023-01-27 17:39 
/warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c147800000000_574471191_data.0.parq
drwxrwx---+ - impala hive 0 2023-01-27 17:39 
/warehouse/tablespace/external/hive/t1/_impala_insert_staging
rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 
/warehouse/tablespace/external/hive/t1/test{noformat}
Reviewing the code[1], I traced the {{TAccessLevel}} to the catalogd. And if I 
add user impala to group supergroup on the catalogd host, this query will 
succeed past the authorization.

Additionally, this query does not trip up during analysis when catalog v2 is 
enabled because the method {{getFirstLocationWithoutWriteAccess()}} is not 
implemented there yet and always returns null[2].

[1] 
[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L494-L504]

[2] 
[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L295-L298]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ideally, when Ranger authorization is in place, we should:
1) Not check access level during analysis
2) Incorporate Ranger ACLs during analysis



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to