Fang-Yu Rao created IMPALA-11871: ------------------------------------ Summary: INSERT statement does not respect Ranger policies for HDFS Key: IMPALA-11871 URL: https://issues.apache.org/jira/browse/IMPALA-11871 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao
In a cluster with Ranger auth (and with legacy catalog mode), even if you provide RWX to cm_hdfs -> all-path for the user impala, inserting into a table whose HDFS POSIX permissions happen to exclude impala access will result in an {noformat} "AnalysisException: Unable to INSERT into target table (default.t1) because Impala does not have WRITE access to HDFS location: hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat} {noformat} [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl /warehouse/tablespace/external/hive/t1 file: /warehouse/tablespace/external/hive/t1 owner: hive group: supergroup user::rwx user:impala:rwx #effective:r-x group::rwx #effective:r-x mask::r-x other::--- default:user::rwx default:user:impala:rwx default:group::rwx default:mask::rwx default:other::--- {noformat} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ANALYSIS Stack trace from a version of Cloudera's distribution of Impala (impalad version 3.4.0-SNAPSHOT RELEASE (build {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})): {noformat} at org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585) at org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545) at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat} The exception occurs at analysis time, so I tested and succeeded in writing directly into the said directory. {noformat} [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz /warehouse/tablespace/external/hive/t1/test [root@nightly-71x-vx-3 ~]# hdfs dfs -ls /warehouse/tablespace/external/hive/t1/ Found 8 items rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 /warehouse/tablespace/external/hive/t1/000000_0 rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 /warehouse/tablespace/external/hive/t1/000000_0_copy_1 rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 /warehouse/tablespace/external/hive/t1/000000_0_copy_2 rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 /warehouse/tablespace/external/hive/t1/000000_0_copy_3 rw-rw---+ 3 impala hive 355 2023-01-27 17:17 /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d00000000_2029811630_data.0.parq rw-rw---+ 3 impala hive 355 2023-01-27 17:39 /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c147800000000_574471191_data.0.parq drwxrwx---+ - impala hive 0 2023-01-27 17:39 /warehouse/tablespace/external/hive/t1/_impala_insert_staging rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 /warehouse/tablespace/external/hive/t1/test{noformat} Reviewing the code[1], I traced the {{TAccessLevel}} to the catalogd. And if I add user impala to group supergroup on the catalogd host, this query will succeed past the authorization. Additionally, this query does not trip up during analysis when catalog v2 is enabled because the method {{getFirstLocationWithoutWriteAccess()}} is not implemented there yet and always returns null[2]. [1] [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L494-L504] [2] [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L295-L298] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ideally, when Ranger authorization is in place, we should: 1) Not check access level during analysis 2) Incorporate Ranger ACLs during analysis -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org