Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16976 )
Change subject: IMPALA-9234: Support Ranger row filtering policies ...................................................................... Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/16976/2/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java: http://gerrit.cloudera.org:8080/#/c/16976/2/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@175 PS2, Line 175: // TODO(IMPALA-10483): Column-masking/Row-filtering expressions may have subqueries Done. Added IMPALA-10483 in the comment. > I think we might also have to be careful about exactly what subqueries are > allowed. E.g. if would be weird if you had a correlated references between > the row filter subquery and the outer query, or if you could have a relative > table reference that referred to different tables depending on which DB it > was executed in. Yeah, currently we depend on users to verify the rules. I think Hive does so. Copy this to comments of IMPALA-10483 so we can take care of it once we support subquery filters. http://gerrit.cloudera.org:8080/#/c/16976/2/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java: http://gerrit.cloudera.org:8080/#/c/16976/2/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@473 PS2, Line 473: given tables. This is used > This is inaccurate - this method doesn't check whether row filtering is ena Done http://gerrit.cloudera.org:8080/#/c/16976/2/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@476 PS2, Line 476: throwIfRowFilterin > Rename this method to reflect new behaviour, e.g. throwIfRowFilteringRequir Done. Also rename the authorizeColumnMask() method name. http://gerrit.cloudera.org:8080/#/c/16976/2/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java: http://gerrit.cloudera.org:8080/#/c/16976/2/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java@139 PS2, Line 139: // TODO: Whether we should use lowercase or uppercase accessType? > How would we decide this? What does Hive do? AccessTypes (e.g. SELECT, INSERT, MASK, ROW_FILTER) in Hive audit events are all in uppercase. They inherit from some constant strings witch are uppercase, e.g. result.getMaskType(). However, AccessTypes in Impala audit events are always converted to lowercase. I think it's not neccessary. But not sure if changing them to uppercase will break other tools, which should be considered as a breaking change. Note that the AccessType field is only used for observerbility. http://gerrit.cloudera.org:8080/#/c/16976/2/fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java File fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java: http://gerrit.cloudera.org:8080/#/c/16976/2/fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java@454 PS2, Line 454: // Two row filter policies will be added. The first one affects 'user_' and keeps rows > Add a brief comment describing the row filter policies added. Done http://gerrit.cloudera.org:8080/#/c/16976/2/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test File testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test: http://gerrit.cloudera.org:8080/#/c/16976/2/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test@3 PS2, Line 3: # Row-filtering policy keeps rows with "id % 3 = 0". > Is the expected behaviour that row filters are applied before column masks? Yes, column masking applies on result data. So the row filtering happens first. I mention this in the commit message. Also copied it here for readability. http://gerrit.cloudera.org:8080/#/c/16976/2/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test File testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test: http://gerrit.cloudera.org:8080/#/c/16976/2/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@1 PS2, Line 1: ==== > Can you add a test where the row filter references a column not in the tabl Good point! But this corner case won't happen, because 'alltypes' will be replaced by an inline view and the filter is added inside the view. The unresolved column won't be resolved as correlated reference. To be specifit, let's say table 'alltypes' has a row filter policy of "test_id = id". Table 'jointbl' has a column 'test_id' which doesn't exist in 'alltypes'. The query select * from jointbl where exists(select * from alltypes) will be analyzed as select * from jointbl where exists(select * from (select id, bool_col, ..., month from alltypes where test_id = id) v ) and will hit an error in resolving 'test_id'. Added a similar test to cover this. -- To view, visit http://gerrit.cloudera.org:8080/16976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I580517be241225ca15e45686381b78890178d7cc Gerrit-Change-Number: 16976 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 19 Feb 2021 09:39:13 +0000 Gerrit-HasComments: Yes
