Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17185 )
Change subject: IMPALA-10483(part-2): Support subqueries in Ranger masking policies ...................................................................... Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/17185/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17185/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-10483(part-2): Support subqueries in Ranger masking policies I think the code changes in the patch are straightforward. Regarding testing, one of the consequences of changing the doTableMasking to false as discussed in the other patch review is that we may want to increase test coverage to cover a few more query patterns such that wherever a FROM clause can appear it can be checked and substituted by table masking. e.g it can appear in the SELECT list subquery, HAVING clause subquery etc. How would something like COMPUTE STATS <columns> work with row filtering ? e.g if there's a row filter defined on a particular column, will that affect it's NDV, null count etc ? Was this something that came up during column masking before ? http://gerrit.cloudera.org:8080/#/c/17185/2/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test File testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test: http://gerrit.cloudera.org:8080/#/c/17185/2/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@167 PS2, Line 167: # Test subquery row filter "id = (select min(id) from functional.alltypesagg)". A few questions/comments: On the left side if an explicit CAST is added CAST(id as bigint) would that qualify for the row filter ? Also, are there restrictions on the expressions on id (e.g arithmetic exprs) ? I made a comment in the other related patch about Explain plans. The same applies here. http://gerrit.cloudera.org:8080/#/c/17185/2/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/17185/2/tests/authorization/test_ranger.py@1232 PS2, Line 1232: row_filter_tmpl = """c_nationkey in ( In this row filter would a correlation condition contained entirely within the row filter be ok ? e.g ..select n_nationkey from nation n1 where n_name in (select n_name from nation n2 where n1.n_regionkey = n2.n_regionkey). Assuming the above works (since decorrelation will be done on the table masked statement), could we also add a negative test where the correlation is to a table in the parent query, not in the row filter itself. That one is expected to fail. (Maybe you already have this test .. if so, feel free to ignore). -- To view, visit http://gerrit.cloudera.org:8080/17185 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I254df9f684c95c660f402abd99ca12dded7e764f Gerrit-Change-Number: 17185 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Wed, 17 Mar 2021 03:20:37 +0000 Gerrit-HasComments: Yes
