Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17185 )

Change subject: IMPALA-10483(part-2): Support subqueries in Ranger masking 
policies
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17185/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17185/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-10483(part-2): Support subqueries in Ranger masking policies
I think the code changes in the patch are straightforward.  Regarding testing, 
one of the consequences of changing the doTableMasking to false as discussed in 
the other patch review is that we may want to increase test coverage to cover a 
few more query patterns such that wherever a FROM clause can appear it can be 
checked and substituted by table masking.  e.g it can appear in the SELECT list 
subquery, HAVING clause subquery etc.

How would something like COMPUTE STATS <columns> work with row filtering ? e.g 
if there's a row filter defined on a particular column, will that affect it's 
NDV, null count etc ? Was this something that came up during column masking 
before ?


http://gerrit.cloudera.org:8080/#/c/17185/2/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test:

http://gerrit.cloudera.org:8080/#/c/17185/2/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@167
PS2, Line 167: # Test subquery row filter "id = (select min(id) from 
functional.alltypesagg)".
A few questions/comments:
On the left side if an explicit CAST is added CAST(id as bigint) would that 
qualify for the row filter ?
Also, are there restrictions on the expressions on id (e.g arithmetic exprs) ?
I made a comment in the other related patch about Explain plans.  The same 
applies here.


http://gerrit.cloudera.org:8080/#/c/17185/2/tests/authorization/test_ranger.py
File tests/authorization/test_ranger.py:

http://gerrit.cloudera.org:8080/#/c/17185/2/tests/authorization/test_ranger.py@1232
PS2, Line 1232:       row_filter_tmpl = """c_nationkey in (
In this row filter would a correlation condition contained entirely within the 
row filter be ok ?
 e.g  ..select n_nationkey from nation n1 where n_name in (select n_name from 
nation n2 where n1.n_regionkey = n2.n_regionkey).

Assuming the above works (since decorrelation will be done on the table masked 
statement), could we also add a negative test where the correlation is to a 
table in the parent query, not in the row filter itself.  That one is expected 
to fail.  (Maybe you already have this test .. if so, feel free to ignore).



--
To view, visit http://gerrit.cloudera.org:8080/17185
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I254df9f684c95c660f402abd99ca12dded7e764f
Gerrit-Change-Number: 17185
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Wed, 17 Mar 2021 03:20:37 +0000
Gerrit-HasComments: Yes

Reply via email to