Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13095 )

Change subject: IMPALA-8444: Fix performance regression when building privilege 
name
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/13095/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13095/2//COMMIT_MSG@26
PS2, Line 26: PrivilegeNameBenchmark.fast  avgt   25  ≈ 10⁻³           ms/op
            : PrivilegeNameBenchmark.slow  avgt   25   0.001 ±  0.001  ms/op
The numbers seem weird, as 10?³ == 0.001, so to only difference seems to be the 
error, while I expect the new implementation to be at least 2x faster.


http://gerrit.cloudera.org:8080/#/c/13095/2//COMMIT_MSG@36
PS2, Line 36: This patch removes the synchronized keyword in
            : SentryAuthorizationPolicy.listPrivileges() in order to not cause 
the
            : operation to run in serial in a highly concurrent workload. This 
has a
            : trade off of stale authorization metadata, which may be 
acceptable to
            : get a better performance.
Note that I do not have a solid understanding about integration with Sentry, so 
I may have basic misconceptions.

How can this make the metadata stale (er)? I didn't see any place where 
SentryAuthorizationPolicy was used as a lock with the exception of 
listPrivileges() functions, but these seem to be "read functions" without side 
effect, so I do not see the merit of locking. Locking on authzPolicy_ would be 
a different case, as AuthorizationPolicy is designed to be thread safe and 
locking on it will not allow concurrent modifications of the metadata.

I think that if the metadata can change during listPrivileges(), the results 
can be worse than staleness:

0. role R1 has no privilege P while role R2 has P, and group G has both R1 and 
R2 roles
1. Sentry: privilege P is granted for role R1
2. Sentry: privilege P is revoked for role R2
3. Impala: listPrivileges() starts for group G, R1's privileges are collected, 
and P is not included
4. the two changes on Sentry are synchronized with Impala in one refresh
5. Impala: listPrivileges() continues, R2's privileges are collected, and P is 
not included

So group G transiently doesn't have privilege P in Impala, while it always has 
it in Sentry. I do not know if it is a design goal to avoid such (transient) 
issues.

If we would like to see a snapshot of SentryAuthorizationPolicy during 
listPrivileges(), then we could move the logic of the functions to 
SentryAuthorizationPolicy, and use a read/write lock to ensure that concurrent 
readers do not block each other. A more sophisticated way would be to use 
version numbers to check if the relevant metadata changed during 
listPrivileges().


http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java
File 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java:

http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java@67
PS2, Line 67:           if (privilegeName == null) {
            :             if (LOG.isTraceEnabled()) {
            :               LOG.trace("Ignoring invalid privilege: " + 
privilegeName);
            :             }
            :             continue;
            :           }
I think that we do not need to handle the null case, as the keys cannot be null 
in the underlying map by design. PrincipalPrivilege.buildPrivilegeName() seems 
to handle it, but most call sites that used it expected non-null even before 
https://gerrit.cloudera.org/#/c/11509

If there are places where nullness should be handled, then those are the places 
where TPrivileges from external sources are added/removed, e.g. 
SentryProxy.refreshPrivilegesInCatalog().


http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java@91
PS2, Line 91:           if (privilegeName == null) {
            :             if (LOG.isTraceEnabled()) {
            :               LOG.trace("Ignoring invalid privilege: " + 
privilegeName);
            :             }
            :             continue;
            :           }
See comment at line 67.


http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
File fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java:

http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java@67
PS2, Line 67:    * The case sensitivity name will depend on the value of 
'caseInsensitiveKeys'.
nit: missing 'of'?


http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/catalog/Principal.java
File fe/src/main/java/org/apache/impala/catalog/Principal.java:

http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/catalog/Principal.java@63
PS2, Line 63:     // URIs are case sensitive.
            :     return principalPrivileges_.add(privilege,
            :         privilege.getPrivilege().getScope() != 
TPrivilegeScope.URI);
I would prefer to change the whole principalPrivileges_ to be case sensitive 
based on the following reasoning:
- if non-URI parts of the name are always lower case, then it doesn't matter if 
we treat them as case insensitive or not
- if non-URI parts of the name can be upper case (including 'server'), then 
this add() will also treat the 'server' part of URIs in case sensitive way, 
which seems wrong

Note that doing it case sensitive way is faster, as the toLower() calls are 
skipped.


http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/catalog/Principal.java@97
PS2, Line 97:     // URIs are case sensitive.
            :     return principalPrivileges_.remove(privilegeName, 
!privilegeName.contains("->uri="));
See comment at line 63.


http://gerrit.cloudera.org:8080/#/c/13095/2/tests/authorization/test_grant_revoke.py
File tests/authorization/test_grant_revoke.py:

http://gerrit.cloudera.org:8080/#/c/13095/2/tests/authorization/test_grant_revoke.py@389
PS2, Line 389: test_uri_privilege_case_sensitive
Is it possible to add two uri privileges that only differ in case sensitivity? 
If there are not tests for this case, then we could test it here.



--
To view, visit http://gerrit.cloudera.org:8080/13095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I942d9b55f07c8972f69e532567d9b7d80fceb6e5
Gerrit-Change-Number: 13095
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Fredy Wijaya <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Wed, 24 Apr 2019 10:29:02 +0000
Gerrit-HasComments: Yes

Reply via email to