Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13095 )
Change subject: IMPALA-8444: Fix performance regression when building privilege name ...................................................................... Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/13095/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13095/2//COMMIT_MSG@26 PS2, Line 26: PrivilegeNameBenchmark.fast avgt 25 ≈ 10⁻³ ms/op : PrivilegeNameBenchmark.slow avgt 25 0.001 ± 0.001 ms/op The numbers seem weird, as 10?³ == 0.001, so to only difference seems to be the error, while I expect the new implementation to be at least 2x faster. http://gerrit.cloudera.org:8080/#/c/13095/2//COMMIT_MSG@36 PS2, Line 36: This patch removes the synchronized keyword in : SentryAuthorizationPolicy.listPrivileges() in order to not cause the : operation to run in serial in a highly concurrent workload. This has a : trade off of stale authorization metadata, which may be acceptable to : get a better performance. Note that I do not have a solid understanding about integration with Sentry, so I may have basic misconceptions. How can this make the metadata stale (er)? I didn't see any place where SentryAuthorizationPolicy was used as a lock with the exception of listPrivileges() functions, but these seem to be "read functions" without side effect, so I do not see the merit of locking. Locking on authzPolicy_ would be a different case, as AuthorizationPolicy is designed to be thread safe and locking on it will not allow concurrent modifications of the metadata. I think that if the metadata can change during listPrivileges(), the results can be worse than staleness: 0. role R1 has no privilege P while role R2 has P, and group G has both R1 and R2 roles 1. Sentry: privilege P is granted for role R1 2. Sentry: privilege P is revoked for role R2 3. Impala: listPrivileges() starts for group G, R1's privileges are collected, and P is not included 4. the two changes on Sentry are synchronized with Impala in one refresh 5. Impala: listPrivileges() continues, R2's privileges are collected, and P is not included So group G transiently doesn't have privilege P in Impala, while it always has it in Sentry. I do not know if it is a design goal to avoid such (transient) issues. If we would like to see a snapshot of SentryAuthorizationPolicy during listPrivileges(), then we could move the logic of the functions to SentryAuthorizationPolicy, and use a read/write lock to ensure that concurrent readers do not block each other. A more sophisticated way would be to use version numbers to check if the relevant metadata changed during listPrivileges(). http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java File fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java: http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java@67 PS2, Line 67: if (privilegeName == null) { : if (LOG.isTraceEnabled()) { : LOG.trace("Ignoring invalid privilege: " + privilegeName); : } : continue; : } I think that we do not need to handle the null case, as the keys cannot be null in the underlying map by design. PrincipalPrivilege.buildPrivilegeName() seems to handle it, but most call sites that used it expected non-null even before https://gerrit.cloudera.org/#/c/11509 If there are places where nullness should be handled, then those are the places where TPrivileges from external sources are added/removed, e.g. SentryProxy.refreshPrivilegesInCatalog(). http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java@91 PS2, Line 91: if (privilegeName == null) { : if (LOG.isTraceEnabled()) { : LOG.trace("Ignoring invalid privilege: " + privilegeName); : } : continue; : } See comment at line 67. http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java File fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java: http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java@67 PS2, Line 67: * The case sensitivity name will depend on the value of 'caseInsensitiveKeys'. nit: missing 'of'? http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/catalog/Principal.java File fe/src/main/java/org/apache/impala/catalog/Principal.java: http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/catalog/Principal.java@63 PS2, Line 63: // URIs are case sensitive. : return principalPrivileges_.add(privilege, : privilege.getPrivilege().getScope() != TPrivilegeScope.URI); I would prefer to change the whole principalPrivileges_ to be case sensitive based on the following reasoning: - if non-URI parts of the name are always lower case, then it doesn't matter if we treat them as case insensitive or not - if non-URI parts of the name can be upper case (including 'server'), then this add() will also treat the 'server' part of URIs in case sensitive way, which seems wrong Note that doing it case sensitive way is faster, as the toLower() calls are skipped. http://gerrit.cloudera.org:8080/#/c/13095/2/fe/src/main/java/org/apache/impala/catalog/Principal.java@97 PS2, Line 97: // URIs are case sensitive. : return principalPrivileges_.remove(privilegeName, !privilegeName.contains("->uri=")); See comment at line 63. http://gerrit.cloudera.org:8080/#/c/13095/2/tests/authorization/test_grant_revoke.py File tests/authorization/test_grant_revoke.py: http://gerrit.cloudera.org:8080/#/c/13095/2/tests/authorization/test_grant_revoke.py@389 PS2, Line 389: test_uri_privilege_case_sensitive Is it possible to add two uri privileges that only differ in case sensitivity? If there are not tests for this case, then we could test it here. -- To view, visit http://gerrit.cloudera.org:8080/13095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I942d9b55f07c8972f69e532567d9b7d80fceb6e5 Gerrit-Change-Number: 13095 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Fredy Wijaya <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Wed, 24 Apr 2019 10:29:02 +0000 Gerrit-HasComments: Yes
