Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12020 )
Change subject: IMPALA-7917 (Part 1): Decouple Sentry from Impala ...................................................................... Patch Set 7: (7 comments) The idea of refactoring auth system dependencies out of Impala is an excellent one. Very nice work in teasing the authorization layer out of the "core" Impala code. In addition to figuring out how to fit in one specific auth system, might be good to think about a variety of systems. Perhaps some major user has their own system they want to leverage. (Doing so is common in other products.) Also, think about how to leverage this separation to empower detailed unit testing of the auth system and its "driver" separate from the rest of Impala. This project can really help us do much deeper, more thorough testing. In this round I mostly focused on design: how might the boxes around abstractions be defined for the cleanest separation and easiest testing. We can chat about that, after which I'll dive into some of the implementation details. http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java File fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java: http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java@32 PS7, Line 32: public abstract class AuthorizationChecker { This class is now just a wrapper around an authorization call. I wonder if it would be structured something like: class PrivilegeRequest ... public PrivlegeRequest(Authorizer auth, ...) public authorize() ... The idea is that the request does the work against an authorization system (which may be the null system if no security is enabled.) The details of the level of granularity, the role of roles (pardon the pun) and so on is hidden behind this facade. The authorize() method can do the translation from security system to formatted error message that is currently done here. http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java@56 PS7, Line 56: if (!hasAccess(user, privilegeRequest)) { While you're in here, might as well simplify this: if (hasAccess(...)) return; ... http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationConfig.java File fe/src/main/java/org/apache/impala/authorization/AuthorizationConfig.java: http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationConfig.java@23 PS7, Line 23: public interface AuthorizationConfig { I wonder if this interface is a bit too leaky. Seems we're making an assumption about how a system might be configured. I'd think configuration would be private to Sentry, Ranger or a customer's own system. http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java File fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java: http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java@55 PS7, Line 55: * to the authorization provider that cannot be protected against. Are we making a rather strong assumption here? That Impala will cache something about security? If so, then the catalog has to know the structure of the security system, which radically leaks abstractions. It forces a system to convert its rules to Impala's format. Instead, should the security manager do its own caching so that it is the manager which ensures atomic operations? http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/Authorizer.java File fe/src/main/java/org/apache/impala/authorization/Authorizer.java: http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/Authorizer.java@32 PS7, Line 32: public interface Authorizer { I wonder if the name could be improved. Seems that this is not so much an authorizer per-se, but an interface to alter an authorization system. I would expect an "authorizer" to authorize an action such as "can I query table x?" http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/Privilege.java File fe/src/main/java/org/apache/impala/authorization/Privilege.java: http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/Privilege.java@23 PS7, Line 23: * List of Impala privileges. Per comments elsewhere, I wonder about the model. The set of privileges is defined by the auth system. Impala, it would seem, has a set of actions it can ask about. Maybe some system groups together the ADD, DROP and UPDATE actions into a MODIFY privilege. These items might be good for the role statements, these are what we have syntax to handle. As part of this work, should we figure out what Impala requires (as set of actions, say) and what the auth system defines (the detailed way it works out who can do what to whom.) Also, I wonder if the User concept should be owned by the auth system. LDAP has a particular form of identifier. Hadoop has another idea. Kerberos another. By having a User interface, but auth-system specific implementations, it might be easier to implement each system. Something like: class AuthSystem { User authenticateUser(String name, ...) That way if user "Fredy" is the super-user in some auth system. that user object could, for example, be tagged as such to simplify further authorization within the auth system. http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java File fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java: http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java@52 PS7, Line 52: * For example: As noted elsewhere, should Impala see this? Isn't this a private detail of the authorization system? It may be that we need a library of components that an auth system can use to implement the Impala side. IMHO, the auth system should know about: * Users * A defined set of actions (query, delete, modify, etc.) * A set of objects (db, table, column, function) Then, a request is simply of the form: can user x perform action y on object z? All auth system must handle this much. Then, if an auth system supports roles (LDAP does not), or columns (maybe not if only table-level is supported) and so on is left to the auth system itself. This also means, by the way, that the auth provider can be extensively tested in a test harness, including caching and invalidation, outside of the rest of Impala. -- To view, visit http://gerrit.cloudera.org:8080/12020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1fd1df0b38ddd7cfa41299e95f5827f8a9e9c1f Gerrit-Change-Number: 12020 Gerrit-PatchSet: 7 Gerrit-Owner: Fredy Wijaya <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Fredy Wijaya <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Wed, 09 Jan 2019 06:45:11 +0000 Gerrit-HasComments: Yes
