[Impala-ASF-CR] IMPALA-7917 (Part 1): Decouple Sentry from Impala

Paul Rogers (Code Review) Tue, 08 Jan 2019 22:46:19 -0800

Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12020 )


Change subject: IMPALA-7917 (Part 1): Decouple Sentry from Impala
......................................................................


Patch Set 7:

(7 comments)

The idea of refactoring auth system dependencies out of Impala is an excellent 
one. Very nice work in teasing the authorization layer out of the "core" Impala 
code.

In addition to figuring out how to fit in one specific auth system, might be 
good to think about a variety of systems. Perhaps some major user has their own 
system they want to leverage. (Doing so is common in other products.)

Also, think about how to leverage this separation to empower detailed unit 
testing of the auth system and its "driver" separate from the rest of Impala. 
This project can really help us do much deeper, more thorough testing.

In this round I mostly focused on design: how might the boxes around 
abstractions be defined for the cleanest separation and easiest testing. We can 
chat about that, after which I'll dive into some of the implementation details.

http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java
File fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java:

http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java@32
PS7, Line 32: public abstract class AuthorizationChecker {
This class is now just a wrapper around an authorization call. I wonder if it 
would be structured something like:

class PrivilegeRequest ...

  public PrivlegeRequest(Authorizer auth, ...)
  public authorize() ...

The idea is that the request does the work against an authorization system 
(which may be the null system if no security is enabled.) The details of the 
level of granularity, the role of roles (pardon the pun) and so on is hidden 
behind this facade.

The authorize() method can do the translation from security system to formatted 
error message that is currently done here.


http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java@56
PS7, Line 56:     if (!hasAccess(user, privilegeRequest)) {
While you're in here, might as well simplify this:

if (hasAccess(...)) return;
...


http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationConfig.java
File fe/src/main/java/org/apache/impala/authorization/AuthorizationConfig.java:

http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationConfig.java@23
PS7, Line 23: public interface AuthorizationConfig {
I wonder if this interface is a bit too leaky. Seems we're making an assumption 
about how a system might be configured. I'd think configuration would be 
private to Sentry, Ranger or a customer's own system.


http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java
File fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java:

http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java@55
PS7, Line 55:  * to the authorization provider that cannot be protected against.
Are we making a rather strong assumption here? That Impala will cache something 
about security? If so, then the catalog has to know the structure of the 
security system, which radically leaks abstractions. It forces a system to 
convert its rules to Impala's format.

Instead, should the security manager do its own caching so that it is the 
manager which ensures atomic operations?


http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/Authorizer.java
File fe/src/main/java/org/apache/impala/authorization/Authorizer.java:

http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/Authorizer.java@32
PS7, Line 32: public interface Authorizer {
I wonder if the name could be improved. Seems that this is not so much an 
authorizer per-se, but an interface to alter an authorization system. I would 
expect an "authorizer" to authorize an action such as "can I query table x?"


http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/Privilege.java
File fe/src/main/java/org/apache/impala/authorization/Privilege.java:

http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/authorization/Privilege.java@23
PS7, Line 23:  * List of Impala privileges.
Per comments elsewhere, I wonder about the model. The set of privileges is 
defined by the auth system. Impala, it would seem, has a set of actions it can 
ask about. Maybe some system groups together the ADD, DROP and UPDATE actions 
into a MODIFY privilege.

These items might be good for the role statements, these are what we have 
syntax to handle.

As part of this work, should we figure out what Impala requires (as set of 
actions, say) and what the auth system defines (the detailed way it works out 
who can do what to whom.)

Also, I wonder if the User concept should be owned by the auth system. LDAP has 
a particular form of identifier. Hadoop has another idea. Kerberos another. By 
having a User interface, but auth-system specific implementations, it might be 
easier to implement each system.

Something like:

class AuthSystem {
  User authenticateUser(String name, ...)

That way if user "Fredy" is the super-user in some auth system. that user 
object could, for example, be tagged as such to simplify further authorization 
within the auth system.


http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java
File fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java:

http://gerrit.cloudera.org:8080/#/c/12020/7/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java@52
PS7, Line 52:  * For example:
As noted elsewhere, should Impala see this? Isn't this a private detail of the 
authorization system? It may be that we need a library of components that an 
auth system can use to implement the Impala side.

IMHO, the auth system should know about:

* Users
* A defined set of actions (query, delete, modify, etc.)
* A set of objects (db, table, column, function)

Then, a request is simply of the form: can user x perform action y on object z? 
All auth system must handle this much.

Then, if an auth system supports roles (LDAP does not), or columns (maybe not 
if only table-level is supported) and so on is left to the auth system itself.

This also means, by the way, that the auth provider can be extensively tested 
in a test harness, including caching and invalidation, outside of the rest of 
Impala.



--
To view, visit http://gerrit.cloudera.org:8080/12020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1fd1df0b38ddd7cfa41299e95f5827f8a9e9c1f
Gerrit-Change-Number: 12020
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Fredy Wijaya <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Wed, 09 Jan 2019 06:45:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7917 (Part 1): Decouple Sentry from Impala

Reply via email to