Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12632 )

Change subject: IMPALA-8100: Add initial support for Ranger
......................................................................


Patch Set 11:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/12632/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12632/9//COMMIT_MSG@30
PS9, Line 30: - Ran all FE tests
            : - Ran all E2E authorization tests
> did you run these with ranger configured? I'm surprised they would pass sin
For FE, it runs with both Ranger and Sentry via JUnit parameterized test. I ran 
E2E authorization tests to make sure no regression in the Sentry implementation.


http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc@41
PS9, Line 41:     
"org.apache.impala.authorization.sentry.SentryAuthorizationFactory",
> I wonder if we should switch this to be a more human-friendly flag instead
Phil brought up the same thing. I have a plan to make it more human-friendly in 
a different JIRA: https://issues.apache.org/jira/browse/IMPALA-8309


http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc@46
PS9, Line 46:     " uniquely identify the application that communicates with 
Ranger.");
> Worth noting whether this is required, considering it has an empty default
Done


http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc@47
PS9, Line 47: DEFINE_string(server_name, "", "(Required) The name to use for 
securing this impalad "
> we probably can't rename the existing sentry-specific flags (for compat rea
Done. server_name is also required for Ranger.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/pom.xml
File fe/pom.xml:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/pom.xml@112
PS9, Line 112:       <exclusions>
> it seems that ranger-plugins-common has a pom dependency on mysql-connector
Yes it's a service model. Running mvn dependency:tree on Ranger gave me this 
output. I think it's pretty light after excluding some of its transitive 
dependencies.

[INFO] +- org.apache.ranger:ranger-plugins-common:jar:1.2.0:compile             
                                                                                
                                                                                
                                                               
[INFO] |  \- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:compile
[INFO] +- org.apache.ranger:ranger-plugins-audit:jar:1.2.0:compile


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java@126
PS9, Line 126:         Privilege.ANY);
> I'm not 100% sure about this part of the change. If I don't have the 'VIEW_
If I don't have the 'VIEW_METADATA' privilege but I do have permissions on one 
or more columns, should I still be able to use 'DESCRIBE' in all cases?

Yes, however the output of DESCRIBE will vary depending on whether or not the 
user has VIEW_METADATA privilege (the second level check). Filtering out the 
DESCRIBE output is part of this JIRA: 
https://issues.apache.org/jira/browse/IMPALA-6479.

This behavior is pretty similar to "show databases/tables". A user can always 
execute "show databases/tables", however only databases/tables that the user 
has access to (with VIEW_METADATA privilege) will be displayed.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java
File 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@77
PS9, Line 77:             .table(authorizable.getTableName());
            :         break;
            :       case COLUMN:
            :         builder.server(config_.getServerName())
            :             .database(authorizable.getDbName());
            :         // * in Ranger means "all". For example to check access
> I'm not understanding this. When you are saying "Any column access" do you
Done. I agree it was confusing.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@106
PS9, Line 106:             .function(authorizable.getFnName());
> does this need to be special cased? It seems all the other privileges alrea
Ah yeah you're right.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@117
PS9, Line 117:     if (request.getPrivilege() == Privilege.ANY) {
> does this result in multiple audit events, one per implied privilege?
Yes.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@131
PS9, Line 131:
> can you use Collections.emptySet?
Good catch, meant to leave it as a todo, but it's easy enough that i'll just do 
it i this patch.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationConfig.java
File 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationConfig.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationConfig.java@38
PS9, Line 38:  RangerConf
> nit: Preconditions.checkNotNull(foo) returns 'foo', so you can combine thes
Done


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java
File 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java@55
PS9, Line 55:     Preconditions.checkArgument(!Strings.isNullOrEmpty(
> is this the only validation that happens on this config? If so, we should m
Ah yeah, good point. Done.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java
File 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java@37
PS9, Line 37: r(String s
> can we add Preconditions.checkNotNull() for all parameters in this file? Is
Done


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java@52
PS9, Line 52:   public RangerImpalaResourceBuilder column(String columnName) {
> general question about ranger semantics -- are resources case sensitive? sh
That'll be handled in Ranger side, more specifically the service definition: 
https://gerrit.cloudera.org/c/12632/9/testdata/cluster/ranger/setup/impala_servicedef.json#142


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java
File fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java@90
PS9, Line 90:   private static final String RANGER_SERVICE_TYPE = "impala";
            :   private static final String RANGER_APP_ID = "impala";
> can these be pulled from the config instead to avoid duplication?
Ranger needs service type and app ID to create a Ranger config.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java@110
PS9, Line 110: System.getenv("IMPALA_HOME") + 
"/fe/src/test/resources/sentry-site.
> this is a little sketchy. Shouldn't this resource be on the classpath and t
Yes, we could. This is some remnant of the old stuff. The 
createHadoopGroupAuthConfig doesn't understand reading a resource from a 
classpath. I plan to clean up a lot of authorization tests in this JIRA: 
https://issues.apache.org/jira/browse/IMPALA-8248


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
File fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@178
PS9, Line 178:     SentryAuthorizationFactory authzFactory = new 
SentryAuthorizationFactory(
> should we be parametrizing this test to be able to run against ranger as we
There's a separate JIRA to refactor all the authorization tests to basically 
separate Sentry and Ranger specific tests and put the tests that are common to 
both: https://issues.apache.org/jira/browse/IMPALA-8248. I don't want this 
patch to get too big.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java@308
PS9, Line 308:           @Override
             :           public boolean isEnabled() { return true; }
             :           @Override
             :           public AuthorizationProvider getProvider() {
             :             return AuthorizationProvider.NONE;
             :           }
> I'm not sure what this means, semantically -- authorization enabled but NON
Yes, it should be DUMMY or ALWAYS_TRUE provider, but I don't think this 
warrants for an additional enum value :) This is pretty much a hack to make 
non-authorization unit tests (see AnalyzerTest and ToSqlTest) easier because 
some statements, such as REFRESH AUTHORIZATION can only be executed when 
authorization is enabled, but we also don't want to run it with Sentry or 
Ranger.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml
File fe/src/test/resources/ranger-impala-security.xml:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@21
PS9, Line 21:     <value>test_impala</value>
> how does this interact with the gflag for "service id"?
service ID is defined in service definition: 
https://gerrit.cloudera.org/c/12632/9/testdata/cluster/ranger/setup/impala_servicedef.json#50


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@37
PS9, Line 37:     <value>/tmp</value>
> can we auto-configure this based on other existing scratch dir configs, at
The scratch_dirs flag takes multiple directories, do we just pick any scratch 
directory randomly?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@48
PS9, Line 48:       Polling interval in milliseconds to poll for changes in 
policies.
> are these strings copied from somewhere? It's weird to see a question mark
Yeah I copied this config file from Ranger. I will update all the descriptions.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@56
PS9, Line 56:       RangerRestClient connection timeout in milliseconds.
> milliseconds
Done


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@64
PS9, Line 64:       RangerRestClient read timeout in milliseconds.
> milliseconds
Done


http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/bin/create-load-data.sh@308
PS9, Line 308:   perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : 
$&/eg' \
> you don't want to die() if $ENV{$1} is not defined here? leaving it unsubst
Done


http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/cluster/ranger/setup/impala_servicedef.json
File testdata/cluster/ranger/setup/impala_servicedef.json:

http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/cluster/ranger/setup/impala_servicedef.json@1
PS9, Line 1: {
> I'm a little curious, at a high level, how this interacts with any ranger p
This is something that I don't quite know the answer whether Impala should have 
the same service definition as Hive. We can't reuse the existing Hive service 
definition because it contains specific things to Hive, such as Hive JDBC 
driver, different access types/privileges, etc. Unlike with Sentry, the Ranger 
model prefers for each service to have their own service definition. IOW, there 
will not be shared policies between Impala and Hive.

If we need to have shared policies between Impala and Hive, we need to come up 
with a new service definition, but that also means the existing Hive Ranger 
plugin may need to be updated. Let me know what you think?

I also haven't spent much time on the deployment. I believe only the service 
definition will need to be deployed. All other files here are purely for 
testing. I'll look into this more once everything is in a better shape. For 
now, we can consider all these files here for testing.



--
To view, visit http://gerrit.cloudera.org:8080/12632
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8cad9e609d20aae1ff645c84fd58a02afee70276
Gerrit-Change-Number: 12632
Gerrit-PatchSet: 11
Gerrit-Owner: Fredy Wijaya <[email protected]>
Gerrit-Reviewer: Austin Nobis <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Fredy Wijaya <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Thu, 14 Mar 2019 23:54:28 +0000
Gerrit-HasComments: Yes

Reply via email to