Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12632 )
Change subject: IMPALA-8100: Add initial support for Ranger ...................................................................... Patch Set 9: (25 comments) http://gerrit.cloudera.org:8080/#/c/12632/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12632/9//COMMIT_MSG@30 PS9, Line 30: - Ran all FE tests : - Ran all E2E authorization tests did you run these with ranger configured? I'm surprised they would pass since best I can tell this patch doesn't attempt to support GRANT/REVOKE statements etc, right? http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc File be/src/service/frontend.cc: http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc@41 PS9, Line 41: "org.apache.impala.authorization.sentry.SentryAuthorizationFactory", I wonder if we should switch this to be a more human-friendly flag instead of a class name, before we ship it? http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc@46 PS9, Line 46: "identify the application that communicates with Ranger."); Worth noting whether this is required, considering it has an empty default http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc@47 PS9, Line 47: DEFINE_string(server_name, "", "The name to use for securing this impalad " we probably can't rename the existing sentry-specific flags (for compat reasons) but perhaps we can amend the help text for each of these flags to say something like "only relevant if sentry is enabled" or something? It might be worth hard-coding some checks to make sure GetCommandLineFlagInfoOrDie("server_name").is_default for these flags if ranger is set, and vice-versa for ranger-specific flags, so that people don't accidentally cross-pollinate their configs http://gerrit.cloudera.org:8080/#/c/12632/9/fe/pom.xml File fe/pom.xml: http://gerrit.cloudera.org:8080/#/c/12632/9/fe/pom.xml@112 PS9, Line 112: <exclusions> it seems that ranger-plugins-common has a pom dependency on mysql-connector-java. Is that really necessary? I thought ranger used a "service" model rather than the plugin directly talking to any database? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java File fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java: http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java@126 PS9, Line 126: Privilege.ANY); I'm not 100% sure about this part of the change. If I don't have the 'VIEW_METADATA' privilege but I do have permissions on one or more columns, should I still be able to use 'DESCRIBE' in all cases? For example, 'DESCRIBE EXTENDED' on a view will show the view text and other metadata, but maybe I'm not supposed to have that capability if I don't have VIEW_METADATA privileges? Do we have some good documentation on the intended semantics here? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java: http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@77 PS9, Line 77: // Any column access is special in Ranger. For example if we want to check if : // we have access to any column on a particular table, we need to build a resource : // without the column resource. : // For example: : // access type: RangerPolicyEngine.ANY_ACCESS : // resources: [server=server1, database=foo, table=bar] I'm not understanding this. When you are saying "Any column access" do you mean "authorization checks for wildcard columns" or something? I think expanding upon the example would be useful. http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@106 PS9, Line 106: if (request.getPrivilege() == Privilege.VIEW_METADATA) { does this need to be special cased? It seems all the other privileges already have an "implied" set which is equal to just the privilege itself. (ANY probably does need to be special cased still) http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@117 PS9, Line 117: if (authorize(plugin, resource, user, privilege)) { does this result in multiple audit events, one per implied privilege? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@131 PS9, Line 131: new HashSet can you use Collections.emptySet? Also is this a TODO? Should we be doing group resolution here? Or will Ranger fill in its own groups? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationConfig.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationConfig.java: http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationConfig.java@38 PS9, Line 38: serviceType nit: Preconditions.checkNotNull(foo) returns 'foo', so you can combine these assignments with the checks http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java: http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java@55 PS9, Line 55: Preconditions.checkArgument(!Strings.isNullOrEmpty(backendConfig.getRangerAppId())); is this the only validation that happens on this config? If so, we should make sure the exception has a clear message pointing back to the flag so the user doesn't need to go pull up the source to track down which config is throwing IllegalArgumentException http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java File fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java: http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java@37 PS9, Line 37: serverName can we add Preconditions.checkNotNull() for all parameters in this file? Is it valid to ever pass null? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java@52 PS9, Line 52: rangerAccessResource.setValue(COLUMN, columnName); general question about ranger semantics -- are resources case sensitive? should we be canonicalizing to lower case anywhere? Or should we be Preconditioning that its' already canonicalized? Or does that get handled on the ranger side? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java File fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java: http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java@90 PS9, Line 90: private static final String RANGER_SERVICE_TYPE = "impala"; : private static final String RANGER_APP_ID = "impala"; can these be pulled from the config instead to avoid duplication? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java@110 PS9, Line 110: System.getenv("IMPALA_HOME") + "/fe/src/test/resources/sentry-site. this is a little sketchy. Shouldn't this resource be on the classpath and thus just loadable by its name? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java File fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java: http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@178 PS9, Line 178: SentryAuthorizationFactory authzFactory = new SentryAuthorizationFactory( should we be parametrizing this test to be able to run against ranger as well? Or is it too specific to Sentry? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java@308 PS9, Line 308: @Override : public boolean isEnabled() { return true; } : @Override : public AuthorizationProvider getProvider() { : return AuthorizationProvider.NONE; : } I'm not sure what this means, semantically -- authorization enabled but NONE? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml File fe/src/test/resources/ranger-impala-security.xml: http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@21 PS9, Line 21: <value>test_impala</value> how does this interact with the gflag for "service id"? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@37 PS9, Line 37: <value>/tmp</value> can we auto-configure this based on other existing scratch dir configs, at least as a default? http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@48 PS9, Line 48: How often to poll for changes in policies? are these strings copied from somewhere? It's weird to see a question mark in a description. http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@56 PS9, Line 56: RangerRestClient Connection Timeout in Milli Seconds. milliseconds http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@64 PS9, Line 64: RangerRestClient read Timeout in Milli Seconds. milliseconds http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/bin/create-load-data.sh@308 PS9, Line 308: perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg' \ you don't want to die() if $ENV{$1} is not defined here? leaving it unsubstituted seems like it'll just make a harder-to-troubleshoot error later http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/cluster/ranger/setup/impala_servicedef.json File testdata/cluster/ranger/setup/impala_servicedef.json: http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/cluster/ranger/setup/impala_servicedef.json@1 PS9, Line 1: { I'm a little curious, at a high level, how this interacts with any ranger polciies defined for hive. Shouldn't we have a single service definition which applies to both hive and impala, rather than our own service def? Also, what's the eventual deployment model? This is in our test data, but in real life we'd expect these files to be deployed onto the ranger server? Maybe a README file in this directory would be useful. -- To view, visit http://gerrit.cloudera.org:8080/12632 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8cad9e609d20aae1ff645c84fd58a02afee70276 Gerrit-Change-Number: 12632 Gerrit-PatchSet: 9 Gerrit-Owner: Fredy Wijaya <[email protected]> Gerrit-Reviewer: Austin Nobis <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Fredy Wijaya <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Wed, 13 Mar 2019 22:23:10 +0000 Gerrit-HasComments: Yes
