Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )
Change subject: [authz] new SentryAuthzProvider's caching strategy ...................................................................... Patch Set 4: (12 comments) Looks good for the most part; I've got a few nits focused around clarity and making parts of the implementation more explicit through comments. http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG@8 PS4, Line 8: : This patch updates the way how the privilege cache in : SentryAuthzProvider is populated. Prior to this patch, only one entry : per sanitized Sentry's response was created. With this patch, : a response may be split into two entries: one contains SERVER- and : DATABASE-scope privileges, and another contains TABLE- and COLUMN-scope : privileges. Of course, it also changes the lookup process: now it's : necessary to search for two entries in the cache if looking up for : an entry with privileges for an authorizable of the TABLE scope. : : The new caching strategy leverages the fact that Sentry includes : information on privileges granted on authorizables of higher scopes : in the hierarchy, if any. The new strategy is beneficial in cases : when a user has privileges granted on DATABASE. In that case, once : there was a request to authorize an action on a table or a column : of that table, next request to authorize an action on the database : itself will hit the cache, avoiding an extra RPC sent to Sentry. : Another example that benefits from the new caching scheme are : scenarios like AuthorizeDropTable(tableA) followed by : AuthorizeCreateTable(tableA). > Thanks for the detailed explanation. Could we further "micro-optimize" by s I think you're referring to optimizing which branch we copy? I don't think it would make a difference; seems we're copying into an empty branch in a loop. Also, instead of being a part of the commit message, I wonder if this should instead go somewhere in SentryAuthzFetcher. http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc File src/kudu/master/sentry_authz_provider-test.cc: http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@729 PS4, Line 729: Status s = sentry_authz_provider_->AuthorizeDropTable("db.t0", kTestUser); : ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString(); : s = sentry_authz_provider_->AuthorizeDropTable("db.t1", kTestUser); : ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString(); : s = sentry_authz_provider_->AuthorizeDropTable("db.other_table", kTestUser); : ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString(); : ASSERT_EQ(3, GetTasksSuccessful()); nit: I think it would be clearer what this is trying to test if we used Authorize() directly on a TABLE authorizable. http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@776 PS4, Line 776: // Same story for requests for 'db1.t0', ..., 'db1.t19'. : for (int idx = 0; idx < 20; ++idx) { : const auto table_name = Substitute("db1.t$0", idx); : ASSERT_OK(sentry_authz_provider_->AuthorizeCreateTable( : table_name, kTestUser, kTestUser)); : } nit: I think it would be clearer what this is trying to test if we used Authorize() directly on a DATABASE authorizable. http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.h File src/kudu/master/sentry_privileges_fetcher.h: http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.h@127 PS4, Line 127: // Add/merge privileges from other instance of SentryPrivilegesBranch. It's probably worth documenting that, while we don't explicitly enforce or check for it, we expect that the 'other' branch is logically partitioned from the privileges in this branch. E.g. that this branch has TABLE and COLUMN scoped privileges, and 'other' has DATABASE and SERVER or vice versa. http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc File src/kudu/master/sentry_privileges_fetcher.cc: http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@240 PS4, Line 240: // Generate the 'raw' key sequence: a sequence of keys for the whole authz : // scope hierarchy based on the scope hierarchy in 'authorizable' : // for the specified 'user'. Would be helpful to add an example, especially since this can easily be conflated with GenerateKeySequence(). http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@247 PS4, Line 247: The lookup sequence : // might differ from the 'raw' sequence: it all depends on how the cache : // is populated given sanitized ListPrivileges() Sentry's response. I'm not sure what this means. I think you're trying to be vague in case this changes in the futuer, but I don't think it's worth it. If we want to change the behavior later, we can just change the comment. So could you just document what the current behavior is? What can a user of this function expect to get back and in what order? http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@250 PS4, Line 250: As of now, the cache stores data only under DATABASE- and TABLE-scope keys. I think this means, "The returned key sequence will only contain DATABASE- and TABLE-scoped keys." If so, just say that. Vagueness helps decouple this from implementation, making it slightly easier to change later, but hurts understandability, which I value more in the context of security-related code. http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@255 PS4, Line 255: // Convert Sentry authz scope into index of the raw key sequence. Reword to indicate a bit more why this is useful: Convert the Sentry authz scope to an index in the list: { SERVER, DATABASE, TABLE, COLUMN }. This is useful for generating keys comprised of authorizable names at different scopes. http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@273 PS4, Line 273: if (!scope) { : // The flattened key without scope level restrictions : // is the very last element of the key_sequence_. : return raw_key_sequence_.back(); : } Is this used? http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@315 PS4, Line 315: : // As of now, nit: while this makes this more future proof, it also somewhat clutters comments and distracts from the more important details of what's implemented. http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@329 PS4, Line 329: if (sequence.empty()) { : sequence.emplace_back(raw_sequence.back()); : } This seems to be here in case we request server-level privileges. I don't think it's important? http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@629 PS4, Line 629: From the other side, the information on relevant : // Kudu tables will be fetched as soon as it's requested directly and the : // cache is missing the required entry. In other words, information on : // privileges for authorizables of the SentryAuthorizableScope::TABLE scope : // are fetched from Sentry explicitly. Reword: If we need table-level privileges, we should request them explicitly; upon doing so, if none are found in the cache, we will fetch them to Sentry. -- To view, visit http://gerrit.cloudera.org:8080/13069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608 Gerrit-Change-Number: 13069 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 22 Apr 2019 22:18:49 +0000 Gerrit-HasComments: Yes
