Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 4:

(12 comments)

Looks good for the most part; I've got a few nits focused around clarity and 
making parts of the implementation more explicit through comments.

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG@8
PS4, Line 8:
           : This patch updates the way how the privilege cache in
           : SentryAuthzProvider is populated.  Prior to this patch, only one 
entry
           : per sanitized Sentry's response was created.  With this patch,
           : a response may be split into two entries: one contains SERVER- and
           : DATABASE-scope privileges, and another contains TABLE- and 
COLUMN-scope
           : privileges.  Of course, it also changes the lookup process: now 
it's
           : necessary to search for two entries in the cache if looking up for
           : an entry with privileges for an authorizable of the TABLE scope.
           :
           : The new caching strategy leverages the fact that Sentry includes
           : information on privileges granted on authorizables of higher scopes
           : in the hierarchy, if any.  The new strategy is beneficial in cases
           : when a user has privileges granted on DATABASE.  In that case, once
           : there was a request to authorize an action on a table or a column
           : of that table, next request to authorize an action on the database
           : itself will hit the cache, avoiding an extra RPC sent to Sentry.
           : Another example that benefits from the new caching scheme are
           : scenarios like AuthorizeDropTable(tableA) followed by
           : AuthorizeCreateTable(tableA).
> Thanks for the detailed explanation. Could we further "micro-optimize" by s
I think you're referring to optimizing which branch we copy? I don't think it 
would make a difference; seems we're copying into an empty branch in a loop.

Also, instead of being a part of the commit message, I wonder if this should 
instead go somewhere in SentryAuthzFetcher.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@729
PS4, Line 729:   Status s = sentry_authz_provider_->AuthorizeDropTable("db.t0", 
kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   s = sentry_authz_provider_->AuthorizeDropTable("db.t1", 
kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   s = 
sentry_authz_provider_->AuthorizeDropTable("db.other_table", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   ASSERT_EQ(3, GetTasksSuccessful());
nit: I think it would be clearer what this is trying to test if we used 
Authorize() directly on a TABLE authorizable.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@776
PS4, Line 776:   // Same story for requests for 'db1.t0', ..., 'db1.t19'.
             :   for (int idx = 0; idx < 20; ++idx) {
             :     const auto table_name = Substitute("db1.t$0", idx);
             :     ASSERT_OK(sentry_authz_provider_->AuthorizeCreateTable(
             :         table_name, kTestUser, kTestUser));
             :   }
nit: I think it would be clearer what this is trying to test if we used 
Authorize() directly on a DATABASE authorizable.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.h
File src/kudu/master/sentry_privileges_fetcher.h:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.h@127
PS4, Line 127:   // Add/merge privileges from other instance of 
SentryPrivilegesBranch.
It's probably worth documenting that, while we don't explicitly enforce or 
check for it, we expect that the 'other' branch is logically partitioned from 
the privileges in this branch. E.g. that this branch has TABLE and COLUMN 
scoped privileges, and 'other' has DATABASE and SERVER or vice versa.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@240
PS4, Line 240:   // Generate the 'raw' key sequence: a sequence of keys for the 
whole authz
             :   // scope hierarchy based on the scope hierarchy in 
'authorizable'
             :   // for the specified 'user'.
Would be helpful to add an example, especially since this can easily be 
conflated with GenerateKeySequence().


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@247
PS4, Line 247:  The lookup sequence
             :   // might differ from the 'raw' sequence: it all depends on how 
the cache
             :   // is populated given sanitized ListPrivileges() Sentry's 
response.
I'm not sure what this means. I think you're trying to be vague in case this 
changes in the futuer, but I don't think it's worth it. If we want to change 
the behavior later, we can just change the comment. So could you just document 
what the current behavior is? What can a user of this function expect to get 
back and in what order?


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@250
PS4, Line 250: As of now, the cache stores data only under DATABASE- and 
TABLE-scope keys.
I think this means, "The returned key sequence will only contain DATABASE- and 
TABLE-scoped keys." If so, just say that. Vagueness helps decouple this from 
implementation, making it slightly easier to change later, but hurts 
understandability, which I value more in the context of security-related code.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@255
PS4, Line 255: // Convert Sentry authz scope into index of the raw key sequence.
Reword to indicate a bit more why this is useful:

Convert the Sentry authz scope to an index in the list: { SERVER, DATABASE, 
TABLE, COLUMN }. This is useful for generating keys comprised of authorizable 
names at different scopes.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@273
PS4, Line 273:   if (!scope) {
             :     // The flattened key without scope level restrictions
             :     // is the very last element of the key_sequence_.
             :     return raw_key_sequence_.back();
             :   }
Is this used?


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@315
PS4, Line 315:
             : // As of now,
nit: while this makes this more future proof, it also somewhat clutters 
comments and distracts from the more important details of what's implemented.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@329
PS4, Line 329:   if (sequence.empty()) {
             :     sequence.emplace_back(raw_sequence.back());
             :   }
This seems to be here in case we request server-level privileges. I don't think 
it's important?


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@629
PS4, Line 629:  From the other side, the information on relevant
             :     // Kudu tables will be fetched as soon as it's requested 
directly and the
             :     // cache is missing the required entry. In other words, 
information on
             :     // privileges for authorizables of the 
SentryAuthorizableScope::TABLE scope
             :     // are fetched from Sentry explicitly.
Reword:

If we need table-level privileges, we should request them explicitly; upon 
doing so, if none are found in the cache, we will fetch them to Sentry.



--
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 22 Apr 2019 22:18:49 +0000
Gerrit-HasComments: Yes

Reply via email to