[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Andrew Wong (Code Review) Sat, 20 Apr 2019 00:31:44 -0700

Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................

Patch Set 4:

(6 comments)

Another partial review, will do a fuller pass, but posting in case my comment
on the commit message is helpful to other reviewers.

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG@8
PS4, Line 8:
: This patch updates the way how the privilege cache in
: SentryAuthzProvider is populated. Prior to this patch, only one
entry
: per sanitized Sentry's response was created. With this patch,
: a response may be split into two entries: one contains SERVER- and
: DATABASE-scope privileges, and another contains TABLE- and
COLUMN-scope
: privileges. Of course, it also changes the lookup process: now
it's
: necessary to search for two entries in the cache if looking up for
: an entry with privileges for an authorizable of the TABLE scope.
:
: The new caching strategy leverages the fact that Sentry includes
: information on privileges granted on authorizables of higher scopes
: in the hierarchy, if any. The new strategy is beneficial in cases
: when a user has privileges granted on DATABASE. In that case, once
: there was a request to authorize an action on a table or a column
: of that table, next request to authorize an action on the database
: itself will hit the cache, avoiding an extra RPC sent to Sentry.
: Another example that benefits from the new caching scheme are
: scenarios like AuthorizeDropTable(tableA) followed by
: AuthorizeCreateTable(tableA).
There are a bunch of details in here, but I think it'd be easier to follow, and
in turn, review the rest of this patch, with roughly the following structure:
1. some context explaining what was wrong with the existing caching strategy
2. a concrete example depicting what was bad
3. what is the high-level approach that fixes this problem?
4. how the problem in #2 will work now.
5. other implications/examples, if any seem appropriate.

Here's my attempt based on my understanding of this patch (feel free to correct
anything that's off):

The API that Kudu uses to fetch privileges from Sentry will return privileges
corresponding to all ancestors and descendants of the requested authorizable in
Sentry's authorizable scope hierarchy tree. For instance, if requesting
database-level privileges on "s.d", the API will return privileges on "s",
"s.d", privileges on all tables within "s.d", and privileges on all columns
therein. This is unsettling because a single database may have many tables and
columns, and we would currently end up caching all of them, even those that may
not be tables and columns stored in Kudu.

Additionally, the existing cache-lookup policy is very simple and doesn't allow
the usage of usable privileges that exist in the cache where it could. For
instance, the existing policy might store the following in the cache:

{ /s/d/t => ["metadata" on /s, "create" on /s/d, "insert" on /s/d/t, "select"
on /s/d/t/c] }

A cache lookup for privileges on /s/d would yield an expensive database-scoped
request to Sentry, even though, if we were smarter about how we structured our
privileges in cache, we could return the cached database-scoped privilege.

This patch addresses these problems by splitting the cached privileges into two
entries: those for database-scope and above (keyed by database), and those for
table-scope and below (keyed by table). For cache lookups for a given
authorizable, this allows us to use existing privileges for higher scopes from
other authorizables where possible, avoiding expensive calls to Sentry for
server- or database-scoped privileges. Now, for the same set of privileges, the
cache will store:

{ /s/d => ["metadata" on /s, "create" on /s/d],
/s/d/t => ["insert" on /s/d/t, "select" on /s/d/t/c] }

With the above in cache, a lookup on /s/d would thus be able to leverage the
existing entry. The trade-off here is that lookups on /s/d/t will now require
two lookups, one on /s/d and one on /s/d/t, to return the complete lists of
privileges in /s/d/t's hierarchy tree.

Why database and above, and table and below? Why not separate them further?
Based on our current privilege model, we only expect to request privileges at
the database scope (e.g. for CreateTable) or table scope (e.g. OpenTable), so
this separation seems natural while limiting the number of cache lookups as
much as possible.

As a further optimization on cache size, because there aren't constraints that
the privileges in Sentry when requesting privileges at the database scope
correspond to things stored in Kudu, we will now ignore privileges below the
database scope when making such requests when caching.
(maybe this belongs in a separate patch)

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@541
PS2, Line 541: HECK(kScopeDabase.Implies(scope));
: // Not expecting requests for authorizables of scope narrower
than TABLE.
: DCHECK(scope.Implies(kScopeTable));
: // Do not query Sentry for authz scopes narrower than 'TABLE'.
: const auto& key = aggregate_key.GetFlattened
> No, it does not -- those are queries to the cache.
Oof, yeah pretty obvious in retrospect. Rushed code review is bad code review.

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@610
PS2, Line 610: omes for an authorizable of the TABLE scope.
> OK, so far I used Implies(), but I'm thinking of adding comparison operatio
Yeah, I feel you. Maybe it's worth just being explicit and doing
requested_scope == TABLE || requested_scope == COLUMN

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@534
PS4, Line 534: scope(requested_scope);
nit: would you mind reversing the names so that we use requested_scope
throughout? That way it's explicitly obvious when reading below what scope
we're referring to.

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@539
PS4, Line 539: // TODO(aserbin): uncomment when it's guaranteed we are not
getting
: // requests for authorizables of scope wider
than DATABASE.
I think this is already true; we never request privileges at the server scope.

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@545
PS4, Line 545: kScopeTable.scope())
nit: I like these in that it means we don't have to do annoying things like
write out SentryAuthorizableScope(SentryAuthorizableScope::TABLE) all the time,
and I know it should be obvious, but when pulling out the enum like this, IMO
this takes extra cycles to read. Could we keep SentryAuthorizableScope::TABLE
here and L617 and L637?

--
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 20 Apr 2019 07:30:54 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Reply via email to