Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12833 )

Change subject: WIP [master] introduced SentryAuthzCache
......................................................................


Patch Set 6:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/12833/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12833/2//COMMIT_MSG@7
PS2, Line 7: WIP [master] introduced SentryAuthzCache
> Yeah, I didn't mean that to say, "We should focus on DML instead of DDL," o
Ah, I see.  Thank you for the reminder and the explanation!


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider-test.cc@211
PS5, Line 211:   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
> Doesn't the superclass do this?
Indeed -- it seems that left from some earlier revision.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider-test.cc@341
PS5, Line 341: // works as expected.
             : class TestAuthzHierarchy :
             :     public SentryAuthzProviderTest,
             :     public ::testing::WithParamInterface<
             :
> I think if a user wanted that behavior, they could disable the cache, no? S
I think it's a good thing that using cache allows to avoid service 
interruptions when Sentry goes down for some reason.  As of now, I think people 
can follow Andrew's suggestion and disable authz caching at all if they want 
'hard failures' when Sentry is not around.

>From the other side, we can introduce some feature where we monitor Sentry's 
>presence and invalidating the cache as soon as we detected Sentry's down and 
>allow to toggle that as an option.  However, I don't see why such a behavior 
>might be desirable at all.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider-test.cc@503
PS5, Line 503:
             :
> Do you intend to extend this with more test cases?
Yep, sure.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.h
File src/kudu/master/sentry_authz_provider.h:

http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.h@30
PS5, Line 30: #include "kudu/sentry/sentry_client.h"
> With some minor rework you can avoid this inclusion, which would be nice be
Right, that's needed for TTLCache declaration.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.h@102
PS5, Line 102:
> function
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc
File src/kudu/master/sentry_authz_provider.cc:

http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@118
PS5, Line 118:
> Nit: "A value"
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@130
PS5, Line 130: }
> The explanation here doesn't match the runtime behavior. AFAICT, finest_sco
Good catch -- it seems I forgot to implement the coarsening of the requested 
scope.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@133
PS5, Line 133:
> Nit: "for a"
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@134
PS5, Line 134:
             :     : metric_entity_(std::move(metric_entity)) {
> Nit: "and the query led to a cache miss,"
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@141
PS5, Line 141:
             : Status SentryAuthzProvider::Start() {
             :   vector<HostPort> addresses;
> Nit: "which means the cache will fetch exactly the authz scope level it was
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@276
PS5, Line 276:
> nit: A
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@280
PS5, Line 280:
> nit: Maybe name it 'user' instead? Or add a comment that this is used as a
Sure, if it makes it clearer then let's use 'user' and 'authorizable' instead.  
For some reason I thought that 'subject' and 'object' are easier to comprehend 
in the context of authorizing some actions.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@294
PS5, Line 294:
> Nit: will
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@295
PS5, Line 295:
> Nit: drop 'the'
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@304
PS5, Line 304:
> Nit: "The same"
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@305
PS5, Line 305:
> privileges
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@307
PS5, Line 307:
             :
             :
             :
             :
             :
             :
             :
> nit: May be slightly more readable starting at 0, eg:
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@367
PS5, Line 367:
> This check seems unnecessary.
I'm just paranoid :)

Removed


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@368
PS5, Line 368:
> Should note that this is an approximation; we're trying to gauge the size o
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@369
PS5, Line 369:
> Do we need to add sizeof(p) as well?
I think that's already accounted at line 368, no?


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@464
PS5, Line 464:
> After discussion with Andrew offline, I agree that we actually should sanit
Thanks for the productive discussion.

I think we will end up caching some search-optimized structures instead of raw 
responses from Sentry, sure.  With that, since it will be necessary to do all 
those transformations anyways, we can do any filtering and sanitize the 
responses as needed.  I think Andrew already posted a patch for that.

Maybe, we can start from that point and see how we can improve that further.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@492
PS5, Line 492:
> Note that these checks are being applied both to the cached and non-cached
I guess Hao already mentioned in one of comments above that it's assumed that 
Sentry sends sane responses, otherwise that must have been a bug that should be 
fixed.  However, after discussion it seems the consensus is that we want to 
sanitize responses since we are about to filter out anything which is not 
Kudu-related anyways.

As for the cached responses for this current approach, we need to do so because 
it might be a response for a higher scope of authz hierarchy and it might 
contain information for other branches which is not relevant.  And even worse 
-- using information from those branches might result in wrong authz decisions. 
 So, it's necessary to filter other unrelated branches out.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@524
PS5, Line 524:
> Isn't this trivially true by reading the 10 lines just above?
Not necessarily.  This verification is meant to spot anomalies like the 
following:

authorizable = /userA/serverS/databaseD
privilege = /userA/serverS///columnC

or

authorizable = /userA/serverS/databaseD
priviledge = /userA/serverS//tableT

I added a comment; hopefully it's clearer now.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@525
PS5, Line 525:
             :
             :
             :
             :
             :
             :
> Nit: more concise as a for loop:
Done


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@601
PS5, Line 601:
> You could also use SentryAUthorizableScope::FromString() and that'll be cas
Done.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@603
PS5, Line 603:
> Did you explore having Put() usage mirror Get()? That is, rather than using
Yeah, that's an interesting idea.  But as I understand that would be 
semantically wrong: inserting information, say, for the authz branch of 
particular table up to the server level will miss info for other branches down 
from the server level, and that's incorrect.

We discussed various caching approaches during our latest authz sync-up meeting 
and it seems we are about to choose among few.  Let's see how it goes.


http://gerrit.cloudera.org:8080/#/c/12833/5/src/kudu/master/sentry_authz_provider.cc@608
PS5, Line 608:
             :
             :
> I don't feel too strongly about it, but I agree and another benefit would b
I figured that if doing proper authz scope coarsening, then keeping two 
separate HasPrivilege() calls makes more sense.

As for direct call to GetSentryPrivileges(), I think it's back and forth 
movements: one of the intermediate version (not published, though) had that, 
but then I moved everything at authz_provider level.  Now it's back in new PS 
for review.



--
To view, visit http://gerrit.cloudera.org:8080/12833
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaefacd50736f1f152dae34e76778e17b2e84cbe
Gerrit-Change-Number: 12833
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 02 Apr 2019 01:52:59 +0000
Gerrit-HasComments: Yes

Reply via email to