Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11956 )

Change subject: [sentry] add mini Sentry to the external mini cluster
......................................................................


Patch Set 7:

(42 comments)

http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@13
PS2, Line 13: one appro
> nit: this reads a bit like this is what's implemented. Mind changing it to
Done


http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@15
PS2, Line 15:   1. Start the Sentry service. Find out which port it's on. At 
this
> Good explanation, but I'd make two small tweaks to improve readability:
Done


http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@21
PS2, Line 21:
> Nit: where
Done


http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@22
PS2, Line 22: ver, there could be
> Nit: "same port during the restart in step 3."
Done


http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@24
PS2, Line 24: SO_REUSEPORT socket option, which permits multiple sockets to be 
bound
            : to an identical socket address
> Nit: I'd rewrite as "Although both Sentry and HMS are written in Java, only
Done


http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@26
PS2, Line 26: [1]. Meth
> Nit: "they are hacky"
Done


http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@33
PS2, Line 33: collision.
> Wrong link here; didn't you mean the SO_REUSEPORT anchor?
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.h
File src/kudu/hms/mini_hms.h:

http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.h@54
PS2, Line 54: d EnableSentry(const HostPort& sentry_address,
> nit: doc what 'sentry_service_principal' is? What happens when it's boost::
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc
File src/kudu/hms/mini_hms.cc:

http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@81
PS2, Line 81: ess_ = sentry_address.ToSt
> nit: maybe add the sentry address?
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@84
PS2, Line 84:
> If you're going to go through the trouble to wrap sentry_service_principal
Sorry for the confusion. I found it is more straightforward to always pass the 
service_principal and use keytab_file as an indicator of if Kerberos is 
enabled, the same as all other places are doing.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@164
PS2, Line 164: if (!wait.ok()) {
> nit: Can you add a comment about why `none` is appropriate here?
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@164
PS2, Line 164:   if (!wait.ok()) {
> Should use /*foo=*/ inline comments to help explain what 'none' means.
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@306
PS2, Line 306:
             :   string sentry_properties;
             :   if (IsAuthorizationEnabled()) {
             :
> Shouldn't this also be conditioned on IsAuthorizationEnabled? Otherwise the
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@318
PS2, Line 318: authorization validation.
             :     //
> I'm not sure I understand this. What is a "read result" in this context?
I am not super clear on this config as well, I cannot find any formal 
documentations on this. By reading the code, it only adds filtering for 
SHOWDATABASES and SHOWTABLES. I will make it more clear based on my 
understanding.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@322
PS2, Line 322: which synchronizes the notification logs with the Sentry
             :     //     service. So that Sentry service is
> Just making sure I understand this, this is configured to ensure the HMS wi
Yeah.

For 'What is considered an authorization metadata operation?', do you mean what 
kinds of operation are considered to consult for authorization metadata?  
Similar to what we do in Kudu, mostly are DDL and DML operations. A complete 
list can be found here: 
https://www.cloudera.com/documentation/enterprise/latest/topics/sentry_privileges_hive_impala.html.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@327
PS2, Line 327:
             :   <name>hive.sentr
> Just making sure I understand this, this is synchronizing authz metadata en
This is to sync events in the HMS with Sentry service, so that Sentry knows 
about the tables has been renamed, or the location of the table has changed. 
And it updates these changes accordingly in Sentry. Update the comment to 
explain in more detail.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@328
PS2, Line 328: ve.sentr
> nit: missing space
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@328
PS2, Line 328:   <name>hive.sentry.conf.url</name>
> Nit: indentation
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@355
PS2, Line 355:                                          keytab_file_,
> Where is this substituted? I don't see a $7 in kHiveFileTemplate.
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@359
PS2, Line 359:
> HMS
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@359
PS2, Line 359:
> HMS
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@363
PS2, Line 363:     //     server-level privileges in Sentry.
             :     //
> Nit: I'd make this a bit more explicit "Set of service users whose access w
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@366
PS2, Line 366:     //     Set of service users whose access will be excluded 
from
> The indentation here is pretty confusing w.r.t. the IsAuthorizationEnabled
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/integration-tests/master_sentry-itest.cc
File src/kudu/integration-tests/master_sentry-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/integration-tests/master_sentry-itest.cc@55
PS2, Line 55:   const ExternalMiniCluster::BindMode bind_mode;
> Hrm, where is this used?
This is used in L83-84 and L86-L87.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/integration-tests/master_sentry-itest.cc@86
PS2, Line 86: macOS, it's not possible to have unique loopback interfaces.
            : // Thus, binds to loopback ip address "127.0.0.1".
> Mind commenting what the behavior for this test is for mac?
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/integration-tests/master_sentry-itest.cc@103
PS2, Line 103: chema::INT32);
> nit: maybe pull this out into a kTableIdentifier?
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/integration-tests/master_sentry-itest.cc@117
PS2, Line 117:
             :   thrift::ClientOptions hms_client_opts;
             :   const auto& params = GetParam();
             :   if (params.enable_kerberos) {
             :     // Create a principal 'kudu' and configure to use it.
             :     ASSERT_OK(cluster_->kdc()->CreateUserPrincipal("kudu"));
             :     ASSERT_OK(cluster_->kdc()->Kinit("kudu"));
             :     ASSERT_OK(cluster_->kdc()->SetKrb5Environment());
             :     hms_client_opts.enable_kerberos = true;
             :     hms_client_opts.service_principal = "hive";
             :   }
             :   hms::HmsClient hms_client(cluster_->hms()->address(), 
hms_client_opts);
             :   ASSERT_OK(hms_client.Start());
> Are there authz metadata entries we should be checking here?
I don't think that is necessary. The main reason to check the HMS entries it to 
ensure metadata are in sync between Kudu and the HMS.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h
File src/kudu/mini-cluster/external_mini_cluster.h:

http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@159
PS2, Line 159:   //
> This should probably be reflected in the CLI tool, in the control shell.
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@159
PS2, Line 159:   //
> nit: Is there a default worth mentioning here?
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@160
PS2, Line 160:   // Default: false.
> Will we need a SentryMode to support all three cases? That is, "no sentry",
We need tri mode in HMS integration because in certain test cases, we need to 
make some updates in HMS but do not want them to be reflected in Kudu. I don't 
see the same needs for Sentry integration tests so far.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@225
PS2, Line 225:
> Nit: provide an example of what an "external server" is (like kdc or whatev
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@419
PS2, Line 419:
> I'm a little confused about this variable. Usually an index is a pointer in
Yeah, my bad, this is a track the index used for GetBindIpForExternalServer. 
Though I will remove it based on Adar's suggestions.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@419
PS2, Line 419:
             :   DISALLOW_COPY_AND_ASSIGN(ExternalMiniCluster)
> Seems like overkill for now, given that Sentry (and maybe HMS) are the only
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.cc
File src/kudu/mini-cluster/external_mini_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.cc@257
PS2, Line 257:     }
> You can avoid the restart altogether:
Unfortunately, this is not an option. As surprisedly, HMS always does wildcard 
binding and doesn't provide an option to bind to a specific host, 
https://issues.apache.org/jira/browse/HIVE-18998.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.cc@226
PS2, Line 226: // with the other's IP/port.
             :   // 1. Pick a bind IP and start the Sentry service with the 
picked IP and
             :   //    port 0. Find out which port it's on by polling lsof.
             :   // 2. Start the HMS, configured to talk to the Sentry service. 
Find out
             :   //    which port it's on.
             :   // 3. Restart the Sentry service on the same port, 
reconfigured to talk
             :   //    to the HMS's port.
             :   // Note that the restart step can be avoid if the HMS provides 
an option
             :   // to bind to a specific IP (HIVE-18998).
             :
             :   // Start Sentry.
             :   if (opts_.enable_sentry) {
             :     string host = GetBindIpForExternalServer(0);
             :     HostPort sentry_address(host, 0);
             :     RETURN_NOT_OK_PREPEND(StartSentry(sentry_address),
             :                           "Failed to start the Sentry service");
             :   }
             :
             :   // Start the HMS.
             :   if (opts_.hms_mode == HmsMode::ENABLE_HIVE_METASTORE ||
             :       opts_.hms_mode == HmsMode::ENABLE_METASTORE_INTEGRATION) {
             :     hms_.reset(new hms::MiniHms());
             :     hms_->SetDataRoot(opts_.cluster_root);
             :
             :     if (opts_.enable_kerberos) {
             :       string spn = Substitute("hive/$0", hms_->address().host());
             :       string ktpath;
             :       RETURN_NOT_OK_PREPEND(kdc_->CreateServiceKeytab(spn, 
&ktpath),
             :                             "could not create keytab");
             :       hms_->EnableKerberos(kdc_->GetEnvVars()["KRB5_CONFIG"], 
spn, ktpath,
             :                            rpc::SaslProtection::kAuthentication);
             :     }
             :
             :     if (opts_.enable_sentry) {
             :       string sentry_spn = Substitute("sentry/$0...@krbtest.com", 
sentry_->address().host());
             :       hms_->EnableSentry(sentry_->address(), sentry_spn);
             :     }
             :
             :     RETURN_NOT_OK_PREPEND(hms_->Start(),
             :
> You kind of allude to what this is doing in the commit message, but I think
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/mini_cluster.cc
File src/kudu/mini-cluster/mini_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/mini_cluster.cc@53
PS2, Line 53:       // Partition the index space 'kServersMaxNum' into three 
portions, one for each
            :       // daemon type, to get unique address. If a daemon index 
spans over the portion
            :       // reserved for another type, then duplicate address can be 
generated. Though this
            :       // should be enough for our current tests.
            :       switch (type) {
            :         case MASTER:
            :           idx = kServersMaxNum - index;
            :           break;
            :         case TSERVER:
            :           idx = index + 1;
            :           break;
            :         case EXTERNAL_SERVER:
            :
> Seems like the goal of this is to partition the 256k "index space" into thr
In that case, I think duplicate IP can be generated.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/sentry/mini_sentry.h
File src/kudu/sentry/mini_sentry.h:

http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/sentry/mini_sentry.h@93
PS2, Line 93:
> Does this inline initialization in the class declaration work? I thought it
Yeah, in class member initialization is allowed in C++11, 
https://isocpp.org/wiki/faq/cpp11-language-classes#member-init.


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/sentry/mini_sentry.cc
File src/kudu/sentry/mini_sentry.cc:

http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/sentry/mini_sentry.cc@251
PS2, Line 251:
> Doc this.
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.h
File src/kudu/util/test_util.h:

http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.h@139
PS2, Line 139:
> Should doc the new field.
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.cc
File src/kudu/util/test_util.cc:

http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.cc@410
PS2, Line 410:   // The first line is the pid. We ignore it.
> Should probably add that in each pair, we ignore the first half of the pair
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.cc@415
PS2, Line 415:  ad
> Nit: "we use", to maintain the same voice as in the beginning of this parag
Done


http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.cc@416
PS2, Line 416:
> Nit: matches
Done



--
To view, visit http://gerrit.cloudera.org:8080/11956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f02e6085bd239570d10ec629f48856d37ed6e59
Gerrit-Change-Number: 11956
Gerrit-PatchSet: 7
Gerrit-Owner: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 10 Dec 2018 02:11:46 +0000
Gerrit-HasComments: Yes

Reply via email to