Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11956 )
Change subject: [sentry] add mini Sentry to the external mini cluster ...................................................................... Patch Set 7: (42 comments) http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@13 PS2, Line 13: one appro > nit: this reads a bit like this is what's implemented. Mind changing it to Done http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@15 PS2, Line 15: 1. Start the Sentry service. Find out which port it's on. At this > Good explanation, but I'd make two small tweaks to improve readability: Done http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@21 PS2, Line 21: > Nit: where Done http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@22 PS2, Line 22: ver, there could be > Nit: "same port during the restart in step 3." Done http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@24 PS2, Line 24: SO_REUSEPORT socket option, which permits multiple sockets to be bound : to an identical socket address > Nit: I'd rewrite as "Although both Sentry and HMS are written in Java, only Done http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@26 PS2, Line 26: [1]. Meth > Nit: "they are hacky" Done http://gerrit.cloudera.org:8080/#/c/11956/2//COMMIT_MSG@33 PS2, Line 33: collision. > Wrong link here; didn't you mean the SO_REUSEPORT anchor? Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.h File src/kudu/hms/mini_hms.h: http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.h@54 PS2, Line 54: d EnableSentry(const HostPort& sentry_address, > nit: doc what 'sentry_service_principal' is? What happens when it's boost:: Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc File src/kudu/hms/mini_hms.cc: http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@81 PS2, Line 81: ess_ = sentry_address.ToSt > nit: maybe add the sentry address? Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@84 PS2, Line 84: > If you're going to go through the trouble to wrap sentry_service_principal Sorry for the confusion. I found it is more straightforward to always pass the service_principal and use keytab_file as an indicator of if Kerberos is enabled, the same as all other places are doing. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@164 PS2, Line 164: if (!wait.ok()) { > nit: Can you add a comment about why `none` is appropriate here? Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@164 PS2, Line 164: if (!wait.ok()) { > Should use /*foo=*/ inline comments to help explain what 'none' means. Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@306 PS2, Line 306: : string sentry_properties; : if (IsAuthorizationEnabled()) { : > Shouldn't this also be conditioned on IsAuthorizationEnabled? Otherwise the Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@318 PS2, Line 318: authorization validation. : // > I'm not sure I understand this. What is a "read result" in this context? I am not super clear on this config as well, I cannot find any formal documentations on this. By reading the code, it only adds filtering for SHOWDATABASES and SHOWTABLES. I will make it more clear based on my understanding. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@322 PS2, Line 322: which synchronizes the notification logs with the Sentry : // service. So that Sentry service is > Just making sure I understand this, this is configured to ensure the HMS wi Yeah. For 'What is considered an authorization metadata operation?', do you mean what kinds of operation are considered to consult for authorization metadata? Similar to what we do in Kudu, mostly are DDL and DML operations. A complete list can be found here: https://www.cloudera.com/documentation/enterprise/latest/topics/sentry_privileges_hive_impala.html. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@327 PS2, Line 327: : <name>hive.sentr > Just making sure I understand this, this is synchronizing authz metadata en This is to sync events in the HMS with Sentry service, so that Sentry knows about the tables has been renamed, or the location of the table has changed. And it updates these changes accordingly in Sentry. Update the comment to explain in more detail. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@328 PS2, Line 328: ve.sentr > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@328 PS2, Line 328: <name>hive.sentry.conf.url</name> > Nit: indentation Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@355 PS2, Line 355: keytab_file_, > Where is this substituted? I don't see a $7 in kHiveFileTemplate. Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@359 PS2, Line 359: > HMS Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@359 PS2, Line 359: > HMS Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@363 PS2, Line 363: // server-level privileges in Sentry. : // > Nit: I'd make this a bit more explicit "Set of service users whose access w Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/hms/mini_hms.cc@366 PS2, Line 366: // Set of service users whose access will be excluded from > The indentation here is pretty confusing w.r.t. the IsAuthorizationEnabled Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/integration-tests/master_sentry-itest.cc File src/kudu/integration-tests/master_sentry-itest.cc: http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/integration-tests/master_sentry-itest.cc@55 PS2, Line 55: const ExternalMiniCluster::BindMode bind_mode; > Hrm, where is this used? This is used in L83-84 and L86-L87. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/integration-tests/master_sentry-itest.cc@86 PS2, Line 86: macOS, it's not possible to have unique loopback interfaces. : // Thus, binds to loopback ip address "127.0.0.1". > Mind commenting what the behavior for this test is for mac? Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/integration-tests/master_sentry-itest.cc@103 PS2, Line 103: chema::INT32); > nit: maybe pull this out into a kTableIdentifier? Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/integration-tests/master_sentry-itest.cc@117 PS2, Line 117: : thrift::ClientOptions hms_client_opts; : const auto& params = GetParam(); : if (params.enable_kerberos) { : // Create a principal 'kudu' and configure to use it. : ASSERT_OK(cluster_->kdc()->CreateUserPrincipal("kudu")); : ASSERT_OK(cluster_->kdc()->Kinit("kudu")); : ASSERT_OK(cluster_->kdc()->SetKrb5Environment()); : hms_client_opts.enable_kerberos = true; : hms_client_opts.service_principal = "hive"; : } : hms::HmsClient hms_client(cluster_->hms()->address(), hms_client_opts); : ASSERT_OK(hms_client.Start()); > Are there authz metadata entries we should be checking here? I don't think that is necessary. The main reason to check the HMS entries it to ensure metadata are in sync between Kudu and the HMS. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h File src/kudu/mini-cluster/external_mini_cluster.h: http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@159 PS2, Line 159: // > This should probably be reflected in the CLI tool, in the control shell. Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@159 PS2, Line 159: // > nit: Is there a default worth mentioning here? Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@160 PS2, Line 160: // Default: false. > Will we need a SentryMode to support all three cases? That is, "no sentry", We need tri mode in HMS integration because in certain test cases, we need to make some updates in HMS but do not want them to be reflected in Kudu. I don't see the same needs for Sentry integration tests so far. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@225 PS2, Line 225: > Nit: provide an example of what an "external server" is (like kdc or whatev Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@419 PS2, Line 419: > I'm a little confused about this variable. Usually an index is a pointer in Yeah, my bad, this is a track the index used for GetBindIpForExternalServer. Though I will remove it based on Adar's suggestions. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.h@419 PS2, Line 419: : DISALLOW_COPY_AND_ASSIGN(ExternalMiniCluster) > Seems like overkill for now, given that Sentry (and maybe HMS) are the only Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.cc File src/kudu/mini-cluster/external_mini_cluster.cc: http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.cc@257 PS2, Line 257: } > You can avoid the restart altogether: Unfortunately, this is not an option. As surprisedly, HMS always does wildcard binding and doesn't provide an option to bind to a specific host, https://issues.apache.org/jira/browse/HIVE-18998. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/external_mini_cluster.cc@226 PS2, Line 226: // with the other's IP/port. : // 1. Pick a bind IP and start the Sentry service with the picked IP and : // port 0. Find out which port it's on by polling lsof. : // 2. Start the HMS, configured to talk to the Sentry service. Find out : // which port it's on. : // 3. Restart the Sentry service on the same port, reconfigured to talk : // to the HMS's port. : // Note that the restart step can be avoid if the HMS provides an option : // to bind to a specific IP (HIVE-18998). : : // Start Sentry. : if (opts_.enable_sentry) { : string host = GetBindIpForExternalServer(0); : HostPort sentry_address(host, 0); : RETURN_NOT_OK_PREPEND(StartSentry(sentry_address), : "Failed to start the Sentry service"); : } : : // Start the HMS. : if (opts_.hms_mode == HmsMode::ENABLE_HIVE_METASTORE || : opts_.hms_mode == HmsMode::ENABLE_METASTORE_INTEGRATION) { : hms_.reset(new hms::MiniHms()); : hms_->SetDataRoot(opts_.cluster_root); : : if (opts_.enable_kerberos) { : string spn = Substitute("hive/$0", hms_->address().host()); : string ktpath; : RETURN_NOT_OK_PREPEND(kdc_->CreateServiceKeytab(spn, &ktpath), : "could not create keytab"); : hms_->EnableKerberos(kdc_->GetEnvVars()["KRB5_CONFIG"], spn, ktpath, : rpc::SaslProtection::kAuthentication); : } : : if (opts_.enable_sentry) { : string sentry_spn = Substitute("sentry/$0...@krbtest.com", sentry_->address().host()); : hms_->EnableSentry(sentry_->address(), sentry_spn); : } : : RETURN_NOT_OK_PREPEND(hms_->Start(), : > You kind of allude to what this is doing in the commit message, but I think Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/mini_cluster.cc File src/kudu/mini-cluster/mini_cluster.cc: http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/mini-cluster/mini_cluster.cc@53 PS2, Line 53: // Partition the index space 'kServersMaxNum' into three portions, one for each : // daemon type, to get unique address. If a daemon index spans over the portion : // reserved for another type, then duplicate address can be generated. Though this : // should be enough for our current tests. : switch (type) { : case MASTER: : idx = kServersMaxNum - index; : break; : case TSERVER: : idx = index + 1; : break; : case EXTERNAL_SERVER: : > Seems like the goal of this is to partition the 256k "index space" into thr In that case, I think duplicate IP can be generated. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/sentry/mini_sentry.h File src/kudu/sentry/mini_sentry.h: http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/sentry/mini_sentry.h@93 PS2, Line 93: > Does this inline initialization in the class declaration work? I thought it Yeah, in class member initialization is allowed in C++11, https://isocpp.org/wiki/faq/cpp11-language-classes#member-init. http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/sentry/mini_sentry.cc File src/kudu/sentry/mini_sentry.cc: http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/sentry/mini_sentry.cc@251 PS2, Line 251: > Doc this. Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.h File src/kudu/util/test_util.h: http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.h@139 PS2, Line 139: > Should doc the new field. Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.cc File src/kudu/util/test_util.cc: http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.cc@410 PS2, Line 410: // The first line is the pid. We ignore it. > Should probably add that in each pair, we ignore the first half of the pair Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.cc@415 PS2, Line 415: ad > Nit: "we use", to maintain the same voice as in the beginning of this parag Done http://gerrit.cloudera.org:8080/#/c/11956/2/src/kudu/util/test_util.cc@416 PS2, Line 416: > Nit: matches Done -- To view, visit http://gerrit.cloudera.org:8080/11956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f02e6085bd239570d10ec629f48856d37ed6e59 Gerrit-Change-Number: 11956 Gerrit-PatchSet: 7 Gerrit-Owner: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Mon, 10 Dec 2018 02:11:46 +0000 Gerrit-HasComments: Yes