[kudu-CR] KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/8312 ) Change subject: KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration .. Patch Set 3: (14 comments) This is going to be significantly reworked in the near future. Leaving open for now because I do want to integrate the feedback. http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc File src/kudu/integration-tests/master_hms-itest.cc: http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@58 PS2, Line 58: using hms::HmsClient; > warning: using decl 'KuduTable' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@65 PS2, Line 65: class MasterHmsTest : public KuduTest { > warning: using decl 'MiniHms' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@67 PS2, Line 67: public: > warning: using decl 'set' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@71 PS2, Line 71: void SetUp() override { > warning: using decl 'Split' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@72 PS2, Line 72: KuduTest::SetUp(); > warning: using decl 'Substitute' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@112 PS2, Line 112: > warning: passing result of std::move() as a const reference argument; no mo ??? http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc File src/kudu/master/hms_catalog.cc: http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@43 PS2, Line 43: using strings::CharSet; > warning: using decl 'function' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@47 PS2, Line 47: > warning: using decl 'Substitute' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@54 PS2, Line 54: > warning: initializer for member 'lock_' is redundant [readability-redundant Done http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@161 PS2, Line 161: rpc.callback = s.AsStdStatusCallback(); > warning: missing username/bug in TODO [google-readability-todo] Done http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@182 PS2, Line 182: table->parameters = { > warning: parameter 'schema' is unused [misc-unused-parameters] this will be used in a later rev once were handling columns correctly. http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@250 PS2, Line 250: } > warning: the parameter 's' is copied for each invocation but only used as a Done http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc File src/kudu/master/hms_catalog.cc: http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@203 PS3, Line 203: for (int idx = 0; idx < table.size(); idx++) { > I feel like this code could be written more concisely using some of the stu perhaps we should just do the split on '.', and send the two parts to hive to validate. We would probably just screw it up if we tried to recreate their rules. http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/util/net/net_util.h File src/kudu/util/net/net_util.h: http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/util/net/net_util.h@133 PS2, Line 133: bool ValidateAddressListFlag(const char* flag_name, const std::string& addr_list); > warning: function 'kudu::ValidateAddressListFlag' has a definition with dif Done -- To view, visit http://gerrit.cloudera.org:8080/8312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b8f360408b05d6ac4f7359a8ac0dd889b196baf Gerrit-Change-Number: 8312 Gerrit-PatchSet: 3 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 13 Nov 2017 22:54:33 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8312 ) Change subject: KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration .. Patch Set 3: (11 comments) curious about timeouts - I dont see any timeouts set here but I'm guessing they're necessary (what if you kill -STOP the minihms, what happens to our calls?) http://gerrit.cloudera.org:8080/#/c/8312/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8312/3//COMMIT_MSG@23 PS3, Line 23: master_hms-itest before this is committed: any plan for fault tests? eg crashing either the HMS or the master at some point in the middle of these sequences? http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/catalog_manager.cc@231 PS3, Line 231: "Addresses of Hive Metastore"); a bit more docs here, eg what the multiple addrs mean, etc. Also does the HMS expose more than one port? (eg an HTTP interface?) If so we should clarify that it's the Thrift addr http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/catalog_manager.cc@1419 PS3, Line 1419: SetupError(resp->mutable_error(), MasterErrorPB::HIVE_METASTORE_ERROR, s); I think it's worth logging such errors as well, since it may be an end user that gets the response, and then they'll probably complain to the admin who will look in the logs for more details. We should probably be especially verbose in these logs because some failures at this point can cause orphaned pointers in the HMS (eg a timeout may be thrown even if the op succeeded) (same below) http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc File src/kudu/master/hms_catalog.cc: http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@88 PS3, Line 88:const Schema& schema) { worth a CHECK(running_) in these functions? otherwise it seems like they would just hang forever http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@198 PS3, Line 198: "Kudu table name must contain only lower-case ASCII characters, " no digits allowed? that's surprising http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@203 PS3, Line 203: for (int idx = 0; idx < table.size(); idx++) { I feel like this code could be written more concisely using some of the stuff from gutil/strings/util.h. For example, you could use Split() on '.' and then ensure that it has 1 or 2 results, and IsIdentifier() returns true for all parts. Or use AdvanceIdentifier up to twice? Or match the whole string against a class that includes '.' and then use strcount('.') to see that there is at most one dot? I think the 'split' is probably most straight forward http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@228 PS3, Line 228: for (const HostPort& address : addresses_) { I'd think we might need to store an index in the class so that we swap to the new HMS when one has an issue. Otherwise what if an HMS is "up" (ie accepts thrift connections) but times out all calls, or "up" but for some reason is unable to talk to its backing RDBMS or whatever? I'd guess we'd want to reconnect to the other one. http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@289 PS3, Line 289: cond_.TimedWait(MonoDelta::FromMilliseconds(wait)); worth some logging here? http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@304 PS3, Line 304: Status s = rpc.run(_.get()); I think wrapping this in a TRACE_EVENT scope and a LOG_IF_SLOW and such would be helpful. Perhaps a TRACE message and/or a TRACE_COUNTER_INCREMENT too to keep track of how much time is spent in calls to HMS for a given CreateTable call. That will also require propagating the current Trace object into the 'Rpc' object. If you'd rather defer this work please add a TODO. I'm happy to do it if you want to assign me the todo http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@308 PS3, Line 308: client_ = boost::none; is there a more explicit Shutdown we should do on the client? or does calling the dtor like this suffice? http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/util/net/net_util.h File src/kudu/util/net/net_util.h: http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/util/net/net_util.h@133 PS3, Line 133: bool ValidateAddressListFlag(const char* flag_name, const std::string& addr_list); doc this? specifically I think this is meant to be used as a gflag VALIDATOR and thus the somewhat odd signature -- To view, visit http://gerrit.cloudera.org:8080/8312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment
[kudu-CR] KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8312 ) Change subject: KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration .. Patch Set 3: (36 comments) http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/CMakeLists.txt File src/kudu/integration-tests/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/CMakeLists.txt@79 PS2, Line 79: ADD_KUDU_TEST(master_hms-itest) Nit: sort order. http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc File src/kudu/integration-tests/master_hms-itest.cc: http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@75 PS2, Line 75: e = true Maybe use ExternalMiniClusterITestBase? http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@79 PS2, Line 79: ASSERT_OK(cluster_->Start()); Can this be omitted? http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@144 PS2, Line 144: ASSERT_OK(hms_client_->GetAllTables(hms_database_name, )); I think you can use ContainsKey here, from map-util.h. http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@155 PS2, Line 155: } // namespace kudu ContainsKey here too. http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/catalog_manager.h@33 PS2, Line 33: #include Not used? http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/catalog_manager.cc@231 PS2, Line 231: "Addresses of Hive Metastore"); Should probably beef this up a bit to explain what setting this actually does. http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/catalog_manager.cc@634 PS2, Line 634: CHECK_OK(HostPort::ParseStrings(FLAGS_hive_metastore_addresses, 9083, )); 9083 is the default HMS port? Probably deserves more visibility than inlined in catalog_manager.cc. Maybe as a constant in hms_client.h? Also, could you add a comment justifying the CHECK_OK? It's guaranteed because of the gflag validator, right? http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/catalog_manager.cc@1066 PS2, Line 1066: if (hms_catalog_) { The order of operations in CatalogManager::Shutdown is tricky and has bitten us before in various ways. How did you arrive at this precise placement of stopping hms_catalog? What constraints are there (if any) about where it should happen? http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/catalog_manager.cc@1415 PS2, Line 1415: // Create the table in the HMS. It's worth noting what happens to this table if step e fails. http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/catalog_manager.cc@1598 PS2, Line 1598: if (hms_catalog_) { Why is it OK if this succeeds but step 3 fails? I can take my answer in the form of a code comment. http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/catalog_manager.cc@2092 PS2, Line 2092: Status s = hms_catalog_->AlterTable(table->id(), table_name, new_name, new_schema); Shouldn't some aspect of this be conditioned on one (or several) of has_foo_changes? For example, if neither the table name nor the schema have changed, why send the alter to HMS? http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.h File src/kudu/master/hms_catalog.h: http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.h@29 PS3, Line 29: #include "kudu/hms/hms_client.h" Seems like you may be able to forward declare HmsClient. http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.h@42 PS3, Line 42: struct Rpc { Maybe define this as a private nested class of HmsCatalog, to make it clear that public users of HmsCatalog aren't expected to use it? http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.h@51 PS3, Line 51: class HmsCatalog { Please doc the class and its methods. I'm especially interested in the synchronization, the reason for a dedicated thread, etc. http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.h@70 PS3, Line 70: Schema Can you get away with forward declaring Schema? http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.h@90 PS3, Line 90: // Only mutated by the worker thread. Maybe it shouldn't be a class member then? It could be declared on the stack in Run(). http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.h@91 PS3, Line 91: boost::optional client_; Worth a comment explaining why this is optional. http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc File src/kudu/master/hms_catalog.cc:
[kudu-CR] KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8312 to look at the new patch set (#3). Change subject: KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration .. KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration This commit adds a basic integration with the Hive Metastore, which can be enabled with a new flag: 'hive-metastore-addresses'. When the flag is provided, the Master will push DDL changes in its catalog to the Hive Metastore. In particular: - Create table, delete table, and alter table operations in the Kudu master will synchronously update the corresponding entry in the HMS. - New table and column names are required to be valid according to the Hive identifier rules, which are much stricter than Kudu's native identifier rules. WIP: This patch doesn't yet have sufficient error handling logic, nor test coverage of errors. Here are the test cases I intend to add to master_hms-itest before this is committed: - Rename a Kudu table to a new name which already has an entry in the HMS (should fail). - Rename a Kudu table when the original name doesn't have an entry in the HMS (should create an entry with the new name). - Rename a Kudu table when the HMS is unreachable (should fail). - Create a Kudu table with a name which already exists in the HMS (should fail). - Create a Kudu table when the HMS is unreachable (should fail). - Drop a Kudu table when the corresponding entry in the HMS does not exist (should succeed). - Drop a Kudu table when the HMS is unreachable (should fail?). Additionally, this patch as written overwrites the entire HMS table entry during table rename operations. This is bad because it will wipe out any stats that query engines like Impala have attached to the table. Change-Id: I1b8f360408b05d6ac4f7359a8ac0dd889b196baf --- M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/master_hms-itest.cc M src/kudu/master/CMakeLists.txt M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h A src/kudu/master/hms_catalog.cc A src/kudu/master/hms_catalog.h M src/kudu/master/master.proto M src/kudu/mini-cluster/external_mini_cluster.cc M src/kudu/util/net/net_util.cc M src/kudu/util/net/net_util.h 11 files changed, 649 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/8312/3 -- To view, visit http://gerrit.cloudera.org:8080/8312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1b8f360408b05d6ac4f7359a8ac0dd889b196baf Gerrit-Change-Number: 8312 Gerrit-PatchSet: 3 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8312 to look at the new patch set (#2). Change subject: KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration .. KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration This commit adds a basic integration with the Hive Metastore, which can be enabled with a new flag: 'hive-metastore-addresses'. When the flag is provided, the Master will push DDL changes in its catalog to the Hive Metastore. In particular: - Create table, delete table, and alter table operations in the Kudu master will synchronously update the corresponding entry in the HMS. - New table and column names are required to be valid according to the Hive identifier rules, which are much stricter than Kudu's native identifier rules. WIP: This patch doesn't yet have sufficient error handling logic, nor test coverage of errors. Here are the test cases I intend to add to master_hms-itest before this is committed: - Rename a Kudu table to a new name which already has an entry in the HMS (should fail). - Rename a Kudu table when the original name doesn't have an entry in the HMS (should create an entry with the new name). - Rename a Kudu table when the HMS is unreachable (should fail). - Create a Kudu table with a name which already exists in the HMS (should fail). - Create a Kudu table when the HMS is unreachable (should fail). - Drop a Kudu table when the corresponding entry in the HMS does not exist (should succeed). - Drop a Kudu table when the HMS is unreachable (should fail?). Additionally, this patch as written overwrites the entire HMS table entry during table rename operations. This is bad because it will wipe out any stats that query engines like Impala have attached to the table. Change-Id: I1b8f360408b05d6ac4f7359a8ac0dd889b196baf --- M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/master_hms-itest.cc M src/kudu/master/CMakeLists.txt M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h A src/kudu/master/hms_catalog.cc A src/kudu/master/hms_catalog.h M src/kudu/master/master.proto M src/kudu/mini-cluster/external_mini_cluster.cc M src/kudu/util/net/net_util.cc M src/kudu/util/net/net_util.h 11 files changed, 664 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/8312/2 -- To view, visit http://gerrit.cloudera.org:8080/8312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1b8f360408b05d6ac4f7359a8ac0dd889b196baf Gerrit-Change-Number: 8312 Gerrit-PatchSet: 2 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration
Dan Burkert has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8312 Change subject: KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration .. KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration This commit adds a basic integration with the Hive Metastore, which can be enabled with a new flag: 'hive-metastore-addresses'. When the flag is provided, the Master will push DDL changes in its catalog to the Hive Metastore. In particular: - Create table, delete table, and alter table operations in the Kudu master will synchronously update the corresponding entry in the HMS. - New table and column names are required to be valid according to the Hive identifier rules, which are much stricter than Kudu's native identifier rules. WIP: This patch doesn't yet have sufficient error handling logic, nor test coverage of errors. Here are the test cases I intend to add to master_hms-itest before this is committed: - Rename a Kudu table to a new name which already has an entry in the HMS (should fail). - Rename a Kudu table when the original name doesn't have an entry in the HMS (should create an entry with the new name). - Rename a Kudu table when the HMS is unreachable (should fail). - Create a Kudu table with a name which already exists in the HMS (should fail). - Create a Kudu table when the HMS is unreachable (should fail). - Drop a Kudu table when the corresponding entry in the HMS does not exist (should succeed). - Drop a Kudu table when the HMS is unreachable (should fail?). Additionally, this patch as written overwrites the entire HMS table entry during table rename operations. This is bad because it will wipe out any stats that query engines like Impala have attached to the table. Change-Id: I1b8f360408b05d6ac4f7359a8ac0dd889b196baf --- M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/master_hms-itest.cc M src/kudu/master/CMakeLists.txt M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h A src/kudu/master/hms_catalog.cc A src/kudu/master/hms_catalog.h M src/kudu/master/master.proto M src/kudu/mini-cluster/external_mini_cluster.cc M src/kudu/util/net/net_util.cc M src/kudu/util/net/net_util.h 11 files changed, 664 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/8312/1 -- To view, visit http://gerrit.cloudera.org:8080/8312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1b8f360408b05d6ac4f7359a8ac0dd889b196baf Gerrit-Change-Number: 8312 Gerrit-PatchSet: 1 Gerrit-Owner: Dan Burkert