Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18866 )
Change subject: [master] KUDU-3322 / KUDU-3319 HMS event logging ...................................................................... Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/18866/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18866/2//COMMIT_MSG@15 PS2, Line 15: Was hesitant on crashing the : master and chose to log an ERROR as in the case an out of order event : id received from HMS might lead to unnecessary crash of the Kudu : master What parts of the system are still functional after this happening? I guess DDL operations do not work after this, but reading/writing data from existing tables is still possible, right? Related question: what is the way to recover from this? Should there be a way to reset the latest HMS event identifier stored in the catalog in such case? http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/hms/hms_catalog.h File src/kudu/hms/hms_catalog.h: http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/hms/hms_catalog.h@147 PS2, Line 147: Status GetCurrentNotificationEventId(int64_t* event_id) WARN_UNUSED_RESULT; nit: could this method be constant? http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/hms/mini_hms.h File src/kudu/hms/mini_hms.h: http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/hms/mini_hms.h@75 PS2, Line 75: // Delete the HMS database directory nit: follow the style of the comments in the rest of this file -- add a period in the end. http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/hms/mini_hms.cc File src/kudu/hms/mini_hms.cc: http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/hms/mini_hms.cc@188 PS2, Line 188: RETURN_NOT_OK(Env::Default()->DeleteRecursively(JoinPathSegments(data_root_, metadb_subdir_))); : return Status::OK(); nit: could be more concise return Env::Default()->DeleteRecursively(...); http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/integration-tests/master_hms-itest.cc File src/kudu/integration-tests/master_hms-itest.cc: http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/integration-tests/master_hms-itest.cc@88 PS2, Line 88: opts.extra_master_flags.emplace_back("--logtostderr=false"); Maybe, create its own class to change this behavior just for the sake of a single test scenario? My concern is that with not writing into stderr it might be harder to troubleshoot test failures when tests are run via dist-test. http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/integration-tests/master_hms-itest.cc@240 PS2, Line 240: rename it a couple of times to generate 3 Why is it needed? Why not to go with just 'default.a' -- that would be one HMS event already, right? Would be great to add comment to clarify. http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/integration-tests/master_hms-itest.cc@263 PS2, Line 263: line.find("latest event ID in HMS(0)") != std::string::npos Why not to search for the whole error message as at line 264? http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/integration-tests/master_hms-itest.cc@271 PS2, Line 271: nit: the indent seems to be messed up http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/master/catalog_manager.cc@1454 PS2, Line 1454: nit: why is this change? I guess the 'if ...' line should have been shifted instead http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/master/catalog_manager.cc@1457 PS2, Line 1457: LOG(INFO) << "Last processed Hive Metastore notification event ID: " << hms_notification_log_event_id_; Maybe, move this into CatalogManager::InitLatestNotificationLogEventId() ? Also, isn't this line too long? http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/master/hms_notification_log_listener.cc File src/kudu/master/hms_notification_log_listener.cc: http://gerrit.cloudera.org:8080/#/c/18866/2/src/kudu/master/hms_notification_log_listener.cc@295 PS2, Line 295: LOG(ERROR) << "Found the last seen event ID in the local Kudu master("<< processed_event_id << : ") to be greater than the latest event ID in HMS("<< event_id << : "). Was there any backup or restore done on HMS recently?"; nit: maybe, use Substitute() to make this more readable? -- To view, visit http://gerrit.cloudera.org:8080/18866 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I242c9cedf170ed621867b905f1cb1875347d5887 Gerrit-Change-Number: 18866 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 06 Sep 2022 23:13:31 +0000 Gerrit-HasComments: Yes
