Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/22602 )
Change subject: KUDU-3649: Fix hms_notification_log_event_id_ not being loaded correctly ...................................................................... Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/22602/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22602/1//COMMIT_MSG@16 PS1, Line 16: It's value remains the default (-1), leading kudu-master to polling all : notification events from Hive Metastore. With or without this patch, when HMS notification is enabled and system catalog starts first time, hms_notification_log_event_id_ is always -1 because nothing was stored in the system catalog table. http://gerrit.cloudera.org:8080/#/c/22602/1//COMMIT_MSG@27 PS1, Line 27: Tests : - Verified in a local build that hms_notification_log_event_id_ is : loaded as expected. As you understand, this doesn't count, especially if you claim there was some sort of race: it might be loading as expected once (BTW, what was it? -1, as before, right), but next time you'd see the unexpected behavior. There should be a proper automated test for this. BTW, if you tried running with DEBUG bits, you'd see it crashing. Many pre-commit tests for this patch are failing due to DCHECK() assertions triggered. For example, take a look at ToolTest.TestHmsList scenario in master_hms-itest. http://gerrit.cloudera.org:8080/#/c/22602/1/src/kudu/hms/hms_client.cc File src/kudu/hms/hms_client.cc: http://gerrit.cloudera.org:8080/#/c/22602/1/src/kudu/hms/hms_client.cc@368 PS1, Line 368: get $0 nit: 'get $0' --> get up to $0' http://gerrit.cloudera.org:8080/#/c/22602/1/src/kudu/hms/hms_client.cc@375 PS1, Line 375: Substitute("failed to get $0 Hive Metastore events since ID $1", : max_events, last_event_id)); This new message is now a bit misleading: it might be interpreted as if it was an attempt to fetch N events, and it failed to fetch so many. I guess 'max_events' isn't relevant here. Maybe, change the message to something like: "failed to get Hive Metastore events since ID $0" ? http://gerrit.cloudera.org:8080/#/c/22602/1/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/22602/1/src/kudu/master/catalog_manager.cc@1526 PS1, Line 1526: if (hms::HmsCatalog::IsEnabled()) { This does nothing to prevent a logical race w.r.t. calling HmsNotificationLogListenerTask::Poll() which fetches notification events from HMS and loading information on last seen event ID from the system catalog. So, the race is still there, this patch is not a fix. Even more: with this patch the code is now crashing with DCHECK() assertions triggered in many other HMS-related tests -- take a closer look at the pre-commit tests. I think a proper fix might be simply adding a call to InitLatestNotificationLogEventId() into CatalogManager::Init() under the 'if (hms::HmsCatalog::IsEnabled()) { ... }' clause, just under the 'leader_lock_'-based guard (i.e. into catalog_manager.cc, line 1054). -- To view, visit http://gerrit.cloudera.org:8080/22602 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3678692e0df0311c6831bcaf2786f657a2fe5625 Gerrit-Change-Number: 22602 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Mon, 10 Mar 2025 22:52:18 +0000 Gerrit-HasComments: Yes
