Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22602 )

Change subject: KUDU-3649: Fix hms_notification_log_event_id_ not being loaded 
correctly
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/22602/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22602/1//COMMIT_MSG@16
PS1, Line 16: It's value remains the default (-1), leading kudu-master to 
polling all
            : notification events from Hive Metastore.
With or without this patch, when HMS notification is enabled and system catalog 
starts first time, hms_notification_log_event_id_ is always -1 because nothing 
was stored in the system catalog table.


http://gerrit.cloudera.org:8080/#/c/22602/1//COMMIT_MSG@27
PS1, Line 27: Tests
            :  - Verified in a local build that hms_notification_log_event_id_ 
is
            :    loaded as expected.
As you understand, this doesn't count, especially if you claim there was some 
sort of race: it might be loading as expected once (BTW, what was it? -1, as 
before, right), but next time you'd see the unexpected behavior.  There should 
be a proper automated test for this.

BTW, if you tried running with DEBUG bits, you'd see it crashing.  Many 
pre-commit tests for this patch are failing due to DCHECK() assertions 
triggered.  For example, take a look at ToolTest.TestHmsList scenario in 
master_hms-itest.


http://gerrit.cloudera.org:8080/#/c/22602/1/src/kudu/hms/hms_client.cc
File src/kudu/hms/hms_client.cc:

http://gerrit.cloudera.org:8080/#/c/22602/1/src/kudu/hms/hms_client.cc@368
PS1, Line 368: get $0
nit: 'get $0' -->
get up to $0'


http://gerrit.cloudera.org:8080/#/c/22602/1/src/kudu/hms/hms_client.cc@375
PS1, Line 375:                  Substitute("failed to get $0 Hive Metastore 
events since ID $1",
             :                             max_events, last_event_id));
This new message is now a bit misleading: it might be interpreted as if it was 
an attempt to fetch N events, and it failed to fetch so many.  I guess 
'max_events' isn't relevant here.

Maybe, change the message to something like:
  "failed to get Hive Metastore events since ID $0" ?


http://gerrit.cloudera.org:8080/#/c/22602/1/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/22602/1/src/kudu/master/catalog_manager.cc@1526
PS1, Line 1526:     if (hms::HmsCatalog::IsEnabled()) {
This does nothing to prevent a logical race w.r.t. calling  
HmsNotificationLogListenerTask::Poll() which fetches notification events from 
HMS and loading information on last seen event ID from the system catalog.

So, the race is still there, this patch is not a fix.  Even more: with this 
patch the code is now crashing with DCHECK() assertions triggered in many other 
HMS-related tests -- take a closer look at the pre-commit tests.

I think a proper fix might be simply adding a call to 
InitLatestNotificationLogEventId() into CatalogManager::Init() under the 'if 
(hms::HmsCatalog::IsEnabled()) { ... }' clause, just under the 
'leader_lock_'-based guard (i.e. into catalog_manager.cc, line 1054).



--
To view, visit http://gerrit.cloudera.org:8080/22602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3678692e0df0311c6831bcaf2786f657a2fe5625
Gerrit-Change-Number: 22602
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Mon, 10 Mar 2025 22:52:18 +0000
Gerrit-HasComments: Yes

Reply via email to