[Impala-ASF-CR] IMPALA-12785: Add commands to control event-processor status

Quanlong Huang (Code Review) Wed, 22 Jan 2025 19:25:32 -0800

Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22250 )

Change subject: IMPALA-12785: Add commands to control event-processor status
......................................................................

Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/22250/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22250/8//COMMIT_MSG@10
PS8, Line 10: EventProcessor. E.g. to pause the EventProcessor:
: impala-shell> :event_processor('pause');
: to resume the EventProcessor:
: impala-shell> :event_processor('start');
> Why not implement this as two separate command: event_processor_pause and e
I thought about this approach and the current approach is more compatible. When
we support new actions, we just need code changes in catalogd side. Coordinator
in an old version can still use the new action (since it's just a string) to
control catalogd of a new version.

http://gerrit.cloudera.org:8080/#/c/22250/8//COMMIT_MSG@20
PS8, Line 20: pause, start, status
> Do we have server-side validation for this possible strings?
Yeah, it's validated in catalogd side, in
CatalogServiceCatalog#setEventProcessorStatus().

http://gerrit.cloudera.org:8080/#/c/22250/8//COMMIT_MSG@27
PS8, Line 27: LastSyncedEventId: 34489. LatestEventId: 34489.
> Can you document in commit message what is the difference between LastSynce
Done. The output of all actions will have these status info.

http://gerrit.cloudera.org:8080/#/c/22250/8//COMMIT_MSG@34
PS8, Line 34: Note that resuming EventProcessor back to a previous event id is
only
: allowed when it's not in the ACTIVE state.
The restriction comes from the existing codes of
MetastoreEventsProcessor#start(long). Let me move this sentense above.
This patch targets to expose the control to users so I try not changing
MetastoreEventsProcessor. We can investigate the restriction and see if we want
to relax it.

> Can :event_processor('status'); fail?

Yeah, e.g. if coordinator can't connect to catalogd.

http://gerrit.cloudera.org:8080/#/c/22250/8/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/22250/8/common/thrift/CatalogService.thrift@758
PS8, Line 758: required string action
> Document what is the possible actions are.
The possible actions are Java enum values of EventProcessorCmdType. Try to
avoid using thrift enum to avoid incompatible thrift changes in the future. If
we use thrift enum, when we support new actions in the future, the coordinator
in the old version can't send the request to catalogd in the new version. Note
that we support sharing catalogd across several Impala clusters (IMPALA-13208).
It's possible that the Impala clusters are in different version (e.g. upgrade
blocked by some reasons). So we try to provide maximum compatibility between
catalogd and coordinators.

http://gerrit.cloudera.org:8080/#/c/22250/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/22250/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3802
PS8, Line 3802: String err = "EventProcessor is active. Failed to set
last synced event id " +
> What happens if the user pauses EP and then sets it to an already processed
That's possible. So this is a command that only admins can run. For the
concern, I think admins should be careful to use this command, which is the
same as other commands like restarting a server. EventProcessor is a thread in
catalogd, the commands provide the ability to restart it individually. If
somehow admins messed up things in catalogd, they can only restart it to
recover. But that's better than having an only choice to restart catalogd.

http://gerrit.cloudera.org:8080/#/c/22250/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3816
PS8, Line 3816: ep.start(params.getEvent_id());
> I think it's still possible at this point that the status is already ACTIVE
Yeah. When there is a big lag in event processing, admins can manually
invalidate/refresh all (or some critical) tables to resolve the stale metadata.
HMS events generated before that can be ignored. So admins can use this command
to let EventProcessor skip them. I'll mention this in the commit message.

http://gerrit.cloudera.org:8080/#/c/22250/8/tests/metadata/test_event_processing.py
File tests/metadata/test_event_processing.py:

http://gerrit.cloudera.org:8080/#/c/22250/8/tests/metadata/test_event_processing.py@272
PS8, Line 272: # 5. Test restarting on a future event id
> Maybe it would be a good idea to put this subcase last so that the skipped
That's a good point. I actually forgot this should be the last case..

http://gerrit.cloudera.org:8080/#/c/22250/8/tests/metadata/test_event_processing.py@289
PS8, Line 289:
> flake8: W292 no newline at end of file
Done

--
To view, visit http://gerrit.cloudera.org:8080/22250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5a19f67264cfe06a1819a22c0c4f0cf174c9b958
Gerrit-Change-Number: 22250
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Comment-Date: Thu, 23 Jan 2025 03:25:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12785: Add commands to control event-processor status

Reply via email to