Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/22250 )
Change subject: IMPALA-12785: Add commands to control event-processor status ...................................................................... Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/22250/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22250/8//COMMIT_MSG@10 PS8, Line 10: EventProcessor. E.g. to pause the EventProcessor: : impala-shell> :event_processor('pause'); : to resume the EventProcessor: : impala-shell> :event_processor('start'); > Why not implement this as two separate command: event_processor_pause and e I thought about this approach and the current approach is more compatible. When we support new actions, we just need code changes in catalogd side. Coordinator in an old version can still use the new action (since it's just a string) to control catalogd of a new version. http://gerrit.cloudera.org:8080/#/c/22250/8//COMMIT_MSG@20 PS8, Line 20: pause, start, status > Do we have server-side validation for this possible strings? Yeah, it's validated in catalogd side, in CatalogServiceCatalog#setEventProcessorStatus(). http://gerrit.cloudera.org:8080/#/c/22250/8//COMMIT_MSG@27 PS8, Line 27: LastSyncedEventId: 34489. LatestEventId: 34489. > Can you document in commit message what is the difference between LastSynce Done. The output of all actions will have these status info. http://gerrit.cloudera.org:8080/#/c/22250/8//COMMIT_MSG@34 PS8, Line 34: Note that resuming EventProcessor back to a previous event id is only : allowed when it's not in the ACTIVE state. The restriction comes from the existing codes of MetastoreEventsProcessor#start(long). Let me move this sentense above. This patch targets to expose the control to users so I try not changing MetastoreEventsProcessor. We can investigate the restriction and see if we want to relax it. > Can :event_processor('status'); fail? Yeah, e.g. if coordinator can't connect to catalogd. http://gerrit.cloudera.org:8080/#/c/22250/8/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/22250/8/common/thrift/CatalogService.thrift@758 PS8, Line 758: required string action > Document what is the possible actions are. The possible actions are Java enum values of EventProcessorCmdType. Try to avoid using thrift enum to avoid incompatible thrift changes in the future. If we use thrift enum, when we support new actions in the future, the coordinator in the old version can't send the request to catalogd in the new version. Note that we support sharing catalogd across several Impala clusters (IMPALA-13208). It's possible that the Impala clusters are in different version (e.g. upgrade blocked by some reasons). So we try to provide maximum compatibility between catalogd and coordinators. http://gerrit.cloudera.org:8080/#/c/22250/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/22250/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3802 PS8, Line 3802: String err = "EventProcessor is active. Failed to set last synced event id " + > What happens if the user pauses EP and then sets it to an already processed That's possible. So this is a command that only admins can run. For the concern, I think admins should be careful to use this command, which is the same as other commands like restarting a server. EventProcessor is a thread in catalogd, the commands provide the ability to restart it individually. If somehow admins messed up things in catalogd, they can only restart it to recover. But that's better than having an only choice to restart catalogd. http://gerrit.cloudera.org:8080/#/c/22250/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3816 PS8, Line 3816: ep.start(params.getEvent_id()); > I think it's still possible at this point that the status is already ACTIVE Yeah. When there is a big lag in event processing, admins can manually invalidate/refresh all (or some critical) tables to resolve the stale metadata. HMS events generated before that can be ignored. So admins can use this command to let EventProcessor skip them. I'll mention this in the commit message. http://gerrit.cloudera.org:8080/#/c/22250/8/tests/metadata/test_event_processing.py File tests/metadata/test_event_processing.py: http://gerrit.cloudera.org:8080/#/c/22250/8/tests/metadata/test_event_processing.py@272 PS8, Line 272: # 5. Test restarting on a future event id > Maybe it would be a good idea to put this subcase last so that the skipped That's a good point. I actually forgot this should be the last case.. http://gerrit.cloudera.org:8080/#/c/22250/8/tests/metadata/test_event_processing.py@289 PS8, Line 289: > flake8: W292 no newline at end of file Done -- To view, visit http://gerrit.cloudera.org:8080/22250 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5a19f67264cfe06a1819a22c0c4f0cf174c9b958 Gerrit-Change-Number: 22250 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Thu, 23 Jan 2025 03:25:19 +0000 Gerrit-HasComments: Yes
