Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20131 )
Change subject: IMPALA-12152: Add query options to wait for HMS events sync up ...................................................................... Patch Set 31: (12 comments) http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@7 PS30, Line 7: IMPALA-12152: Add query options to wait for HMS events sy > nit: IMPALA-12152: Add query options to wait for HMS events sync up Done http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@20 PS30, Line 20: SYNC_HMS_EVENTS_WAIT_TIME_S > nit: SYNC_HMS_EVENTS_WAIT_TIME_S Done. Currently, no limits. Users can set abitrary values. http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@22 PS30, Line 22: SYNC_HMS_EVENTS_STRICT_MODE > nit: SYNC_HMS_EVENTS_STRICT_MODE Done http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@37 PS30, Line 37: The wait time can : be set to the largest lag of event processing that has been observed in : the cluster. > How to observe this? can you mention it in commit message? Done. Usually users have some charts to track the lag. It's shown in the catalogd WebUI and metrics. http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@51 PS30, Line 51: Compilation: 1s321ms > "Failed" might be interpreted by user as mistake. "Why query still running Maybe "Continuing without syncing Metastore events" is better. We don't know if there are unprocessed events. We could be lucky that there are none. Using the new message when strict mode is off. http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@57 PS30, Line 57: : A ne > nit: is added to CatalogService API so that coordinator can wait Done http://gerrit.cloudera.org:8080/#/c/20131/30/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/20131/30/be/src/catalog/catalog-server.cc@465 PS30, Line 465: status = catalog_server_->catalog()->WaitForHmsEvent(req, &resp); > 1. Not wait indefinitely? It's a common problem for all catalog RPCs, e.g. execDdl, execResetMetadata, getPartialCatalogObject, etc. I think some work in this patch might help: https://gerrit.cloudera.org/c/21803/ . E.g. coordinator can send a cancel request to catalogd to cancel all threads of a query id, including this waiting thread in catalogd. > 2. Not filling up / holding resources of CatalogService API with multiple > sleeping RPC thread? That's a good point that we should take care of. I think we need some stress tests to measure the overhead of these. If we want to control the concurrency, we can add a semaphore like partialObjectFetchAccess_ in CatalogServiceCatalog (IMPALA-7626). Maybe sleeping threads won't consume too much resource. If it's OK so far, I'll file a follow-up JIRA for this. http://gerrit.cloudera.org:8080/#/c/20131/30/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20131/30/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@4232 PS30, Line 4232: req.timeout_s * 1000L > Should this also account for time spent sending TWaitForHmsEventRequest tho That's a fair point. The network latency might be trivial and can be ignored since this is suggested to be set to several seconds/minutes. However, once we have the blocking mechanism (like semaphore mentioned in another comment), the time between coordinator sending the request and catalogd goes into this method could be significant. We can consider it after that. Another point is that we suggest setting this timeout to the max lag of event processing. So here we get the latest event id and start waiting for it to be synced. If we account for the time coordinator sending the request, we need to consider the max time of that. Maybe we can do it in the same future patch. http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py@1415 PS30, Line 1415: > Add 'vector' parameter and create a new client based on vector.get_value('p Good point! I just realized there are some test dimensions for this class. I don't think we need to test on them. Moving this new test to tests/metadata/test_event_processing.py. The other reason is we don't need custom flags here. http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py@1523 PS30, Line 1523: > Add 'vector' parameter and create a new client based on vector.get_value('p Done. Try using the created client of the specific protocol. http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py@1530 PS30, Line 1530: > Don't check against QUERY_STATES directly. It will break this test if somed Done. I think this is a custom cluster test that runs serially so it's ok to use the existing clients. Any concerns on using them? http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py@1541 PS30, Line 1541: > Can you also validate if the duration is aligned with given sync_hms_events The query will fail immediately regardless of the wait time since catalogd found EventProcessor is disabled. We are currently lack of a test for a real timeout. Added it in the next patch set. -- To view, visit http://gerrit.cloudera.org:8080/20131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36ac941bb2c2217b09fcfa2eb567b011b38efa2a Gerrit-Change-Number: 20131 Gerrit-PatchSet: 31 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Tue, 21 Jan 2025 13:35:54 +0000 Gerrit-HasComments: Yes
