Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21383 )
Change subject: IMPALA-13040: Add waiting mechanism in UpdateFilterFromRemote ...................................................................... Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/21383/1/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/21383/1/be/src/runtime/query-exec-mgr.cc@85 PS1, Line 85: {FLAGS_krpc_port == 27000 ? "coordinator" : "executor"})); > This seems like an odd way to check when is_coordinator and is_executor exi This test is tricky because: - No fragment instances nor RuntimeFilterBank active until QueryState pass initialization. - In 3 nodes impala minicluster, all of them are is_coordinator and is_executor. Only our py.test code enforce that query execution of EE test go to the first impalad, that is the one with krpc_port=27000 - The test scenario ideally just want to delay the randomly-selected intermediate aggregator node, but we won't be able to determine if this impalad is it or not until QueryState is initialized. That is why SLEEP is injected for all but first impalad. - The test rely on JOIN BUILD assigned in first impalad (27000) to send UpdateFilterFromRemote to either the second or the third that is selected as filter aggregator. This is only possible if this DebugAction does not trigger in the first impalad. But even so, I still find undeterminism happen due to data partitioning schedule. That test scenario that I mention about only work if the build scanner fragment and join build fragment both colocated at the first impalad and build scanner does not exchange to the other two impalads. So far, I have not found any way to make scheduling and exchange direction more deterministic. http://gerrit.cloudera.org:8080/#/c/21383/1/be/src/service/data-stream-service.cc File be/src/service/data-stream-service.cc: http://gerrit.cloudera.org:8080/#/c/21383/1/be/src/service/data-stream-service.cc@126 PS1, Line 126: const UpdateFilterParamsPB* req, UpdateFilterResultPB* resp, RpcContext* context) { > What thread does this run in? Do we need to worry about it blocking a queue It share the same RPC queue with TransmitData (KRPC exchanges). There is a possibility of blocked queue, just as it does with TransmitData. http://gerrit.cloudera.org:8080/#/c/21383/1/be/src/service/data-stream-service.cc@166 PS1, Line 166: PrintId(ProtoToQueryId(req->query_id())), query_found ? "has done" : "not found", > I'm not clear what "Query State for query_id=... has done after N ms" would Ack. "no longer running" sound better. http://gerrit.cloudera.org:8080/#/c/21383/1/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/21383/1/tests/query_test/test_runtime_filters.py@555 PS1, Line 555: """Test that distributed runtime filter aggregation still works > Can we also test for the log message? assert_impalad_log_contains would wor The longer SLEEP debug action should log error. I'll see if I can add that. -- To view, visit http://gerrit.cloudera.org:8080/21383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I156d1f0c694b91ba34be70bc53ae9bacf924b3b9 Gerrit-Change-Number: 21383 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Comment-Date: Wed, 01 May 2024 23:30:51 +0000 Gerrit-HasComments: Yes