Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21383 )

Change subject: IMPALA-13040: Add waiting mechanism in UpdateFilterFromRemote
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21383/1/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/21383/1/be/src/runtime/query-exec-mgr.cc@85
PS1, Line 85:       {FLAGS_krpc_port == 27000 ? "coordinator" : "executor"}));
> This seems like an odd way to check when is_coordinator and is_executor exi
This test is tricky because:
- No fragment instances nor RuntimeFilterBank active until QueryState pass 
initialization.
- In 3 nodes impala minicluster, all of them are is_coordinator and 
is_executor. Only our py.test code enforce that query execution of EE test go 
to the first impalad, that is the one with krpc_port=27000
- The test scenario ideally just want to delay the randomly-selected 
intermediate aggregator node, but we won't be able to determine if this impalad 
is it or not until QueryState is initialized. That is why SLEEP is injected for 
all but first impalad.
- The test rely on JOIN BUILD assigned in first impalad (27000) to send 
UpdateFilterFromRemote to either the second or the third that is selected as 
filter aggregator. This is only possible if this DebugAction does not trigger 
in  the first impalad.

But even so, I still find undeterminism happen due to data partitioning 
schedule. That test scenario that I mention about only work if the build 
scanner fragment and join build fragment both colocated at the first impalad 
and build scanner does not exchange to the other two impalads. So far, I have 
not found any way to make scheduling and exchange direction more deterministic.


http://gerrit.cloudera.org:8080/#/c/21383/1/be/src/service/data-stream-service.cc
File be/src/service/data-stream-service.cc:

http://gerrit.cloudera.org:8080/#/c/21383/1/be/src/service/data-stream-service.cc@126
PS1, Line 126:     const UpdateFilterParamsPB* req, UpdateFilterResultPB* resp, 
RpcContext* context) {
> What thread does this run in? Do we need to worry about it blocking a queue
It share the same RPC queue with TransmitData (KRPC exchanges). There is a 
possibility of blocked queue, just as it does with TransmitData.


http://gerrit.cloudera.org:8080/#/c/21383/1/be/src/service/data-stream-service.cc@166
PS1, Line 166:       PrintId(ProtoToQueryId(req->query_id())), query_found ? 
"has done" : "not found",
> I'm not clear what "Query State for query_id=... has done after N ms" would
Ack. "no longer running" sound better.


http://gerrit.cloudera.org:8080/#/c/21383/1/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/21383/1/tests/query_test/test_runtime_filters.py@555
PS1, Line 555:     """Test that distributed runtime filter aggregation still 
works
> Can we also test for the log message? assert_impalad_log_contains would wor
The longer SLEEP debug action should log error. I'll see if I can add that.



--
To view, visit http://gerrit.cloudera.org:8080/21383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I156d1f0c694b91ba34be70bc53ae9bacf924b3b9
Gerrit-Change-Number: 21383
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 May 2024 23:30:51 +0000
Gerrit-HasComments: Yes

Reply via email to