Hello Jason Fehr, Surya Hebbar, Michael Smith, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24123
to look at the new patch set (#19).
Change subject: IMPALA-14796: Show effective runtime filter targets in profile
......................................................................
IMPALA-14796: Show effective runtime filter targets in profile
This patch adds an "Eff. Tgt. Node(s)" (Effective Target Node(s)) column
to the "Final filter table" in the query profile. This shows which scan
nodes actually had rows rejected by each runtime filter, distinguishing
filters that were effective from those that were applied but rejected no
data. E.g.
ID Src. Node Tgt. Node(s) Eff. Tgt. Node(s) Target type ...
--------------------------------------------------------------------
10 6 2 2 LOCAL ...
8 7 1 1 REMOTE ...
5 8 2 2 LOCAL ...
4 8 0 N REMOTE ...
2 9 0, 3 0, 3 REMOTE, REMOTE ...
0 10 4 4 REMOTE ...
In the above example, filter 4 has "N" in the "Eff. Tgt. Node(s)"
column, which means it doesn't filter out any rows, i.e. effective
target node is "None". All the other filters are effective.
Implementation
- In ScanNode::Close(), collect the effective runtime filter ids by
checking the "rejected" counters of all the FilterStats. These
counters correspond to "Files rejected", "RowGroups rejected", "Rows
rejected", "Splits rejected" in the query profile. If any of them is
non-zero, the filter has rejected some data so it's effective.
- Executor reports this info to coordinator via ReportExecStatus RPCs.
A list of (filter_id, scan_node_id) pairs is added in
ReportExecStatusRequestPB to carry this info.
- Coordinator aggregates the effective filter targets when processing
the status reports.
- In FilterDebugString(), add a column to show the node ids where the
runtime filter is effective.
Other minor changes
- In coordinator.cc, move the code of setting the "Final filter table"
from ReleaseExecResources() to ComputeQuerySummary() to ensure the
final status reports from backends all arrive.
- Removed temp_object_pool and temp_mem_tracker from
FilterDebugString() as they have been unused since commit a985e11.
- Replaced boost::lexical_cast<string> with std::to_string in
converting int to string which is more optimized.
- Sort node ids in "Tgt. Node(s)" and "Eff. Tgt. Node(s)" columns to
make the output consistent across different runs.
Limitation
- Kudu scanner doesn't expose metrics reflecting effect of individual
filters so we can't detect effective runtime filters on KuduScanNode.
Currently the "Eff. Tgt. Node(s)" column of them always has value "N"
(IMPALA-15002).
Tests
- Added e2e test for TPCH-Q5 where some filters are ineffective in
both profile modes configured by gen_experimental_profile.
- Added checks in runtime_filters.test for queries that have only one
runtime filter.
- Updated in_list_filters.test for the new column.
- Ran tests on both the original planner and the calcite planner.
Assisted-by: Claude Sonnet 4.5
Change-Id: Iccf4b87ac4579a70273f3306ec7b58850f06b17c
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M common/protobuf/control_service.proto
M testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
A testdata/workloads/tpch/queries/effective-runtime-filter.test
M tests/custom_cluster/test_observability.py
M tests/query_test/test_runtime_filters.py
16 files changed, 242 insertions(+), 29 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/24123/19
--
To view, visit http://gerrit.cloudera.org:8080/24123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iccf4b87ac4579a70273f3306ec7b58850f06b17c
Gerrit-Change-Number: 24123
Gerrit-PatchSet: 19
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>