Hello Riza Suminto, Noemi Pap-Takacs, Michael Smith, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/22873
to look at the new patch set (#7).
Change subject: IMPALA-14014: Fix COMPUTE STATS with TABLESAMPLE clause
......................................................................
IMPALA-14014: Fix COMPUTE STATS with TABLESAMPLE clause
COMPUTE STATS with TABLESAMPLE clause did a full scan on Iceberg
tables since IMPALA-13737, because before this patch ComputeStatsStmt
used FeFsTable.Utils.getFilesSample() which only works correctly on
FS tables that have the file descriptors loaded. Since IMPALA-13737
the internal FS table of an Iceberg table doesn't have file descriptor
information, therefore FeFsTable.Utils.getFilesSample() returned an
empty map which turned off table sampling for COMPUTE STATS.
We did not have proper testing for COMPUTE STATS with table sampling
therefore we did not catch the regression.
This patch adds proper table sampling logic for Iceberg tables that
can be used for COMPUTE STATS. The algorithm previously found in
IcebergScanNode.getFilesSample() has been moved to
FeIcebergTable.Utils.getFilesSample().
Testing
* added e2e tests
Change-Id: Ie59d5fc1374ab69209a74f2488bcb9a7d510b782
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge-insert-only.test
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-resources.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
A
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-compute-stats-table-sampling.test
M tests/query_test/test_iceberg.py
16 files changed, 538 insertions(+), 157 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/22873/7
--
To view, visit http://gerrit.cloudera.org:8080/22873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie59d5fc1374ab69209a74f2488bcb9a7d510b782
Gerrit-Change-Number: 22873
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>