Hello Riza Suminto, Noemi Pap-Takacs, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/22873

to look at the new patch set (#6).

Change subject: IMPALA-14014: Fix COMPUTE STATS with TABLESAMPLE clause
......................................................................

IMPALA-14014: Fix COMPUTE STATS with TABLESAMPLE clause

COMPUTE STATS with TABLESAMPLE clause did a full scan on Iceberg
tables since IMPALA-13737, because before this patch ComputeStatsStmt
used FeFsTable.Utils.getFilesSample() which only works correctly on
FS tables that have the file descriptors loaded. Since IMPALA-13737
the internal FS table of an Iceberg table doesn't have file descriptor
information, therefore FeFsTable.Utils.getFilesSample() returned an
empty map which turned off table sampling for COMPUTE STATS.

We did not have proper testing for COMPUTE STATS with table sampling
therefore we did not catch the regression.

This patch adds proper table sampling logic for Iceberg tables that
can be used for COMPUTE STATS. The algorithm previously found in
IcebergScanNode.getFilesSample() has been moved to
FeIcebergTable.Utils.getFilesSample().

Testing
 * added e2e tests

Change-Id: Ie59d5fc1374ab69209a74f2488bcb9a7d510b782
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge-insert-only.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-resources.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-compute-stats-table-sampling.test
M tests/query_test/test_iceberg.py
15 files changed, 462 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/22873/6
--
To view, visit http://gerrit.cloudera.org:8080/22873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie59d5fc1374ab69209a74f2488bcb9a7d510b782
Gerrit-Change-Number: 22873
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to