Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21959


Change subject: IMPALA-13370: Read Puffin stats from metadata.json property if 
available
......................................................................

IMPALA-13370: Read Puffin stats from metadata.json property if available

When Trino writes Puffin stats for a column, it includes the NDV as a
property (with key "ndv") in the "statistics" section of the
metadata.json file, in addition to the Theta sketch in the Puffin file.
When we are only reading the stats and not writing/updating them, it is
enough to read this property if it is present.

After this change, Impala only opens and reads a Puffin stats file if it
contains stats for at least one column for which the "ndv" property is
not set in the metadata.json file.

Testing:
 - added a test in test_iceberg_with_puffin.py that verifies that the
   Puffin stats file is not read if the the metadata.json file contains
   the NDV property.

Change-Id: I5e92056ce97c4849742db6309562af3b575f647b
---
M fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java
M 
java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java
A testdata/ice_puffin/generated/corrupt_file_metadata_ndv_ok.stats
A testdata/ice_puffin/generated/metadata_ndv_ok_stats_file_corrupt.metadata.json
M tests/custom_cluster/test_iceberg_with_puffin.py
5 files changed, 406 insertions(+), 34 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/21959/1
--
To view, visit http://gerrit.cloudera.org:8080/21959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5e92056ce97c4849742db6309562af3b575f647b
Gerrit-Change-Number: 21959
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to