Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21959
Change subject: IMPALA-13370: Read Puffin stats from metadata.json property if available ...................................................................... IMPALA-13370: Read Puffin stats from metadata.json property if available When Trino writes Puffin stats for a column, it includes the NDV as a property (with key "ndv") in the "statistics" section of the metadata.json file, in addition to the Theta sketch in the Puffin file. When we are only reading the stats and not writing/updating them, it is enough to read this property if it is present. After this change, Impala only opens and reads a Puffin stats file if it contains stats for at least one column for which the "ndv" property is not set in the metadata.json file. Testing: - added a test in test_iceberg_with_puffin.py that verifies that the Puffin stats file is not read if the the metadata.json file contains the NDV property. Change-Id: I5e92056ce97c4849742db6309562af3b575f647b --- M fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java M java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java A testdata/ice_puffin/generated/corrupt_file_metadata_ndv_ok.stats A testdata/ice_puffin/generated/metadata_ndv_ok_stats_file_corrupt.metadata.json M tests/custom_cluster/test_iceberg_with_puffin.py 5 files changed, 406 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/21959/1 -- To view, visit http://gerrit.cloudera.org:8080/21959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5e92056ce97c4849742db6309562af3b575f647b Gerrit-Change-Number: 21959 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
