Mihaly Szjatinya has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/24041


Change subject: IMPALA-14583: Support partial RPC dispatch for Iceberg tables
......................................................................

IMPALA-14583: Support partial RPC dispatch for Iceberg tables

This patch extends IMPALA-11402 to support partial RPC dispatch for
Iceberg tables in local catalog mode. IMPALA-11402 added support for
HDFS partitioned tables where catalogd can truncate the response of
getPartialCatalogObject at partition boundaries when the file count
exceeds catalog_partial_fetch_max_files.

For Iceberg tables, the file list is not organized by partitions but
stored as a flat list of data and delete files. This patch implements
offset-based pagination to allow catalogd to truncate the response at
any point in the file list, not just at partition boundaries.

Implementation details:
- Added iceberg_file_offset field to TTableInfoSelector thrift struct
- IcebergContentFileStore.toThriftPartial() supports pagination with
  offset and limit parameters
- IcebergContentFileStore uses a reverse lookup table
  (icebergFileOffsetToContentFile_) for efficient offset-based access to
  files
- IcebergTable.getPartialInfo() enforces the file limit configured by
  catalog_partial_fetch_max_files (reusing the flag from IMPALA-11402)
- CatalogdMetaProvider.loadIcebergTableWithRetry() implements the retry
  loop on the coordinator side, sending follow-up requests with
  incremented offsets until all files are fetched
- Coordinator detects catalog version changes between requests and
  throws InconsistentMetadataFetchException for query replanning

Key differences from IMPALA-11402:
- Offset-based pagination instead of partition-based (can split
  anywhere)
- Single flat file list instead of per-partition file lists
- Works with both data files and delete files (Iceberg v2)

Tests:
- Added two custom-cluster tests in TestAllowIncompleteData:
  * test_incomplete_iceberg_file_list: 150 data files with limit=100
  * test_iceberg_with_delete_files: 60+ data+delete files with limit=50
- Both tests verify partial fetch across multiple requests and proper
  log messages for truncation warnings and request counts

Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb
---
M be/src/catalog/catalog-server.h
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M tests/custom_cluster/test_local_catalog.py
7 files changed, 372 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/24041/1
--
To view, visit http://gerrit.cloudera.org:8080/24041
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb
Gerrit-Change-Number: 24041
Gerrit-PatchSet: 1
Gerrit-Owner: Mihaly Szjatinya <[email protected]>

Reply via email to