Hello Quanlong Huang, Jason Fehr, Zoltan Borok-Nagy, Noemi Pap-Takacs, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24049

to look at the new patch set (#14).

Change subject: WIP IMPALA-14805: LocalIcebergTable loads files in coordinator
......................................................................

WIP IMPALA-14805: LocalIcebergTable loads files in coordinator

If load_iceberg_files_in_coordinator=true:
- catalogd will skip listing files while loading
  Iceberg tables
- coordinator will load the files instead and cache
  them

load_iceberg_files_in_coordinator is an experimental flag
and false by default. Most tests pass with it, but some
Iceberg DMLs still use the file list in the catalog.
IMPALA-15113 tracks fixing these.
Note that load_iceberg_files_in_coordinator=true does not
work with legacy catalog mode.

Instead of caching TPartialTableInfo this solution caches
IcebergFileContentStore + hostIndex + partition stats.
This would be suitable for REST catalog too if the key
contained snapshot ID instead of catalog version.

Incremental table loading is implemented by storing a
weak reference to the last used IcebergFileContentStore for
each table in the cache. When a new IcebergFileContentStore
is requested for the table, this weak pointer is looked
up and if found, the old partition and files lists are
reused, similarly to existing implementation in catalogd.

Examples are for 1M files, 25K partitions Iceberg table
on my dev machine.

Pros of load_iceberg_files_in_coordinator=true:
- File descs are not transferred in getPartialCatalogObject RPC.
- Less getPartialCatalogObject to load a table (4->2).
- Probably needs to worry less about inconsistant metadata
  exceptions, as the old file list remains loadable even after
  catalogd updated to a newer version.

Cons:
- Multiple coordinators will all load files, increasing
  Namenode pressure.

Testing:
- kept load_iceberg_files_in_coordinator's default as false
  as some EE tests were not passing
- added custom cluster test to run a subset of Iceberg tests
  with load_iceberg_files_in_coordinator=true

Change-Id: I6732af76a2e040fa57e39260302951466037b934
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/IcebergMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProviderDecorator.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A tests/custom_cluster/test_iceberg_load_on_coordinator.py
12 files changed, 285 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/24049/14
--
To view, visit http://gerrit.cloudera.org:8080/24049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6732af76a2e040fa57e39260302951466037b934
Gerrit-Change-Number: 24049
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to