Hello Zoltan Borok-Nagy, Noemi Pap-Takacs, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24049

to look at the new patch set (#7).

Change subject: POC IMPALA-14805: LocalIcebergTable loads files in coordinator
......................................................................

POC IMPALA-14805: LocalIcebergTable loads files in coordinator

If load_iceberg_files_in_coordinator=true:
- catalogd will skip listing files while loading
  Iceberg tables
- coordinator will load the files instead and cache
  them

This assumes that local catalog mode is used, I didn't
test without it.

Instead of caching TPartialTableInfo this solution caches
IcebergFileContentStore + hostIndex + partition stats.
This would be suitable for REST catalog too if the key
contained snapshot ID instead of catalog version.

Incremental table loading is implemented by storing a
weak pointer to the last used IcebergFileContentStore for
each table in the cache. When a new IcebergFileContentStore
is requested for the table, this weak pointer is looked
up and if found, the old partition and files lists are
reused, similarly to existing implementation in catalogd.

Examples are for 1M files, 25K partitions Iceberg table
on my dev machine.

Pros:
- File descs are not transferred in getPartialCatalogObject RPC
- less getPartialCatalogObject to load a table (5->3)
- Size of cache objects seem to decrease (based on JAMM):
  Before:
   543MB org.apache.impala.thrift.TPartialTableInfo
   0.5MB iceberg.BaseTable
  After:
   435MB org.apache.impala.catalog.IcebergFileContentStore
   0.5MB org.apache.iceberg.BaseTable
   <10KB org.apache.impala.thrift.TPartialTableInfo
- Plans look faster due to skipping construction of
  IcebergFileContentStore:
  DESCRIBE t: 1s->10ms
  EXPLAIN SELECT * FROM t : 3s->1.5s
  EXPLAIN SELECT * FROM t WHERE part_col=100: 1s->0.3s
- Probably needs to worry less about inconsistant metadata
  exceptions, as the old file list remains loadable even after
  catalogd updated to a newer version.

Cons:
- Multiple coordinators will all load files, increasing
  Namenode pressure.

Change-Id: I6732af76a2e040fa57e39260302951466037b934
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/IcebergMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProviderDecorator.java
M fe/src/main/java/org/apache/impala/catalog/local/MultiMetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/local/MetaProviderDecoratorTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test
M tests/webserver/test_web_pages.py
21 files changed, 366 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/24049/7
--
To view, visit http://gerrit.cloudera.org:8080/24049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6732af76a2e040fa57e39260302951466037b934
Gerrit-Change-Number: 24049
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to