Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16159


Change subject: IMPALA-3127: Support incremental metadata updates in partition 
level
......................................................................

IMPALA-3127: Support incremental metadata updates in partition level

Currently, partitions are tightly integrated into the HdfsTable objects.
Catalogd has to transmit the entire table metadata even when few
partitions change. This is a waste of resources and can lead to OOM in
transmitting large tables due to the 2GB JVM array limit.

This patch makes HdfsPartition extend CatalogObject so the catalogd can
send partitions as individual catalog objects. Consequently, table
objects in the catalog topic update can have empty partition maps, which
reduces the thrift object size for large tables. The catalog object key
of HdfsPartition consists of db name, table name and partition name.

In "full" topic mode (catalog_topic_mode=full), catalogd only sends
changed partitions with their latest table states. The latest table
states are table objects with the partition map being replaced with the
partition list, i.e. empty partition map and non-empty partition list.
Legacy coordinators use the partition list to pick up existing
(unchanged) partitions from the existing table object and new
partitions in the catalog update.

Currently, partition instances are immutable - all partition
modifications are implemented by deleting the old instance and adding a
new one with a new partition id. Since partition ids are generated by a
global counter. Newer partition instances will have larger partition
ids. So catalogd maintains a watermark for each table as the max sent
partition id. Partition instances with ids larger than this are new
partitions that should be sent in the next catalog update. For the
deleted partition instances, they are kept in a set for each table until
the next catalog update. If there are no updates on the same partition
name, catalogd will send deletion on the partition.

For dropped or invalidated tables, catalogd will still send deletions on
their partitions. Although they are not used in coordinators
(coordinators delete the partitions when they delete the table
instances), they help in avoiding topic entry leak in the statestore
catalog topic.

In "minimal" topic mode (catalog_topic_mode=minimal), catalogd only
sends invalidations on tables and deleted partition instances. Each
partition instance is identified by its partition id. LocalCatalog
coordinators use the partition invalidations to evict stale partitions
in time. For instance, let's say partition(year=2010) is updated in
catalogd. This is done by deleting the old partition instance
partition(id=0, year=2010) and adding a new partition instance
partition(id=1, year=2010). Catalogd will send invalidations on the
table and partition instance with id=0, but not the one with id=1. A
LocalCatalog coordinator will invalidate the partition instance(id=0) if
it's in the cache. If the partition instance(id=1) is cached, it's
already the latest version since partition instances are immutable. So
we don't need to invalidate it.

Tests
 - Run exhaustive tests.
 - Run exhaustive test_ddl.py in LocalCatalog mode.
 - (TODO) Add tests on long statestore update frequency that several
   table changes are sent in the same topic update.
 - (TODO) Add tests on straggler coordinators that need to process
   several incremental updates at once.
 - (TODO) Add tests on no statestore topic entry leak.

Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
---
M be/src/catalog/catalog-util.cc
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
12 files changed, 443 insertions(+), 52 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/16159/1
--
To view, visit http://gerrit.cloudera.org:8080/16159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Gerrit-Change-Number: 16159
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <[email protected]>

Reply via email to