Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12591


Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates
......................................................................

IMPALA-7972 Detect self-events to avoid unnecessary invalidates

This patch adds support to detect self-generated events from catalog.
This is used to avoid unnecessary invalidates to the tables from such
self-events. Currently, alter_table, alter_partition, add_partition and
drop_partition event types can invalidate the table metadata.

Originally, we planned to have a global version number support from
metastore (see HIVE-21115). But since that is still not complete, we
rely on a combination of other identifiers to determine if a event is
self-generated or not. These self-event identifiers consists of values
from the table/partition parameters for transient_lastDDLTime, a uuid
and version number. The uuid is generated for each catalogservice when
it comes up and it adds it to the table/partition parameters with the
key "impala.CatalogServiceId". The catalog version number is added with
the key "impala.CatalogVersion". Since we want the metastore to update
the transient_lastDDLTime we remove this parameter before catalog
issues a alterTable or alterPartition DDL operation to metastore.

When a event is generated we fetch the values of these parameters from
event and catalog and compare them as folows:
1. If the transient_lastDDLTime of the table (partition in case of
partition events) from the event is strictly less than or greater than value of
transient_lastDDLTime in the parameters of corresponding catalog object
we can ignore the event or process the event respectively.
2. In case of transient_lastDDLTime is equal to value in catalog, we
rely on the serviceId and catalog version to resolve the conflict. if
the serviceId matches with the serviceId of catalog, the version number
is used to compare. If it doesn't match, the event is generated from
another catalog and event should be processed.

In case of drop_partition event, the partition object is not available
in the event. Hence we cannot determine if its a self-event. In such
cases currently we always issue a invalidate command. This is a known
limitation and will be improved in IMPALA-7973

Patch adds new tests to trigger alter table/partition DDLs from impala
and makes sure that the table is not invalidated.

Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
6 files changed, 1,021 insertions(+), 198 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/1
--
To view, visit http://gerrit.cloudera.org:8080/12591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353
Gerrit-Change-Number: 12591
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Bharath Krishna <bhar...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>

Reply via email to