Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 )
Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables ...................................................................... Patch Set 1: (14 comments) http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift File common/thrift/BackendGflags.thrift: http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift@219 PS1, Line 219: 97: required bool incremental_refresh_acid nit: rename it to hms_event_incremental_refresh or hms_event_incremental_refresh_transactional_table Also where are we setting the default value of this flag? http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2582 PS1, Line 2582: List<TPartitionKeyValue> tPartSpec, long writeId, boolean loadFileMetadata, I feel it would be cleaner if we don't add writeId and eventId as parameters here and instead call this method only if these checks pass in the caller. Otherwise the caller of this method would need to know writeId and eventId before calling this method which does not seem right. We can add loadFileMetadata here though. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java File fe/src/main/java/org/apache/impala/catalog/TableWriteId.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java@29 PS1, Line 29: this.tableName = TableName.parse(tableName); We are probably expecting tableName to be fullyQualified but not enforcing that check here. Instead of TableName, we can store String DbName, String tableName separately. . http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@20 PS1, Line 20: import com.amazonaws.services.servicequotas.model.MetricInfo; nit: remove it if not being used? http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@184 PS1, Line 184: switch (metastoreEventType) { this seems incomplete. Are we missing anything here? We didn't create objs for ALLOC_WRITE_ID_EVENT, COMMIT_TXN and ABORT_TXN anywhere in this method. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1458 PS1, Line 1458: public static class AcidAddPartitionEvent extends AddPartitionEvent { 1. Where is AcidAddPartitionEvent getting initialized? 2. Instead of creating a new event i.e AcidAddPartitionEvent, we can modify the process() logic in AddPartitionEvent itself based on the incremental refresh flag http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1717 PS1, Line 1717: if (tbl_.getCreateEventId() > eventId_) { IMO, there is no need to acquire read lock on table to read createEventId since it is set only once when creating table in db. Instead, we can make createEventId member variable in table as volatile to make sure we are always reading most recent value. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1735 PS1, Line 1735: if (msTbl_ == null) { can we move the check tbl_.getCreateEventId() > eventId_ here ? As msTbl_ is not being used anywhere else, better to avoid it. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1738 PS1, Line 1738: List<Long> writeIds = txnToWriteIdList_.stream() txnToWriteIdList_ does not seem to be set anywhere in the event. Where are we reading its values from? http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1786 PS1, Line 1786: throw new CatalogException("Failed to get write event infos", e); nit: Add txnId in the message? http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1789 PS1, Line 1789: commitTxnMessage_.addWriteEventInfo(writeEventInfoList); Didn't understand the purpose of this. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1793 PS1, Line 1793: throw new CatalogException("Commit event processing error", e); nit: add more info like txnId etc. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1800 PS1, Line 1800: Set<TableWriteId> tableWriteIds = catalog_.getWriteIds(txnId_); if the tableWriteIds don't exist in catalog cache (say because of cache eviction etc), we should fetch it from HMS. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1861 PS1, Line 1861: Set<TableWriteId> tableWriteIds = catalog_.getWriteIds(txnId_); Same as my previous comment in CommitTxnEvent, if txnId does not exist in catalog cache, we should fetch it from HMS. -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 1 Gerrit-Owner: Yu-Wen Lai <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sourabh Goyal <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Yu-Wen Lai <[email protected]> Gerrit-Comment-Date: Thu, 23 Sep 2021 00:48:53 +0000 Gerrit-HasComments: Yes
