Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16159 )
Change subject: IMPALA-3127: Support incremental metadata updates in partition level ...................................................................... Patch Set 3: (13 comments) Sorry for the delay in the review. I took another pass and overall the approach makes sense to me. I think it would be useful if we could have an exhaustive test (may be in a separate jira) to make sure that we are not leaking partitions in statestore. The test could add/drop partitions along with multiple add/invalidate/drop table commands and make sure that the number of partition keys in the statestore is as per our expectation. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@712 PS3, Line 712: if (!delete && obj.type == HDFS_PARTITION) return; Its unclear to me that when we generate the minimalObject when delete flag is false, we set the prev_id in the partObject.setId on line 746. However, that object is never sent as per this line. How do we invalidate the updated partitions when topicMode is minimal? Also, would be good to add a comment here (or somewhere else if more appropriate) explaining what we send in case of partitions in both v1 and v2 modes since it is not trivial to understand the subtle differences. For instance, as I understand: 1. In v1 mode (topicMode = full), we only send the partitionIds in the thrift table which represents the current list of the partitions. Additionally, for each newly added/removed partition we send a THdfsPartition in the same topic update. However, coordinators detect the removal of any partitions by absence of a id partitionIds in the table object. 2. In v2 mode (topicMode = minimal), local coordinators only load what they need and hence we only send deleted partitionIds. Updated partitions are also treated as a special case of deleted partitions by sending the previous partitionId for such partitions so that local coordinators invalidate them proactively. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@742 PS3, Line 742: if (isDelete) { : partObject.setId(obj.hdfs_partition.id); : } else if (obj.hdfs_partition.prev_id != HdfsPartition.INITIAL_PARTITION_ID - 1) { : // For updates, coordinators can invalidate the old partition instance. : partObject.setId(obj.hdfs_partition.prev_id); : } This looks a bit hacky to me. Do you think it would be more readable by adding a explicit field to THdfsPartition called prev_id or a boolean indicating that the id is previous id and let local coordinator decide how to process the partition. This implementation on the catalogd side assumes the knowledge of how local catalog invalidates stuff and I think it would cleaner to keep them independent as much as possible. If you decide to have prev_id field in the THdfsPartition you can keep a default value of field to -1 so that its not serialized unless its set. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1295 PS3, Line 1295: catalogTbl.setTable(((HdfsTable) tbl).toThriftWithPartitionIds()); wouldn't this line be called for both fullUpdate and a incremental update? The javadoc of the method toThriftWithPartitionIds suggests this is called for incremental updates which seems confusing. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1321 PS3, Line 1321: if (maxSentId < catalogPart.getHdfs_partition().getId()) { : maxSentId = catalogPart.getHdfs_partition().getId(); : } nit, perhaps this is more readable? maxSentId = Math.max(maxSentId, catalogPart.getHdfs_partition().getId()); http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1329 PS3, Line 1329: f (!ctx.isFullUpdate()) Can you clarify why this is needed only in case of incremental updates? What happens on the coordinators when a statestore is restarted and then it requests for a full-update? http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@90 PS3, Line 90: HdfsPartition I think it is worth documenting that even though this extends CatalogObjectImpl, we don't explicitly have a different catalog version for Partitions. All the partitions of a HdfsTable will have the same catalogVersion as its parent table. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@273 PS3, Line 273: cached nit, replace with "known" instead of "cached" since local coordinators don't necessarily cache these partitions. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@278 PS3, Line 278: HdfsPartition nit, Do we need to store the full HdfsPartition if it is dropped? Seems like we can only store a minimal HdfsPartition. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java: http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@208 PS3, Line 208: Map<TableName, List<THdfsPartition>> newPartitionsByTable = new HashMap<>(); Instead of tracking the partitions separately in this map do you think it would be more readable to modify the ObjectSequencer to order the updates such that the order is topLevelObject, Table, HdfsPartition (using a PriorityQueue)? The advantage of doing that would be that we will have separate methods for addPartition, removePartition which fits nicely with the existing logic instead of sending over this map to all the table methods. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@466 PS3, Line 466: Detected by : // non-empty partition list and empty partition map. can you please update this comment? http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@492 PS3, Line 492: !tHdfsTable.partitions.containsKey(tPart.id) Why is this not a Preconditions.checkState? http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1378 PS3, Line 1378: useless nit, may be replace with "obsolete"? http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1451 PS3, Line 1451: partition(id=%d) of table %s.%s nit, may be more helpful to print the partitionNames here instead of id? Also, may be reduce the verbosity the string using perhaps dbname.tblname.partitionName format? -- To view, visit http://gerrit.cloudera.org:8080/16159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870 Gerrit-Change-Number: 16159 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Anurag Mantripragada <anu...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Comment-Date: Mon, 20 Jul 2020 17:18:29 +0000 Gerrit-HasComments: Yes