Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/20145 )
Change subject: IMPALA-12256: Fix DDLs losing partition-level create event ids in reloads ...................................................................... Patch Set 4: Code-Review+1 (3 comments) Thanks for the analysis and the prompt fix Quanlong! I only have some minor questions mostly because I know very little around this area. http://gerrit.cloudera.org:8080/#/c/20145/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/20145/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1326 PS4, Line 1326: loss nit: Should we use "lose" instead? http://gerrit.cloudera.org:8080/#/c/20145/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/20145/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2904 PS4, Line 2904: partBuilder.setCreateEventId(oldPartition.getCreateEventId()); : partBuilder.setLastCompactionId(oldPartition.getLastCompactionId()); I have 2 questions. 1. At https://gerrit.cloudera.org/c/20145/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#1329 we also populate the field 'lastRefreshEventId_'. I wonder if we also need to pick up the value of 'lastRefreshEventId_' like the following. Or we do not really need to do this because we already populate the field 'lastRefreshEventId_' somewhere else and this field will be available when the event processor is processing a reload event (e.g., at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L3891)? partBuilder.setLastRefreshEventId(oldPartition.getLastRefreshEventId()); 2. Here we also pick up the value of 'lastCompactionId_'. Will this piece of information be required to fix the issue reported in this JIRA? Or we just want to keep this field in sync with the same field in the partition that was just updated by the DDL command? http://gerrit.cloudera.org:8080/#/c/20145/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20145/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4544 PS4, Line 4544: // if the partition has been created since the event was generated, skip : // the stale event. : boolean isStale = hdfsPartition.getCreateEventId() > eventId; : LOG.info("{} partition {} of table {} since it's create event id {} is {} than " + : "eventid {}", : isStale ? "Not dropping" : "Dropping", : hdfsPartition.getPartitionName(), hdfsTable.getFullName(), : hdfsPartition.getCreateEventId(), isStale ? "higher" : "not higher", : eventId); : return !isStale; : At https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L3891 we have the following check within CatalogServiceCatalog.java#isPartitionLoadedAfterEvent(). I wonder if it would also be good for us to log the reload event? HdfsPartition hdfsPartition = getHdfsPartition(dbName, tableName, msPartition); if (hdfsPartition.getLastRefreshEventId() > eventId) { return true; } -- To view, visit http://gerrit.cloudera.org:8080/20145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I052faa093bda69fb16db0d424c1478bba103dad9 Gerrit-Change-Number: 20145 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Sun, 02 Jul 2023 00:06:33 +0000 Gerrit-HasComments: Yes
