Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17308 )
Change subject: IMPALA-10502: Handle CREATE/DROP events correctly ...................................................................... Patch Set 6: (24 comments) http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@a689 PS5, Line 689: > Will removing these break third party extensions? If not, we can remove the yes, you are right. Thanks for catching that! http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@874 PS5, Line 874: loadFileMetadataForPartitions(client, addedPartBuilders, /*isRefresh=*/false); > Can we refactor this with the above method? The major logics are the same. Done http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@883 PS5, Line 883: : private HdfsPartition.Builder > nit: can fit into one line Done http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2436 PS5, Line 2436: value. > nit: I think we prefer ++i and some spaces Done. I never realized that we prefer ++i v/s i++ in the loops. A quick grep does indeed so ++i has 319 instances and i++ has only 61. Did some reading to know if this is more of a stylistic preference or something more :) http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/catalog/TableLoader.java File fe/src/main/java/org/apache/impala/catalog/TableLoader.java: http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@78 PS5, Line 78: synchronized (metastoreAccessLock_) { > hmm.. not related to this patch, I think we don't need this anymore. We hav yeah, I agree. I filed IMPALA-10706 http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@112 PS5, Line 112: throw new TableLoadingExceptio > I'm not clear on the purpose of this. Do we depend on the createEventId whe The table has a new field createEventId which is used to track the event. When a table is created from Impala we create a IncompleteTable which has the createEventId set to corresponding CREATE_TABLE event id from the HMS. But it is possible that when the table is loaded, the table was dropped and recreated outside Impala. Events processor doesn't update the createEventId of the table since this table is in unloaded state. Hence we need to update the eventId to the latest CREATE_TABLE event id when we load the table so that the drop and create table event which events processor receives on this loaded table is ignored. I will update the comment with more details to make it more readable. http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@744 PS5, Line 744: LOG.debug("EventId: {} Table {} was not added since " > nit: Log table name as well? Done http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@787 PS5, Line 787: Table tblBefore = null; > nit: could you move this to line 805 since it's only used there? Done http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@809 PS5, Line 809: origin > nit: "original" makes more sense for me :) Done http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1305 PS5, Line 1305: * Alters an existing view's definition in the metastore. Throws an exception > Can we refactor this with the above method? The major logics are the same. Done http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1388 PS5, Line 1388: * Updates table property 'impala.lastComputeStatsTime' for COMPUTE (INCREMENTAL) STATS, > nit: redundant blank line Done http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1773 PS5, Line 1773: ((CreateDatabaseEvent) event).getDatabase()); : } catch (MetastoreNotificationException e) { : throw new CatalogException("Unable to create a metastore event ", e); : } : } : : /** > Yeah, I think handling the IF NOT EXISTS case in this situation is a nice-t It was not too complex. I implemented it for create table and create database. For partitions it was already being handled. http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1802 PS5, Line 1802: .get(events.get(events.size() - 1)); : Preconditions.checkState(event instanceof CreateTableEvent); : > nit: this can be one-line statement Done http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2257 PS5, Line 2257: } else { > Does it make more sense to either yeah, I could have done either of those but I found that the code was becoming a lot more convoluted because of the need to handle the TException if I move this out of the try-with-resources block or if I pass the client to the getNextMetastoreEvents. I thought this perf benefit is unlikely to be the common case and hence wasn't worth the extra complexity. The only time I can see this would benefit is when the dropDatabase is using the last metastore client of the pool and getNextMetastoreEvents is creating a new client. However, at that point it is clear that catalogd needs more clients than it is configured with due to possibly increased concurrency of ddls and this kind of bottleneck would be observed at many places. http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2260 PS5, Line 2260: } > Why are we filtering DROP_TABLE_EVENT_TYPE here? Isn't it being handled at we are interested in filtering out the drop table events also here because this drop database operation could be a drop database cascade command which would drop the underlying tables as well. line 2548 is to handle the drop table command. In both the cases we need to update the deleteEventLog. Hope that makes sense. http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3093 PS5, Line 3093: .equalsIgnoreCase(event.getTableName())); > Just confirming: For any ddl operations, HMS adds entry to NotificationEven yes, HMS events are generated as part of the RDBMS transaction which adds the table to the HMS. http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3198 PS5, Line 3198: e > Thoughts? Yeah, I think we should change it to .equals() But perhaps in a separate change so that we can port it to any other branches if needed. http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3209 PS5, Line 3209: if (cacheOp != null && cacheOp.isSet_cached()) { > Thoughts? yeah ,I have updated my patch to remove this additional getTable call. I will see if this passes the tests to make sure. http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3255 PS5, Line 3255: "ifNotExists is true.", tableName)); > nit: redundant blank line Done http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4015 PS5, Line 4015: + " > nit: can we remove this and the parameter in the method if we don't need it makes sense. http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: http://gerrit.cloudera.org:8080/#/c/17308/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java@185 PS5, Line 185: "Fatal error while ini > Throwing the execption will fail the process. Based on the aboved logs, it Thanks for pointing this out. I think it is valid to throw CatalogException here since the user configured to use the events processor but we cannot instantiate it. I will update the log message. http://gerrit.cloudera.org:8080/#/c/17308/5/tests/custom_cluster/test_event_processing.py File tests/custom_cluster/test_event_processing.py: http://gerrit.cloudera.org:8080/#/c/17308/5/tests/custom_cluster/test_event_processing.py@285 PS5, Line 285: t Exception, e: > No needed? It seems 'q' is already a concrete string. Thanks for catching that. Done. http://gerrit.cloudera.org:8080/#/c/17308/5/tests/custom_cluster/test_event_processing.py@286 PS5, Line 286: print("Failed > nit: might be helpful to print the exception as well Done http://gerrit.cloudera.org:8080/#/c/17308/5/tests/custom_cluster/test_event_processing.py@591 PS5, Line 591: r is not updated. Inste > nit: process it Done -- To view, visit http://gerrit.cloudera.org:8080/17308 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia2c5e96b48abac015240f20295b3ec3b1d71f24a Gerrit-Change-Number: 17308 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sourabh Goyal <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Comment-Date: Tue, 18 May 2021 00:36:52 +0000 Gerrit-HasComments: Yes
