Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 )
Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints ...................................................................... Patch Set 15: (35 comments) http://gerrit.cloudera.org:8080/#/c/17859/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17859/11//COMMIT_MSG@7 PS11, Line 7: Sync db/table in catalog cache to latest HMS event id when performing : DDL operations via catalog HMS endpoints > If this patch is close to getting merged, now is a good to add more details Yes I will add more details in the commit message. Will fix message format styles as well. http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG@7 PS12, Line 7: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing : DDL operations via catalog HMS endpoints > Since this patch is not a WIP anymore, can you please follow the commit mes Ack http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@457 PS6, Line 457: tableInfo); : } : int tableIndex=-1, versionLockCount = 0; : try { : for(tableIndex = 0; tableIndex < numTables; tableIndex++) { : Table tbl = tables[tableIndex]; : if (!tryWriteLock(tbl)) { : LOG.debug("Could not acquire write lock on table: " + tbl.getFullName()); : return false; : } : versionLockCount += 1; : } : // in case of success, release version write lock for all tables except last : if (tableIndex == numTables) { : > I think the versionLock.writeLock().unlock() should be moved out finally bl Currently tryWriteLock(tbl) does not throw an exception. But it still makes sense to release version lock in a finally block. Will fix it. http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3652 PS6, Line 3652: > annotate with @VisibleForTesting if this was for testing. It is used in testing as well as in JniCatalog (after initializing catalogServiceCatalog object) http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3658 PS6, Line 3658: > nit, missing newline. Ack http://gerrit.cloudera.org:8080/#/c/17859/11/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/11/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@459 PS11, Line 459: leIndex=-1, > I had left some comments in the older gerrit url for this patch. Can you pl Sure http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@457 PS12, Line 457: tableInfo); : } : int tableIndex=-1, versionLockCount = 0; : try { : for(tableIndex = 0; tableIndex < numTables; tableIndex++) { : Table tbl = tables[tableIndex]; : if (!tryWriteLock(tbl)) { : LOG.debug("Could not acquire write lock on table: " + tbl.getFullName()); : return false; : } : versionLockCount += 1; : } : // in case of success, release version write lock for all tables except last : if (tableIndex == numTables) { : > a RuntimeException thrown at line 459 will not release the table locks as w Makes sense. I didn't think about the RunTimeException. Will address it. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2672 PS12, Line 2672: > why do we need to override this method? I think the intention for this change was - currently for removeTable() api , we acquire global write lock. So the read for the table should acquire global read lock. http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java File fe/src/main/java/org/apache/impala/catalog/Db.java: http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115 PS6, Line 115: > need this? The intention was - by making lastSyncedEventId_ volatile, the get and set apis for this variable would become thread safe. Otherwise to read this, one would have to take read lock. http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115 PS6, Line 115: > please add a comment here explaining what the value of this field signifies Ack http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java File fe/src/main/java/org/apache/impala/catalog/TableLoader.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@170 PS12, Line 170: initMetrics > since these metrics are not getting exposed anywhere. can we remove these? Are you suggesting to pass metrics as null in MetastoreEventsProcessor.syncToLatestEventId(catalog_, table, catalog_.getEventFactoryForSyncToLatestEvent(), metrics_, false); ? If yes, then we need to handle null condition for metrics in MetastoreEvents.java http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@168 PS12, Line 168: // St > do we need these changes to this class? Not really. I will revert back to original behavior. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@461 PS12, Line 461: shouldSkipWhenSyncingToLatestEventId > why are we not just using isSelfEvent() here? Would it not be cleaner to ch Sorry I didn't understand your comment http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@576 PS12, Line 576: if (c > is this commented for a reason? No. Forgot to clean it up. Will do it. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@800 PS12, Line 800: return true; : } : } catch (DatabaseNotFoundException e) { : infoLog("Skipping on table {} because db {} not found in cache", tblName, : > if the table is not present do we need to check if it is in the deletelog? How would checking in deletelog help? Even if it is present, should we not skip processing this event? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@990 PS12, Line 990: > I think we should implement a check for lastSyncedEventId based self-event MetastoreTableEvent class already has a default implementation for skipping the event if the table is already synced till this event id. Do we need a custom implementation here? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1175 PS12, Line 1175: the ALTER_TABLE event is due a table rename, this method removes the old table : * and creates a new table with the new name. Else, this just issues a refr > It is not clear to me why we are checking on both the before and after tabl This check is done for rename table. Already have an explanation for this in method's description // If the alter table event is generated because of table rename then event // should *NOT* be skipped if old table is not synced this till event AND // new table doesn't exist in cache. Skip otherwise Revisiting the logic and I think we can skip this logic for rename and simply rely on renameTableFromEvent() when actually processing the event. Thoughts? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1699 PS12, Line 1699: > Why does this class not implement shouldSkipWhenSyncingToLatestEventId? MetastoreTableEvent class already has the default implementation for this. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@328 PS12, Line 328: shouldBeAlreadyLocked > Is there a caller which passes this as false? Yes, during full table reload from TableLoader. WriteLock is not required there since we are doing reload on a fresh table Rethinking about this - It probably makes more sense to acquire write lock on a newly created table in TableLoader and get rid of shouldBeAlreadyLocked. Thoughts? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@329 PS12, Line 329: Preconditions.checkState(tbl.isWriteLockedByCurrentThread(), : String.format("Write lock is not held on table %s by current thread", : tbl.getFullName())); > I think this preconditions check should be done in any case to catch bugs w Sorry I couldn't understand your comment. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@386 PS12, Line 386: tbl.setLastSyncedEventId(currentEvent.getEventId()); > can this logic be moved to the event.processIfEnabled() itself? Actual event processing logic like: addPartitionsIfNotRemovedLater, reloadPartition() (for alter partition) is already taking care of setting event id in table/db. I think we shouldn't move this to event.processIfEnabled() because the actual implementation of process() api should decide whether a given event id should be set in table/db or not http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@458 PS12, Line 458: TODO: should we ignore case > yes, case should be ignored. Also I think we should make sure that the cata Do you mean we support only "hive" catalog? If yes should the filter be modified to if (event.getCatName() != null && event.getDbName() != null && event.getTableName() != null) { return event.getCatName().equalsIgnoreCase("hive") && tbl.getDb().getName().equalsIgnoreCase(event.getDbName()) && tbl.getName().equalsIgnoreCase(event.getTableName()); Same question for db filter as well. Should we check for catalog name? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@475 PS12, Line 475: return db.getName().equalsIgnoreCase(event.getDbName()); Should we have catalog name check here? Ignore events if catalog name is not hive? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@478 PS12, Line 478: event.getTableName() == null > not sure I get this part. Why do we need to do this? The check is to ignore any table specific events since those events would have table name as non null and accept the rest http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@533 PS12, Line 533: storeEventFactory.get > can we use the singleton way to get the factory here? I tried using singleton object after our discussion but that introduced some tests failures. Most likely some race conditions. I couldn't repro them on local. Also since the logs from pre commit tests were missing upstream I couldn't debug it further. Therefore switched to using non singleton object of event factory. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@283 PS12, Line 283: dropDbIfExists > why not sync to latest event here? My thought was - when the db is eventually going to be dropped, why sync it to latest event i.e drop event and then drop the db? We can by pass the sync and drop the db right away. Thoughts? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@857 PS12, Line 857: syncToLatestEventId(srcTbl, apiName); : syncToLatestEventId(destinationTbl, apiName); : } catch (Exception e) { : rethrowException(e, apiName); > Can this be removed now? Ack http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1105 PS12, Line 1105: > why not sync to latest event here? Same reasoning as dropDbIfExists. Currently when syncing to latest event if we encounter drop table/db event then we drop the db/table and stop processing further events (if any). So why not drop the table right away? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1279 PS12, Line 1279: ename = !db > Do we need a preconditions check to confirm that table is locked by this th Yes we should have. For now, I have removed this method and moved the logic in the caller since there is only one caller of this method. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@909 PS12, Line 909: if (dbToAlter == null) { : LOG.debug("Event id: {}, not altering db {} since it does not exist in catalogd", : eventId, dbName); : > check deleteEventLog here? How would that help? Even if db is present in deleteEventLog, is it not already deleted? http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1611 PS12, Line 1611: if (!BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls()) { : assertEquals(EventProcessorStatus.NEEDS_INVALIDATE, : eventsProcessor_.getStatus()); : } > not sure I fully understand this. When sync to latest event id is enabled, any event on a non existing table is skipped (check shouldSkipWhenSyncingToLatestEventId in MetastoreTableEvent). Regarding this change, I am actually not sure if this is causing any regression but the behavior in case the flag is turned on seems to be right. Let me know your thoughts http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java File fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java: http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@199 PS12, Line 199: prevSyncedEventId); > Will this cause flakiness of the test since the HMS is shared among multipl Yes the check will fail in that case. Thanks for pointing it out. I will modify the check at all places in this file to getLastSyncedEventId() > prevSyncedId http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@240 PS12, Line 240: // added partitions should not reflect in table : // stored in catalog cache : assertTrue(tbl.getPartitions().size() == 0); > not sure I understand this. The add_partitions API in HMS handler should sy addPartitionsInHms() method does not use catalogHmsClient and uses normal hms client to add partitions to metastore. Therefore if cataligHmsClient is by passed, then that ddl operation should not affect the cache. http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@550 PS12, Line 550: catalogHmsClient_.dropTable(TEST_DB_NAME, tblName, true, true); > can you also assert here that lastSyncedEventId > createEventId Ack http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@561 PS12, Line 561: EventsProcessor prevEventProcessor = > not sure what is the goal of this test? This is testing functionality which Thats correct. However tests in MetastoreEventsProcessorTest currently do not test if table has been synced to latest event id. We can modify those tests. May be run all metastore events processor tests as parameterized tests with and without sync to latest event id patch ? Thoughts? -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 15 Gerrit-Owner: Sourabh Goyal <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sourabh Goyal <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Yu-Wen Lai <[email protected]> Gerrit-Comment-Date: Thu, 07 Oct 2021 20:03:39 +0000 Gerrit-HasComments: Yes
