Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12118 )
Change subject: IMPALA-7970 : Add support for metastore event based automatic invalidate ...................................................................... Patch Set 14: (21 comments) Nice improvements. Comments mostly relate to ways to simplify the processing logic. Feel free to ping me directly to discuss. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@293 PS14, Line 293: if (metastoreEventProcessor_ != null) { The var != null pattern is error prone. Would be great to have the MetastoreEventsProcessor be an interface with two implementations: a null implementation (disabled) and a live implementation. Then, you just call the various methods without a check for null or not. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@325 PS14, Line 325: } Can this move to a static factory method on the MetastoreEventsProcessor implementation? Doing so puts all the logic for that abstraction in one place. All this method would do here is choose between the dummy and live versions. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1292 PS14, Line 1292: return addDb(dbName, msDb) != null; In master, addDb() returns a DB. This method returns a boolean. Is something amiss here? Is this really an "addOrGetDb"? Do you want the DB object -- either an existing one or a new one? Else, if you call this, get true, then look up the DB, is there a race condition? http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1344 PS14, Line 1344: public Table addTableIfNotExists(String dbName, String tblName, Checked. We do have existing methods called getOrLoadTable(). So, calling this one getOrAddTable might be more consistent. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1345 PS14, Line 1345: Reference<Boolean> dbWasFound, Reference<Boolean> tblWasFound) { This is rather an odd C++ way to do things. A more typical Java way is to declare a class that will return the required values.. Or, you can declare a Pair<Table,TableStatus> with TableStatus an enum of {CREATED, ALREADY_EXISTS, DB_NOT_FOUND}. The same enum could be returned for other add-or-get operations. Pair<Table,AddOrGetStatus> addOrGetTable(String dbName, String tblName) ... http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1359 PS14, Line 1359: } else { The else is not necessary since prior block returned. Not a bug, just not necessary. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1365 PS14, Line 1365: versionLock_.writeLock().unlock(); This is old school. We should change this so we can use try-with-resources: try(LockInstance lock =versionLock.writeLock()) { ... } LockInstance would implement AutoCloseable to make this happen. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1462 PS14, Line 1462: Reference<Boolean> tblWasfound, Reference<Boolean> tblMatched) { Same issue with the references. We already have a removeTable() method which is implemented to do a "remove table if exists". Is there commonality that should be exploited here? http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1792 PS14, Line 1792: versionLock_.writeLock().lock(); Race condition? We got the DB above before the lock. The DB could be invalid now that we have the lock. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1804 PS14, Line 1804: return db.getTable(tblName); The table was created above. We released the lock, then looked up the table again. Race conditions? http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java File fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java: http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@52 PS14, Line 52: public static final String ALTER_PARTITION_EVENT_TYPE = "ALTER_PARTITION"; Given that we use the event names not just to translate from Thrift event to handler, perhaps this should be an enum. public EventType { CREATE_TABLE("CREATE_TABLE") ... } Then we can do: if (event == EventType.CREATE_TABLE) ... The enum form is both faster and more robust. Oh, and to get an enum value from a string you can define a static getEventType(String name) method to do the lookup. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@62 PS14, Line 62: public static class MetastoreEventHandlerFactory { I'm a bit confused by the structure. Looks like the event handlers are created once up-front. That means that the event is handled by a method that must take all its parameters via its "do something" method. A more typical approach is to create an instance per event, with the instance holding all the required data. Indeed, if you do this, then you can pass the event object into your addOrGet methods which can fill in the "already exists", "just created", "table pointer" and other values. The overhead of creating one more Java object is a small price to pay for the resulting simplicity. Also, in this model, you don't need the EventType enum suggested above: the event instance is its own enum. There is no checking of "if the event is of type x" because each event has its own handler class. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@74 PS14, Line 74: private final ConcurrentHashMap<String, MetastoreEventHandler> eventhandlers_ = Also, if you use enums, each has an ordinal. You can then populate the table simply with a loop that populates an array: MetaStoreEventHandler eventHandlers[] = new MetaStoreEventHandler[EventType.values().size]. for (int I = 0; I < EventType.values.size(); i++) { eventHandlers[I] = ...( EventType.getValue(i) ); The method names may be off; typed above from memory. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@87 PS14, Line 87: public MetastoreEventHandler getOrCreate(String eventType) { With the array, this becomes as simple as: return eventHandler[ EventType.getEvent(eventType) ]; http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@135 PS14, Line 135: protected final Logger LOG = Logger.getLogger(this.getClass().getName()); Do you need a logger per class if they will all report the same class name? Or, do you want protected static final Logger LOG = Logger.getLogger(MetastoreEventHandler.class); http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@160 PS14, Line 160: String expectedType, NotificationEvent event) { This need not be a util method if we store the Db and Table name as members of each event object. Just call this to parse out the value and fill in the corresponding fields. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@269 PS14, Line 269: JSONAlterTableMessage alterTableMessage = This would be cleaner with the event-handler-per-event model. The factory would parse the event type, do the cast, an each handler would receive the subclass that it is designed to handle. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@49 PS14, Line 49: * such events on the catalogD we can sync to external metadata operations by taking Have the goals changed? Or, is this description jumping a few steps? Are we not keeping in sync by flushing the cache after each metastore change? We are not replaying the metastore changes onto our local cache? http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@53 PS14, Line 53: * Maybe put a <pre> ... </pre> around this to keep Eclipse and IntelliJ from treating this block as text when displaying JavaDoc. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@81 PS14, Line 81: * . A object in catalog is stale if and only if its creationTime is <= creationTime of Nit: move period to previous line. http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@117 PS14, Line 117: * object will make sure that the catalog is stale before applying any such event. Thanks for the very detailed explanation. Very helpful! -- To view, visit http://gerrit.cloudera.org:8080/12118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7 Gerrit-Change-Number: 12118 Gerrit-PatchSet: 14 Gerrit-Owner: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Comment-Date: Fri, 11 Jan 2019 00:59:26 +0000 Gerrit-HasComments: Yes
