Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12118 )
Change subject: IMPALA-7970 : Add support for metastore event based automatic invalidate ...................................................................... Patch Set 9: (39 comments) This will be a great feature. Detailed review from the perspective of 1) synchronization and 2) diagnosing issues when they occur in the field. This is a partial review, will review remaining files separately. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@302 PS9, Line 302: if (BackendConfig.INSTANCE.getHMSPollingFrequencyInSeconds() <= 0) { Good check. Better to define this as behavior: to disable the polling, set the interval to 0 (or, trivially <0). This would be a good fall-back safety-valve for support cases, tests, etc. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@313 PS9, Line 313: } catch (TException e) { Nice use of try-with-catch. No need for nested tries. Remove inner catch, move catch to outer try. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2317 PS9, Line 2317: MetastoreEventsProcessor getMetastoreEventProcessor() { Should this be public? If only protected, please explicitly label it protected. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/Db.java File fe/src/main/java/org/apache/impala/catalog/Db.java: http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/Db.java@112 PS9, Line 112: synchronized (thriftDb_) { thriftDb_.setMetastore_db(msDb); } What is the synchronization model? As written, we lock on the prior DB to set the new one. This means access is not synchronized. Better to add synchronized to the method itself, here and accessor. But, more broadly, if we synchronize only at the DB level, we'll have all manner of race conditions. One client will grab the DB with one version the thrift object, then another will be slotted in underneath. This probably needs more thought. Is the model documented in the write-up? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/Db.java@159 PS9, Line 159: synchronized (thriftDb_) { return thriftDb_.deepCopy(); } How often do we do this? Do we want this to block on an HMS call about parameters? Maybe a better model is for the RPC calls to do this: - Obtain a lock - Get a pointer to the current thrift object - Release the lock - Make the RPC Not that there is no need to copy the thrift object above: the logic here implies that the Thrift object is immutable once set. All we are doing is making sure we use the same object throughout the RPC. Still, not convinced we have the right locking semantics. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/Db.java@164 PS9, Line 164: synchronized (thriftDb_) { return thriftDb_.getDb_name(); } Now getting the name is synchronized on an HMS call. That can't be good for performance. The planner/analyzer gets the name all the time, so we really don't want to do this. The name should be invariant or all heck breaks loose in the planner. So, if we swap out Thrift objects, pull the name outside the thrift object and cache it in a final field. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@50 PS9, Line 50: * in Hive metastore using the the public API <code>get_next_notification</code> These events could Impala generally does not use Javadoc markup, sadly. But, if you want to use it (we really should), then a reference to a Java method should be: {@link Class#method}. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@63 PS9, Line 63: * <li>CREATE_TABLE, CREATE_DATABASE</li> Wrong form for HTML lists. I think you want a definition list: <dl> <dt>CREATE_TABLE, CREATE_DATABASE<dt> <dd>A new table/database is ... </dd> ... That sad, if Vuk was here, he'd have you rip all this out. So, a broader question is: do we want our Javadoc to follow C++ doc standard so the C++ folks can follow it, or Javadoc standard so the documentation looks correct when formatted? I prefer the Javadoc approach, so would encourage you to keep the (corrected) formatting. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@91 PS9, Line 91: private static final Logger LOG = Logger.getLogger(MetastoreEventsProcessor.class); Statics generally go before instance members. Move this up with the constant string. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@106 PS9, Line 106: ExtendedJSONMessageFactory.getInstance(); Same as above, and below. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@118 PS9, Line 118: // know if we need to ignore some of events which are already applied in catalogD To avoid race conditions, separate creation from starting. Have a separate method that starts the timer, to be called once the system is ready for action. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@137 PS9, Line 137: void scheduleAtFixedDelayRate(long pollingFrequencyInSec) { Please put explicit public/private/protected declarations before each class and method so we can determine your intent. Shouldn't this one be private? Or, if we make the race-avoiding change above, it would be public, called once. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@145 PS9, Line 145: LOG.warn(String.format( LOG.error? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@158 PS9, Line 158: void processHMSNotificationEvents() throws ImpalaException { Please document the timer semantics supported here. This is not synchronized. Say we poll every 5 seconds. Say we get a very slow HMS message and the previous poll is still in progress when the next tick occurs. Would we get two poll events overlapping? Or, will the timer skip a tick if the previous has not yet completed? That is, does the scheduler handle this case, or must you handle it here? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@177 PS9, Line 177: throw new MetastoreNotificationException( To whom are we throwing this? This method was called by the scheduler, wasn't it? Does the scheduler know what to do with this? Or, should we log the error here? What happens if the errors continue? (HMS server was shut down on host x and moved to host y. It is never coming back up.) Should we retry for x times, then turn off polling? Use an exponential back-off? Whatever we do, please document the error handling strategy. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@186 PS9, Line 186: .format("Unable to process notification event %d due to %s. Ignoring this event...", If we get this in an escalation, we'll need to debug. At the very least, dump some info for the kind of event. Ideally dump things like table name, type of change, etc. Next, document why it is OK to ignore an event and continue. Since we are invalidating, the worst that will happen is that something is not invalidated that should be. But, if we ever wanted to replay events, failing on one would be fatal, continuing would corrupt the metadata. A fall-back there would be a global invalidate for the DB in question. Should we use that as the fall-back here? Given the severity, this should be LOG.error. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@198 PS9, Line 198: LOG.info("Processing event " + event.getEventId() + " : " + event.getEventType() Use this info in the error message above. In a production server, the info events may be turned off, we'll get only the error/warn events. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@202 PS9, Line 202: Db db = catalog_.getDb(dbName); Are there any events for which the DB name will be missing? System-wide events? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@204 PS9, Line 204: switch (event.getEventType()) { Is the event type provided as an enum or just a string? String comparisons work, are slower than enum/int comparisons. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@209 PS9, Line 209: invalidateCatalogTable(table); No need to invalidate the entire table (potentially 1000s of partitions) because one changed. Talk to Todd to see if he can recommend a more focused invalidation. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@212 PS9, Line 212: catalog_.addTable(dbName, tblName); No need to clutter up the catalog with new tables that no query has requested. Maybe just ignore these events. We'll get the table data on the first query for that table. This is particularly true for temp tables and other transient items. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@221 PS9, Line 221: invalidateCatalogTable(table); Should the fall-back logic be implemented inside the rename table event method? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@225 PS9, Line 225: catalog_.removeTable(dbName, tblName); This one prompts questions about synchronization. Suppose 10 tables are removed. Suppose we're planning a query at the same time. The way this works, for each removal, we'll lock the catalog, make the change, and unlock. A planning session will see multiple changes. The current logic will (I believe) fail the plan and restart. Since we do the 10 removes (or other changes) one-by-one, it means we will fail and restart planning for any in-flight queries once for each change (10 times here) rather than once for overall poll. Do we like this? Does this help with the goal of performance? Can we do something else? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@235 PS9, Line 235: catalog_.addDb(dbName, msDb); Don't add the DB for the same reason cited for tables. If this Impala is used for Sales, we really don't need to take space in the cache for DBs created by Production, say. Also, if this is a remove, we should remove the old DB, then add a new one. Does this avoid the need for the db Thrift switch code discussed in the DB class? Reason: we need to remove all tables for the DB, and all partitions for all those tables. If a DB is dropped and replaced, none of the prior tables survive. We should not retain them in our cache. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@238 PS9, Line 238: if (db != null) { Let's think a bit. We're checking for null db on every event. We don't care about a new DB. So, should we do the "we don't have the DB in our cache so we don't care about this event" check before the switch statement? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@261 PS9, Line 261: LOG.warn( We'll get thousands of these. Set this to Log.detail/debug (whatever the finest-grain level is). We know it is OK to ignore them because you wrote the code stating that this is OK, not much the user can do with the warnings other than call for help. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@271 PS9, Line 271: * table in-place We're just invalidating, not renaming. So, just remove the old table. Impala will load the new one on demand when the next query arrives that uses it. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@281 PS9, Line 281: event.getMessage()); Can deserialization fail? If so, what does that exception due to the scheduler when propagated back up through the event thread? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@290 PS9, Line 290: return false; Wait, then why did we receive this event? Did something else about the table change? Should we just invalidate the table to be safe if we don't know what changed? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@296 PS9, Line 296: catalog_.addTable(newTable.getDbName(), newTable.getTableName()); Skip this step for reasons given above. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@299 PS9, Line 299: throw new MetastoreNotificationException(e); How is this handled? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@304 PS9, Line 304: * Deserializes the event message and create a database object out of it. nit: Deserializes ... create[s]... http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@315 PS9, Line 315: if ("CREATE_DATABASE".equalsIgnoreCase(event.getEventType())) { The switch statement already parsed out the event type. Looks like we're doing it here again so we can share some code at the top and bottom. Better: one method per event. Put common code into methods, called from the handler methods. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@318 PS9, Line 318: .getCreateDatabaseMessage(event.getMessage()); This code block appears multiple times. For each I'll ask error handling strategy. Cries out for being moved to its own method. Each returns a different class. This can be handled with Java generics and passing the class of the expected message type. Which suggests that the method either check the message class type before casting, or handle the class cast exception. This will make the code more resilient to unexpected changes to the RPC protocol since the protocol itself is not versioned (I believe.) http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@326 PS9, Line 326: if (!checkSupportedCasesForAlterDatabaseEvent(msDbBefore, msDb)) { Good error handling. Can be made better. Change the method to "validateAlterDatabaseEvent". Have it return void. In the method, parse out each error case: if (first problem occurs) { throw Something("describing the first problem" } else if (second problem occurs) { ... That way, we'll log the precise failure mode making it easier to diagnose issues in the field when all we have are logs. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@343 PS9, Line 343: throw new MetastoreNotificationException(e); Tests should trigger each error condition to see how they are reported in the log. Determine if we could track down the problem from the log messages. Error injection is helpful in such cases. Do we have a preferred mechanism? Else, we can invent something. http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@365 PS9, Line 365: if (table == null) { Why would we call this for a null table? Preconditions instead? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@373 PS9, Line 373: + " invalidated because the table was recreated in metastore"); Is this method only for recreation? Looks like it is called for ALTER_TABLE, which is not recreation. Better log message? Also, how the heck does the catalog know if the table was removed or added? Is this the right method to be calling here? Does that method call out to HMS to learn about the table? If so, should we just invalidate instead? http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@401 PS9, Line 401: static class MetastoreNotificationException extends ImpalaException { You should have gotten a warning here about lack of a serialization ID, I you are using an IDE. To solve that, add the suppress warnings annotation that your IDE suggests. Also, where do we expect to catch the exception? If inside this code only, then a nested class is fine. If outside, then moving the class to a separate file is a good idea. Finally, nested classes generally should appear at the top of the file. Mark the class as public unless you really want to hide if from other modules. -- To view, visit http://gerrit.cloudera.org:8080/12118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7 Gerrit-Change-Number: 12118 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 27 Dec 2018 20:06:35 +0000 Gerrit-HasComments: Yes
