Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
......................................................................


Patch Set 14:

(21 comments)

Nice improvements. Comments mostly relate to ways to simplify the processing 
logic. Feel free to ping me directly to discuss.

http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@293
PS14, Line 293:     if (metastoreEventProcessor_ != null) {
The var != null pattern is error prone. Would be great to have the 
MetastoreEventsProcessor be an interface with two implementations: a null 
implementation (disabled) and a live implementation. Then, you just call the 
various methods without a check for null or not.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@325
PS14, Line 325:     }
Can this move to a static factory method on the MetastoreEventsProcessor 
implementation? Doing so puts all the logic for that abstraction in one place. 
All this method would do here is choose between the dummy and live versions.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1292
PS14, Line 1292:         return addDb(dbName, msDb) != null;
In master, addDb() returns a DB. This method returns a boolean. Is something 
amiss here?

Is this really an "addOrGetDb"? Do you want the DB object -- either an existing 
one or a new one? Else, if you call this, get true, then look up the DB, is 
there a race condition?


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1344
PS14, Line 1344:   public Table addTableIfNotExists(String dbName, String 
tblName,
Checked. We do have existing methods called getOrLoadTable(). So, calling this 
one getOrAddTable might be more consistent.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1345
PS14, Line 1345:       Reference<Boolean> dbWasFound, Reference<Boolean> 
tblWasFound) {
This is rather an odd C++ way to do things. A more typical Java way is to 
declare a class that will return the required values.. Or, you can declare a 
Pair<Table,TableStatus> with TableStatus an enum of {CREATED, ALREADY_EXISTS, 
DB_NOT_FOUND}. The same enum could be returned for other add-or-get operations.

Pair<Table,AddOrGetStatus> addOrGetTable(String dbName, String tblName) ...


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1359
PS14, Line 1359:       } else {
The else is not necessary since prior block returned. Not a bug, just not 
necessary.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1365
PS14, Line 1365:       versionLock_.writeLock().unlock();
This is old school. We should change this so we can use try-with-resources:

try(LockInstance lock =versionLock.writeLock()) {
   ...
}

LockInstance would implement AutoCloseable to make this happen.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1462
PS14, Line 1462:       Reference<Boolean> tblWasfound, Reference<Boolean> 
tblMatched) {
Same issue with the references.

We already have a removeTable() method which is implemented to do a "remove 
table if exists". Is there commonality that should be exploited here?


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1792
PS14, Line 1792:     versionLock_.writeLock().lock();
Race condition? We got the DB above before the lock. The DB could be invalid 
now that we have the lock.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1804
PS14, Line 1804:     return db.getTable(tblName);
The table was created above. We released the lock, then looked up the table 
again. Race conditions?


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java
File fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java:

http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@52
PS14, Line 52:   public static final String ALTER_PARTITION_EVENT_TYPE = 
"ALTER_PARTITION";
Given that we use the event names not just to translate from Thrift event to 
handler, perhaps this should be an enum.

public EventType {
  CREATE_TABLE("CREATE_TABLE") ...
}

Then we can do:

  if (event == EventType.CREATE_TABLE) ...

The enum form is both faster and more robust.

Oh, and to get an enum value from a string you can define a static 
getEventType(String name) method to do the lookup.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@62
PS14, Line 62:   public static class MetastoreEventHandlerFactory {
I'm a bit confused by the structure. Looks like the event handlers are created 
once up-front. That means that the event is handled by a method that must take 
all its parameters via its "do something" method.

A more typical approach is to create an instance per event, with the instance 
holding all the required data. Indeed, if you do this, then you can pass the 
event object into your addOrGet methods which can fill in the "already exists", 
"just created", "table pointer" and other values.

The overhead of creating one more Java object is a small price to pay for the 
resulting simplicity.

Also, in this model, you don't need the EventType enum suggested above: the 
event instance is its own enum. There is no checking of "if the event is of 
type x" because each event has its own handler class.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@74
PS14, Line 74:     private final ConcurrentHashMap<String, 
MetastoreEventHandler> eventhandlers_ =
Also, if you use enums, each has an ordinal. You can then populate the table 
simply with a loop that populates an array:

MetaStoreEventHandler eventHandlers[] = new 
MetaStoreEventHandler[EventType.values().size].
for (int I = 0; I < EventType.values.size(); i++) {
  eventHandlers[I] = ...( EventType.getValue(i) );

The method names may be off; typed above from memory.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@87
PS14, Line 87:     public MetastoreEventHandler getOrCreate(String eventType) {
With the array, this becomes as simple as:

return eventHandler[ EventType.getEvent(eventType) ];


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@135
PS14, Line 135:     protected final Logger LOG = 
Logger.getLogger(this.getClass().getName());
Do you need a logger per class if they will all report the same class name? Or, 
do you want

  protected static final Logger LOG = 
Logger.getLogger(MetastoreEventHandler.class);


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@160
PS14, Line 160:         String expectedType, NotificationEvent event) {
This need not be a util method if we store the Db and Table name as members of 
each event object. Just call this to parse out the value and fill in the 
corresponding fields.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@269
PS14, Line 269:       JSONAlterTableMessage alterTableMessage =
This would be cleaner with the event-handler-per-event model. The factory would 
parse the event type, do the cast, an each handler would receive the subclass 
that it is designed to handle.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java
File fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@49
PS14, Line 49:  * such events on the catalogD we can sync to external metadata 
operations by taking
Have the goals changed? Or, is this description jumping a few steps? Are we not 
keeping in sync by flushing the cache after each metastore change? We are not 
replaying the metastore changes onto our local cache?


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@53
PS14, Line 53:  *
Maybe put a <pre> ... </pre> around this to keep Eclipse and IntelliJ from 
treating this block as text when displaying JavaDoc.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@81
PS14, Line 81:  * . A object in catalog is stale if and only if its 
creationTime is <= creationTime of
Nit: move period to previous line.


http://gerrit.cloudera.org:8080/#/c/12118/14/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@117
PS14, Line 117:  * object will make sure that the catalog is stale before 
applying any such event.
Thanks for the very detailed explanation. Very helpful!



--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 14
Gerrit-Owner: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Fri, 11 Jan 2019 00:59:26 +0000
Gerrit-HasComments: Yes

Reply via email to