Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
......................................................................


Patch Set 13:

(24 comments)

Firstly thanks for the clean-up. The new class structure makes it much clearer 
IMO. Publishing a bunch of minor comments. Please bear with my nits.

Also, I had a brief chat with Vihang offline. It looks like the event handling 
races is still a WIP. My general concern was around the following 3 areas.

- Races with conflicting local events.
- What if Catalog already has the latest HMS state (due to a previous 
invalidation)
- Failure scenarios if some event processing errors out. (Should we stop and 
invalidate everything, should be abort the thread etc.)

Vihang mentioned that he is looking into HMS side versioning for objects to 
detect if the Catalog server already has the latest state and we can skip 
applying certain events. That probably requires some Hive side changes. Also we 
discussed about incorporating some of his thought process into class comments.

http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@304
PS8, Line 304:
             :   private void initializeMetastoreEventProcessor() {
             :     long eventPollingInterval = 
BackendConfig.INSTANCE.getHMSPolli
> done. Did you mean changing the log level to fatal as well?
fatal() IIRC aborts the process. If metastore polling is configured  and we are 
not able to init here, it looks to me like we should abort (instead of silently 
swallowing) since that can result in correctness issues later and is unexpected 
from a user POV. Thoughts?


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@292
PS13, Line 292: // start polling for metastore events
              :     if (metastoreEventProcessor_ != null) {
              :       metastoreEventProcessor_.start();
              :     }
Any reason for not moving this to initializeMetastoreEventProcessor()?


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@308
PS13, Line 308: debug
nit: Make this info?


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1460
PS13, Line 1460:       if (tblToBeRemoved != null) {
nit: invert the logic?

if (tblToBeRemoved == null) return null;
....


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java
File fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java:

http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@38
PS13, Line 38:   static abstract class MetastoreEventHandler {
Add doc about the thread-safety expectations here.


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@39
PS13, Line 39: CatalogServiceCatalog catalog,
Any reason for passing this around? Can't we instantiate this once and apply 
changes to it?


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@42
PS13, Line 42:     protected String dbName;
Document what it means to have these set. It looks like the dbName is always 
set but tblName is optional


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@43
PS13, Line 43:     protected String tblName;
nit: _ for class members


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@47
PS13, Line 47:     protected void validateAndInit(String expectedType, 
NotificationEvent event) {
method doc.


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@49
PS13, Line 49: event.getTableName()
wouldn't this be null in CREATED_DB/DROP_DB case?

Also, is it worth adding a TRACE log that dumps the event debug data? (Impala 
allows dynamically switching log levels for debugging).


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@74
PS13, Line 74:   static class CreateTableEventHandler extends 
MetastoreEventHandler {
method docs for each event handlers?


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java@78
PS13, Line 78: CREATE_TABLE
nit: Define these event types as static final globally and reuse them? I think 
that makes is more readable.


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java
File fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@182
PS8, Line 182: econditions.checkState(pollingFrequencyInSec >= 0);
             :     pollingFrequencyInSec_ = pollingFrequencyInSec;
             :   }
             :
> The getCurrentNotificationEventId is much cheaper (and hence faster too) th
Fair point.


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@347
PS8, Line 347:
> How do you envision this mechanism? Metastore does not provide any API to f
Nvm, I thought this was HiveClient configuration. Thanks for the correction.


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@402
PS8, Line 402:
> This is not a client side property but metastore server side config. The ge
Gotcha. I thought this was client side configuration.


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java
File fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@68
PS13, Line 68:  * Events which are not self-events, should be applied on a 
best-effort basis. All the
             :  * operations which change the state of catalog cache while 
processing a certain event
             :  * type must be atomic in nature. We rely on taking a write lock 
on version object in
             :  * CatalogServiceCatalog to make sure that readers are blocked 
while the metadata update
             :  * operation is being performed. Since the events are generated 
post-metastore operations,
             :  * such catalog updates do not need to update the state in Hive 
Metastore
             :  *
             :  * Additionally, in case of CREATE and DROP events, it is 
important to make sure that past
             :  * event information is not used to modify the future state of 
the cache. Eg. consider a
             :  * operation sequence of CREATE, DROP and CREATE on a table. In 
this case, when the first
             :  * two events (CREATE, DROP) are being processed, catalog will 
already have a table of the
             :  * same name in its cache. In case of such conflicts, event 
processor should look at the
             :  * creationTimeStamp of the metastore object to make sure that a 
object from future is not
             :  * modified by object from past.
             :  *
             :  * Following actions are currently taken based on the type of 
events received from
             :  * metastore.
             :  * <dl>
             :  * <dt>CREATE_TABLE, CREATE_DATABASE</dt>
             :  * <dd>A new table/database is created in Catalog respectively. 
In case a
             :  * table[database] with the same name already exists in catalog, 
we compare the createTime
             :  * of the table[database] to resolve the conflict. The newly 
created table/database are
             :  * Incomplete and should get loaded lazily when needed</dd>
             :  * <dt>DROP_TABLE, DROP_DATABASE</dt>
             :  * <dd>The table/database is dropped from catalog if they are 
present and if the
             :  * createTime of the object matches with the object present in 
the event</dd>
             :  * <dt>ALTER_TABLE</dt>
             :  * <dd>In case of alter table event, currently the code issues a 
invalidate table
             :  * command.
             :  * There is a special case of this event in case of renames, 
where the old table is
             :  * removed and a new IncompleteTable with the new name is 
created.</dd>
             :  * <dt>ALTER_DATABASE</dt>
             :  * <dd>In case of alter database events, currently only the case 
of changing default
             :  * location,owner and change in database description is 
supported.</dd>
             :  * <dt>ADD_PARTITION, ALTER_PARTITION and DROP_PARTITION</dt>
             :  * <dd>Currently, in case of these events, we issue invalidate 
on the table. This can be
             :  * optimized by issuing add/refresh/drop the partition at the 
Table level</dd>
             :  * <dt>All other events are currently ignored</dt>
             :  * </dl>
Move the  content from here to the event handlers impl?


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@139
PS13, Line 139: CreateDatabaseEventHandler
Should we make them singletons?


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@195
PS13, Line 195:     if (isInitialized_) {
nit: use AtomicBoolean#compareAndSet(). The current impl is theoretically racy.


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@236
PS13, Line 236: if (currentEventId <= lastSyncedEventId_) {
              :         return;
              :       }
nit: single line without braces.


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@251
PS13, Line 251: if (eventHandlers.containsKey(event.getEventType())) {
              :           
eventHandlers.get(event.getEventType()).process(catalog_, event);
              :         } else {
              :           
eventHandlers.get(DEFAULT_EVENT_HANDLER_KEY).process(catalog_, event);
              :         }
I think you can simplify this with, 

eventHandlers.getOrDefault(event.getEventType(), defaultHandler).process();


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@262
PS13, Line 262:         // Make sure to update lastSyncedEventId_ in case there 
are errors while
One question around this error handling, is it safe to continue at this point 
if the subsequent events depend on the failed event?


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@283
PS13, Line 283:   /**
nit: newline


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@298
PS13, Line 298:     if (INSTANCE == null) {
how about reversing this. (fewer indents)

if (INSTANCE != null) return INSTANCE;
return INSTANCE = new MetaStore...();


http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreNotificationException.java
File 
fe/src/main/java/org/apache/impala/catalog/MetastoreNotificationException.java:

http://gerrit.cloudera.org:8080/#/c/12118/13/fe/src/main/java/org/apache/impala/catalog/MetastoreNotificationException.java@25
PS13, Line 25: class MetastoreNotificationException extends ImpalaException {
nit: The general practice is to explicitly qualify objects with public/private 
keywords. Looks like it is needed in multiple places in the patch. (AFAIK 
Impala doesn't follow the package-private semantics)



--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 13
Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Comment-Date: Sat, 05 Jan 2019 00:51:50 +0000
Gerrit-HasComments: Yes

Reply via email to