Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for automatic invalidates by polling 
metastore events
......................................................................


Patch Set 8:

(46 comments)

Posting my initial set of comments. Haven't looked at the tests yet. Will take 
another pass after code refactoring. (Also, please update your profile name in 
gerrit settings).

http://gerrit.cloudera.org:8080/#/c/12118/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12118/8//COMMIT_MSG@7
PS8, Line 7: IMPALA-7970 : Add support for automatic invalidates by polling 
metastore events
nit: wrap lines to less than 72 chars.


http://gerrit.cloudera.org:8080/#/c/12118/8//COMMIT_MSG@21
PS8, Line 21:
Is the self-event detection coming in a separate patch?


http://gerrit.cloudera.org:8080/#/c/12118/8/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/12118/8/be/src/common/global-flags.cc@246
PS8, Line 246: invalidate or refresh
This only invalidates right? (refresh has a different meaning in Impala's 
context)


http://gerrit.cloudera.org:8080/#/c/12118/8/be/src/common/global-flags.cc@247
PS8, Line 247: cached metadata about tables
nit: rephrase to  ...cached table metadata based on...


http://gerrit.cloudera.org:8080/#/c/12118/8/be/src/common/global-flags.cc@251
PS8, Line 251:
..has..


http://gerrit.cloudera.org:8080/#/c/12118/8/be/src/common/global-flags.cc@253
PS8, Line 253:     "Additionally, "
Since this is a user-facing message, we should probably clarify that this is 
for "external" events.


http://gerrit.cloudera.org:8080/#/c/12118/8/be/src/common/global-flags.cc@258
PS8, Line 258: To enable this feature, a "
             :     "non-zero "
             :     "value of the flag must be applied to catalogd and impalad.
Looks like this is already mentioned above. Also, is it needed on the 
coordinators? Doesn't look like it.


http://gerrit.cloudera.org:8080/#/c/12118/8/be/src/common/global-flags.cc@260
PS8, Line 260:     "value of the flag must be applied to catalogd and 
impalad.");
nit: format to fewer lines like others.


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@229
PS8, Line 229:   private MetastoreEventsProcessor metastoreEventProcessor;
Add a oneliner comment. Also, we use an "_" prefix for class variables. foo_


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@292
PS8, Line 292: if configured
nit: sounds vague, rephrase?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@292
PS8, Line 292: In order to determine if
             :    * event polling is enabled, this method looks at the value of 
HMS polling frequency in
             :    * <code>BackendConfig</code>.
I think we can get rid of the implementation detail here, remove?

Also, I don't think you are doing that check here? Return early (and log) if it 
is disabled?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@304
PS8, Line 304: LOG.error(
             :             "Unable to fetch the current notification event id 
from metastore."
             :                 + "Metastore event processing will be 
disabled.");
Isn't this a fatal error? (Include the stack trace in the error log)


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java
File fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@45
PS8, Line 45: events
Probably good to preface this with a what an "event" is with a pointer to 
Hive's class. Basically when is an event generated etc (since that isn't quite 
obvious for Impala folks). Also probably worth mentioning that the goal is to 
apply changes external to Impala.


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@47
PS8, Line 47: The polling of the events is from a
            :  * given start event id at the construction time. Subsequently, 
it updates the last event
            :  * id until which it has already synced so that next poll is from 
the last synced event
            :  * id.
Maybe just rephrase to "we keep track of last synced event in each iteration..."


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@51
PS8, Line 51: 1000
MAX_EVENTS....


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@51
PS8, Line 51: In case there is a use-case in the future, we can change the 
batch size to a
            :  * configurable value. The polling frequency can be configured 
using the backend config
            :  * hms_event_polling_frequency_s
Remove?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@56
PS8, Line 56:  * metastore. In case of CREATE_TABLE/CREATE_DATABASE events, a 
new table/database is
nit: Make this bullet points instead? (supported / non-supported)


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@64
PS8, Line 64: In order to function correctly, it assumes that Hive metastore is 
configured correctly
            :  * to generate the events. Following configurations should be set 
in metastore
            :  * hive-site.xml so that events have sufficient information and 
actions can be taken base
            :  * on them.
            :  *
            :  * <code>hive.metastore.notifications.add.thrift.objects</code> 
should be set to
            :  * <code>true</code> so that event messages contain thrift object 
and can be used to
            :  * create new objects in Catalog 
<code>hive.metastore.alter.notifications.basic</code>
            :  * should be set to <code>true</code> so that all the needed 
ALTER events are generated
I think this is implementation detail can be omitted in the class comment 
(since you already mentioned it in the init methods).


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@84
PS8, Line 84: METASTORE_NOTIFICATIONS_ADD_THRIFT_OBJECTS
_CONFIG_KEY


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@84
PS8, Line 84:   static final String METASTORE_NOTIFICATIONS_ADD_THRIFT_OBJECTS =
for consistency, use public/private?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@88
PS8, Line 88:   private long lastSyncedEventId_;
Observability: expose this as a catalog server metric? (look at 
<catalog-webui>/metrics). Also helps with testing.


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@93
PS8, Line 93:   private final ScheduledExecutorService scheduler =
nit: "_" for class members (multiple places)


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@109
PS8, Line 109:   private static final int EVENTS_BATCH_SIZE = 1000;
oneliner comment.


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@113
PS8, Line 113: Preconditions.checkNotNull(catalog);
             :     this.catalog_ = catalog;
this.catalog_ = Preconditions.checkNotNull(catalog);


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@122
PS8, Line 122: if (pollingFrequencyInSec > 0) {
             :       scheduleAtFixedDelayRate(pollingFrequencyInSec);
             :     }
Probably worth moving this check to the parent class and skip the c'tor if this 
is not set.


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@128
PS8, Line 128: of
update.


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@139
PS8, Line 139: Starting metastore event polling with interval " + 
pollingFrequencyInSec
             :         + " seconds."
nit: String.format()


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@147
PS8, Line 147: );
include stack trace


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@156
PS8, Line 156:   private long getCurrentNotificationID() {
Looks like the only caller is processHMSNotificationEvents(), can we reuse the 
msClient? (do we need a separate method? can't we just move it to the try-catch 
block inside processHMSNotificationEvents()?)


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@174
PS8, Line 174: lastSyncedEventId_ =
             :         lastSyncedEventId_ == -1 ? getCurrentNotificationID() : 
lastSyncedEventId_;
             :     if (lastSyncedEventId_ == -1) {
             :       LOG.warn(
             :           "Unable to fetch current notification event id. Cannot 
sync with metastore");
             :
Like I commented below, it appears to me that  (lastSyncedEventId_ != -1) is a 
precondition. Thoughts?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@182
PS8, Line 182: CurrentNotificationEventId currentNotificationEventId =
             :           
msClient.getHiveClient().getCurrentNotificationEventId();
             :       long currentEventId = 
currentNotificationEventId.getEventId();
             :       if (currentEventId > lastSyncedEventId_) {
Can we skip this and directly do a single RPC to fetch the latest events?

events = getEvents();
if (events.size() == 0) break;


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@187
PS8, Line 187: EVENTS_BATCH_SIZE
Looks like we are only doing only a single batch per iteration. Should we 
rename this to MAX_EVENTS_PER_RPC or something?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@191
PS8, Line 191:           // update the lastSyncedEventId before processing the 
event itself
Looks like this event is lost forever. Any idea how this can be surfaced back 
to the user?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@196
PS8, Line 196:           processEvent(event);
If a single event processing throws an exception, we seem to be breaking out of 
the loop. Does that mean everything after that is lost too?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@212
PS8, Line 212: dbName
This seems to be optional in the thrift spec and the next line there is a (db 
!= null) check. Are there events where this is not set in the response? Should 
we add those tests?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@215
PS8, Line 215:       case "ADD_PARTITION":
Strings look risky, is there no EventType enum or some such?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@221
PS8, Line 221:       case "CREATE_TABLE":
This seems to contain too much code and branches. Can we define an EventHandler 
interface (or abstract class) and implement it for various Events and register 
them at init? That simplifies the code here to dispatch. Thoughts?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@222
PS8, Line 222:         catalog_.addTable(dbName, tblName);
What if this races with an addTable() from the local cluster?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@235
PS8, Line 235:         catalog_.removeTable(dbName, tblName);
Same as above. What if there are races? (multiple cases below)


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@257
PS8, Line 257: /*if (db != null) {
             :           msDb = 
MetastoreEventsProcessor.getDatabaseFromMessage(event);
             :           db.setMSDb(msDb);
             :         }*/
Is this intentional? (given you changed the Db class locking protocol)


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@287
PS8, Line 287:   private boolean processRenameTableEvent(NotificationEvent 
event)
It looks like to me like these should call into the CatalogOpExecutor (modulo 
apply*() calls) primarily for

- Locking considerations (races)
- To avoid code duplication


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@347
PS8, Line 347: Check if %s is set to true in metastore configuration
Lets force this check?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@401
PS8, Line 401: startSyncFromId
If startSyncFromId == -1, shouldn't that be a fatal error?


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@402
PS8, Line 402:     if (INSTANCE == null) {
Do we need to validate the HiveClient config?

property>
  <name>hive.metastore.notifications.add.thrift.objects</name>
  <value>true</value>
</property>
<property>
  <name>hive.metastore.alter.notifications.basic</name>
  <value>false</value>
</property>


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@412
PS8, Line 412:   static class MetastoreNotificationException extends 
ImpalaException {
Move to its own class file


http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/test/resources/postgresql-hive-site.xml.template
File fe/src/test/resources/postgresql-hive-site.xml.template:

http://gerrit.cloudera.org:8080/#/c/12118/8/fe/src/test/resources/postgresql-hive-site.xml.template@223
PS8, Line 223: <!-- This property is required to issue invalidates based on 
metastore events. See IMPALA-7954 for details -->
nit:wrap lines.



--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Wed, 26 Dec 2018 02:23:46 +0000
Gerrit-HasComments: Yes

Reply via email to