Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
......................................................................


Patch Set 20:

(36 comments)

http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@307
PS21, Line 307: BackendConfig.INSTANCE.getHMSPollingIntervalInSeconds()
> eventPollingInterval
Done


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@319
PS21, Line 319:  LOG.error("Unable to fetch the current notification event id 
from metastore."
              :               + "Metastore event processing will be disabled.",
              :           e);
> nit: Multiple places in the code, try to format in fewer lines. Something l
I used the command given in 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536 as 
advised by some of the other members of the team. Unfortunately, it doesn't 
like some of these lines.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@324
PS21, Line 324:           e);
> Will the code that handles this exception write to the log both the message
The LOG.error above logs the trace as well since it takes in the underlying 
cause exception as the second argument. The exception is uncaught in this 
particular case on the caller side, since we want CatalogD to not come up if it 
is configured to using event processing but for some reasons isn't able to 
instantiate one.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1206
PS21, Line 1206:     // Even though we get the current notification event id 
before stopping the event
> clarification: reset() is equivalent to invalidating everything and fetchin
This is similar to Tim's comment and our discussion. The line 1216 gets the 
currentEventID (latest) from HMS. And the event processing begins from that id. 
So the event processing jumps to the latest event id once the reset() is 
complete. There is still a slight chance that we will reprocess some of the 
events which are generated during reset() execution. It is better than missing 
those events which would lead to inconsistent state between catalog and HMS as 
far as event processing is concerned. The comment gives details of that case.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1216
PS21, Line 1216:     long currentEventId = 
metastoreEventProcessor_.getCurrentEventId();
> Should this be here? Or in the event processor itself? If here, why is it n
It needs to be here to avoid the race condition which can lead to possible 
missed events. We want to get the latestEventId before triggering the reset() 
so that we can then restart the event processing after reset from this eventId. 
If we move the code in the start() method, there is a chance that we will miss 
processing some of the events which happen during after reset is complete, but 
before we started the event processing. See Tim's comment in the earlier review 
comments which gives a nice example.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1360
PS21, Line 1360:       Reference<Boolean> dbWasFound, Reference<Boolean> 
tblWasFound) {
> I still think that using the reference is non-standard Java. A simple solut
used the second option of return boolean and throw exception when db is not 
found.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1363
PS21, Line 1363:     Db db = getDb(dbName);
> Let's think about this. We return the flags because we want to know if the
if getting db before acquiring the lock has a race then it looks like addTable 
and renameTable also has a race. I will create separate JIRAs for fixing them


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1372
PS21, Line 1372:         tblWasFound.setRef(true);
> I wonder, do we really need to know if the table existed or not? Is it even
consider a case where user do create, drop, create with the same table name 
from Impala. In case of if the create event from the first create statement, 
the table presented in the event is stale and should not be used to add to the 
catalog. I think it is useful to log this information to make sure that the 
create event was ignored in such a case.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1475
PS21, Line 1475:       Reference<Boolean> tblWasfound, Reference<Boolean> 
tblMatched) {
> Same comments as above.
Moved the getDb() call in the block where lock is acquired first. I would like 
to defer the suggestion of changing the Reference to later patch in the 
interest of time.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1804
PS21, Line 1804:     Table incompleteTable = 
IncompleteTable.createUninitializedTable(db, tblName);
> move it after L1807?
Done


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java:

http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@130
PS20, Line 130:   public static abstract class MetastoreEvent {
> The new event class hierarchy looks very good.
Done


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@138
PS20, Line 138:     protected final Logger LOG = 
Logger.getLogger(this.getClass());
> protected static final ... MetastoreEvent.class);
this.getClass() gives us the ability to log the implementation class name which 
is what I thought would be useful to debug. Else you will have to grep the 
statement in this class to locate the line of the code.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@160
PS20, Line 160:       if 
(!event.getEventType().equalsIgnoreCase("CREATE_DATABASE")
> Move this code to the constructor for the suggested MetastoreTableEvent cla
Done


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@182
PS20, Line 182:     }
> Would move to the suggested MetastoreTableEvent class.
Done


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@199
PS20, Line 199:     private final String tblName_;
> These two fields shadow fields of the same name in the base class. This is
yeah, I missed cleaning this up in one of the many refactors this class went 
through.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@241
PS20, Line 241:         LOG.info(String.format("Added a table %s.%s after 
processing event "
> Nit: to make it easier to read/parse log records, maybe:
introduced a debugString method. Bharath had suggested the same.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@271
PS20, Line 271:         throw new MetastoreNotificationException("Unable to 
parse the alter table "
> This is a round-about way to do error handling. Generally, you EITHER
Would like to defer this to later in the interest of time.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@293
PS20, Line 293:           if (!invalidateCatalogTable(dbName_, tblName_)) {
> If this is a method, then no need to pass the fields since the method has v
missed that during refactor


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@318
PS20, Line 318:           throw new 
MetastoreNotificationException(String.format("Could not rename "
> Let's think about error handling a bit. There are two broad cases:
If the table rename did not happen, there is something very unexpected going on 
in the catalog. I think its better to halt the event processing in such a case. 
If the rename did not happen and the table in catalog still has the old name, 
it may be possible the future events act on the old name leading to 
inconsistent catalog state.

There is lot more to do in this feature and I would like to move on to the next 
sub-tasks. I would like to defer this suggestion to later unless you think it 
is critical for functionality of the patch.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@333
PS20, Line 333:         throw new MetastoreNotificationException(e);
> Suggestion: have MetastoreNotificationException take this class, or the mes
Good suggestion but in the interest of time the debugString method which does 
something similar is a good substitute for what your are suggesting here.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java:

http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@155
PS21, Line 155: event %d of type %s on table %s
> I think it is cleaner to define a debugString(NotificationEvent event) {} t
Good suggestion. Added a util method which gets the debug string for a given 
list of message and arguments. It appends EventId and EventType information to 
the given message.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@160
PS21, Line 160: if (!event.getEventType().equalsIgnoreCase("CREATE_DATABASE")
              :           && 
!event.getEventType().equalsIgnoreCase("DROP_DATABASE")) {
              :         tblName_ = 
Preconditions.checkNotNull(event.getTableName());
              :       } else {
              :         tblName_ = null;
              :       }
> nit: Reformat to the following? (fewer branches and use enum)
The tblName_ is a final object so you can assign to it only once.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@181
PS21, Line 181: tblName_
> checkNotNull()?
Done


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@198
PS21, Line 198:    private final String dbName_;
              :     private final String tblName_;
> aren't these set in the parent c'tor?
yes, Looks like I missed removing it during one of the many iterations of the 
patch. Thanks for catching this.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@347
PS21, Line 347: droppedTable_
> isn't this more like tblTobeDropped_ ?
It depends on whether you see it as a object which is dropped in metastore or 
which needs to be dropped in Catalog. Since this object is retrieved by parsing 
the NotificationEvent and the type of the object is metastore.api.Table not 
impala.catalog.Table, I thought the name made sense.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@434
PS21, Line 434: //
> nit: remove, you are already in a comment block.
Done


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@205
PS20, Line 205:       eventProcessorStatus_ = EventProcessorStatus.ACTIVE;
> Should we write something to the log, perhaps at DEBUG or TRACE levels, tha
added a info log.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@265
PS20, Line 265:       return 
metaStoreClient.getHiveClient().getCurrentNotificationEventId().getEventId();
> How is this used? How is it synchronized?
Initially this was part of start() method. But we needed to get the 
currentNotificationID from metastore in catalogServiceCatalog reset() method. 
So instead of duplicating the code, I created a method which is called from 
start() method.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@285
PS20, Line 285:       LOG.info(String.format("Metastore event processing 
restarted. Last synced event id "
> Isn't start() called only once? If not, what is the use case for pausing an
the start() method starts the event processing from whatever is the current 
event id in metastore. This is done when catalog service is started. 
start(eventId) resumes the event processing from the given eventId. THis is 
done from the reset() method. WHen a user issues invalidate metadata, the 
catalog state is cleared and we can jump to the latest eventId. Its also good 
to stop event processing during reset() to avoid races between events from 
reset()


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@360
PS20, Line 360:       Preconditions.checkState(lastProcessedEvent != null);
> The catch-block handles two types of exceptions. It also handles two source
good point. However, adding two catch blocks is not sufficient since event 
process can also throws MetastoreNotificationException. I added a simple check 
of lastProcessedEvent to find out if the getNextMetastoreEvents failed or the 
event.process()


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@371
PS20, Line 371:   private void dumpEventInfoOnErrors(NotificationEvent event) {
> The name indicates why the method might be called. But, the method itself d
Done


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@373
PS20, Line 373:                             .append("Event id : " + 
event.getEventId() + "\n")
> Nit: If using a StringBuilder, write each field as a separate .append() cal
Done


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@376
PS20, Line 376:                             .append("Database name : " + 
event.getDbName() + "\n");
> Another nit: colons usually have no leading space: "Database name: foo", no
Done


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@397
PS20, Line 397:   public static synchronized ExternalEventsProcessor 
getOrCreate(
> typically called "instance()" or "getInstance()".
renamed the method and changed the INSTANCE to instance. However, I do think 
that this method needs to be synchronized to avoid the risk of double 
instantiation in the future this is called from multiple threads


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@310
PS21, Line 310: if (currentEventId <= lastSyncedEventId_) {
              :         return Collections.emptyList();
              :       }
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@373
PS21, Line 373:                             .append("Event id : " + 
event.getEventId() + "\n")
> weird formatting.
Done



--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 20
Gerrit-Owner: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Sat, 19 Jan 2019 01:54:56 +0000
Gerrit-HasComments: Yes

Reply via email to