Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
......................................................................


Patch Set 21:

(32 comments)

Very nice improvements: the event classes greatly simplify the logic.

This round focused a bit on error reporting and logging. Many specific 
comments. As always, these are just questions and suggestions, you should 
decide which make sense.

http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@324
PS21, Line 324:           e);
Will the code that handles this exception write to the log both the message 
from this exception AND the message from the "cause" exception?

The goal is that, if a failure occurs, we get the detailed error message as 
well as the "summary" message.

If we log the stack trace, then we're good.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1216
PS21, Line 1216:     long currentEventId = 
metastoreEventProcessor_.getCurrentEventId();
Should this be here? Or in the event processor itself? If here, why is it 
needed other than to pass it along to the event processor?

Best not to leak implementation details if possible.


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1360
PS21, Line 1360:       Reference<Boolean> dbWasFound, Reference<Boolean> 
tblWasFound) {
I still think that using the reference is non-standard Java. A simple solution 
is:

public Pair<Boolean,Boolean> addTableIfNotExists(String dbName, String tblName) 
...

Better would be:

enum TableStatus (DB_NOT_FOUND, TBL_NOT_FOUND, OK)

public TableStatus addTableIfNotExists(String dbName, String tblName) ...

Or, suppose we do this, since this method currently returns void:

* Throw an exception if the DB does not exist.
* Return true/false to indicate if the table was created (true) or already 
existed (false).


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1363
PS21, Line 1363:     Db db = getDb(dbName);
Let's think about this. We return the flags because we want to know if the DB 
was found or not. We could just as easily do:

DB db = catalog.get(dbName_);
if (db == null) { handle error; }

The reason we don't do the above is we want to prevent race conditions. But... 
We do the check here outside the lock. So, we have a race condition anyway.

Should either 1) the DB lookup be done by the caller to simplify the code, or 
2) be done in the lock to prevent race conditions?


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1372
PS21, Line 1372:         tblWasFound.setRef(true);
I wonder, do we really need to know if the table existed or not? Is it even 
useful? Consider.

* Impala creates a table. It updates its own cache and informs HMS.
* HMS sends a notification. In processing the notification, we determine that 
the table already exists.

Or

* A table is added in HMS.
* The user immediately kicks of a query in Impala, causing a load of metadata.
* Impala polls for notifications, learns that the table is new, tries to add 
it, and finds that the table already exists.

In this cases, do we get anything by logging that the table exists? If not, do 
we need to return the exists-or-not result?


http://gerrit.cloudera.org:8080/#/c/12118/21/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1475
PS21, Line 1475:       Reference<Boolean> tblWasfound, Reference<Boolean> 
tblMatched) {
Same comments as above.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java:

http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@130
PS20, Line 130:   public static abstract class MetastoreEvent {
The new event class hierarchy looks very good.

One nit: some events are DB-level events, others are table-level events. This 
base class includes the table-level stuff and the reader (and code) must know 
what table stuff is available and when it is not.

So, a further small improvement would be to create another abstract class, 
MetastoreTableEvent, to hold the common table stuff. The table class extends 
this one. DB-level events also extend this one.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@138
PS20, Line 138:     protected final Logger LOG = 
Logger.getLogger(this.getClass());
protected static final ... MetastoreEvent.class);


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@160
PS20, Line 160:       if 
(!event.getEventType().equalsIgnoreCase("CREATE_DATABASE")
Move this code to the constructor for the suggested MetastoreTableEvent class 
to avoid redundantly checking the event type.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@182
PS20, Line 182:     }
Would move to the suggested MetastoreTableEvent class.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@199
PS20, Line 199:     private final String tblName_;
These two fields shadow fields of the same name in the base class. This is 
generally considered undesirable.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@207
PS20, Line 207:       tblName_ = 
Preconditions.checkNotNull(event.getTableName());
OK to do the checks, but not super necessary since the code they check is part 
of this file...


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@241
PS20, Line 241:         LOG.info(String.format("Added a table %s.%s after 
processing event "
Nit: to make it easier to read/parse log records, maybe:

Event %d, (%s): ...

Where the two args are event id and type. That way all the messages follow a 
common format in the log.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@271
PS20, Line 271:         throw new MetastoreNotificationException("Unable to 
parse the alter table "
This is a round-about way to do error handling. Generally, you EITHER

1) Use Preconditions to do the check and throw the exception, or
2) Do the check and throw the exception yourself.

Here, would be better to say:

tableBefore_ = alterTableMessage.getTableObjBefore();
if (tableBefore_ == null) {
  throw new ...Exception("Event %d (%s): Invalid message, missing 
tableObjBefore.")
}

The additional info will be helpful when someone changes the message on HMS 
without changing Impala and we have to track down the problem based on a long 
file.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@293
PS20, Line 293:           if (!invalidateCatalogTable(dbName_, tblName_)) {
If this is a method, then no need to pass the fields since the method has 
visibility to the fields.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@318
PS20, Line 318:           throw new 
MetastoreNotificationException(String.format("Could not rename "
Let's think about error handling a bit. There are two broad cases:

1. The system went off the rails and got into an unexpected state. 
Preconditions are great for this.
2. The system system is in a bad state due to environment factors. The user has 
to clean up the mess.
3. The user provided bad input. The user will fix it and try again.

Given this framework, are the errors here of the 1st or 2nd category?

For example, this one, not being able to rename a table, seems to put the 
system in a bad state. But, the following error, trying to rename a table not 
in the cache, seems relatively benign.

Should we log some errors, but continue, while throwing a fatal error for 
others? Do we need a MetastoreNotificationFailureException vs. a 
MetastoreNotificationWarningException? Or, just log warnings? If only log 
warnings, then exceptions are for the "something is terribly wrong, stop the 
world" kind of error.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@333
PS20, Line 333:         throw new MetastoreNotificationException(e);
Suggestion: have MetastoreNotificationException take this class, or the message 
class, as a constructor argument. Have the exception class create a full 
message by adding the message ID and type as suggested for logs. That way, if 
the exception message appears in a log, we'll have more context about what went 
wrong:

super(String.format("Event % (%s): %s", id, messageName, e.getMessage), e)


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@383
PS20, Line 383:       if (removedTable == null) {
Nit:

if (removedTable != null) {
  ...
else if (!tblMatched.getRef()) {
 ...
else if (!tblWasFound.getRef()) {
  ...
}

That is, you can reword this to a chain of if/then rather than a nested if.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@389
PS20, Line 389:           LOG.debug(String.format("Table %s from event id %d 
was not removed since it "
Here, for example, is a nice "you might want to know" kind of message that does 
not throw an exception.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@440
PS20, Line 440:     public void process() {
I really like how this turned out. Each event is clearly separated. Very easy 
to follow what's happening. The comments are also easy to follow because they 
focus on one specific event. Nicely done.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@205
PS20, Line 205:       eventProcessorStatus_ = EventProcessorStatus.ACTIVE;
Should we write something to the log, perhaps at DEBUG or TRACE levels, that 
capture the major steps?

And, should we write something at, say, INFO level to confirm that this feature 
is, in fact, started? Will help with "is the feature really turned on" kinds of 
Support questions.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@215
PS20, Line 215:     statusLock_.readLock().lock();
Seems that statusLock_ is used only within this class. The only read lock is 
for this function, else we use a write lock.

I wonder, is this over complicating the code? Can't we just use "synchronized" 
on each method where we need a write lock?

And, since the status itself is a pointer, just returning it without 
synchronization is good enough: you'll always get a valid pointer. There is no 
race condition because the caller holds no lock: with or without a lock the 
reader gets a value at a point in time that may change in the next instant.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@229
PS20, Line 229:     return lastSyncedEventId_;
Here is an example where we're returning a value without locking, even though 
the value itself is changed in a write lock. This is why the above method also 
does not need a lock.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@251
PS20, Line 251:       eventProcessorStatus_ = EventProcessorStatus.STOPPED;
Should we check if the processor is in the run state before moving to stopped? 
Or, at least, before emitting a log message?


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@253
PS20, Line 253:           "Event processing is paused. Last synced event id is 
%d", lastSyncedEventId_));
Nit: Paused or stopped? Can we move from STOPPED state back to run state 
(paused) or is STOPPED terminal (stopped)?


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@265
PS20, Line 265:       return 
metaStoreClient.getHiveClient().getCurrentNotificationEventId().getEventId();
How is this used? How is it synchronized?

if this is simply retrieved by the client, then passed to start(), then 
shouldn't start() make this call instead?


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@285
PS20, Line 285:       LOG.info(String.format("Metastore event processing 
restarted. Last synced event id "
Isn't start() called only once? If not, what is the use case for pausing and 
resuming?

If started the first time, there will be no previous value. Do we want to say 
"updated from 0 to 100234"? Or, would it be better to say "started at 1000234"?


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@360
PS20, Line 360:       Preconditions.checkState(lastProcessedEvent != null);
The catch-block handles two types of exceptions. It also handles two sources of 
exceptions. I wonder if the catch-block mixes the two conditions.

The call to getNextMetastoreEvents() could fail due to, say, an RPC error. In 
this case, there is no lastProcessedEvent. On the other hand, processing the 
event could fail and there would be a lastProcessedEvent value.

Should the catch block handle both cases? Different catch blocks for the two 
kinds of errors?


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@371
PS20, Line 371:   private void dumpEventInfoOnErrors(NotificationEvent event) {
The name indicates why the method might be called. But, the method itself does 
not really care about that. Maybe call it "dumpEventToLog()" or some such.


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@373
PS20, Line 373:                             .append("Event id : " + 
event.getEventId() + "\n")
Nit: If using a StringBuilder, write each field as a separate .append() call. 
Otherwise, Java creates a temporary string builder, appends the two strings, 
converts the result to a string, and then appends that to your string builder.

Or, you may find it easier to use String.format().


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@376
PS20, Line 376:                             .append("Database name : " + 
event.getDbName() + "\n");
Another nit: colons usually have no leading space: "Database name: foo", not 
"Database name : foo".


http://gerrit.cloudera.org:8080/#/c/12118/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@397
PS20, Line 397:   public static synchronized ExternalEventsProcessor 
getOrCreate(
typically called "instance()" or "getInstance()".

Also, since the instance is created here, INSTANCE is not a constant, and so 
should not be capitalized.

Finally, what are the intended semantics? Is start-up multi-threaded? Or, is 
it, as is typical, single-threaded? If single threaded, no "synchronized" 
keyword is needed.



--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 21
Gerrit-Owner: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Paul Rogers <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Fri, 18 Jan 2019 02:35:03 +0000
Gerrit-HasComments: Yes

Reply via email to