Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
......................................................................


Patch Set 15:

(35 comments)

http://gerrit.cloudera.org:8080/#/c/17859/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17859/11//COMMIT_MSG@7
PS11, Line 7: Sync db/table in catalog cache to latest HMS event id when 
performing
            : DDL operations via catalog HMS endpoints
> If this patch is close to getting merged, now is a good to add more details
Yes I will add more details in the commit message.  Will fix message format 
styles as well.


http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG@7
PS12, Line 7: IMPALA-10926: Sync db/table in catalog cache to latest HMS event 
id when performing
            : DDL operations via catalog HMS endpoints
> Since this patch is not a WIP anymore, can you please follow the commit mes
Ack


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@457
PS6, Line 457:           tableInfo);
             :     }
             :     int tableIndex=-1, versionLockCount = 0;
             :     try {
             :       for(tableIndex = 0; tableIndex < numTables; tableIndex++) {
             :         Table tbl = tables[tableIndex];
             :         if (!tryWriteLock(tbl)) {
             :           LOG.debug("Could not acquire write lock on table: " + 
tbl.getFullName());
             :           return false;
             :         }
             :         versionLockCount += 1;
             :       }
             :       // in case of success, release version write lock for all 
tables except last
             :       if (tableIndex == numTables) {
             :
> I think the versionLock.writeLock().unlock() should be moved out finally bl
Currently tryWriteLock(tbl) does not throw an exception. But it still makes 
sense to release version lock in a finally block. Will fix it.


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3652
PS6, Line 3652:
> annotate with @VisibleForTesting if this was for testing.
It is used in testing as well as in JniCatalog (after initializing 
catalogServiceCatalog object)


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3658
PS6, Line 3658:
> nit, missing newline.
Ack


http://gerrit.cloudera.org:8080/#/c/17859/11/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/11/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@459
PS11, Line 459: leIndex=-1,
> I had left some comments in the older gerrit url for this patch. Can you pl
Sure


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@457
PS12, Line 457:           tableInfo);
              :     }
              :     int tableIndex=-1, versionLockCount = 0;
              :     try {
              :       for(tableIndex = 0; tableIndex < numTables; tableIndex++) 
{
              :         Table tbl = tables[tableIndex];
              :         if (!tryWriteLock(tbl)) {
              :           LOG.debug("Could not acquire write lock on table: " + 
tbl.getFullName());
              :           return false;
              :         }
              :         versionLockCount += 1;
              :       }
              :       // in case of success, release version write lock for all 
tables except last
              :       if (tableIndex == numTables) {
              :
> a RuntimeException thrown at line 459 will not release the table locks as w
Makes sense. I didn't think about the RunTimeException. Will address it.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2672
PS12, Line 2672:
> why do we need to override this method?
I think the intention for this change was - currently for removeTable() api , 
we acquire global write lock. So the read for the table should acquire global 
read lock.


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115
PS6, Line 115:
> need this?
The intention was - by making lastSyncedEventId_ volatile, the get and set apis 
for this variable would become thread safe. Otherwise to read this, one would 
have to take read lock.


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115
PS6, Line 115:
> please add a comment here explaining what the value of this field signifies
Ack


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@170
PS12, Line 170: initMetrics
> since these metrics are not getting exposed anywhere. can we remove these?
Are you suggesting to pass metrics as null in
 MetastoreEventsProcessor.syncToLatestEventId(catalog_, table,
            catalog_.getEventFactoryForSyncToLatestEvent(), metrics_, false); ?
If yes, then we need to handle null condition for metrics in 
MetastoreEvents.java


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@168
PS12, Line 168: // St
> do we need these changes to this class?
Not really. I will revert back to original behavior.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@461
PS12, Line 461: shouldSkipWhenSyncingToLatestEventId
> why are we not just using isSelfEvent() here? Would it not be cleaner to ch
Sorry I didn't understand your comment


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@576
PS12, Line 576:   if (c
> is this commented for a reason?
No. Forgot to clean it up. Will do it.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@800
PS12, Line 800:           return true;
              :         }
              :       } catch (DatabaseNotFoundException e) {
              :         infoLog("Skipping on table {} because db {} not found 
in cache", tblName,
              :
> if the table is not present do we need to check if it is in the deletelog?
How would checking in deletelog help? Even if it is present, should we not skip 
processing this event?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@990
PS12, Line 990:
> I think we should implement a check for lastSyncedEventId based self-event
MetastoreTableEvent class already has a default implementation for skipping the 
event if the table is already synced till this event id. Do we need a custom 
implementation here?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1175
PS12, Line 1175:  the ALTER_TABLE event is due a table rename, this method 
removes the old table
               :      * and creates a new table with the new name. Else, this 
just issues a refr
> It is not clear to me why we are checking on both the before and after tabl
This check is done for rename table. Already have an explanation for this in 
method's description

 // If the alter table event is generated because of table rename then event
 // should *NOT* be skipped if old table is not synced this till event AND
 //  new table doesn't exist in cache. Skip otherwise

Revisiting the logic and I think we can skip this logic for rename and simply 
rely on renameTableFromEvent() when actually processing the event.

Thoughts?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1699
PS12, Line 1699:
> Why does this class not implement shouldSkipWhenSyncingToLatestEventId?
MetastoreTableEvent class already has the default implementation for this.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@328
PS12, Line 328: shouldBeAlreadyLocked
> Is there a caller which passes this as false?
Yes, during full table reload from TableLoader. WriteLock is not required there 
since we are doing reload on a fresh table
Rethinking about this - It probably makes more sense to acquire write lock on a 
newly created table in TableLoader  and get rid of shouldBeAlreadyLocked.

Thoughts?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@329
PS12, Line 329: Preconditions.checkState(tbl.isWriteLockedByCurrentThread(),
              :           String.format("Write lock is not held on table %s by 
current thread",
              :               tbl.getFullName()));
> I think this preconditions check should be done in any case to catch bugs w
Sorry I couldn't understand your comment.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@386
PS12, Line 386: tbl.setLastSyncedEventId(currentEvent.getEventId());
> can this logic be moved to the event.processIfEnabled() itself?
Actual event processing logic like: addPartitionsIfNotRemovedLater, 
reloadPartition() (for alter partition) is already taking care of setting event 
id in table/db.
I think we shouldn't move this to event.processIfEnabled() because the actual 
implementation of process() api should decide whether a given event id should 
be set in table/db or not


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@458
PS12, Line 458: TODO: should we ignore case
> yes, case should be ignored. Also I think we should make sure that the cata
Do you mean we support only "hive" catalog? If yes should the filter be 
modified to

if (event.getCatName() != null && event.getDbName() != null && 
event.getTableName() != null) {
          return event.getCatName().equalsIgnoreCase("hive") && 
tbl.getDb().getName().equalsIgnoreCase(event.getDbName()) &&
              tbl.getName().equalsIgnoreCase(event.getTableName());

Same question for db filter as well. Should we check for catalog name?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@475
PS12, Line 475:           return 
db.getName().equalsIgnoreCase(event.getDbName());
Should we have catalog name check here? Ignore events if catalog name is not 
hive?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@478
PS12, Line 478: event.getTableName() == null
> not sure I get this part. Why do we need to do this?
The check is to ignore any table specific events since those events would have 
table name as non null and accept the rest


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@533
PS12, Line 533: storeEventFactory.get
> can we use the singleton way to get the factory here?
I tried using singleton object after our discussion but that introduced some 
tests failures. Most likely some race conditions. I couldn't repro them on 
local. Also since the logs from pre commit tests were missing upstream I 
couldn't debug it further. Therefore switched to using non singleton object of 
event factory.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@283
PS12, Line 283: dropDbIfExists
> why not sync to latest event here?
My thought was - when the db is eventually going to be dropped, why sync it to 
latest event i.e drop event and then  drop the db? We can by pass the sync and 
drop the db right away.

Thoughts?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@857
PS12, Line 857: syncToLatestEventId(srcTbl, apiName);
              :       syncToLatestEventId(destinationTbl, apiName);
              :     } catch (Exception e) {
              :       rethrowException(e, apiName);
> Can this be removed now?
Ack


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1105
PS12, Line 1105:
> why not sync to latest event here?
Same reasoning as dropDbIfExists.

Currently when syncing to latest event if we encounter drop table/db event then 
we drop the db/table and stop processing further events (if any). So why not 
drop the table right away?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@1279
PS12, Line 1279: ename = !db
> Do we need a preconditions check to confirm that table is locked by this th
Yes we should have.
For now, I have removed this method and moved the logic in the caller since 
there is only one caller of this method.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@909
PS12, Line 909:   if (dbToAlter == null) {
              :       LOG.debug("Event id: {}, not altering db {} since it does 
not exist in catalogd",
              :           eventId, dbName);
              :
> check deleteEventLog here?
How would that help? Even if db is present in deleteEventLog, is it not already 
deleted?


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1611
PS12, Line 1611:         if 
(!BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls()) {
               :           assertEquals(EventProcessorStatus.NEEDS_INVALIDATE,
               :               eventsProcessor_.getStatus());
               :         }
> not sure I fully understand this.
When sync to latest event id is enabled, any  event on a non existing table is 
skipped (check shouldSkipWhenSyncingToLatestEventId in MetastoreTableEvent).
 Regarding this change, I am actually not sure if this is causing any 
regression but the behavior in case the flag is turned on seems to be right.
Let me know your thoughts


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
File 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@199
PS12, Line 199: prevSyncedEventId);
> Will this cause flakiness of the test since the HMS is shared among multipl
Yes the check will fail in that case. Thanks for pointing it out. I will modify 
the check at all places in this file to

getLastSyncedEventId() > prevSyncedId


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@240
PS12, Line 240: // added partitions should not reflect in table
              :             // stored in catalog cache
              :             assertTrue(tbl.getPartitions().size() == 0);
> not sure I understand this. The add_partitions API in HMS handler should sy
addPartitionsInHms() method does not use catalogHmsClient and uses normal hms 
client to add partitions to metastore. Therefore if cataligHmsClient is by 
passed, then that ddl operation should not affect the cache.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@550
PS12, Line 550:             catalogHmsClient_.dropTable(TEST_DB_NAME, tblName, 
true, true);
> can you also assert here that lastSyncedEventId > createEventId
Ack


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@561
PS12, Line 561: EventsProcessor prevEventProcessor =
> not sure what is the goal of this test? This is testing functionality which
Thats correct. However tests in MetastoreEventsProcessorTest currently do not 
test if table has been synced to latest event id. We can modify those tests. 
May be run all metastore events processor tests as parameterized tests with and 
without sync to latest event id patch ?

Thoughts?



--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 15
Gerrit-Owner: Sourabh Goyal <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sourabh Goyal <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Yu-Wen Lai <[email protected]>
Gerrit-Comment-Date: Thu, 07 Oct 2021 20:03:39 +0000
Gerrit-HasComments: Yes

Reply via email to