Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
......................................................................


Patch Set 20:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG@7
PS12, Line 7: IMPALA-10926: Sync db/table in catalog cache to latest HMS event 
id when performing
            : DDL operations via catalog HMS endpoints
> ping
@Vihang: I already have this in my to-do list. Will write a detailed commit 
message.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2672
PS12, Line 2672: lse;
> I am not sure I fully understand. If this change is not needed for the patc
I understand and as discussed over call, I will remove this method


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@288
PS19, Line 288: syncToLatestEventFactory_
> Based on my understanding this field here is unnecessary and is complicatin
@Vihang: We do need to pass event factory object to syncToLatestEventId() in 
metastoreEventProcessor since it is a static method. However I agree that we 
should not create a new event factory and instead modify isSelfEvent() to 
accomodate sync to latest event id flag. One issue that I see is - if event 
processing is disabled then MetastoreEventProcessor is not initialized and 
there would be no way to access eventFactory object from it.

Few ways to solve this:
1. Decouple MetastoreEventProcessor and EventFactory creation. In JniCatalog, 
we can create a common event factory that would be set in EventProcessor as 
well as used elsewhere. Doing so, we need to make sure that the factory is 
thread safe.

2. Make MetastoreEventFactory singleton and use it from all the places. This 
would avoid JniCatalog route. I had tried this approach in the past and 
encountered some test failures. Didn't investigate the failures in depth but it 
appeared to be race conditions issues. I can take a shot at it again if the 
approach seems cleaner.

Let me know your thoughts.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@316
PS19, Line 316:
> this is unused and can be removed.
Ack


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115
PS6, Line 115:
> All the places where I see this variable getting accessed, I see that it is
As discussed over call, we will keep it as volatile so that EventProcessor can 
check if it needs to process an event or not without acquiring readlock on 
db/table.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@134
PS19, Line 134:     LOG.debug("createEventId_ for db: {} set to: {}", 
getName(), createEventId_);
              :     if (lastSyncedEventId_ < eventId) {
              :       setLastSyncedEventId(eventId);
              :     }
> Pls remove if not needed anymore.
Sure. I had added this check earlier but then saw some failures in the tests 
related to ddls from Impala shell . For now, I will add a TODO comment and we 
can address it in a follow up jira.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@153
PS19, Line 153:   /**
              :    * Creates a Db object with no tables based on the given 
TDatabase thrift struct.
              :    */
              :   public static Db fromTDatabase(TDatabase db) {
              :     ret
> pls remove if not needed.
Same as previous comment


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@186
PS12, Line 186:
> Thanks for adding the comment. But similar to the Db.java's volatile keywor
As discussed, we will keep the variable as volatile so that event processor can 
read it (to check if event should be skipped or not) without acquiring read 
lock on table object.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@212
PS12, Line 212:   crea
> nit, single line space.
Ack


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@60
PS19, Line 60:
> please remove this
Ack


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@77
PS19, Line 77: Logger LOG
> this is not needed if you remove line 60
Ack


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@220
PS19, Line 220:   }
              :
              :   public long getLastSyncedEventId(
> the createEventId is set when the table is created, I am not sure when woul
I did see an occurrence of this scenario in catalogOpExecutor code but don't 
remember clearly now.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@234
PS19, Line 234:    * Returns if the given HMS table is an external table (uses 
table type if
              :    * available or else uses table properties). Implementation 
is based on org.apache
              :    * 
.hadoop.hive.metastore.utils.MetaStoreUtils.isExternalTable()
              :    */
              :   publi
> pls remove if not needed.
Sure I will add a TODO for now and we can take it up in a follow up jira.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@170
PS12, Line 170: alidate met
> I was suggesting that we should not pass the metrics in which case we would
I understand. However reusing MetastoreEventProcessor's metrics seems to be 
tricky because:

What if event processing is disabled? In that case eventProcessor will be an 
instance of NoOpEventProcessor. Also ExternalEventProcessor currently does not 
have getMetrics() api. We can expose this api  and create a default metrics for 
NoOpEventProcessor as well.

Let me know what you think.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@34
PS19, Line 34: import
> remove if not necessary? In fact all the changes to this file can be remove
Sure.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@461
PS12, Line 461:  return dbName_; }
> Actually, I see why you have a new abstract method now. However, instead of
As discussed over call, we will keep this method separate from isSelfEvent() 
because when sync to latest event id is enabled by default, we can then get rid 
of self event logic


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@875
PS19, Line 875:
> Why do we need to logically AND with this condition here?
As discussed, I will add a comment for the same.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@371
PS19, Line 371:    * @param db
> if we are stopping at the drop event it would be weird I feel. For instance
As discussed over call, there are few complications in processing events beyond 
drop event. For example - as a result of drop event, the table would be dropped 
from the cache. If we continue processing next events, the first such event 
would be create_table. While processing that event, we would again have to 
acquire write lock on the new table. But this method assumes that the write 
lock is already acquired.
For now, we have discussed to break after drop event.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1939
PS19, Line 1939:  pro
> I am surprised we are not hitting NPE due to this and other similar calls b
It is because we are not processing these events. But I agree that we shouldn't 
pass null metrics.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java@40
PS19, Line 40: import org.apache.impala.catalog.events.ExternalEventsProcessor;
             : import org.apache.impala.catalog.events.MetastoreEven
> please remove the unused imports
Sure.



--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 20
Gerrit-Owner: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Oct 2021 18:01:10 +0000
Gerrit-HasComments: Yes

Reply via email to