[email protected] has posted comments on this change. ( http://gerrit.cloudera.org:8080/21031 )
Change subject: IMPALA-12709: Add support for hierarchical metastore event processing ...................................................................... Patch Set 25: (20 comments) Rework http://gerrit.cloudera.org:8080/#/c/21031/19//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21031/19//COMMIT_MSG@28 PS19, Line 28: related > nit: related Done http://gerrit.cloudera.org:8080/#/c/21031/20//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21031/20//COMMIT_MSG@31 PS20, Line 31: > Please describe the pseudo events in the commit message. Done http://gerrit.cloudera.org:8080/#/c/21031/19/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/21031/19/be/src/catalog/catalog-server.cc@235 PS19, Line 235: db_event_executors > IMO, db_event_executors and table_event_executors could be a query option f Changing db_event_executors and table_event_executors requires resizing the existing db and table executors which are already in use(i.e., when db processors and table processors are already processing the events dispatched to them). IMHO, To keep it simple, it is good to keep them as start up flags similar to enable_hierarchical_event_processing flag. http://gerrit.cloudera.org:8080/#/c/21031/20/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/21031/20/be/src/catalog/catalog-server.cc@217 PS20, Line 217: hms_event_polling_interval_ms > Instead of adding a new flag the old hms_event_polling_interval could be ch For backward compatility, have not modified the existing flag. Have updated the description of existing seconds precision flag to describe about this new milliseconds precision flag. http://gerrit.cloudera.org:8080/#/c/21031/20/be/src/catalog/catalog-server.cc@235 PS20, Line 235: DEFINE_int32(db_event_executors, 5, > Please mention these new flags in the commit message so we can find them in Done http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java@658 PS19, Line 658: > Why did you feel the need to introduce a new pseudo class? CommitTxnEvent and AbortTxnEvent can involve multiple tables in a transaction and processing these events modifies multiple table objects. Pseudo events are introduced such that a pseudo event is created for each table involved in the transaction and these pseudo events are processed independently at respective table processor. Have described about the pseudo events in commit message now. http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@846 PS19, Line 846: LOG.info("EventId: {} EventType: COMMIT_TXN transaction id: {}", getEventId(), > I think we'll be removing writeIds for the transaction if the event is self We do not use Catalog.txnToWriteIds_ for self event. We add them to catalog in AllocWriteIdEvent and remove them upon AbortTxnEvent and CommitTxnEvent. These are used to track table write ids for a transaction and used in processing o those events. So we can hold the reference to table write Ids here and use them during processing. These are not the validWriteIds_ in table object. http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@966 PS19, Line 966: tableName, wr > What happens when different table event executors try to modify different t When tables are processed in different table event executors, they are executing in different threads. They are processed concurrently. http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@4081 PS19, Line 4081: return resp; > nit: adding? Done http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/events/DBBarrierEvent.java File fe/src/main/java/org/apache/impala/catalog/events/DBBarrierEvent.java: http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/events/DBBarrierEvent.java@24 PS19, Line 24: ache.impala.ca > nit: Need some explanation regarding what is the objective of this new clas Done http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/events/DBBarrierEvent.java@55 PS19, Line 55: private final MetastoreDatabaseEvent actualEvent_; // database event > nit: should we log the counter of 'expectedProceedCount_' whenever we incr Done http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/DBEventExecutor.java File fe/src/main/java/org/apache/impala/catalog/events/DBEventExecutor.java: http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/DBEventExecutor.java@288 PS20, Line 288: executorName_ = "DBEventExecutor-" + name; : ses_ = Executors.newSingleThreadScheduledExecutor( : new ThreadFactoryBuilder().s > I am confused about locking - only a few public functions are synchronized Had missed locking at some places. Have added them now. http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java@52 PS19, Line 52: * processor status is active. > Why is it any different from pause state? Event processor status is ACTIVE. But event processor do not read and process HMS events when it is put on hold. It is used in performance measurement. Hold helps to ensure that we do not process events until all the necessary events are generated in test. Once required events are generated, event processing can be resumed to process all events. http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@832 PS20, Line 832: } > can you add some comments about the role of the pseudo events? Done. Have added the description at class and commit message. http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@863 PS20, Line 863: dbEventExecutor.enqueue(new PseudoAbortTxnEvent(abortTxnEvent, : entry.getKey().get > Can we add synchronization for the drop and create events? I.e. the create Yes. Have added a RenameTableBarrierEvent to synchronize the source and target table processors to process rename table event. http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@863 PS20, Line 863: dbEventExecutor.enqueue(new PseudoAbortTxnEvent(abortTxnEvent, : entry.getKey().get > I am not completely convinced that this is a good idea. Processing the drop Yes. Have added a RenameTableBarrierEvent to synchronize the source and target table processors to process rename table event. This ensures that the new version of table is not created until the old version of table is dropped. http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@882 PS20, Line 882: pseudoCreateTableEvent.setEventId(event.getEventId()); : > Shouldn't these use AfterTable instead of BeforeTable? Thanks. Thats a bug. Have fixed it. http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@885 PS20, Line 885: new DropTableEvent(event.getCatalogOpExecutor(), e > Can't it confuse other parts of the code if to events have the same event i Have synchronized the processing of drop and create for rename with RenameTableBarrierEvent http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java File fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java: http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java@223 PS20, Line 223: > Do I understand correctly that the tableProcessors_ will have a fixed order Do I understand correctly that the tableProcessors_ will have a fixed order, and the events will be processed in this order instead of event id order? Yes. Does this also apply to DROP/CREATE table events? In case of renames, what guarantees that the CREATE event won't be processed before the DROP event if the the new table name is earlier among the table processors? Have added RenameTableBarrierEvent to synchronize the source and target table processors to process rename table event. This ensures that the new version of table is not created until the old version of table is dropped. http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java@222 PS19, Line 222: BackendConfig.INSTANCE.getHMSPollingIntervalInSeconds() * 1000; > Even with this feature disabled, do we want to deal with polling interval i Yes. Added support for ms precision event polling interval. It works independent of enable_hierarchical_event_processing flag. -- To view, visit http://gerrit.cloudera.org:8080/21031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6 Gerrit-Change-Number: 21031 Gerrit-PatchSet: 25 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Wed, 23 Oct 2024 16:25:24 +0000 Gerrit-HasComments: Yes
