Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
after the DDL operation for all DDLS from Impala clients
......................................................................


Patch Set 5:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG@8
PS4, Line 8: after the DD
> Can you add to the title whether Impala syncs the events before or after th
Ack


http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG@19
PS4, Line 19: n: when the enable_sync_to_latest_event_on_ddls flag is
            : set to true, we do t
> This is stale now.
Ack


http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG@20
PS4, Line 20:  DDL operation first, i.e., perform HMS operation
            : and then sync the db/table in the catalogD's cache to the latest 
event
            : in HMS for the corresponding db/table. Currently we fetch all 
events
> Won't there be always a pending event, as the DDL from Impala creates a new
Agreed. But we may also fetch events that are from other databases/tables.
Ideally, we want to fetch only events that only required for the current 
metadata object.


http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1396
PS3, Line 1396:     Preconditions.checkNotNull(topicUpdateEntry);
> I think that we also need to hold versionLock_ before calling this, as the
I have to remove this method. This is not required anymore.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1045
PS4, Line 1045:           alterTableOrViewRename(tbl,
              :               
TableName.fromThrift(params.getRename_params().getNew_table_name()),
              :               newCatalogVersion, wantMinimalResult, response);
              :           return;
> I don't see this change applied to renames, while the commit message sugges
Rename is a complex operation, it has to change the catalogD's cache 
immediately after the HMS operation without waiting for 
MetastoreEventsProcessor to sync the metadata object's events. I'll update the 
commit message accordingly, regarding operations that don't depend upon the 
event processor to update the cache.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1113
PS4, Line 1113:           reloadMetadata = false;
> All the code paths that set 'reloadMetadata' to false have updated the cata
reloadMetadata is set to false for addPartitions() -- 
https://gerrit.cloudera.org/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#4393
 and dropPartitions() -- 
https://gerrit.cloudera.org/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#5265
I have also added tests CatalogHmsSyncToLatestEventIdTest to verify the same.
Do you see any issues with not modifying the cache in place? Should I change 
this back to the original state for addPartitions() and dropPartitions()? 
Please let me know.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1263
PS4, Line 1263:       // Make sure we won't forget finalizing the modification.
              :       if (tbl.hasInProgressModification()) 
Preconditions.checkState(reloadMetadata);
              :       if (!trySyncToLatestEventId(tbl) && reloadMetadata) {
              :         loadTableMetadata(tbl, newCatalogVersion, 
reloadFileMetadata,
              :             reloadTableSchema, null, "ALTER TABLE " + params
> refactor idea: this could be moved to a function like CatalogOpExecutor.try
Ack


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1270
PS4, Line 1270:         catalog_.addVersionsForInflightEvents(false, tbl, 
newCatalogVersion);
> This seems valid even in the syncToLatestEventId case.
Ack


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1276
PS4, Line 1276:       }
> Are we using in flight event lists for anything when enableSyncToLatestEven
This is used to identify if the events are from other Impala services.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1658
PS4, Line 1658: Id(eventId);
> What kind of guarantees do we have for the state of the event processor whe
I understand your concern. By setting the syncToLatestEventId on the metadata 
object (tbl/database) during to syncToLatestEventId() operation, these events 
are skipped by the event processor by comparing the current event id to the 
lastSyncEventId on the metadata object. E.g.: See 
CatalogHmsSyncToLatestEventIdTestL#246 where alter database Event is skipped on 
the event processor once the eventId is synced.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7004
PS4, Line 7004:           // Add catalog service id and the 'newCatalogVersion' 
to the table parameters.
              :           // This way we can avoid reloading the table on 
self-events (Iceberg generates
              :           // an ALTER TABLE statement to set the new 
metadata_location).
              :           
IcebergCatalogOpExecutor.addCatalogVersionToTxn(iceTxn,
              :               catalog_.getCatalogServiceId(), 
newCatalogVersion);
              :     
> Allowing createInsertEvents() to fail is more problematic in case of enable
Ack. Reverted the change for insert operations.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7031
PS4, Line 7031:     }
> Can't we skip this too if the events will by synced for the table?
I wasn't too sure about Iceberg tables so I didn't touch this part. Also, 
lastSyncEventId doesn't exist for the Iceberg table definition, right?


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7040
PS4, Line 7040:   /**
              :    * Populates insert event data and calls fireInsertEvents() 
if external event
              :    * processing is enabled.
              :    * This method is replicating what Hive does in case a table 
or partition is inserts
              :    * into. There are 2 cases:
              :    * 1. If the table is transactional, we should first generate 
ADD_PARTITION events
              :    * for new partitions which are generated. This is taken care 
of in the updateCatalog
              :    * method. Additionally, for each partition including 
existing partitions which were
              :    * inserted into, this method creates an ACID_WRITE event 
using the HMS API
              :    * ad
> The commit message and the flag name suggests that we only apply this for D
IMPALA-10926, 
https://gerrit.cloudera.org/c/17859/42/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java#1139
Addressed this for truncate operations also.
So IMO, it makes sense to revert the change for the insert operation and only 
do this for the truncate operation. I'll update the commit message to notify 
which DML statements (truncate) are updating the caching. Do you agree with 
this?


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1546
PS4, Line 1546:       alterTableRename(testTblName, newTblName, TEST_DB_NAME);
              :       eventsProcessor_.processEvents();
> What will happen if a rename is detected during a DDL and not in eventsProc
Added a test for the above-suggested scenario and confirmed that the event 
processor is active.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Comment-Date: Mon, 30 Oct 2023 23:56:17 +0000
Gerrit-HasComments: Yes

Reply via email to