Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18043/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18043/2//COMMIT_MSG@10
PS2, Line 10: After compaction happened in Hive(HIVE ACID table), queries made 
in
            : Impala possibly fail with a FileNotFoundException if files already
            : removed by the Hive cleaner.
> Can you confirm if Impala open's a transaction for select queries for ACID
IIRC, Impala only open transactions for DDL/DML operations. Do you know how 
long Hive will remove files after compaction?


http://gerrit.cloudera.org:8080/#/c/18043/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/18043/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@898
PS2, Line 898: List<PartitionRef> stalePartitions = 
directProvider_.checkLatestCompaction(
             :         refImpl.dbName_, refImpl.tableName_, refImpl, refToMeta);
> looks like this is going to be called during each query's compilation when
I think this introduces several HMS RPCs per query (some queries may call this 
multiple times). Maybe we can add a query option or table property to skip the 
check so ACID tables that are not frequently updated/compacted can skip this. 
We can use notification-based solution (in follow-up JIRAs) for those tables.


http://gerrit.cloudera.org:8080/#/c/18043/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@951
PS2, Line 951:     req.table_info_selector.valid_write_ids = 
table.validWriteIds_;
With this change, catalogd will check latest compaction ids for each request. I 
think we need a follow-up JIRA for perf-test to measure the overhead, 
especially for tables with large number of partitions.



--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sourabh Goyal <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Yu-Wen Lai <[email protected]>
Gerrit-Comment-Date: Thu, 25 Nov 2021 08:20:11 +0000
Gerrit-HasComments: Yes

Reply via email to