Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18043 )
Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction ...................................................................... Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/18043/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18043/2//COMMIT_MSG@10 PS2, Line 10: After compaction happened in Hive(HIVE ACID table), queries made in : Impala possibly fail with a FileNotFoundException if files already : removed by the Hive cleaner. > IIRC, Impala only open transactions for DDL/DML operations. Do you know how Thank Vihang and Quanlong for letting me know the problem. Impala does NOT open transactions for select queries so this approach doesn't work all the time... Hive has a config that can delay the cleaner some period of time but we don't know exactly how long we should extend. Given that this is time sensitive, I'm thinking we could make this feature optional for now. If this flag is set, say auto_check_compaction, let Impala open transactions for all the queries for ACID tables and do the compaction checking. Any thoughts? http://gerrit.cloudera.org:8080/#/c/18043/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/18043/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@898 PS2, Line 898: List<PartitionRef> stalePartitions = directProvider_.checkLatestCompaction( : refImpl.dbName_, refImpl.tableName_, refImpl, refToMeta); > I think this introduces several HMS RPCs per query (some queries may call t If we take the performance numbers on DWX as example, currently this API call takes 10 ~ 40 ms per table depending on the number of partitions. I will have a fix on the HMS side to solve an issue around this API that we need to pass all the partition names. That should make all the API execution time close to 10 ms. Even though we can make some improvement around this API, I understand this is still introduce the overhead that might not neglectable. It might be better to introduce this feature with a flag and the table property to skip this check as Quanlong suggested. -- To view, visit http://gerrit.cloudera.org:8080/18043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b Gerrit-Change-Number: 18043 Gerrit-PatchSet: 2 Gerrit-Owner: Yu-Wen Lai <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sourabh Goyal <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Yu-Wen Lai <[email protected]> Gerrit-Comment-Date: Mon, 29 Nov 2021 02:56:40 +0000 Gerrit-HasComments: Yes
