Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21072 )
Change subject: IMPALA-12831: Fix HdfsTable.toMinimalTCatalogObject() failed by concurrent modification ...................................................................... Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21072/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/21072/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2133 PS1, Line 2133: /** > Would it be hard to synchronize access to partitionMap_? We recently fixed a performance regression on INVALIDATE that it triggers table loading and could be blocked by concurrent DDLs (IMPALA-11501). Adding synchronization here means INVALIDATE could again be blocked by concurrent DDLs. I think the executiing time is more important than memory usage. So the current patch is best effort to get the partition list to optimize the memory usage. If the read lock can't be acquired, we still have the same performance since tryReadLock() has a zero timeout. BTW, the partition ID changes when a partition is reloaded (though the partition name unchanged) since HdfsPartition objects are immutable and are assigned an incremental id (IMPALA-9778). To avoid INVALIDATE be blocked by concurrent DDLs, maybe another approach is using ConcurrentHashMap as partitionMap_. But in some places we do clear the map, e.g. https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L706 So we could send an empty map here before all partitions are loaded. What do you think? http://gerrit.cloudera.org:8080/#/c/21072/1/tests/custom_cluster/test_concurrent_ddls.py File tests/custom_cluster/test_concurrent_ddls.py: http://gerrit.cloudera.org:8080/#/c/21072/1/tests/custom_cluster/test_concurrent_ddls.py@215 PS1, Line 215: refresh > refresh_handle would be clearer ase there are other operations in parallel Done -- To view, visit http://gerrit.cloudera.org:8080/21072 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9f8e65c0bd24000241eedf8ca765c1e4e14fdb3 Gerrit-Change-Number: 21072 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Tue, 27 Feb 2024 08:48:46 +0000 Gerrit-HasComments: Yes
