Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/23113 )
Change subject: IMPALA-13267: Display number of partitions for Iceberg tables ...................................................................... Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/23113/12/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/23113/12/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@242 PS12, Line 242: synchronized (loadedIcebergPartitions_) { : partitionId = loadedIcebergPartitions_.get(partition); : if (partitionId == null) { : partitionId = loadedIcebergPartitions_.size(); : loadedIcebergPartitions_.put(partition, partitionId); : } : } > We can increase concurrency with this construct: Can't we have concurrency problems with all of these approaches? In the original one: First we query the size of the map, then we insert an element. Because it is a concurrent map, it is guaranteed that these are "atomic" individually, but not together: after we query the size, another thread may insert or remove elements before we do the insertion. Note that it is not guaranteed (and probably not true) that the map uses its own default monitor for implementing its methods thread-safely, so having the two statements in a synchronized block does not help. Also, we have very similar code on L189 without a synchronized block. The computeIfAbsent() approach: It can still collide with the code on L189, or any other code: if somebody modifies the map concurrently, the counter could be out-of-sync. Are this and L189 the only places where we modify this map? If so, then maybe we don't need a ConcurrentHashMap, instead we could manually synchronize a normal map. The problem is complicated by the getIcebergPartitions() method, which returns a reference to this map. Do we know whether access to the map through getIcebergPartitions() can happen simultaneously with these modifications? If not, we could simply return a constant view of the map (or not even bother because we're not modifying it anyway). If access and modification can be concurrent, we could consider returning a (shallow) copy in getIcebergPartitions(), although that may lead to worse performance if this map is huge. Alternatively, we could return a constant view of a ConcurrentHashMap so that reading the map (by users of getIcebergPartitions()) is thread-safe in case we concurrently modify it (here in this class), but in the code within this class we don't need to worry about some other code (the caller of getIcebergPartitions()) modifying the map. -- To view, visit http://gerrit.cloudera.org:8080/23113 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb2f654bc6c9bdf9cfafc27b38b5ca2f7b6b4872 Gerrit-Change-Number: 23113 Gerrit-PatchSet: 12 Gerrit-Owner: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 11 Sep 2025 13:07:20 +0000 Gerrit-HasComments: Yes