Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/23113 )
Change subject: IMPALA-13267: Display number of partitions for Iceberg tables ...................................................................... Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/23113/12/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/23113/12/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@242 PS12, Line 242: synchronized (loadedIcebergPartitions_) { : partitionId = loadedIcebergPartitions_.get(partition); : if (partitionId == null) { : partitionId = loadedIcebergPartitions_.size(); : loadedIcebergPartitions_.put(partition, partitionId); : } : } > Can't we have concurrency problems with all of these approaches? With the current control flow I don't think so. > In the original one: First we query the size of the map, then we insert an element. Because it is a concurrent map, it is guaranteed that these are "atomic" individually, but not together: after we query the size, another thread may insert or remove elements before we do the insertion. Note that it is not guaranteed (and probably not true) that the map uses its own default monitor for implementing its methods thread-safely, so having the two statements in a synchronized block does not help. Also, we have very similar code on L189 without a synchronized block. L189-190 runs on a single thread, before we start doing things in parallel, so it is in a happens-before relationship with this code. We only have this method that can run in parallel (with another instance of itself) and modify the map, but the synchronized block gives enough protection. > The computeIfAbsent() approach: It can still collide with the code on L189, or any other code: if somebody modifies the map concurrently, the counter could be out-of-sync. L189-190 doesn't run concurrently with this, it happens before. > Are this and L189 the only places where we modify this map? If so, then maybe > we don't need a ConcurrentHashMap, instead we could manually synchronize a > normal map. The problem is complicated by the getIcebergPartitions() method, > which returns a reference to this map. Do we know whether access to the map > through getIcebergPartitions() can happen simultaneously with these > modifications? If not, we could simply return a constant view of the map (or > not even bother because we're not modifying it anyway). Manually synchronizing a normal map allows much less concurrency than a ConcurrentHashMap. getIcebergPartitions() will only be called after the IcebergFileMetadataLoader finished with load(). A ConcurrentHashMap might have a little memory and CPU time overhead (because of the synchronization), but I'm not sure if it justifies the creation of a new map. > If access and modification can be concurrent, we could consider returning a > (shallow) copy in getIcebergPartitions(), although that may lead to worse > performance if this map is huge. Alternatively, we could return a constant > view of a ConcurrentHashMap so that reading the map (by users of > getIcebergPartitions()) is thread-safe in case we concurrently modify it > (here in this class), but in the code within this class we don't need to > worry about some other code (the caller of getIcebergPartitions()) modifying > the map. The current code uses loadedIcebergPartitions_ in a read-only fashion after it has been built (i.e. after load() returned). It should be possible to return an ImmutableMap that wraps loadedIcebergPartitions_, but I don't know the perf implications. -- To view, visit http://gerrit.cloudera.org:8080/23113 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb2f654bc6c9bdf9cfafc27b38b5ca2f7b6b4872 Gerrit-Change-Number: 23113 Gerrit-PatchSet: 12 Gerrit-Owner: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 11 Sep 2025 15:39:09 +0000 Gerrit-HasComments: Yes