[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901357#comment-17901357 ] ASF GitHub Bot commented on HADOOP-19346: - ctrezzo merged PR #7187: URL: https://github.com/apache/hadoop/pull/7187 > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. > Right now, we are using a ReentrantReadWriteLock to protect concurrent > updates to InnerCache.map(). > As a performance optimization, to reduce acquiring writeLock, we first > acquire readLock, check existence of the key and then acquire writeLock, to > insert the new key. > {code:java} > FileSystem get(URI uri, Configuration config) throws IOException { > Key key = new Key(uri); > FileSystem fs = null; > try { > rwLock.readLock().lock(); > fs = map.get(key); > if (fs != null) { > return fs; > } > } finally { > rwLock.readLock().unlock(); > } > try { > rwLock.writeLock().lock(); > fs = map.get(key); > if (fs != null) { > return fs; > } > fs = fsCreator.getNewInstance(uri, config); > map.put(key, fs); > return fs; > } finally { > rwLock.writeLock().unlock(); > } > } > {code} > The above function is already available as computeIfAbsent() from > ConcurrentHashMap, which atomically checks existence of a key and inserts the > new key if not there. This greatly simplifies the code. On the other hand, it > should improve performance as well: concurrentHashMap supports finer-grain > concurrency vs. a single ReadWrite Lock. > We can replace the above code with the following one and it is much simpler. > {code:java} > FileSystem get(URI uri, Configuration config) throws IOException { > Key key = new Key(uri); > FileSystem fs = map.computeIfAbsent(key, k -> getNewFileSystem(uri, > config)); > if (fs == null) { > throw new IOException("Failed to create new FileSystem instance for " > + uri); > } > return fs; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901347#comment-17901347 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2502383248 Added more details to jira ticket. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. > Right now, we are using a ReentrantReadWriteLock to protect concurrent > updates to InnerCache.map(). > As a performance optimization, to reduce acquiring writeLock, we first > acquire readLock, check existence of the key and then acquire writeLock, to > insert the new key. > {code:java} > FileSystem get(URI uri, Configuration config) throws IOException { > Key key = new Key(uri); > FileSystem fs = null; > try { > rwLock.readLock().lock(); > fs = map.get(key); > if (fs != null) { > return fs; > } > } finally { > rwLock.readLock().unlock(); > } > try { > rwLock.writeLock().lock(); > fs = map.get(key); > if (fs != null) { > return fs; > } > fs = fsCreator.getNewInstance(uri, config); > map.put(key, fs); > return fs; > } finally { > rwLock.writeLock().unlock(); > } > } > {code} > The above function is already available as computeIfAbsent() from > ConcurrentHashMap, which atomically checks existence of a key and inserts the > new key if not there. This greatly simplifies the code. On the other hand, it > should improve performance as well: concurrentHashMap supports finer-grain > concurrency vs. a single ReadWrite Lock. > We can replace the above code with the following one and it is much simpler. > {code:java} > FileSystem get(URI uri, Configuration config) throws IOException { > Key key = new Key(uri); > FileSystem fs = map.computeIfAbsent(key, k -> getNewFileSystem(uri, > config)); > if (fs == null) { > throw new IOException("Failed to create new FileSystem instance for " > + uri); > } > return fs; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901337#comment-17901337 ] ASF GitHub Bot commented on HADOOP-19346: - sjlee commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2502207410 +1 from me. Since I haven't committed PRs in a long while, it would take me a long time to get familiar with the steps. @ctrezzo, can you please review and commit if you approve? Thanks. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901331#comment-17901331 ] ASF GitHub Bot commented on HADOOP-19346: - hadoop-yetus commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2502185597 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 19s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 59s | | trunk passed | | +1 :green_heart: | compile | 8m 57s | | trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 8m 3s | | trunk passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | checkstyle | 0m 43s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 2s | | trunk passed | | +1 :green_heart: | javadoc | 0m 49s | | trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 35s | | trunk passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | spotbugs | 1m 32s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 17s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 33s | | the patch passed | | +1 :green_heart: | compile | 8m 37s | | the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 8m 37s | | the patch passed | | +1 :green_heart: | compile | 8m 9s | | the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | javac | 8m 9s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 40s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 53s | | the patch passed | | +1 :green_heart: | javadoc | 0m 43s | | the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 36s | | the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | spotbugs | 1m 34s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 10s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 43s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 41s | | The patch does not generate ASF License warnings. | | | | 137m 20s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/7187 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux f7df674b2134 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / bbfa1e121b6c3a1944b3caef468f42f5305c92eb | | Default Java | Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/4/testReport/ | | Max. process+thread count | 3154 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/4/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | |
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901314#comment-17901314 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2502014477 > What you're saying is, your internal version has the RWL code removed and you are seeing a correctness issue. . You're NOT saying the upstream version has a correctness issue, correct? Correct, our internal version simply did not have that PR, which added the lock. and it has correctness issue. Trunk version is good. We are not trying to improve the performance here. The concurrentHashMap is a cleaner approach than the old lock based approach. You already mentioned concurrentHashMap provides finer-grain concurrency and should improve performance. So, it is a win-win in both code-readable/maintainability and performance. tbh, I did not want to spend time benchmarking RWLock vs concurrentHashMap. Both are standard constructs that we are supposed to use. This is client-side code: most of the overhead will be from communicating with NN/DN, vs competing for locks here. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901307#comment-17901307 ] ASF GitHub Bot commented on HADOOP-19346: - sjlee commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2501967926 I see. What you're saying is, your internal version has the RWL code removed and you are seeing a correctness issue. You're **NOT** saying the upstream version has a correctness issue, correct? From my cursory look, I think the existing code is correct in a concurrent setting. If that's the case, my original comment basically stands. From the community standpoint, it would not be fixing a correctness issue but rather improving implementations and/or performance. While I think the new version is probably a better one, it would be good to see how both versions behave in a pretty high-concurrency situation. It shouldn't be too difficult to test quickly. WDYT? > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901302#comment-17901302 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2501947487 > Oh so was this due to a correctness bug not performance that led you to this? It wasn't very clear from the JIRA description. yes, correctness issue: we are getting stale/incorrect reads from observer nodes (fileLen=0 for non-zero bytes files which has been closed by the same FileSystem object). We finally narrowed it down the RC: we are using two different dfs clients/ObserverReadProxyProviders, though the key (scheme/authority) to InnerCache.map.get() is the same. Then, we compared trunk and our internal version. We found in our internal version (based off 2.10.0), we don't have the lock to prevent concurrent updates to InnerCache.map. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901304#comment-17901304 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2501947853 > Oh so was this due to a correctness bug not performance that led you to this? It wasn't very clear from the JIRA description. yes, correctness issue: we are getting stale/incorrect reads from observer nodes (fileLen=0 for non-zero bytes files which has been closed by the same FileSystem object). We finally narrowed it down the RC: we are using two different dfs clients/ObserverReadProxyProviders, though the key (scheme/authority) to InnerCache.map.get() is the same. Then, we compared trunk and our internal version. We found in our internal version (based off 2.10.0), we don't have the lock to prevent concurrent updates to InnerCache.map. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901303#comment-17901303 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2501947548 > Oh so was this due to a correctness bug not performance that led you to this? It wasn't very clear from the JIRA description. yes, correctness issue: we are getting stale/incorrect reads from observer nodes (fileLen=0 for non-zero bytes files which has been closed by the same FileSystem object). We finally narrowed it down the RC: we are using two different dfs clients/ObserverReadProxyProviders, though the key (scheme/authority) to InnerCache.map.get() is the same. Then, we compared trunk and our internal version. We found in our internal version (based off 2.10.0), we don't have the lock to prevent concurrent updates to InnerCache.map. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901293#comment-17901293 ] ASF GitHub Bot commented on HADOOP-19346: - sjlee commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2501797003 > We are running a hadoop version without a lock protecting the InnerCache.get() and basically rediscovered this bug, after two months' investigation. > Oh so was this due to a correctness bug not performance that led you to this? It wasn't very clear from the JIRA description. > Performance-wise, I think they should be similar? Only one thread will be allowed to create the new FileSystem instance for each missing key in both cases. ConcurrentHashMap is better because RWL locks globally whenever there is a mutation whereas ConcurrentHashMap can be more granular. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901275#comment-17901275 ] ASF GitHub Bot commented on HADOOP-19346: - hadoop-yetus commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2501637949 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 10s | | trunk passed | | +1 :green_heart: | compile | 9m 53s | | trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 8m 19s | | trunk passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | checkstyle | 0m 40s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 54s | | trunk passed | | +1 :green_heart: | javadoc | 0m 43s | | trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 30s | | trunk passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | spotbugs | 1m 25s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 58s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 8m 34s | | the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 8m 34s | | the patch passed | | +1 :green_heart: | compile | 8m 12s | | the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | javac | 8m 12s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 39s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/3/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 36 unchanged - 0 fixed = 38 total (was 36) | | +1 :green_heart: | mvnsite | 0m 58s | | the patch passed | | +1 :green_heart: | javadoc | 0m 44s | | the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 36s | | the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | spotbugs | 1m 35s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 22s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 38s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 41s | | The patch does not generate ASF License warnings. | | | | 144m 31s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/7187 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 29151b5fdcec 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 1e173116998c0d29fbe64f37bdbdf63fb08c8b9a | | Default Java | Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/3/testReport/ | | Max. process+thread c
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901261#comment-17901261 ] ASF GitHub Bot commented on HADOOP-19346: - hadoop-yetus commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2501503792 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 19s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 44s | | trunk passed | | +1 :green_heart: | compile | 8m 58s | | trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 8m 11s | | trunk passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | checkstyle | 0m 42s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 1s | | trunk passed | | +1 :green_heart: | javadoc | 0m 48s | | trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 31s | | trunk passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | spotbugs | 1m 32s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 17s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 8m 52s | | the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 8m 52s | | the patch passed | | +1 :green_heart: | compile | 8m 23s | | the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | javac | 8m 23s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 35s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/2/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 36 unchanged - 0 fixed = 38 total (was 36) | | +1 :green_heart: | mvnsite | 0m 56s | | the patch passed | | +1 :green_heart: | javadoc | 0m 43s | | the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | -1 :x: | spotbugs | 1m 33s | [/new-spotbugs-hadoop-common-project_hadoop-common.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/2/artifact/out/new-spotbugs-hadoop-common-project_hadoop-common.html) | hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | shadedclient | 23m 46s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 16m 57s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 39s | | The patch does not generate ASF License warnings. | | | | 140m 22s | | | | Reason | Tests | |---:|:--| | SpotBugs | module:hadoop-common-project/hadoop-common | | | ViewFileSystem$InnerCache$Key is incompatible with expected argument type org.apache.hadoop.fs.FileSystem in org.apache.hadoop.fs.viewfs.ViewFileSystem$InnerCache.get(URI, Configuration) At ViewFileSystem.java:argument type org.apache.hadoop.fs.FileSystem in org.apache.hadoop.fs.viewfs.ViewFileSystem$InnerCache.get(URI, Configuration) At ViewFileSystem.java:[line 141] | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/7187 | | Optional Tests | dupname asflicense compile javac java
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901250#comment-17901250 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2501272174 > Can you share some rationale or tabletop concurrent benchmark tests to show the improvements with the new approach from the old? I do think the ConcurrentHashMap is a fine implementation, but I'm curious what led you to look into the ReadWriteLock implementation. We are running a hadoop version without a lock protecting the InnerCache.get() and basically rediscovered this bug, after two months' investigation. I was about to backport the ReadWriteLock approach from trunk. Then, a team member suggested that we could just use ConcurrentHashMap here, which provides a more clean approach than the current ReadWriteLock approach. Performance-wise, I think they should be similar? Only one thread will be allowed to create the new FileSystem instance for each missing key in both cases. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901248#comment-17901248 ] ASF GitHub Bot commented on HADOOP-19346: - sjlee commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2501246215 Can you share some rationale or tabletop concurrent benchmark tests to show the improvements with the new approach from the old? I do think the ConcurrentHashMap is a fine implementation, but I'm curious what led you to look into the ReadWriteLock implementation. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901246#comment-17901246 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on code in PR #7187: URL: https://github.com/apache/hadoop/pull/7187#discussion_r1858811126 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java: ## @@ -118,38 +118,35 @@ protected FsGetter fsGetter() { * Caching children filesystems. HADOOP-15565. */ static class InnerCache { -private Map map = new HashMap<>(); +private ConcurrentHashMap map = new ConcurrentHashMap<>(); private FsGetter fsCreator; -private ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock(); InnerCache(FsGetter fsCreator) { this.fsCreator = fsCreator; } +// computeIfAbsent() does not support a mapping function which throws IOException. +// Wrap fsCreator.getNewInstance() to not throw IOException and return null instead. +FileSystem getNewFileSystem(FsGetter fsCreator, URI uri, Configuration config) { + try { +return fsCreator.getNewInstance(uri, config); + } catch (IOException e) { +LOG.error("Failed to create new FileSystem instance for " + uri, e); +return null; + } +} + FileSystem get(URI uri, Configuration config) throws IOException { Key key = new Key(uri); - FileSystem fs = null; - try { -rwLock.readLock().lock(); -fs = map.get(key); -if (fs != null) { - return fs; -} - } finally { -rwLock.readLock().unlock(); + if (map.contains(key)) { Review Comment: This version is now much better than the original one. :) Thanks for your help. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901244#comment-17901244 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on code in PR #7187: URL: https://github.com/apache/hadoop/pull/7187#discussion_r1858806435 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java: ## @@ -118,38 +118,35 @@ protected FsGetter fsGetter() { * Caching children filesystems. HADOOP-15565. */ static class InnerCache { -private Map map = new HashMap<>(); +private ConcurrentHashMap map = new ConcurrentHashMap<>(); Review Comment: Fixed. thanks, > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901243#comment-17901243 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on code in PR #7187: URL: https://github.com/apache/hadoop/pull/7187#discussion_r1858805112 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java: ## @@ -118,38 +118,35 @@ protected FsGetter fsGetter() { * Caching children filesystems. HADOOP-15565. */ static class InnerCache { -private Map map = new HashMap<>(); +private ConcurrentHashMap map = new ConcurrentHashMap<>(); private FsGetter fsCreator; -private ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock(); InnerCache(FsGetter fsCreator) { this.fsCreator = fsCreator; } +// computeIfAbsent() does not support a mapping function which throws IOException. +// Wrap fsCreator.getNewInstance() to not throw IOException and return null instead. +FileSystem getNewFileSystem(FsGetter fsCreator, URI uri, Configuration config) { + try { +return fsCreator.getNewInstance(uri, config); + } catch (IOException e) { +LOG.error("Failed to create new FileSystem instance for " + uri, e); +return null; + } +} + FileSystem get(URI uri, Configuration config) throws IOException { Key key = new Key(uri); - FileSystem fs = null; - try { -rwLock.readLock().lock(); -fs = map.get(key); -if (fs != null) { - return fs; -} - } finally { -rwLock.readLock().unlock(); + if (map.contains(key)) { Review Comment: Good point! Did not realize that. Fixed. thanks, > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901242#comment-17901242 ] ASF GitHub Bot commented on HADOOP-19346: - sjlee commented on code in PR #7187: URL: https://github.com/apache/hadoop/pull/7187#discussion_r1858789781 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java: ## @@ -118,38 +118,35 @@ protected FsGetter fsGetter() { * Caching children filesystems. HADOOP-15565. */ static class InnerCache { -private Map map = new HashMap<>(); +private ConcurrentHashMap map = new ConcurrentHashMap<>(); Review Comment: Minor nit: the interface type ConcurrentMap is slightly preferred in the declaration over the concrete implementation type ConcurrentHashMap. Just a better idiomatic style. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901240#comment-17901240 ] ASF GitHub Bot commented on HADOOP-19346: - sjlee commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2501200727 +1 to @pjfanning's comments. In fact, in principle all you need is just one call to computeIfAbsent() which takes care of thread safety, concurrency, and at-most-once cost for the computation. No need to call contains() or get() in addition. The original commit is not correct in a concurrent situation FYI. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901237#comment-17901237 ] ASF GitHub Bot commented on HADOOP-19346: - pjfanning commented on code in PR #7187: URL: https://github.com/apache/hadoop/pull/7187#discussion_r1858759097 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java: ## @@ -118,38 +118,35 @@ protected FsGetter fsGetter() { * Caching children filesystems. HADOOP-15565. */ static class InnerCache { -private Map map = new HashMap<>(); +private ConcurrentHashMap map = new ConcurrentHashMap<>(); private FsGetter fsCreator; -private ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock(); InnerCache(FsGetter fsCreator) { this.fsCreator = fsCreator; } +// computeIfAbsent() does not support a mapping function which throws IOException. +// Wrap fsCreator.getNewInstance() to not throw IOException and return null instead. +FileSystem getNewFileSystem(FsGetter fsCreator, URI uri, Configuration config) { + try { +return fsCreator.getNewInstance(uri, config); + } catch (IOException e) { +LOG.error("Failed to create new FileSystem instance for " + uri, e); +return null; + } +} + FileSystem get(URI uri, Configuration config) throws IOException { Key key = new Key(uri); - FileSystem fs = null; - try { -rwLock.readLock().lock(); -fs = map.get(key); -if (fs != null) { - return fs; -} - } finally { -rwLock.readLock().unlock(); + if (map.contains(key)) { Review Comment: This should be unnecessary. The computeIfAbsent call returns existing mapped value if one exists. The whole point of the high level calls is that it is easy to make code thread safe when you just need 1 API call. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901236#comment-17901236 ] ASF GitHub Bot commented on HADOOP-19346: - pjfanning commented on code in PR #7187: URL: https://github.com/apache/hadoop/pull/7187#discussion_r1858759097 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java: ## @@ -118,38 +118,35 @@ protected FsGetter fsGetter() { * Caching children filesystems. HADOOP-15565. */ static class InnerCache { -private Map map = new HashMap<>(); +private ConcurrentHashMap map = new ConcurrentHashMap<>(); private FsGetter fsCreator; -private ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock(); InnerCache(FsGetter fsCreator) { this.fsCreator = fsCreator; } +// computeIfAbsent() does not support a mapping function which throws IOException. +// Wrap fsCreator.getNewInstance() to not throw IOException and return null instead. +FileSystem getNewFileSystem(FsGetter fsCreator, URI uri, Configuration config) { + try { +return fsCreator.getNewInstance(uri, config); + } catch (IOException e) { +LOG.error("Failed to create new FileSystem instance for " + uri, e); +return null; + } +} + FileSystem get(URI uri, Configuration config) throws IOException { Key key = new Key(uri); - FileSystem fs = null; - try { -rwLock.readLock().lock(); -fs = map.get(key); -if (fs != null) { - return fs; -} - } finally { -rwLock.readLock().unlock(); + if (map.contains(key)) { Review Comment: This should be unnecessary. The computed absent call returns existing mapped value if one exists. The whole point of the high level calls is that it is easy to make code thread safe when you just need 1 API call. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901226#comment-17901226 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on code in PR #7187: URL: https://github.com/apache/hadoop/pull/7187#discussion_r1858693600 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java: ## @@ -118,38 +118,22 @@ protected FsGetter fsGetter() { * Caching children filesystems. HADOOP-15565. */ static class InnerCache { -private Map map = new HashMap<>(); +private ConcurrentHashMap map = new ConcurrentHashMap<>(); private FsGetter fsCreator; -private ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock(); InnerCache(FsGetter fsCreator) { this.fsCreator = fsCreator; } FileSystem get(URI uri, Configuration config) throws IOException { Key key = new Key(uri); - FileSystem fs = null; - try { -rwLock.readLock().lock(); -fs = map.get(key); -if (fs != null) { - return fs; -} - } finally { -rwLock.readLock().unlock(); - } - try { -rwLock.writeLock().lock(); -fs = map.get(key); -if (fs != null) { - return fs; -} -fs = fsCreator.getNewInstance(uri, config); -map.put(key, fs); -return fs; - } finally { -rwLock.writeLock().unlock(); + if (map.contains(key)) { +return map.get(key); } + + FileSystem fs = fsCreator.getNewInstance(uri, config); Review Comment: Hi @pjfanning, Thanks for your comments. Replaced with computeIfAbsent(). Could you take another look? > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901140#comment-17901140 ] ASF GitHub Bot commented on HADOOP-19346: - pjfanning commented on code in PR #7187: URL: https://github.com/apache/hadoop/pull/7187#discussion_r1858126001 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java: ## @@ -118,38 +118,22 @@ protected FsGetter fsGetter() { * Caching children filesystems. HADOOP-15565. */ static class InnerCache { -private Map map = new HashMap<>(); +private ConcurrentHashMap map = new ConcurrentHashMap<>(); private FsGetter fsCreator; -private ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock(); InnerCache(FsGetter fsCreator) { this.fsCreator = fsCreator; } FileSystem get(URI uri, Configuration config) throws IOException { Key key = new Key(uri); - FileSystem fs = null; - try { -rwLock.readLock().lock(); -fs = map.get(key); -if (fs != null) { - return fs; -} - } finally { -rwLock.readLock().unlock(); - } - try { -rwLock.writeLock().lock(); -fs = map.get(key); -if (fs != null) { - return fs; -} -fs = fsCreator.getNewInstance(uri, config); -map.put(key, fs); -return fs; - } finally { -rwLock.writeLock().unlock(); + if (map.contains(key)) { +return map.get(key); } + + FileSystem fs = fsCreator.getNewInstance(uri, config); Review Comment: I would also suggest looking at computeIfAbsent instead of putIfAbsent. https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#computeIfAbsent-K-java.util.function.Function- With computeIfAbsent, you may be able to avoid the lock but still have the behaviour that only one fs is created per key. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901139#comment-17901139 ] ASF GitHub Bot commented on HADOOP-19346: - pjfanning commented on code in PR #7187: URL: https://github.com/apache/hadoop/pull/7187#discussion_r1858120454 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java: ## @@ -118,38 +118,22 @@ protected FsGetter fsGetter() { * Caching children filesystems. HADOOP-15565. */ static class InnerCache { -private Map map = new HashMap<>(); +private ConcurrentHashMap map = new ConcurrentHashMap<>(); private FsGetter fsCreator; -private ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock(); InnerCache(FsGetter fsCreator) { this.fsCreator = fsCreator; } FileSystem get(URI uri, Configuration config) throws IOException { Key key = new Key(uri); - FileSystem fs = null; - try { -rwLock.readLock().lock(); -fs = map.get(key); -if (fs != null) { - return fs; -} - } finally { -rwLock.readLock().unlock(); - } - try { -rwLock.writeLock().lock(); -fs = map.get(key); -if (fs != null) { - return fs; -} -fs = fsCreator.getNewInstance(uri, config); -map.put(key, fs); -return fs; - } finally { -rwLock.writeLock().unlock(); + if (map.contains(key)) { +return map.get(key); } + + FileSystem fs = fsCreator.getNewInstance(uri, config); Review Comment: * this does not have the same concurrency semantics as the existing code * fsCreator.getNewInstance should only happen once in existing code with lock * in new code, concurrent threads could call fsCreator.getNewInstance multiple times * once the putIfAbsent is called, then future threads should no longer call fsCreator.getNewInstance * the new behaviour may be acceptable - but it is important to analyse if it is ok to allow multiple fsCreator.getNewInstance calls * do you have evidence to show that there is a performance bottleneck with the existing code? * one option is to keep the lock but use a concurrent map nonetheless - this code would be similar to your change - the main diff is ``` private ConcurrentHashMap map = new ConcurrentHashMap<>(); private ReentrantLock lock = new ReentrantLock(); if (map.contains(key)) { return map.get(key); } try { lock.lock(); // need to check if another thread has put an fs for this key while we waited for the lock FileSystem fs = map.get(key); if (fs == null) { fs = fsCreator.getNewInstance(uri, config); map.put(key, fs); } return fs; } finally { lock.unlock(); } ``` This avoids the read locks and the write lock only happens when the key has no fs stored for it. > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901107#comment-17901107 ] ASF GitHub Bot commented on HADOOP-19346: - hadoop-yetus commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2499833346 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 18s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 53s | | trunk passed | | +1 :green_heart: | compile | 9m 2s | | trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 8m 8s | | trunk passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | checkstyle | 0m 45s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 59s | | trunk passed | | +1 :green_heart: | javadoc | 0m 47s | | trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 35s | | trunk passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | spotbugs | 1m 32s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 9s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 30s | | the patch passed | | +1 :green_heart: | compile | 8m 39s | | the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 8m 39s | | the patch passed | | +1 :green_heart: | compile | 8m 15s | | the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | +1 :green_heart: | javac | 8m 15s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 35s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/1/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 1 new + 36 unchanged - 0 fixed = 37 total (was 36) | | +1 :green_heart: | mvnsite | 0m 55s | | the patch passed | | +1 :green_heart: | javadoc | 0m 43s | | the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 36s | | the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga | | -1 :x: | spotbugs | 1m 36s | [/new-spotbugs-hadoop-common-project_hadoop-common.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/1/artifact/out/new-spotbugs-hadoop-common-project_hadoop-common.html) | hadoop-common-project/hadoop-common generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) | | +1 :green_heart: | shadedclient | 20m 58s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 17m 34s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7187/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 41s | | The patch does not generate ASF License warnings. | | | | 136m 50s | | | | Reason | Tests | |---:|:--| | SpotBugs | module:hadoop-common-project/hadoop-common | | | ViewFileSystem$InnerCache$Key is incompatible with expected argument type org.apache.hadoop.fs.FileSystem in org.apache.hadoop.fs.viewfs.ViewFileSystem$InnerCache.get(URI, Configuration) At ViewFileSystem.java:argument type org.apache.hadoop.fs.FileSystem in org.apache.hadoop.fs.viewfs.ViewFileSystem$InnerCache.get(URI, Configuration) At ViewFileSystem.java:[line 130] | | | Return value of putIfAbsent is ignored, but fs is reused in org.apache.hadoop.fs.viewfs.ViewFileSystem$InnerCache.get(URI, Configuration) At ViewFileS
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901088#comment-17901088 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin commented on PR #7187: URL: https://github.com/apache/hadoop/pull/7187#issuecomment-2499661831 @ctrezzo: Could you take a look? thanks, > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19346) ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
[ https://issues.apache.org/jira/browse/HADOOP-19346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901087#comment-17901087 ] ASF GitHub Bot commented on HADOOP-19346: - xinglin opened a new pull request, #7187: URL: https://github.com/apache/hadoop/pull/7187 ### Description of PR HADOOP-19346. ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent() ### How was this patch tested? mvn package -Pdist -PskipTests ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with > ConcurrentHashMap/putIfAbsent() > --- > > Key: HADOOP-19346 > URL: https://issues.apache.org/jira/browse/HADOOP-19346 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > > Use of ReentrantReadWriteLock + Map can be replaced/simplified with a > ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org