[jira] [Commented] (HDFS-17451) RBF: fix spotbugs for redundant nullcheck of dns.
[ https://issues.apache.org/jira/browse/HDFS-17451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835584#comment-17835584 ] ASF GitHub Bot commented on HDFS-17451: --- KeeProMise commented on code in PR #6697: URL: https://github.com/apache/hadoop/pull/6697#discussion_r1558805351 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -1090,7 +1090,7 @@ DatanodeInfo[] getCachedDatanodeReport(DatanodeReportType type) throws IOException { try { DatanodeInfo[] dns = this.dnCache.get(type); - if (dns == null) { + if (dns.length == 0) { Review Comment: @simbadzina https://github.com/apache/hadoop/assets/38941777/d91a9e69-14a6-44ac-b983-65364ae4dd8c;> > RBF: fix spotbugs for redundant nullcheck of dns. > - > > Key: HDFS-17451 > URL: https://issues.apache.org/jira/browse/HDFS-17451 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > > h2. Dodgy code Warnings > ||Code||Warning|| > |RCN|Redundant nullcheck of dns, which is known to be non-null in > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType)| > | |[Bug type RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE (click for > details)|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6655/8/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf-warnings.html#RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE] > In class org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer > In method > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType) > Value loaded from dns > Return value of > org.apache.hadoop.thirdparty.com.google.common.cache.LoadingCache.get(Object) > of type Object > Redundant null check at RouterRpcServer.java:[line 1093]| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17453) IncrementalBlockReport can have race condition with Edit Log Tailer
[ https://issues.apache.org/jira/browse/HDFS-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835585#comment-17835585 ] ASF GitHub Bot commented on HDFS-17453: --- hadoop-yetus commented on PR #6708: URL: https://github.com/apache/hadoop/pull/6708#issuecomment-2046497984 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 11m 37s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 46m 2s | | trunk passed | | +1 :green_heart: | compile | 1m 22s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 1m 14s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 1m 12s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 26s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 41s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 18s | | trunk passed | | +1 :green_heart: | shadedclient | 35m 44s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 13s | | the patch passed | | +1 :green_heart: | compile | 1m 15s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 1m 15s | | the patch passed | | +1 :green_heart: | compile | 1m 7s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 1m 7s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 58s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 15s | | the patch passed | | +1 :green_heart: | javadoc | 0m 53s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 37s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 17s | | the patch passed | | +1 :green_heart: | shadedclient | 35m 59s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 230m 8s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 46s | | The patch does not generate ASF License warnings. | | | | 384m 30s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6708/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6708 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 3d8c054a2c10 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 7e36db7affc66da101cfda141c9f35afc82aa606 | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6708/8/testReport/ | | Max. process+thread count | 4658 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6708/8/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > IncrementalBlockReport can have race
[jira] [Created] (HDFS-17459) [FGL] Summarize this feature
ZanderXu created HDFS-17459: --- Summary: [FGL] Summarize this feature Key: HDFS-17459 URL: https://issues.apache.org/jira/browse/HDFS-17459 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu Write a doc to summarize this feature so we can merge it into the trunk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17445) [FGL] All remaining operations support fine-grained locking
[ https://issues.apache.org/jira/browse/HDFS-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei resolved HDFS-17445. Resolution: Fixed > [FGL] All remaining operations support fine-grained locking > --- > > Key: HDFS-17445 > URL: https://issues.apache.org/jira/browse/HDFS-17445 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17445) [FGL] All remaining operations support fine-grained locking
[ https://issues.apache.org/jira/browse/HDFS-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835565#comment-17835565 ] ASF GitHub Bot commented on HDFS-17445: --- ferhui merged PR #6715: URL: https://github.com/apache/hadoop/pull/6715 > [FGL] All remaining operations support fine-grained locking > --- > > Key: HDFS-17445 > URL: https://issues.apache.org/jira/browse/HDFS-17445 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17445) [FGL] All remaining operations support fine-grained locking
[ https://issues.apache.org/jira/browse/HDFS-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835566#comment-17835566 ] ASF GitHub Bot commented on HDFS-17445: --- ferhui commented on PR #6715: URL: https://github.com/apache/hadoop/pull/6715#issuecomment-2046331457 Thanks for contribution. Merged. > [FGL] All remaining operations support fine-grained locking > --- > > Key: HDFS-17445 > URL: https://issues.apache.org/jira/browse/HDFS-17445 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17451) RBF: fix spotbugs for redundant nullcheck of dns.
[ https://issues.apache.org/jira/browse/HDFS-17451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835543#comment-17835543 ] ASF GitHub Bot commented on HDFS-17451: --- hadoop-yetus commented on PR #6697: URL: https://github.com/apache/hadoop/pull/6697#issuecomment-2046132702 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 12m 3s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 56m 4s | | trunk passed | | +1 :green_heart: | compile | 0m 44s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 0m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 46s | | trunk passed | | +1 :green_heart: | javadoc | 0m 44s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 30s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | -1 :x: | spotbugs | 1m 28s | [/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6697/3/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf-warnings.html) | hadoop-hdfs-project/hadoop-hdfs-rbf in trunk has 1 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 36m 35s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 0m 35s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 0m 30s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 0m 30s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 20s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 35s | | the patch passed | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 27s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 1m 33s | | hadoop-hdfs-project/hadoop-hdfs-rbf generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1) | | +1 :green_heart: | shadedclient | 38m 45s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 29m 33s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 39s | | The patch does not generate ASF License warnings. | | | | 189m 11s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6697/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6697 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 2033f43410ac 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / f43a1c75f464c081d3de3b807f23bec27fc173cb | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results |
[jira] [Commented] (HDFS-17453) IncrementalBlockReport can have race condition with Edit Log Tailer
[ https://issues.apache.org/jira/browse/HDFS-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835540#comment-17835540 ] ASF GitHub Bot commented on HDFS-17453: --- dannytbecker commented on code in PR #6708: URL: https://github.com/apache/hadoop/pull/6708#discussion_r1558331856 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingDataNodeMessages.java: ## @@ -95,16 +95,27 @@ void removeAllMessagesForDatanode(DatanodeDescriptor dn) { void enqueueReportedBlock(DatanodeStorageInfo storageInfo, Block block, ReplicaState reportedState) { +long genStamp = block.getGenerationStamp(); +Queue queue = null; if (BlockIdManager.isStripedBlockID(block.getBlockId())) { Block blkId = new Block(BlockIdManager.convertToStripedID(block .getBlockId())); - getBlockQueue(blkId).add( - new ReportedBlockInfo(storageInfo, new Block(block), reportedState)); + queue = getBlockQueue(blkId); } else { block = new Block(block); - getBlockQueue(block).add( - new ReportedBlockInfo(storageInfo, block, reportedState)); + queue = getBlockQueue(block); } +// We only want the latest non-future reported block to be queued for each +// DataNode. Otherwise, there can be a race condition that causes an old +// reported block to be kept in the queue until the SNN switches to ANN and +// the old reported block will be processed and marked as corrupt by the ANN. +// See HDFS-17453 +int size = queue.size(); +if (queue.removeIf(rbi -> rbi.storageInfo.equals(storageInfo) && Review Comment: I added these null checks, but I needed to remove the reportedState null check because the null value is used by removeStoredBlock > IncrementalBlockReport can have race condition with Edit Log Tailer > --- > > Key: HDFS-17453 > URL: https://issues.apache.org/jira/browse/HDFS-17453 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover, ha, hdfs, namenode >Affects Versions: 3.3.0, 3.3.1, 2.10.2, 3.3.2, 3.3.5, 3.3.4, 3.3.6 >Reporter: Danny Becker >Assignee: Danny Becker >Priority: Major > Labels: pull-request-available > > h2. Summary > There is a race condition between IncrementalBlockReports (IBR) and > EditLogTailer in Standby NameNode (SNN) which can lead to leaked IBRs and > false corrupt blocks after HA Failover. The race condition occurs when the > SNN loads the edit logs before it receives the block reports from DataNode > (DN). > h2. Example > In the following example there is a block (b1) with 3 generation stamps (gs1, > gs2, gs3). > # SNN1 loads edit logs for b1gs1 and b1gs2. > # DN1 sends the IBR for b1gs1 to SNN1. > # SNN1 will determine that the reported block b1gs1 from DN1 is corrupt and > it will be queued for later. > [BlockManager.java|https://github.com/apache/hadoop/blob/6ed73896f6e8b4b7c720eff64193cb30b3e77fb2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3447C1-L3464C6] > {code:java} > BlockToMarkCorrupt c = checkReplicaCorrupt( > block, reportedState, storedBlock, ucState, dn); > if (c != null) { > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // Storing the reported block for later processing, as that is what > // comes from the IBR / FBR and hence what we should use to compare > // against the memory state. > // See HDFS-6289 and HDFS-15422 for more context. > queueReportedBlock(storageInfo, block, reportedState, > QUEUE_REASON_CORRUPT_STATE); > } else { > toCorrupt.add(c); > } > return storedBlock; > } {code} > # DN1 sends IBR for b1gs2 and b1gs3 to SNN1. > # SNN1 processes b1sg2 and updates the blocks map. > # SNN1 queues b1gs3 for later because it determines that b1gs3 is a future > genstamp. > # SNN1 loads b1gs3 edit logs and processes the queued reports for b1. > # SNN1 processes b1gs1 first and puts it back in the queue. > # SNN1 processes b1gs3 next and updates the blocks map. > # Later, SNN1 becomes the Active NameNode (ANN) during an HA Failover. > # SNN1 will catch to the latest edit logs, then process all queued block > reports to become the ANN. > # ANN1 will process b1gs1 and mark it as corrupt. > If the example above happens for every DN which stores b1, then when the HA > failover happens, b1 will be incorrectly marked as corrupt. This will be > fixed when the first DN sends a
[jira] [Commented] (HDFS-17451) RBF: fix spotbugs for redundant nullcheck of dns.
[ https://issues.apache.org/jira/browse/HDFS-17451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835515#comment-17835515 ] ASF GitHub Bot commented on HDFS-17451: --- simbadzina commented on code in PR #6697: URL: https://github.com/apache/hadoop/pull/6697#discussion_r1558177674 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -1090,7 +1090,7 @@ DatanodeInfo[] getCachedDatanodeReport(DatanodeReportType type) throws IOException { try { DatanodeInfo[] dns = this.dnCache.get(type); - if (dns == null) { + if (dns.length == 0) { Review Comment: @KeeProMise where do you see that `get` has a `non-null` annotation? > RBF: fix spotbugs for redundant nullcheck of dns. > - > > Key: HDFS-17451 > URL: https://issues.apache.org/jira/browse/HDFS-17451 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > > h2. Dodgy code Warnings > ||Code||Warning|| > |RCN|Redundant nullcheck of dns, which is known to be non-null in > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType)| > | |[Bug type RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE (click for > details)|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6655/8/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf-warnings.html#RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE] > In class org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer > In method > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType) > Value loaded from dns > Return value of > org.apache.hadoop.thirdparty.com.google.common.cache.LoadingCache.get(Object) > of type Object > Redundant null check at RouterRpcServer.java:[line 1093]| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17455) Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt
[ https://issues.apache.org/jira/browse/HDFS-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835412#comment-17835412 ] ASF GitHub Bot commented on HDFS-17455: --- Hexiaoqiao commented on code in PR #6710: URL: https://github.com/apache/hadoop/pull/6710#discussion_r1557595628 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInputStream.java: ## @@ -287,4 +300,69 @@ public void testReadWithoutPreferredCachingReplica() throws IOException { cluster.shutdown(); } } + + @Test + public void testCreateBlockReaderWhenInvalidBlockTokenException() throws + IOException, InterruptedException, TimeoutException { +GenericTestUtils.setLogLevel(DFSClient.LOG, Level.DEBUG); +Configuration conf = new Configuration(); +DFSClientFaultInjector oldFaultInjector = DFSClientFaultInjector.get(); +try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).numDataNodes(3).build()) { + cluster.waitActive(); + DistributedFileSystem fs = cluster.getFileSystem(); + String file = "/testfile"; + Path path = new Path(file); + long fileLen = 1024 * 64; + EnumSet createFlags = EnumSet.of(CREATE); + FSDataOutputStream out = fs.create(path, FsPermission.getFileDefault(), createFlags, Review Comment: +1 > Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt > - > > Key: HDFS-17455 > URL: https://issues.apache.org/jira/browse/HDFS-17455 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the client read data, connect to the datanode, because at this time the > datanode access token is invalid will throw InvalidBlockTokenException. At > this time, when call fetchBlockAt method will throw > java.lang.IndexOutOfBoundsException causing read data failed. > *Root case:* > * The HDFS file contains only one RBW block, with a block data size of 2048KB. > * The client open this file and seeks to the offset of 1024KB to read data. > * Call DFSInputStream#getBlockReader method connect to the datanode, because > at this time the datanode access token is invalid will throw > InvalidBlockTokenException., and call DFSInputStream#fetchBlockAt will throw > java.lang.IndexOutOfBoundsException. > {code:java} > private synchronized DatanodeInfo blockSeekTo(long target) > throws IOException { >if (target >= getFileLength()) { >// the target size is smaller than fileLength (completeBlockSize + > lastBlockBeingWrittenLength), >// here at this time target is 1024 and getFileLength is 2048 > throw new IOException("Attempted to read past end of file"); >} >... >while (true) { > ... > try { >blockReader = getBlockReader(targetBlock, offsetIntoBlock, >targetBlock.getBlockSize() - offsetIntoBlock, targetAddr, >storageType, chosenNode); >if(connectFailedOnce) { > DFSClient.LOG.info("Successfully connected to " + targetAddr + > " for " + targetBlock.getBlock()); >} >return chosenNode; > } catch (IOException ex) { >... >} else if (refetchToken > 0 && tokenRefetchNeeded(ex, targetAddr)) { > refetchToken--; > // Here will catch InvalidBlockTokenException. > fetchBlockAt(target); >} else { > ... >} > } >} > } > private LocatedBlock fetchBlockAt(long offset, long length, boolean useCache) > throws IOException { > maybeRegisterBlockRefresh(); > synchronized(infoLock) { > // Here the locatedBlocks only contains one locatedBlock, at this time > the offset is 1024 and fileLength is 0, > // so the targetBlockIdx is -2 > int targetBlockIdx = locatedBlocks.findBlock(offset); > if (targetBlockIdx < 0) { // block is not cached > targetBlockIdx = LocatedBlocks.getInsertIndex(targetBlockIdx); > // Here the targetBlockIdx is 1; > useCache = false; > } > if (!useCache) { // fetch blocks > final LocatedBlocks newBlocks = (length == 0) > ? dfsClient.getLocatedBlocks(src, offset) > : dfsClient.getLocatedBlocks(src, offset, length); > if (newBlocks == null || newBlocks.locatedBlockCount() == 0) { > throw new EOFException("Could not find target position " + offset); > } > // Update the LastLocatedBlock, if offset is for last block. > if (offset >= locatedBlocks.getFileLength()) { > setLocatedBlocksFields(newBlocks, getLastBlockLength(newBlocks)); > } else { > locatedBlocks.insertRange(targetBlockIdx, >
[jira] [Commented] (HDFS-17424) [FGL] DelegationTokenSecretManager supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835402#comment-17835402 ] ASF GitHub Bot commented on HDFS-17424: --- hadoop-yetus commented on PR #6696: URL: https://github.com/apache/hadoop/pull/6696#issuecomment-2045093108 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 19s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ HDFS-17384 Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 49s | | HDFS-17384 passed | | +1 :green_heart: | compile | 0m 43s | | HDFS-17384 passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 0m 42s | | HDFS-17384 passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 0m 38s | | HDFS-17384 passed | | +1 :green_heart: | mvnsite | 0m 44s | | HDFS-17384 passed | | +1 :green_heart: | javadoc | 0m 46s | | HDFS-17384 passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 10s | | HDFS-17384 passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 1m 45s | | HDFS-17384 passed | | +1 :green_heart: | shadedclient | 21m 23s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 37s | | the patch passed | | +1 :green_heart: | compile | 0m 40s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 0m 40s | | the patch passed | | +1 :green_heart: | compile | 0m 36s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 0m 36s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 31s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 41s | | the patch passed | | +1 :green_heart: | javadoc | 0m 31s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 1s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 1m 45s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 9s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 207m 17s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6696/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 32s | | The patch does not generate ASF License warnings. | | | | 300m 27s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6696/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6696 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c8f24b173cce 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | HDFS-17384 / 50f97bf7cf488948649f5b41e934abec74f14928 | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6696/3/testReport/ | | Max.
[jira] [Commented] (HDFS-17455) Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt
[ https://issues.apache.org/jira/browse/HDFS-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835355#comment-17835355 ] ASF GitHub Bot commented on HDFS-17455: --- Hexiaoqiao commented on PR #6710: URL: https://github.com/apache/hadoop/pull/6710#issuecomment-2044900199 Got it. My bad, the first feeling is out of scope will return -1 for binary search, but actually not. > Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt > - > > Key: HDFS-17455 > URL: https://issues.apache.org/jira/browse/HDFS-17455 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the client read data, connect to the datanode, because at this time the > datanode access token is invalid will throw InvalidBlockTokenException. At > this time, when call fetchBlockAt method will throw > java.lang.IndexOutOfBoundsException causing read data failed. > *Root case:* > * The HDFS file contains only one RBW block, with a block data size of 2048KB. > * The client open this file and seeks to the offset of 1024KB to read data. > * Call DFSInputStream#getBlockReader method connect to the datanode, because > at this time the datanode access token is invalid will throw > InvalidBlockTokenException., and call DFSInputStream#fetchBlockAt will throw > java.lang.IndexOutOfBoundsException. > {code:java} > private synchronized DatanodeInfo blockSeekTo(long target) > throws IOException { >if (target >= getFileLength()) { >// the target size is smaller than fileLength (completeBlockSize + > lastBlockBeingWrittenLength), >// here at this time target is 1024 and getFileLength is 2048 > throw new IOException("Attempted to read past end of file"); >} >... >while (true) { > ... > try { >blockReader = getBlockReader(targetBlock, offsetIntoBlock, >targetBlock.getBlockSize() - offsetIntoBlock, targetAddr, >storageType, chosenNode); >if(connectFailedOnce) { > DFSClient.LOG.info("Successfully connected to " + targetAddr + > " for " + targetBlock.getBlock()); >} >return chosenNode; > } catch (IOException ex) { >... >} else if (refetchToken > 0 && tokenRefetchNeeded(ex, targetAddr)) { > refetchToken--; > // Here will catch InvalidBlockTokenException. > fetchBlockAt(target); >} else { > ... >} > } >} > } > private LocatedBlock fetchBlockAt(long offset, long length, boolean useCache) > throws IOException { > maybeRegisterBlockRefresh(); > synchronized(infoLock) { > // Here the locatedBlocks only contains one locatedBlock, at this time > the offset is 1024 and fileLength is 0, > // so the targetBlockIdx is -2 > int targetBlockIdx = locatedBlocks.findBlock(offset); > if (targetBlockIdx < 0) { // block is not cached > targetBlockIdx = LocatedBlocks.getInsertIndex(targetBlockIdx); > // Here the targetBlockIdx is 1; > useCache = false; > } > if (!useCache) { // fetch blocks > final LocatedBlocks newBlocks = (length == 0) > ? dfsClient.getLocatedBlocks(src, offset) > : dfsClient.getLocatedBlocks(src, offset, length); > if (newBlocks == null || newBlocks.locatedBlockCount() == 0) { > throw new EOFException("Could not find target position " + offset); > } > // Update the LastLocatedBlock, if offset is for last block. > if (offset >= locatedBlocks.getFileLength()) { > setLocatedBlocksFields(newBlocks, getLastBlockLength(newBlocks)); > } else { > locatedBlocks.insertRange(targetBlockIdx, > newBlocks.getLocatedBlocks()); > } > } > // Here the locatedBlocks only contains one locatedBlock, so will throw > java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 > return locatedBlocks.get(targetBlockIdx); > } > } > {code} > The client exception: > {code:java} > java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 > at > java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) > at > java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) > at > java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266) > at java.base/java.util.Objects.checkIndex(Objects.java:359) > at java.base/java.util.ArrayList.get(ArrayList.java:427) > at > org.apache.hadoop.hdfs.protocol.LocatedBlocks.get(LocatedBlocks.java:87) > at >
[jira] [Commented] (HDFS-17424) [FGL] DelegationTokenSecretManager supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835340#comment-17835340 ] ASF GitHub Bot commented on HDFS-17424: --- ZanderXu commented on PR #6696: URL: https://github.com/apache/hadoop/pull/6696#issuecomment-2044641581 > For sync edit logging, it may cause corruption by interspersing edits with the end/start segment edits. As HDFS-13112 said rollEdits is not thread-safe, so it added hasReadLock(). But if we can make rollEdits thread-safe, DelegationTokenSecretManager can not use FSLock. We can mark it as a improvement and complete in milestone II. > [FGL] DelegationTokenSecretManager supports fine-grained lock > - > > Key: HDFS-17424 > URL: https://issues.apache.org/jira/browse/HDFS-17424 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: Yuanbo Liu >Priority: Major > Labels: pull-request-available > > DelegationTokenSecretManager supports fine-grained lock -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17424) [FGL] DelegationTokenSecretManager supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835334#comment-17835334 ] ASF GitHub Bot commented on HDFS-17424: --- ZanderXu commented on code in PR #6696: URL: https://github.com/apache/hadoop/pull/6696#discussion_r1557344652 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java: ## @@ -401,7 +402,10 @@ protected void logExpireToken(final DelegationTokenIdentifier dtId) // closes the edit log files. Doing this inside the // fsn lock will prevent being interrupted when stopping // the secret manager. - namesystem.readLockInterruptibly(); + // TODO: delegation token is a very independent system, so + // it's proper to use an seperated r/w lock instead of fs lock + // for getting/renewing/expiring/canceling token or updating master key. Review Comment: `logUpdateMasterKey` and `logExpireDelegationToken` need to write edit log, so HDFS-13112 add `hasReadLock` for these methods. So I think this FSLock is needed before we understand why `rollEdits` is not thread-safe as HDFS-13112 said. > [FGL] DelegationTokenSecretManager supports fine-grained lock > - > > Key: HDFS-17424 > URL: https://issues.apache.org/jira/browse/HDFS-17424 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: Yuanbo Liu >Priority: Major > Labels: pull-request-available > > DelegationTokenSecretManager supports fine-grained lock -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17445) [FGL] All remaining operations support fine-grained locking
[ https://issues.apache.org/jira/browse/HDFS-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835333#comment-17835333 ] ASF GitHub Bot commented on HDFS-17445: --- ZanderXu commented on PR #6715: URL: https://github.com/apache/hadoop/pull/6715#issuecomment-2044575542 The failed UT `hadoop.hdfs.tools.TestDFSAdmin` works locally. > [FGL] All remaining operations support fine-grained locking > --- > > Key: HDFS-17445 > URL: https://issues.apache.org/jira/browse/HDFS-17445 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17445) [FGL] All remaining operations support fine-grained locking
[ https://issues.apache.org/jira/browse/HDFS-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835326#comment-17835326 ] ASF GitHub Bot commented on HDFS-17445: --- hadoop-yetus commented on PR #6715: URL: https://github.com/apache/hadoop/pull/6715#issuecomment-2044556675 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ HDFS-17384 Compile Tests _ | | +1 :green_heart: | mvninstall | 52m 22s | | HDFS-17384 passed | | +1 :green_heart: | compile | 1m 22s | | HDFS-17384 passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 1m 12s | | HDFS-17384 passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 1m 14s | | HDFS-17384 passed | | +1 :green_heart: | mvnsite | 1m 24s | | HDFS-17384 passed | | +1 :green_heart: | javadoc | 1m 9s | | HDFS-17384 passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 42s | | HDFS-17384 passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 20s | | HDFS-17384 passed | | +1 :green_heart: | shadedclient | 40m 27s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 12s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 7s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 1m 7s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 4s | | hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 414 unchanged - 5 fixed = 414 total (was 419) | | +1 :green_heart: | mvnsite | 1m 12s | | the patch passed | | +1 :green_heart: | javadoc | 0m 57s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 34s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 18s | | the patch passed | | +1 :green_heart: | shadedclient | 40m 18s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 266m 17s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6715/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 42s | | The patch does not generate ASF License warnings. | | | | 425m 19s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6715/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6715 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 65d8160b42e7 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | HDFS-17384 / 3f65a1e1117d75ba75394e6591960548a52db170 | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6715/3/testReport/ | | Max. process+thread count | 2790 (vs. ulimit of
[jira] [Commented] (HDFS-17455) Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt
[ https://issues.apache.org/jira/browse/HDFS-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835310#comment-17835310 ] ASF GitHub Bot commented on HDFS-17455: --- ZanderXu commented on PR #6710: URL: https://github.com/apache/hadoop/pull/6710#issuecomment-2044493479 > I am a little confused with it why here return -2 which is binary search for collections contains only one element, IIUC, it will return -1 for this case. Please correct me if i missed something. The binary search will return -1 if the offset is less than the smallest offset in the list. And it will return -2 if the offset is greater than the maximum offset in the list. We can reproduce by the following steps: 1. Assume one file contains 20 completed blocks and 1 UC block. 2. The current `locatedBlocks` only cache the 10 ~ 20 blocks. 3. The binary search will return -1 if the offset is less than the offset of the first cached block, such as offset in the first ten blocks. 4. The binary search will return -2 if the offset is greater than the offset of the last cached block, such as offset in the last UC block. And another bug is that the `currentBlock` should be relocated after `locatedBlocks` is refreshed. > Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt > - > > Key: HDFS-17455 > URL: https://issues.apache.org/jira/browse/HDFS-17455 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the client read data, connect to the datanode, because at this time the > datanode access token is invalid will throw InvalidBlockTokenException. At > this time, when call fetchBlockAt method will throw > java.lang.IndexOutOfBoundsException causing read data failed. > *Root case:* > * The HDFS file contains only one RBW block, with a block data size of 2048KB. > * The client open this file and seeks to the offset of 1024KB to read data. > * Call DFSInputStream#getBlockReader method connect to the datanode, because > at this time the datanode access token is invalid will throw > InvalidBlockTokenException., and call DFSInputStream#fetchBlockAt will throw > java.lang.IndexOutOfBoundsException. > {code:java} > private synchronized DatanodeInfo blockSeekTo(long target) > throws IOException { >if (target >= getFileLength()) { >// the target size is smaller than fileLength (completeBlockSize + > lastBlockBeingWrittenLength), >// here at this time target is 1024 and getFileLength is 2048 > throw new IOException("Attempted to read past end of file"); >} >... >while (true) { > ... > try { >blockReader = getBlockReader(targetBlock, offsetIntoBlock, >targetBlock.getBlockSize() - offsetIntoBlock, targetAddr, >storageType, chosenNode); >if(connectFailedOnce) { > DFSClient.LOG.info("Successfully connected to " + targetAddr + > " for " + targetBlock.getBlock()); >} >return chosenNode; > } catch (IOException ex) { >... >} else if (refetchToken > 0 && tokenRefetchNeeded(ex, targetAddr)) { > refetchToken--; > // Here will catch InvalidBlockTokenException. > fetchBlockAt(target); >} else { > ... >} > } >} > } > private LocatedBlock fetchBlockAt(long offset, long length, boolean useCache) > throws IOException { > maybeRegisterBlockRefresh(); > synchronized(infoLock) { > // Here the locatedBlocks only contains one locatedBlock, at this time > the offset is 1024 and fileLength is 0, > // so the targetBlockIdx is -2 > int targetBlockIdx = locatedBlocks.findBlock(offset); > if (targetBlockIdx < 0) { // block is not cached > targetBlockIdx = LocatedBlocks.getInsertIndex(targetBlockIdx); > // Here the targetBlockIdx is 1; > useCache = false; > } > if (!useCache) { // fetch blocks > final LocatedBlocks newBlocks = (length == 0) > ? dfsClient.getLocatedBlocks(src, offset) > : dfsClient.getLocatedBlocks(src, offset, length); > if (newBlocks == null || newBlocks.locatedBlockCount() == 0) { > throw new EOFException("Could not find target position " + offset); > } > // Update the LastLocatedBlock, if offset is for last block. > if (offset >= locatedBlocks.getFileLength()) { > setLocatedBlocksFields(newBlocks, getLastBlockLength(newBlocks)); > } else { > locatedBlocks.insertRange(targetBlockIdx, > newBlocks.getLocatedBlocks()); > } > } > //
[jira] [Commented] (HDFS-17455) Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt
[ https://issues.apache.org/jira/browse/HDFS-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835294#comment-17835294 ] ASF GitHub Bot commented on HDFS-17455: --- Hexiaoqiao commented on PR #6710: URL: https://github.com/apache/hadoop/pull/6710#issuecomment-2044371352 @haiyang1987 Thanks for your works. At description I see that you mentioned, ``` // Here the locatedBlocks only contains one locatedBlock, at this time the offset is 1024 and fileLength is 0, // so the targetBlockIdx is -2 int targetBlockIdx = locatedBlocks.findBlock(offset); ``` This is the root cause here, right ? I am a little confused with it why here return -2 which is binary search for collections contains only one element, IIUC, it will return -1 for this case. Please correct me if i missed something. > Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt > - > > Key: HDFS-17455 > URL: https://issues.apache.org/jira/browse/HDFS-17455 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the client read data, connect to the datanode, because at this time the > datanode access token is invalid will throw InvalidBlockTokenException. At > this time, when call fetchBlockAt method will throw > java.lang.IndexOutOfBoundsException causing read data failed. > *Root case:* > * The HDFS file contains only one RBW block, with a block data size of 2048KB. > * The client open this file and seeks to the offset of 1024KB to read data. > * Call DFSInputStream#getBlockReader method connect to the datanode, because > at this time the datanode access token is invalid will throw > InvalidBlockTokenException., and call DFSInputStream#fetchBlockAt will throw > java.lang.IndexOutOfBoundsException. > {code:java} > private synchronized DatanodeInfo blockSeekTo(long target) > throws IOException { >if (target >= getFileLength()) { >// the target size is smaller than fileLength (completeBlockSize + > lastBlockBeingWrittenLength), >// here at this time target is 1024 and getFileLength is 2048 > throw new IOException("Attempted to read past end of file"); >} >... >while (true) { > ... > try { >blockReader = getBlockReader(targetBlock, offsetIntoBlock, >targetBlock.getBlockSize() - offsetIntoBlock, targetAddr, >storageType, chosenNode); >if(connectFailedOnce) { > DFSClient.LOG.info("Successfully connected to " + targetAddr + > " for " + targetBlock.getBlock()); >} >return chosenNode; > } catch (IOException ex) { >... >} else if (refetchToken > 0 && tokenRefetchNeeded(ex, targetAddr)) { > refetchToken--; > // Here will catch InvalidBlockTokenException. > fetchBlockAt(target); >} else { > ... >} > } >} > } > private LocatedBlock fetchBlockAt(long offset, long length, boolean useCache) > throws IOException { > maybeRegisterBlockRefresh(); > synchronized(infoLock) { > // Here the locatedBlocks only contains one locatedBlock, at this time > the offset is 1024 and fileLength is 0, > // so the targetBlockIdx is -2 > int targetBlockIdx = locatedBlocks.findBlock(offset); > if (targetBlockIdx < 0) { // block is not cached > targetBlockIdx = LocatedBlocks.getInsertIndex(targetBlockIdx); > // Here the targetBlockIdx is 1; > useCache = false; > } > if (!useCache) { // fetch blocks > final LocatedBlocks newBlocks = (length == 0) > ? dfsClient.getLocatedBlocks(src, offset) > : dfsClient.getLocatedBlocks(src, offset, length); > if (newBlocks == null || newBlocks.locatedBlockCount() == 0) { > throw new EOFException("Could not find target position " + offset); > } > // Update the LastLocatedBlock, if offset is for last block. > if (offset >= locatedBlocks.getFileLength()) { > setLocatedBlocksFields(newBlocks, getLastBlockLength(newBlocks)); > } else { > locatedBlocks.insertRange(targetBlockIdx, > newBlocks.getLocatedBlocks()); > } > } > // Here the locatedBlocks only contains one locatedBlock, so will throw > java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 > return locatedBlocks.get(targetBlockIdx); > } > } > {code} > The client exception: > {code:java} > java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 > at >
[jira] [Updated] (HDFS-17458) Remove unnecessary BP lock in ReplicaMap
[ https://issues.apache.org/jira/browse/HDFS-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] farmmamba updated HDFS-17458: - Description: In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and in HDFS-16511 we change some methods in ReplicaMap to acquire read lock instead of acquiring write lock. This PR try to remove unnecessary Block_Pool read lock further. Recently, I performed stress tests on datanodes to measure their read/write operations/second. Before we removing some lock, it can only achieve ~2K write ops. After optimizing, it can achieve more than 5K write ops. > Remove unnecessary BP lock in ReplicaMap > > > Key: HDFS-17458 > URL: https://issues.apache.org/jira/browse/HDFS-17458 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > > In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and in > HDFS-16511 we change some methods in ReplicaMap to acquire read lock instead > of acquiring write lock. > This PR try to remove unnecessary Block_Pool read lock further. > Recently, I performed stress tests on datanodes to measure their read/write > operations/second. > Before we removing some lock, it can only achieve ~2K write ops. After > optimizing, it can achieve more than 5K write ops. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17458) Remove unnecessary BP lock in ReplicaMap
farmmamba created HDFS-17458: Summary: Remove unnecessary BP lock in ReplicaMap Key: HDFS-17458 URL: https://issues.apache.org/jira/browse/HDFS-17458 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.4.0 Reporter: farmmamba Assignee: farmmamba -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org