[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506345 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 29/Oct/20 18:57 Start Date: 29/Oct/20 18:57 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2421: URL: https://github.com/apache/hadoop/pull/2421#issuecomment-718955616 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 37m 33s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 43s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 23m 2s | | trunk passed | | +1 :green_heart: | compile | 21m 21s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | compile | 18m 9s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | checkstyle | 2m 54s | | trunk passed | | +1 :green_heart: | mvnsite | 26m 3s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 44s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 7m 17s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 7m 30s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +0 :ok: | spotbugs | 34m 27s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 37m 33s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 24m 10s | | the patch passed | | +1 :green_heart: | compile | 21m 2s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javac | 21m 2s | | the patch passed | | +1 :green_heart: | compile | 18m 22s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | javac | 18m 22s | | the patch passed | | +1 :green_heart: | checkstyle | 2m 49s | | the patch passed | | +1 :green_heart: | mvnsite | 20m 47s | | the patch passed | | +1 :green_heart: | shellcheck | 0m 0s | | There were no new shellcheck issues. | | +1 :green_heart: | shelldocs | 0m 14s | | There were no new shelldocs issues. | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 16m 44s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 7m 19s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 7m 33s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | findbugs | 37m 52s | | the patch passed | _ Other Tests _ | | -1 :x: | unit | 615m 2s | [/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2421/2/artifact/out/patch-unit-root.txt) | root in the patch passed. | | +1 :green_heart: | asflicense | 1m 30s | | The patch does not generate ASF License warnings. | | | | 991m 20s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.yarn.applications.distributedshell.TestDistributedShell | | | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | | | hadoop.yarn.server.nodemanager.TestNodeManagerReboot | | | hadoop.yarn.server.nodemanager.TestNodeManagerResync | | | hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch | | | hadoop.yarn.server.nodemanager.TestNodeManagerShutdown | | | hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor | | | hadoop.security.TestLdapGroupsMapping | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.TestReadStripedFileWithDNFailure | | | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.TestReconstructStripedFile | | |
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506058 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 29/Oct/20 04:49 Start Date: 29/Oct/20 04:49 Worklog Time Spent: 10m Work Description: aajisaka commented on a change in pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#discussion_r513966454 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java ## @@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int range, dnIdxToDie = getDataNodeToKill(filePath); DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie); shutdownDataNode(dnToDie); + // wait enough time for the locations to be updated. + Thread.sleep(STALE_INTERVAL); Review comment: I could reproduce even without `-Pparallel-tests` ``` $ pwd /home/aajisaka/hadoop/hadoop-hdfs-project/hadoop-hdfs $ mvn test -Dtest=TestFileChecksum -Pnative ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506058) Time Spent: 3h 40m (was: 3.5h) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3h 40m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506054 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 29/Oct/20 04:40 Start Date: 29/Oct/20 04:40 Worklog Time Spent: 10m Work Description: aajisaka commented on a change in pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#discussion_r513963136 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java ## @@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int range, dnIdxToDie = getDataNodeToKill(filePath); DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie); shutdownDataNode(dnToDie); + // wait enough time for the locations to be updated. + Thread.sleep(STALE_INTERVAL); Review comment: I could reproduce the failure locally: ``` $ ./start-build-env.sh $ mvn clean install -DskipTests -Pnative $ cd hadoop-hdfs-project/hadoop-hdfs $ mvn test -Pnative -Pparallel-tests ``` Attached the stdout in the JIRA: https://issues.apache.org/jira/secure/attachment/13014321/org.apache.hadoop.hdfs.TestFileChecksum-output.txt This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506054) Time Spent: 3.5h (was: 3h 20m) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 3.5h > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506034 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 29/Oct/20 03:38 Start Date: 29/Oct/20 03:38 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2421: URL: https://github.com/apache/hadoop/pull/2421#issuecomment-718338898 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 31s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 30m 14s | | trunk passed | | +1 :green_heart: | compile | 1m 18s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | compile | 1m 13s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | checkstyle | 0m 49s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 24s | | trunk passed | | +1 :green_heart: | shadedclient | 17m 11s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 55s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 1m 28s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +0 :ok: | spotbugs | 3m 3s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 3m 0s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 13s | | the patch passed | | +1 :green_heart: | compile | 1m 12s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javac | 1m 12s | | the patch passed | | +1 :green_heart: | compile | 1m 5s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | javac | 1m 5s | | the patch passed | | +1 :green_heart: | checkstyle | 0m 39s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 11s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 14m 44s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 47s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 1m 20s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | findbugs | 3m 7s | | the patch passed | _ Other Tests _ | | -1 :x: | unit | 59m 22s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2421/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | -1 :x: | asflicense | 0m 37s | [/patch-asflicense-problems.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2421/1/artifact/out/patch-asflicense-problems.txt) | The patch generated 4 ASF License warnings. | | | | 145m 17s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeSync | | | hadoop.hdfs.TestDFSStorageStateRecovery | | | hadoop.hdfs.TestSafeModeWithStripedFile | | | hadoop.hdfs.TestFileCreationClient | | | hadoop.hdfs.tools.TestDFSZKFailoverController | | | hadoop.hdfs.TestErasureCodingPolicies | | | hadoop.hdfs.TestDecommissionWithStriped | | | hadoop.hdfs.TestMiniDFSCluster | | | hadoop.hdfs.TestMultipleNNPortQOP | | | hadoop.hdfs.TestDFSStripedInputStream | | | hadoop.hdfs.TestFileAppend2 | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.TestDatanodeDeath | | | hadoop.hdfs.TestErasureCodingMultipleRacks | | | hadoop.hdfs.TestSnapshotCommands | | | hadoop.hdfs.TestReadStripedFileWithDecoding | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.TestDFSClientSocketSize | | | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy |
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506020 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 29/Oct/20 02:49 Start Date: 29/Oct/20 02:49 Worklog Time Spent: 10m Work Description: amahussein commented on pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#issuecomment-718325314 > I think the `keepLongStdio` option can be used https://www.jenkins.io/doc/pipeline/steps/junit/ > The option can be enabled by updating the `./Jenkinsfile` as follows: > > ```diff > -junit "${env.SOURCEDIR}/**/target/surefire-reports/*.xml" > +junit keepLongStdio: true, testResults: "${env.SOURCEDIR}/**/target/surefire-reports/*.xml" > ``` Thanks @aajisaka ! I am going to try it out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506020) Time Spent: 3h 10m (was: 3h) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 3h 10m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506006 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 29/Oct/20 02:11 Start Date: 29/Oct/20 02:11 Worklog Time Spent: 10m Work Description: aajisaka commented on pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#issuecomment-718313974 > do you guys know if it is possible to see the full logs of the unit test? I think the `keepLongStdio` option can be used https://www.jenkins.io/doc/pipeline/steps/junit/ The option can be enabled by updating the `./Jenkinsfile` as follows: ```diff -junit "${env.SOURCEDIR}/**/target/surefire-reports/*.xml" +junit keepLongStdio: true, testResults: "${env.SOURCEDIR}/**/target/surefire-reports/*.xml" ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506006) Time Spent: 3h (was: 2h 50m) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 3h > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505986 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 29/Oct/20 01:11 Start Date: 29/Oct/20 01:11 Worklog Time Spent: 10m Work Description: amahussein opened a new pull request #2421: URL: https://github.com/apache/hadoop/pull/2421 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HADOOP-X. Fix a typo in YYY.) For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505986) Time Spent: 2h 50m (was: 2h 40m) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 2h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505969 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 28/Oct/20 23:37 Start Date: 28/Oct/20 23:37 Worklog Time Spent: 10m Work Description: amahussein commented on a change in pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#discussion_r513824033 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java ## @@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int range, dnIdxToDie = getDataNodeToKill(filePath); DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie); shutdownDataNode(dnToDie); + // wait enough time for the locations to be updated. + Thread.sleep(STALE_INTERVAL); Review comment: Thanks @goiri Those are the logs I was looking at. All logs in `TestFileChecksum` and `TestFileChecksumCompositeCrc` truncate the last 9 seconds prior to the failure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505969) Time Spent: 2h 40m (was: 2.5h) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 2h 40m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505966 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 28/Oct/20 23:31 Start Date: 28/Oct/20 23:31 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#discussion_r513822007 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java ## @@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int range, dnIdxToDie = getDataNodeToKill(filePath); DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie); shutdownDataNode(dnToDie); + // wait enough time for the locations to be updated. + Thread.sleep(STALE_INTERVAL); Review comment: I see there is truncation there too though. We may want to make the test a little less verbose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505966) Time Spent: 2.5h (was: 2h 20m) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 2.5h > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505965 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 28/Oct/20 23:29 Start Date: 28/Oct/20 23:29 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#discussion_r513821613 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java ## @@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int range, dnIdxToDie = getDataNodeToKill(filePath); DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie); shutdownDataNode(dnToDie); + // wait enough time for the locations to be updated. + Thread.sleep(STALE_INTERVAL); Review comment: I am not sure exactly what test you are interested, but here you can see one of the failed tests full logs: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2408/2/testReport/org.apache.hadoop.hdfs/TestFileChecksum/testStripedFileChecksumWithMissedDataBlocksRangeQuery11/ Here are all the tests: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2408/2/testReport/org.apache.hadoop.hdfs/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505965) Time Spent: 2h 20m (was: 2h 10m) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 2h 20m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505962 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 28/Oct/20 23:13 Start Date: 28/Oct/20 23:13 Worklog Time Spent: 10m Work Description: amahussein commented on pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#issuecomment-718261099 @goiri and @aajisaka , do you guys know if it is possible to see the full logs of the unit test? The Yetus console and test reports show truncated logs. So, I cannot see the sequence of events that leads to the Exception and the stack trace. I cannot reproduce the failure locally too :/ ``` ...[truncated 896674 chars]... sed]. Total timeout mills is 48, 479813 millis timeout left. at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:351) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505962) Time Spent: 2h 10m (was: 2h) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 2h 10m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505794 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 28/Oct/20 16:25 Start Date: 28/Oct/20 16:25 Worklog Time Spent: 10m Work Description: amahussein commented on a change in pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#discussion_r513584215 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java ## @@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int range, dnIdxToDie = getDataNodeToKill(filePath); DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie); shutdownDataNode(dnToDie); + // wait enough time for the locations to be updated. + Thread.sleep(STALE_INTERVAL); Review comment: I see. the problem is that I cannot reproduce it on my local machine. However, it seems that it fails in a consistent way on Yetus. If it is not a real bug, I wonder if volumeScanner could be a factor in randomly slowing down the DNs. I see many log message from the volume scanner when I run locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505794) Time Spent: 2h (was: 1h 50m) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 2h > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505776 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 28/Oct/20 15:44 Start Date: 28/Oct/20 15:44 Worklog Time Spent: 10m Work Description: amahussein commented on pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#issuecomment-718021914 That's trickier than what I thought. Need another iteration. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505776) Time Spent: 1h 50m (was: 1h 40m) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 1h 50m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505763 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 28/Oct/20 15:32 Start Date: 28/Oct/20 15:32 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#discussion_r513541376 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java ## @@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int range, dnIdxToDie = getDataNodeToKill(filePath); DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie); shutdownDataNode(dnToDie); + // wait enough time for the locations to be updated. + Thread.sleep(STALE_INTERVAL); Review comment: I am not very close to this part of the code but there must be ways to force the statistics to update. Not sure who can help with this part of the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505763) Time Spent: 1h 40m (was: 1.5h) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 1h 40m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505538 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 28/Oct/20 04:03 Start Date: 28/Oct/20 04:03 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#issuecomment-717681661 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 2s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 47s | | trunk passed | | +1 :green_heart: | compile | 1m 19s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | compile | 1m 11s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | checkstyle | 0m 46s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 21s | | trunk passed | | +1 :green_heart: | shadedclient | 18m 54s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 52s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 1m 22s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +0 :ok: | spotbugs | 3m 10s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 3m 8s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 11s | | the patch passed | | +1 :green_heart: | compile | 1m 16s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javac | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 6s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | javac | 1m 6s | | the patch passed | | +1 :green_heart: | checkstyle | 0m 40s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 13s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 16m 14s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 1m 20s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | findbugs | 3m 18s | | the patch passed | _ Other Tests _ | | -1 :x: | unit | 114m 44s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2408/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 37s | | The patch does not generate ASF License warnings. | | | | 206m 56s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.TestFileChecksumCompositeCrc | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetCache | | | hadoop.hdfs.TestMultipleNNPortQOP | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2408/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2408 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1055ba63ccdd 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / ae74407ac43 | | Default Java | Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | Multi-JDK versions |
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505495 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 28/Oct/20 02:30 Start Date: 28/Oct/20 02:30 Worklog Time Spent: 10m Work Description: amahussein commented on a change in pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#discussion_r513142714 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java ## @@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int range, dnIdxToDie = getDataNodeToKill(filePath); DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie); shutdownDataNode(dnToDie); + // wait enough time for the locations to be updated. + Thread.sleep(STALE_INTERVAL); Review comment: Yes, I agree with you @goiri . I experimented with wait for number of live replicas: this did not work. It stayed 8 and did not go back to 9. Do you have suggestion what conditions should we be waiting for? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505495) Time Spent: 1h 20m (was: 1h 10m) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 1h 20m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at >
[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505491 ] ASF GitHub Bot logged work on HDFS-15643: - Author: ASF GitHub Bot Created on: 28/Oct/20 01:58 Start Date: 28/Oct/20 01:58 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #2408: URL: https://github.com/apache/hadoop/pull/2408#discussion_r513134020 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java ## @@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int range, dnIdxToDie = getDataNodeToKill(filePath); DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie); shutdownDataNode(dnToDie); + // wait enough time for the locations to be updated. + Thread.sleep(STALE_INTERVAL); Review comment: Should we waitFor instead? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505491) Time Spent: 1h 10m (was: 1h) > TestFileChecksumCompositeCrc fails intermittently > - > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Critical > Labels: pull-request-available > Attachments: > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log > > Time Spent: 1h 10m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >