[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506345
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 18:57
Start Date: 29/Oct/20 18:57
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2421:
URL: https://github.com/apache/hadoop/pull/2421#issuecomment-718955616


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  37m 33s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 43s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m  2s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  18m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 54s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  26m  3s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 44s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   7m 17s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   7m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |  34m 27s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |  37m 33s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  24m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m  2s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  21m  2s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 22s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  18m 22s |  |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   2m 49s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |  20m 47s |  |  the patch passed  |
   | +1 :green_heart: |  shellcheck  |   0m  0s |  |  There were no new 
shellcheck issues.  |
   | +1 :green_heart: |  shelldocs  |   0m 14s |  |  There were no new 
shelldocs issues.  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  shadedclient  |  16m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   7m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   7m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  findbugs  |  37m 52s |  |  the patch passed  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 615m  2s | 
[/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2421/2/artifact/out/patch-unit-root.txt)
 |  root in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 991m 20s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShell |
   |   | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
   |   | hadoop.yarn.server.nodemanager.TestNodeManagerReboot |
   |   | hadoop.yarn.server.nodemanager.TestNodeManagerResync |
   |   | 
hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch |
   |   | hadoop.yarn.server.nodemanager.TestNodeManagerShutdown |
   |   | 
hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor |
   |   | hadoop.security.TestLdapGroupsMapping |
   |   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
   |   | hadoop.hdfs.TestReadStripedFileWithDNFailure |
   |   | hadoop.hdfs.TestDFSUpgradeFromImage |
   |   | hadoop.hdfs.TestFileChecksum |
   |   | hadoop.hdfs.TestReconstructStripedFile |
   |   | 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506058
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 04:49
Start Date: 29/Oct/20 04:49
Worklog Time Spent: 10m 
  Work Description: aajisaka commented on a change in pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#discussion_r513966454



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java
##
@@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int 
range,
   dnIdxToDie = getDataNodeToKill(filePath);
   DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie);
   shutdownDataNode(dnToDie);
+  // wait enough time for the locations to be updated.
+  Thread.sleep(STALE_INTERVAL);

Review comment:
   I could reproduce even without `-Pparallel-tests`
   ```
   $ pwd
   /home/aajisaka/hadoop/hadoop-hdfs-project/hadoop-hdfs
   $ mvn test -Dtest=TestFileChecksum -Pnative
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506058)
Time Spent: 3h 40m  (was: 3.5h)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, 
> org.apache.hadoop.hdfs.TestFileChecksum-output.txt, 
> org.apache.hadoop.hdfs.TestFileChecksum.txt
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506054
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 04:40
Start Date: 29/Oct/20 04:40
Worklog Time Spent: 10m 
  Work Description: aajisaka commented on a change in pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#discussion_r513963136



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java
##
@@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int 
range,
   dnIdxToDie = getDataNodeToKill(filePath);
   DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie);
   shutdownDataNode(dnToDie);
+  // wait enough time for the locations to be updated.
+  Thread.sleep(STALE_INTERVAL);

Review comment:
   I could reproduce the failure locally:
   ```
   $ ./start-build-env.sh
   $ mvn clean install -DskipTests -Pnative
   $ cd hadoop-hdfs-project/hadoop-hdfs
   $ mvn test -Pnative -Pparallel-tests
   ```
   Attached the stdout in the JIRA: 
https://issues.apache.org/jira/secure/attachment/13014321/org.apache.hadoop.hdfs.TestFileChecksum-output.txt





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506054)
Time Spent: 3.5h  (was: 3h 20m)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, 
> org.apache.hadoop.hdfs.TestFileChecksum-output.txt, 
> org.apache.hadoop.hdfs.TestFileChecksum.txt
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506034
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 03:38
Start Date: 29/Oct/20 03:38
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2421:
URL: https://github.com/apache/hadoop/pull/2421#issuecomment-718338898


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   0m 49s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  17m 11s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  trunk passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +0 :ok: |  spotbugs  |   3m  3s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m  0s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javac  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  javac  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  shadedclient  |  14m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  findbugs  |   3m  7s |  |  the patch passed  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  59m 22s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2421/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | -1 :x: |  asflicense  |   0m 37s | 
[/patch-asflicense-problems.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2421/1/artifact/out/patch-asflicense-problems.txt)
 |  The patch generated 4 ASF License warnings.  |
   |  |   | 145m 17s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
   |   | hadoop.hdfs.TestDFSStorageStateRecovery |
   |   | hadoop.hdfs.TestSafeModeWithStripedFile |
   |   | hadoop.hdfs.TestFileCreationClient |
   |   | hadoop.hdfs.tools.TestDFSZKFailoverController |
   |   | hadoop.hdfs.TestErasureCodingPolicies |
   |   | hadoop.hdfs.TestDecommissionWithStriped |
   |   | hadoop.hdfs.TestMiniDFSCluster |
   |   | hadoop.hdfs.TestMultipleNNPortQOP |
   |   | hadoop.hdfs.TestDFSStripedInputStream |
   |   | hadoop.hdfs.TestFileAppend2 |
   |   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
   |   | hadoop.hdfs.TestDistributedFileSystem |
   |   | hadoop.hdfs.TestDatanodeDeath |
   |   | hadoop.hdfs.TestErasureCodingMultipleRacks |
   |   | hadoop.hdfs.TestSnapshotCommands |
   |   | hadoop.hdfs.TestReadStripedFileWithDecoding |
   |   | hadoop.hdfs.TestFileChecksum |
   |   | hadoop.hdfs.TestDFSClientSocketSize |
   |   | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy |
  

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506020
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 02:49
Start Date: 29/Oct/20 02:49
Worklog Time Spent: 10m 
  Work Description: amahussein commented on pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#issuecomment-718325314


   > I think the `keepLongStdio` option can be used 
https://www.jenkins.io/doc/pipeline/steps/junit/
   > The option can be enabled by updating the `./Jenkinsfile` as follows:
   > 
   > ```diff
   > -junit "${env.SOURCEDIR}/**/target/surefire-reports/*.xml"
   > +junit keepLongStdio: true, testResults: 
"${env.SOURCEDIR}/**/target/surefire-reports/*.xml"
   > ```
   
   Thanks @aajisaka !
   I am going to try it out.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506020)
Time Spent: 3h 10m  (was: 3h)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=506006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506006
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 02:11
Start Date: 29/Oct/20 02:11
Worklog Time Spent: 10m 
  Work Description: aajisaka commented on pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#issuecomment-718313974


   > do you guys know if it is possible to see the full logs of the unit test?
   
   I think the `keepLongStdio` option can be used 
https://www.jenkins.io/doc/pipeline/steps/junit/
   The option can be enabled by updating the `./Jenkinsfile` as follows:
   ```diff
   -junit "${env.SOURCEDIR}/**/target/surefire-reports/*.xml"
   +junit keepLongStdio: true, testResults: 
"${env.SOURCEDIR}/**/target/surefire-reports/*.xml"
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506006)
Time Spent: 3h  (was: 2h 50m)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505986
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 29/Oct/20 01:11
Start Date: 29/Oct/20 01:11
Worklog Time Spent: 10m 
  Work Description: amahussein opened a new pull request #2421:
URL: https://github.com/apache/hadoop/pull/2421


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HADOOP-X. Fix a typo in YYY.)
   For more details, please see 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505986)
Time Spent: 2h 50m  (was: 2h 40m)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505969
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 23:37
Start Date: 28/Oct/20 23:37
Worklog Time Spent: 10m 
  Work Description: amahussein commented on a change in pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#discussion_r513824033



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java
##
@@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int 
range,
   dnIdxToDie = getDataNodeToKill(filePath);
   DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie);
   shutdownDataNode(dnToDie);
+  // wait enough time for the locations to be updated.
+  Thread.sleep(STALE_INTERVAL);

Review comment:
   Thanks @goiri 
   Those are the logs I was looking at. All logs in `TestFileChecksum` and 
`TestFileChecksumCompositeCrc` truncate the last 9 seconds prior to the failure.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505969)
Time Spent: 2h 40m  (was: 2.5h)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505966
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 23:31
Start Date: 28/Oct/20 23:31
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#discussion_r513822007



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java
##
@@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int 
range,
   dnIdxToDie = getDataNodeToKill(filePath);
   DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie);
   shutdownDataNode(dnToDie);
+  // wait enough time for the locations to be updated.
+  Thread.sleep(STALE_INTERVAL);

Review comment:
   I see there is truncation there too though.
   We may want to make the test a little less verbose.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505966)
Time Spent: 2.5h  (was: 2h 20m)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505965
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 23:29
Start Date: 28/Oct/20 23:29
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#discussion_r513821613



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java
##
@@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int 
range,
   dnIdxToDie = getDataNodeToKill(filePath);
   DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie);
   shutdownDataNode(dnToDie);
+  // wait enough time for the locations to be updated.
+  Thread.sleep(STALE_INTERVAL);

Review comment:
   I am not sure exactly what test you are interested, but here you can see 
one of the failed tests full logs:
   
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2408/2/testReport/org.apache.hadoop.hdfs/TestFileChecksum/testStripedFileChecksumWithMissedDataBlocksRangeQuery11/
   
   Here are all the tests:
   
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2408/2/testReport/org.apache.hadoop.hdfs/





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505965)
Time Spent: 2h 20m  (was: 2h 10m)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505962
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 23:13
Start Date: 28/Oct/20 23:13
Worklog Time Spent: 10m 
  Work Description: amahussein commented on pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#issuecomment-718261099


   @goiri and @aajisaka , do you guys know if it is possible to see the full 
logs of the unit test?
   The Yetus console and test reports show truncated logs. So, I cannot see the 
sequence of events that leads to the Exception and the stack trace. I cannot 
reproduce the failure locally too :/
   
   ```
   ...[truncated 896674 chars]...
   sed]. Total timeout mills is 48, 479813 millis timeout left.
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:351)
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505962)
Time Spent: 2h 10m  (was: 2h)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505794
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 16:25
Start Date: 28/Oct/20 16:25
Worklog Time Spent: 10m 
  Work Description: amahussein commented on a change in pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#discussion_r513584215



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java
##
@@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int 
range,
   dnIdxToDie = getDataNodeToKill(filePath);
   DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie);
   shutdownDataNode(dnToDie);
+  // wait enough time for the locations to be updated.
+  Thread.sleep(STALE_INTERVAL);

Review comment:
   I see. the problem is that I cannot reproduce it on my local machine. 
However, it seems that it fails in a consistent way on Yetus.
   If it is not a real bug, I wonder if volumeScanner could be a factor in 
randomly slowing down the DNs. I see many log message from the volume scanner 
when I run locally. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505794)
Time Spent: 2h  (was: 1h 50m)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505776
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 15:44
Start Date: 28/Oct/20 15:44
Worklog Time Spent: 10m 
  Work Description: amahussein commented on pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#issuecomment-718021914


   That's trickier than what I thought. Need another iteration.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505776)
Time Spent: 1h 50m  (was: 1h 40m)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505763
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 15:32
Start Date: 28/Oct/20 15:32
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#discussion_r513541376



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java
##
@@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int 
range,
   dnIdxToDie = getDataNodeToKill(filePath);
   DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie);
   shutdownDataNode(dnToDie);
+  // wait enough time for the locations to be updated.
+  Thread.sleep(STALE_INTERVAL);

Review comment:
   I am not very close to this part of the code but there must be ways to 
force the statistics to update.
   Not sure who can help with this part of the code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505763)
Time Spent: 1h 40m  (was: 1.5h)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505538
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 04:03
Start Date: 28/Oct/20 04:03
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#issuecomment-717681661


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  2s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 47s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  18m 54s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  trunk passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +0 :ok: |  spotbugs  |   3m 10s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m  8s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  javac  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  shadedclient  |  16m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  findbugs  |   3m 18s |  |  the patch passed  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 114m 44s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2408/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 37s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 206m 56s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.TestRollingUpgrade |
   |   | hadoop.hdfs.TestFileChecksumCompositeCrc |
   |   | hadoop.hdfs.TestFileChecksum |
   |   | hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate |
   |   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetCache |
   |   | hadoop.hdfs.TestMultipleNNPortQOP |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2408/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2408 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 1055ba63ccdd 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ae74407ac43 |
   | Default Java | Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 |
   | Multi-JDK versions | 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505495
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 02:30
Start Date: 28/Oct/20 02:30
Worklog Time Spent: 10m 
  Work Description: amahussein commented on a change in pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#discussion_r513142714



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java
##
@@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int 
range,
   dnIdxToDie = getDataNodeToKill(filePath);
   DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie);
   shutdownDataNode(dnToDie);
+  // wait enough time for the locations to be updated.
+  Thread.sleep(STALE_INTERVAL);

Review comment:
   Yes, I agree with you @goiri .
   I experimented with wait for number of live replicas: this did not work. It 
stayed 8 and did not go back to 9.
   Do you have suggestion what conditions should we be waiting for?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505495)
Time Spent: 1h 20m  (was: 1h 10m)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> 

[jira] [Work logged] (HDFS-15643) TestFileChecksumCompositeCrc fails intermittently

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15643?focusedWorklogId=505491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505491
 ]

ASF GitHub Bot logged work on HDFS-15643:
-

Author: ASF GitHub Bot
Created on: 28/Oct/20 01:58
Start Date: 28/Oct/20 01:58
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #2408:
URL: https://github.com/apache/hadoop/pull/2408#discussion_r513134020



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksum.java
##
@@ -575,6 +596,8 @@ private FileChecksum getFileChecksum(String filePath, int 
range,
   dnIdxToDie = getDataNodeToKill(filePath);
   DataNode dnToDie = cluster.getDataNodes().get(dnIdxToDie);
   shutdownDataNode(dnToDie);
+  // wait enough time for the locations to be updated.
+  Thread.sleep(STALE_INTERVAL);

Review comment:
   Should we waitFor instead?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505491)
Time Spent: 1h 10m  (was: 1h)

> TestFileChecksumCompositeCrc fails intermittently
> -
>
> Key: HDFS-15643
> URL: https://issues.apache.org/jira/browse/HDFS-15643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 
> TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases 
> {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The 
> following is a sample of the stack trace in two of them Query7 and Query8.
> {code:bash}
> org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail 
> to get block checksum for 
> LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001;
>  getBlockSize()=37748736; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]];
>  indices=[1, 2, 3, 4, 5, 6, 7, 8]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295)
>   at 
> org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
>