[jira] [Commented] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2020-11-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232588#comment-17232588
 ] 

Hadoop QA commented on HDFS-15240:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 2 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m  
3s{color} |  | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
20s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m  
2s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
13s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
32s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
11s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
23m 11s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
1s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m  
4s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
15s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m  
1s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} |  | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
50s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 
23s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 19m 
23s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
12s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 
12s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 31s{color} | 
[/results-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/302/artifact/out/results-checkstyle-root.txt]
 | {color:orange} root: The patch generated 5 new + 27 unchanged - 0 fixed = 32 
total (was 27) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m  
8s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
24m 50s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
3s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m  
6s{color} |  | {color:green} the patch passed 

[jira] [Updated] (HDFS-15680) Disable Broken Azure Junits

2020-11-15 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15680:
---
Target Version/s: 3.2.2, 3.3.1, 3.4.0, 3.2.3

update: mark 3.2.2/3.2.3/3.3.1/3.4.0 as target version.

> Disable Broken Azure Junits
> ---
>
> Key: HDFS-15680
> URL: https://issues.apache.org/jira/browse/HDFS-15680
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are 6 test classes have been failing on Yetus for several months. 
> They contributed to more than 41 failing tests which makes reviewing Yetus 
> reports every a pain in the neck. Another point is to save the resources and 
> avoiding utilization of ports, memory, and CPU.
> Over the last month, there was some effort to bring the Yetus back to a 
> stable state. However, there is no progress in addressing Azure failures.
> Generally, I do not like to disable failing tests, but for this specific 
> case, I do not assume that it makes any sense to have 41 failing tests from 
> one module for several months. Whenever someone finds that those tests are 
> useful, then they can re-enable the tests on Yetus *_After_* the test is 
> fixed.
> Following a PR, I have to  review that my patch does not cause any failures 
> (include changing error messages in existing tests). A thorough review takes 
> a considerable amount of time browsing the nightly builds and Github reports.
> So, please consider how much time is being spent to review those stack trace 
> over the last months.
> Finally, this is one of the reasons developers tend to ignore the reports, 
> because it would take too much time to review; and by default, the errors are 
> considered irrelevant.
> CC: [~aajisaka], [~elgoiri], [~weichiu], [~ayushtkn]
> {code:bash}
>   hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked 
>hadoop.fs.azure.TestNativeAzureFileSystemMocked 
>hadoop.fs.azure.TestBlobMetadata 
>hadoop.fs.azure.TestNativeAzureFileSystemConcurrency 
>hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck 
>hadoop.fs.azure.TestNativeAzureFileSystemContractMocked 
>hadoop.fs.azure.TestWasbFsck 
>hadoop.fs.azure.TestOutOfBandAzureBlobOperations 
> {code}
> {code:bash}
> org.apache.hadoop.fs.azure.TestBlobMetadata.testFolderMetadata
> org.apache.hadoop.fs.azure.TestBlobMetadata.testFirstContainerVersionMetadata
> org.apache.hadoop.fs.azure.TestBlobMetadata.testPermissionMetadata
> org.apache.hadoop.fs.azure.TestBlobMetadata.testOldPermissionMetadata
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testNoTempBlobsVisible
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testLinkBlobs
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatusRootDir
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryMoveToExistingDirectory
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatus
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryAsExistingDirectory
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameToDirWithSamePrefixAllowed
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testLSRootDir
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testDeleteRecursively
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck.testWasbFsck
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testChineseCharactersFolderRename
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListingWithZeroByteRenameMetadata
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListing
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testUriEncoding
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testDeepFileCreation
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testListDirectory
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderRenameInProgress
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameFolder
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameImplicitFolder
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolder
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testStoreDeleteFolder
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRename
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testListStatus
> 

[jira] [Comment Edited] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2020-11-15 Thread HuangTao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232477#comment-17232477
 ] 

HuangTao edited comment on HDFS-15240 at 11/16/20, 3:01 AM:


Thanks [~ferhui] [~umamaheswararao] for your comments. I upload 
HDFS-15240.007.patch for addressing your comments.

With regard to StripingChunkReadResult.OUTDATED, I think we should keep it, 
which can avoid scheduling an useless block reader.

About reader.closeBlockReader(), 
{code:java}
 reconstructor.freeBuffer(reader.getReadBuffer());
 // thread may stop here, before closeBlockReader, StripedBlockReader may write 
data into the released buffer. 
 reader.freeReadBuffer();
 reader.closeBlockReader();{code}
 I add UT `testAbnormallyCloseDoesNotWriteBufferAgain` which can detect the 
dirty buffer without any fix.

In addition, I add
{code:java}
 entry.getValue().clear();{code}
in ElasticByteBufferPool#getBuffer to ensure the buffer is clear before return.


was (Author: marvelrock):
Thanks [~ferhui] [~umamaheswararao] for your comments. I upload 
HDFS-15240.007.patch for addressing your comments.

With regard to StripingChunkReadResult.OUTDATED, I think we should keep it, 
which can avoid scheduling an useless block reader.

About reader.closeBlockReader(), I add UT 
`testAbnormallyCloseDoesNotWriteBufferAgain`. 
{code:java}
 reconstructor.freeBuffer(reader.getReadBuffer());
 // thread may stop here, before closeBlockReader, StripedBlockReader may write 
data into the released buffer. 
 reader.freeReadBuffer();
 reader.closeBlockReader();{code}
 

> Erasure Coding: dirty buffer causes reconstruction block error
> --
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, 
> HDFS-15240.003.patch, HDFS-15240.004.patch, HDFS-15240.005.patch, 
> HDFS-15240.006.patch, HDFS-15240.007.patch, 
> image-2020-07-16-15-56-38-608.png, 
> org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, 
> test-HDFS-15240.006.patch
>
>
> When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from 
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') 
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and 
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and 
> iterating through all cases, I find one case that the length of LCS is the 
> block length - 64KB, 64KB is just the length of ByteBuffer used by 
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program 
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks 
> to decode another 3 blocks, then find the 1th block's LCS substring is block 
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
> after used the dirty buffer.
> EDITED for readability.
> {code:java}
> decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8']
> Check the first 131072 bytes between block[1] and block[1'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[6] and block[6'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[8] and block[8'], the longest 
> common substring length is 4
> decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8']
> Check the first 131072 bytes between block[1] and block[1'], the longest 
> common substring length is 65536
> CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest 
> common substring length is 27197440  # this one
> Check the first 131072 bytes between block[7] and block[7'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[8] and block[8'], the longest 
> common substring length is 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does 
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the 
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/:52586 
> remote=/:50010]. 18 millis timeout left.
> [WARN] 

[jira] [Commented] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2020-11-15 Thread HuangTao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232477#comment-17232477
 ] 

HuangTao commented on HDFS-15240:
-

Thanks [~ferhui] [~umamaheswararao] for your comments. I upload 
HDFS-15240.007.patch for addressing your comments.

With regard to StripingChunkReadResult.OUTDATED, I think we should keep it, 
which can avoid scheduling an useless block reader.

About reader.closeBlockReader(), I add UT 
`testAbnormallyCloseDoesNotWriteBufferAgain`. 
{code:java}
 reconstructor.freeBuffer(reader.getReadBuffer());
 // thread may stop here, before closeBlockReader, StripedBlockReader may write 
data into the released buffer. 
 reader.freeReadBuffer();
 reader.closeBlockReader();{code}
 

> Erasure Coding: dirty buffer causes reconstruction block error
> --
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, 
> HDFS-15240.003.patch, HDFS-15240.004.patch, HDFS-15240.005.patch, 
> HDFS-15240.006.patch, HDFS-15240.007.patch, 
> image-2020-07-16-15-56-38-608.png, 
> org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, 
> test-HDFS-15240.006.patch
>
>
> When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from 
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') 
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and 
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and 
> iterating through all cases, I find one case that the length of LCS is the 
> block length - 64KB, 64KB is just the length of ByteBuffer used by 
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program 
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks 
> to decode another 3 blocks, then find the 1th block's LCS substring is block 
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
> after used the dirty buffer.
> EDITED for readability.
> {code:java}
> decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8']
> Check the first 131072 bytes between block[1] and block[1'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[6] and block[6'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[8] and block[8'], the longest 
> common substring length is 4
> decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8']
> Check the first 131072 bytes between block[1] and block[1'], the longest 
> common substring length is 65536
> CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest 
> common substring length is 27197440  # this one
> Check the first 131072 bytes between block[7] and block[7'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[8] and block[8'], the longest 
> common substring length is 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does 
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the 
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/:52586 
> remote=/:50010]. 18 millis timeout left.
> [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped 
> block: BP-714356632--1519726836856:blk_-YY_3472979393
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at 
> 

[jira] [Commented] (HDFS-15684) EC: Call recoverLease on DFSStripedOutputStream close exception

2020-11-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232473#comment-17232473
 ] 

Hadoop QA commented on HDFS-15684:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
52s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} |  | {color:red} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
52s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  5s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
52s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 26s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} |  | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
21s{color} |  | 

[jira] [Commented] (HDFS-15296) TestBPOfferService#testMissBlocksWhenReregister is flaky

2020-11-15 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232468#comment-17232468
 ] 

Masatake Iwasaki commented on HDFS-15296:
-

looks like duplicate of HDFS-15654 and HDFS-15674 is the follow-up

> TestBPOfferService#testMissBlocksWhenReregister is flaky
> 
>
> Key: HDFS-15296
> URL: https://issues.apache.org/jira/browse/HDFS-15296
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Mingliang Liu
>Priority: Major
>
> TestBPOfferService.testMissBlocksWhenReregister fails intermittently in 
> {{trunk}} branch, not sure about other branches. Example failures are
> - 
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1964/4/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testMissBlocksWhenReregister/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/29175/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testMissBlocksWhenReregister/
> Sample exception stack is:
> {quote}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2020-11-15 Thread HuangTao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HuangTao updated HDFS-15240:

Attachment: HDFS-15240.007.patch

> Erasure Coding: dirty buffer causes reconstruction block error
> --
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, 
> HDFS-15240.003.patch, HDFS-15240.004.patch, HDFS-15240.005.patch, 
> HDFS-15240.006.patch, HDFS-15240.007.patch, 
> image-2020-07-16-15-56-38-608.png, 
> org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, 
> test-HDFS-15240.006.patch
>
>
> When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from 
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') 
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and 
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and 
> iterating through all cases, I find one case that the length of LCS is the 
> block length - 64KB, 64KB is just the length of ByteBuffer used by 
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program 
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks 
> to decode another 3 blocks, then find the 1th block's LCS substring is block 
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
> after used the dirty buffer.
> EDITED for readability.
> {code:java}
> decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8']
> Check the first 131072 bytes between block[1] and block[1'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[6] and block[6'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[8] and block[8'], the longest 
> common substring length is 4
> decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8']
> Check the first 131072 bytes between block[1] and block[1'], the longest 
> common substring length is 65536
> CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest 
> common substring length is 27197440  # this one
> Check the first 131072 bytes between block[7] and block[7'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[8] and block[8'], the longest 
> common substring length is 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does 
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the 
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/:52586 
> remote=/:50010]. 18 millis timeout left.
> [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped 
> block: BP-714356632--1519726836856:blk_-YY_3472979393
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834) {code}
> Reading from DN may timeout(hold by a future(F)) and output the INFO log, but 
> the futures that contains the future(F)  is cleared, 
> {code:java}
> return new 

[jira] [Updated] (HDFS-15684) EC: Call recoverLease on DFSStripedOutputStream close exception

2020-11-15 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei updated HDFS-15684:
---
Status: Patch Available  (was: Open)

> EC: Call recoverLease on DFSStripedOutputStream close exception
> ---
>
> Key: HDFS-15684
> URL: https://issues.apache.org/jira/browse/HDFS-15684
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient, ec
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Major
> Attachments: HDFS-15684.001.patch
>
>
> -HDFS-14694- add a feature that call recoverLease operation automatically 
> when DFSOutputSteam close encounters exception. When we wanted to apply this 
> feature to our cluster, we found that it does not support EC files. 
> I think this feature should take effect whether replica files or EC files. 
> This Jira proposes to make it effective when in the case of EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15684) EC: Call recoverLease on DFSStripedOutputStream close exception

2020-11-15 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-15684:
-
Attachment: HDFS-15684.001.patch

> EC: Call recoverLease on DFSStripedOutputStream close exception
> ---
>
> Key: HDFS-15684
> URL: https://issues.apache.org/jira/browse/HDFS-15684
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient, ec
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Major
> Attachments: HDFS-15684.001.patch
>
>
> -HDFS-14694- add a feature that call recoverLease operation automatically 
> when DFSOutputSteam close encounters exception. When we wanted to apply this 
> feature to our cluster, we found that it does not support EC files. 
> I think this feature should take effect whether replica files or EC files. 
> This Jira proposes to make it effective when in the case of EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15684) EC: Call recoverLease on DFSStripedOutputStream close exception

2020-11-15 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-15684:
-
Description: 
-HDFS-14694- add a feature that call recoverLease operation automatically when 
DFSOutputSteam close encounters exception. When we wanted to apply this feature 
to our cluster, we found that it does not support EC files. 

I think this feature should take effect whether replica files or EC files. This 
Jira proposes to make it effective when in the case of EC files.

> EC: Call recoverLease on DFSStripedOutputStream close exception
> ---
>
> Key: HDFS-15684
> URL: https://issues.apache.org/jira/browse/HDFS-15684
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient, ec
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Major
>
> -HDFS-14694- add a feature that call recoverLease operation automatically 
> when DFSOutputSteam close encounters exception. When we wanted to apply this 
> feature to our cluster, we found that it does not support EC files. 
> I think this feature should take effect whether replica files or EC files. 
> This Jira proposes to make it effective when in the case of EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15684) EC: Call recoverLease on DFSStripedOutputStream close exception

2020-11-15 Thread Hongbing Wang (Jira)
Hongbing Wang created HDFS-15684:


 Summary: EC: Call recoverLease on DFSStripedOutputStream close 
exception
 Key: HDFS-15684
 URL: https://issues.apache.org/jira/browse/HDFS-15684
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient, ec
Reporter: Hongbing Wang
Assignee: Hongbing Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org