[jira] [Updated] (HDFS-15260) Change from List to Set to improve the chooseTarget performance

2020-04-03 Thread HuangTao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HuangTao updated HDFS-15260:

Attachment: (was: choose_target_flamegraph.png)

> Change from List to Set to improve the chooseTarget performance
> ---
>
> Key: HDFS-15260
> URL: https://issues.apache.org/jira/browse/HDFS-15260
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15260) Change from List to Set to improve the chooseTarget performance

2020-04-03 Thread HuangTao (Jira)
HuangTao created HDFS-15260:
---

 Summary: Change from List to Set to improve the chooseTarget 
performance
 Key: HDFS-15260
 URL: https://issues.apache.org/jira/browse/HDFS-15260
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: HuangTao
Assignee: HuangTao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15260) Change from List to Set to improve the chooseTarget performance

2020-04-03 Thread HuangTao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HuangTao updated HDFS-15260:

Attachment: choose_target_flamegraph.png

> Change from List to Set to improve the chooseTarget performance
> ---
>
> Key: HDFS-15260
> URL: https://issues.apache.org/jira/browse/HDFS-15260
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Minor
> Attachments: choose_target_flamegraph.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2020-04-03 Thread HuangTao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HuangTao updated HDFS-15240:

Attachment: HDFS-15240.003.patch

> Erasure Coding: dirty buffer causes reconstruction block error
> --
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, 
> HDFS-15240.003.patch
>
>
> When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from 
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') 
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and 
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and 
> iterating through all cases, I find one case that the length of LCS is the 
> block length - 64KB, 64KB is just the length of ByteBuffer used by 
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program 
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks 
> to decode another 3 blocks, then find the 1th block's LCS substring is block 
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
> after used the dirty buffer.
> {code:java}
> decode from [0, 2, 3, 4, 5, 7] -> [1, 6, 8]
> Check Block(1) first 131072 bytes longest common substring length 4
> Check Block(6) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4
> decode from [0, 2, 3, 4, 5, 6] -> [1, 7, 8]
> Check Block(1) first 131072 bytes longest common substring length 65536
> CHECK AGAIN: Block(1) all 27262976 bytes longest common substring length 
> 27197440  # this one
> Check Block(7) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does 
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the 
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/:52586 
> remote=/:50010]. 18 millis timeout left.
> [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped 
> block: BP-714356632--1519726836856:blk_-YY_3472979393
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834) {code}
> Reading from DN may timeout(hold by a future(F)) and output the INFO log, but 
> the futures that contains the future(F)  is cleared, 
> {code:java}
> return new StripingChunkReadResult(futures.remove(future),
> StripingChunkReadResult.CANCELLED); {code}
> futures.remove(future) cause NPE. So the EC reconstruction is failed. In the 
> finally phase, the code snippet in *getStripedReader().close()* 
> {code:java}
> reconstructor.freeBuffer(reader.getReadBuffer());
> reader.freeReadBuffer();
> reader.closeBlockReader(); {code}
> free buffer firstly, but the StripedBlockReader still holds the buffer and 
> write it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2020-04-03 Thread HuangTao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17075021#comment-17075021
 ] 

HuangTao commented on HDFS-15240:
-

fix checkstyle and the unit case

> Erasure Coding: dirty buffer causes reconstruction block error
> --
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, 
> HDFS-15240.003.patch
>
>
> When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from 
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') 
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and 
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and 
> iterating through all cases, I find one case that the length of LCS is the 
> block length - 64KB, 64KB is just the length of ByteBuffer used by 
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program 
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks 
> to decode another 3 blocks, then find the 1th block's LCS substring is block 
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
> after used the dirty buffer.
> {code:java}
> decode from [0, 2, 3, 4, 5, 7] -> [1, 6, 8]
> Check Block(1) first 131072 bytes longest common substring length 4
> Check Block(6) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4
> decode from [0, 2, 3, 4, 5, 6] -> [1, 7, 8]
> Check Block(1) first 131072 bytes longest common substring length 65536
> CHECK AGAIN: Block(1) all 27262976 bytes longest common substring length 
> 27197440  # this one
> Check Block(7) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does 
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the 
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/:52586 
> remote=/:50010]. 18 millis timeout left.
> [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped 
> block: BP-714356632--1519726836856:blk_-YY_3472979393
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834) {code}
> Reading from DN may timeout(hold by a future(F)) and output the INFO log, but 
> the futures that contains the future(F)  is cleared, 
> {code:java}
> return new StripingChunkReadResult(futures.remove(future),
> StripingChunkReadResult.CANCELLED); {code}
> futures.remove(future) cause NPE. So the EC reconstruction is failed. In the 
> finally phase, the code snippet in *getStripedReader().close()* 
> {code:java}
> reconstructor.freeBuffer(reader.getReadBuffer());
> reader.freeReadBuffer();
> reader.closeBlockReader(); {code}
> free buffer firstly, but the StripedBlockReader still holds the buffer and 
> write it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To 

[jira] [Created] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger

2020-04-03 Thread Yang Yun (Jira)
Yang Yun created HDFS-15259:
---

 Summary: Reduce useless log information in FSNamesystemAuditLogger
 Key: HDFS-15259
 URL: https://issues.apache.org/jira/browse/HDFS-15259
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: logging, namenode
Reporter: Yang Yun
Assignee: Yang Yun


For most operations, the 'dst' is null, add checking before logging the 'dst' 
information in FSNamesystemAuditLogger
{code:java}
2020-04-03 16:34:40,021 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null 
proto=rpc
2020-04-03 16:35:16,329 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null 
proto=rpc
2020-04-03 16:35:16,362 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null 
perm=yang:supergroup:rwxr-xr-x proto=rpc{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger

2020-04-03 Thread Yang Yun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yun updated HDFS-15259:

Attachment: HDFS-15259.001.patch
Status: Patch Available  (was: Open)

> Reduce useless log information in FSNamesystemAuditLogger
> -
>
> Key: HDFS-15259
> URL: https://issues.apache.org/jira/browse/HDFS-15259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15259.001.patch
>
>
> For most operations, the 'dst' is null, add checking before logging the 'dst' 
> information in FSNamesystemAuditLogger
> {code:java}
> 2020-04-03 16:34:40,021 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null 
> proto=rpc
> 2020-04-03 16:35:16,329 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null 
> proto=rpc
> 2020-04-03 16:35:16,362 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null 
> perm=yang:supergroup:rwxr-xr-x proto=rpc{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger

2020-04-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074514#comment-17074514
 ] 

Hadoop QA commented on HDFS-15259:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  3m 
36s{color} | {color:red} root in trunk failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
9s{color} | {color:red} hadoop-hdfs in trunk failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m  7s{color} | {color:orange} The patch fails to run checkstyle in hadoop-hdfs 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
52s{color} | {color:red} hadoop-hdfs in trunk failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
1m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
24s{color} | {color:red} hadoop-hdfs in trunk failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
24s{color} | {color:red} hadoop-hdfs in trunk failed. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m  6s{color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs generated 585 new + 0 unchanged - 
0 fixed = 585 total (was 0) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 44s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 182 new + 0 unchanged - 0 fixed = 182 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 87 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
3s{color} | {color:red} The patch 600 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 23m  
8s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
29s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
42s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 101 new + 0 
unchanged - 0 fixed = 101 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 32s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
44s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}146m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerWithStripedBlocks |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.TestLeaseRecoveryStriped |
|   | hadoop.hdfs.server.namenode.TestFsck |
|   | hadoop.hdfs.server.namenode.TestAuditLogs |
|   | hadoop.hdfs.TestDecommissionWithBackoffMonitor |
|   | hadoop.hdfs.tools.TestECAdmin |
|   | hadoop.hdfs.TestFileCreation |
|   | hadoop.hdfs.TestReadStripedFileWithDecodingCorruptData |
|   | hadoop.hdfs.TestAclsEndToEnd |
|   | hadoop.hdfs.TestEncryptionZones |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 |
| JIRA Issue | HDFS-15259 |
| JIRA Patch URL | 

[jira] [Commented] (HDFS-15258) RBF: Mark Router FSCK unstable

2020-04-03 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074669#comment-17074669
 ] 

Hudson commented on HDFS-15258:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18116 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18116/])
HDFS-15258. RBF: Mark Router FSCK unstable. (#1934) (github: rev 
1695d8d59c4b441448c16c6cee5c83347c4cb87f)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterFsck.java


> RBF: Mark Router FSCK unstable
> --
>
> Key: HDFS-15258
> URL: https://issues.apache.org/jira/browse/HDFS-15258
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: release-blocker
> Fix For: 3.3.0
>
>
> As per discussion in HDFS-15169, we should mark DFSRouter FSCK 
> public-evolving or public-unstable before 3.3.0 is released.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15258) RBF: Mark Router FSCK unstable

2020-04-03 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15258:
-
Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged the PR into trunk and branch-3.3.

> RBF: Mark Router FSCK unstable
> --
>
> Key: HDFS-15258
> URL: https://issues.apache.org/jira/browse/HDFS-15258
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: release-blocker
> Fix For: 3.3.0
>
>
> As per discussion in HDFS-15169, we should mark DFSRouter FSCK 
> public-evolving or public-unstable before 3.3.0 is released.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger

2020-04-03 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074718#comment-17074718
 ] 

Chao Sun commented on HDFS-15259:
-

[~hadoop_yangyun] I don't think you can do this as many applications depend on 
the tabular format for parsing audit log and it will break them badly.

> Reduce useless log information in FSNamesystemAuditLogger
> -
>
> Key: HDFS-15259
> URL: https://issues.apache.org/jira/browse/HDFS-15259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15259.001.patch
>
>
> For most operations, the 'dst' is null, add checking before logging the 'dst' 
> information in FSNamesystemAuditLogger
> {code:java}
> 2020-04-03 16:34:40,021 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null 
> proto=rpc
> 2020-04-03 16:35:16,329 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null 
> proto=rpc
> 2020-04-03 16:35:16,362 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null 
> perm=yang:supergroup:rwxr-xr-x proto=rpc{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-5242) Reduce contention on DatanodeInfo instances

2020-04-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-5242:
--
Target Version/s:   (was: )
  Resolution: Duplicate
  Status: Resolved  (was: Patch Available)

> Reduce contention on DatanodeInfo instances
> ---
>
> Key: HDFS-5242
> URL: https://issues.apache.org/jira/browse/HDFS-5242
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0-alpha1
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Major
>  Labels: BB2015-05-TBR
> Attachments: HDFS-5242.patch, HDFS-5242.patch, HDFS-5242.patch, 
> HDFS-5242.patch
>
>
> Synchronization in {{DatanodeInfo}} instances causes unnecessary contention 
> between call handlers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2020-04-03 Thread HuangTao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HuangTao updated HDFS-15240:

Attachment: HDFS-15240.002.patch

> Erasure Coding: dirty buffer causes reconstruction block error
> --
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch
>
>
> When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from 
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') 
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and 
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and 
> iterating through all cases, I find one case that the length of LCS is the 
> block length - 64KB, 64KB is just the length of ByteBuffer used by 
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program 
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks 
> to decode another 3 blocks, then find the 1th block's LCS substring is block 
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
> after used the dirty buffer.
> {code:java}
> decode from [0, 2, 3, 4, 5, 7] -> [1, 6, 8]
> Check Block(1) first 131072 bytes longest common substring length 4
> Check Block(6) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4
> decode from [0, 2, 3, 4, 5, 6] -> [1, 7, 8]
> Check Block(1) first 131072 bytes longest common substring length 65536
> CHECK AGAIN: Block(1) all 27262976 bytes longest common substring length 
> 27197440  # this one
> Check Block(7) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does 
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the 
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/:52586 
> remote=/:50010]. 18 millis timeout left.
> [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped 
> block: BP-714356632--1519726836856:blk_-YY_3472979393
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834) {code}
> Reading from DN may timeout(hold by a future(F)) and output the INFO log, but 
> the futures that contains the future(F)  is cleared, 
> {code:java}
> return new StripingChunkReadResult(futures.remove(future),
> StripingChunkReadResult.CANCELLED); {code}
> futures.remove(future) cause NPE. So the EC reconstruction is failed. In the 
> finally phase, the code snippet in *getStripedReader().close()* 
> {code:java}
> reconstructor.freeBuffer(reader.getReadBuffer());
> reader.freeReadBuffer();
> reader.closeBlockReader(); {code}
> free buffer firstly, but the StripedBlockReader still holds the buffer and 
> write it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Commented] (HDFS-15207) VolumeScanner skip to scan blocks accessed during recent scan peroid

2020-04-03 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074851#comment-17074851
 ] 

Wei-Chiu Chuang commented on HDFS-15207:


Patch makes sense tom me. Thanks for working on this [~hadoop_yangyun]

In the test code:
{code}
  assertTrue("Should not run to here", false);
{code}
Can you use {{LambdaTestUtils#intercept()}}  instead, or use the more 
traditional {{fail()}}

{{VolumeScanner#runLoop()}} is quite long now. Time to refactor it. Can be a 
separate jira to refactor that method.

> VolumeScanner skip to scan blocks accessed during recent scan peroid
> 
>
> Key: HDFS-15207
> URL: https://issues.apache.org/jira/browse/HDFS-15207
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15207.002.patch, HDFS-15207.003.patch, 
> HDFS-15207.patch, HDFS-15207.patch
>
>
> Check the access time of block file to avoid scanning recently changed 
> blocks, reducing disk IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15207) VolumeScanner skip to scan blocks accessed during recent scan peroid

2020-04-03 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074851#comment-17074851
 ] 

Wei-Chiu Chuang edited comment on HDFS-15207 at 4/3/20, 7:38 PM:
-

Patch makes sense tom me. Thanks for working on this [~hadoop_yangyun]

In the test code:
{code}
  assertTrue("Should not run to here", false);
{code}

+1 after the test code change.

Can you use {{LambdaTestUtils#intercept()}}  instead, or use the more 
traditional {{fail()}}

{{VolumeScanner#runLoop()}} is quite long now. Time to refactor it. Can be a 
separate jira to refactor that method.


was (Author: jojochuang):
Patch makes sense tom me. Thanks for working on this [~hadoop_yangyun]

In the test code:
{code}
  assertTrue("Should not run to here", false);
{code}
Can you use {{LambdaTestUtils#intercept()}}  instead, or use the more 
traditional {{fail()}}

{{VolumeScanner#runLoop()}} is quite long now. Time to refactor it. Can be a 
separate jira to refactor that method.

> VolumeScanner skip to scan blocks accessed during recent scan peroid
> 
>
> Key: HDFS-15207
> URL: https://issues.apache.org/jira/browse/HDFS-15207
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15207.002.patch, HDFS-15207.003.patch, 
> HDFS-15207.patch, HDFS-15207.patch
>
>
> Check the access time of block file to avoid scanning recently changed 
> blocks, reducing disk IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2020-04-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074946#comment-17074946
 ] 

Hadoop QA commented on HDFS-15240:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m  
9s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 34s{color} | {color:orange} root: The patch generated 11 new + 27 unchanged 
- 0 fixed = 38 total (was 27) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
58s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
3s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 39s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}236m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.TestReconstructStripedFile |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 |
| JIRA Issue | HDFS-15240 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12998786/HDFS-15240.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4ad954db57d2 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 

[jira] [Commented] (HDFS-15207) VolumeScanner skip to scan blocks accessed during recent scan peroid

2020-04-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074984#comment-17074984
 ] 

Íñigo Goiri commented on HDFS-15207:


Another thing, let's merge the else and the if as it makes it less verbose and 
easier to reason about:
{code}
} else if (conf.skipRecentAccessed) {
  // Check the access time of block file to avoid scanning recently
  // changed blocks, reducing disk IO.
  ...
}
{code}

> VolumeScanner skip to scan blocks accessed during recent scan peroid
> 
>
> Key: HDFS-15207
> URL: https://issues.apache.org/jira/browse/HDFS-15207
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15207.002.patch, HDFS-15207.003.patch, 
> HDFS-15207.patch, HDFS-15207.patch
>
>
> Check the access time of block file to avoid scanning recently changed 
> blocks, reducing disk IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14978) In-place Erasure Coding Conversion

2020-04-03 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDFS-14978:
--
Target Version/s: 3.4.0

> In-place Erasure Coding Conversion
> --
>
> Key: HDFS-14978
> URL: https://issues.apache.org/jira/browse/HDFS-14978
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Aravindan Vijayan
>Priority: Major
> Attachments: In-place Erasure Coding Conversion.pdf
>
>
> HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses 
> encoding algorithms to reduce disk space usage while retaining redundancy 
> necessary for data recovery. It was a huge amount of work but it is just 
> getting adopted after almost 2 years.
> One usability problem that’s blocking users from adopting HDFS Erasure Coding 
> is that existing replicated files have to be copied to an EC-enabled 
> directory explicitly. Renaming a file/directory to an EC-enabled directory 
> does not automatically convert the blocks. Therefore users typically perform 
> the following steps to erasure-code existing files:
> {noformat}
> Create $tmp directory, set EC policy at it
> Distcp $src to $tmp
> Delete $src (rm -rf $src)
> mv $tmp $src
> {noformat}
> There are several reasons why this is not popular:
> * Complex. The process involves several steps: distcp data to a temporary 
> destination; delete source file; move destination to the source path.
> * Availability: there is a short period where nothing exists at the source 
> path, and jobs may fail unexpectedly.
> * Overhead. During the copy phase, there is a point in time where all of 
> source and destination files exist at the same time, exhausting disk space.
> * Not snapshot-friendly. If a snapshot is taken prior to performing the 
> conversion, the source (replicated) files will be preserved in the cluster 
> too. Therefore, the conversion actually increase storage space usage.
> * Not management-friendly. This approach changes file inode number, 
> modification time and access time. Erasure coded files are supposed to store 
> cold data, but this conversion makes data “hot” again.
> * Bulky. It’s either all or nothing. The directory may be partially erasure 
> coded, but this approach simply erasure code everything again.
> To ease data management, we should offer a utility tool to convert replicated 
> files to erasure coded files in-place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org