[jira] [Updated] (HDFS-15260) Change from List to Set to improve the chooseTarget performance
[ https://issues.apache.org/jira/browse/HDFS-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HuangTao updated HDFS-15260: Attachment: (was: choose_target_flamegraph.png) > Change from List to Set to improve the chooseTarget performance > --- > > Key: HDFS-15260 > URL: https://issues.apache.org/jira/browse/HDFS-15260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: HuangTao >Assignee: HuangTao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15260) Change from List to Set to improve the chooseTarget performance
HuangTao created HDFS-15260: --- Summary: Change from List to Set to improve the chooseTarget performance Key: HDFS-15260 URL: https://issues.apache.org/jira/browse/HDFS-15260 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: HuangTao Assignee: HuangTao -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15260) Change from List to Set to improve the chooseTarget performance
[ https://issues.apache.org/jira/browse/HDFS-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HuangTao updated HDFS-15260: Attachment: choose_target_flamegraph.png > Change from List to Set to improve the chooseTarget performance > --- > > Key: HDFS-15260 > URL: https://issues.apache.org/jira/browse/HDFS-15260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: HuangTao >Assignee: HuangTao >Priority: Minor > Attachments: choose_target_flamegraph.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HuangTao updated HDFS-15240: Attachment: HDFS-15240.003.patch > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, > HDFS-15240.003.patch > > > When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > {code:java} > decode from [0, 2, 3, 4, 5, 7] -> [1, 6, 8] > Check Block(1) first 131072 bytes longest common substring length 4 > Check Block(6) first 131072 bytes longest common substring length 4 > Check Block(8) first 131072 bytes longest common substring length 4 > decode from [0, 2, 3, 4, 5, 6] -> [1, 7, 8] > Check Block(1) first 131072 bytes longest common substring length 65536 > CHECK AGAIN: Block(1) all 27262976 bytes longest common substring length > 27197440 # this one > Check Block(7) first 131072 bytes longest common substring length 4 > Check Block(8) first 131072 bytes longest common substring length 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) {code} > Reading from DN may timeout(hold by a future(F)) and output the INFO log, but > the futures that contains the future(F) is cleared, > {code:java} > return new StripingChunkReadResult(futures.remove(future), > StripingChunkReadResult.CANCELLED); {code} > futures.remove(future) cause NPE. So the EC reconstruction is failed. In the > finally phase, the code snippet in *getStripedReader().close()* > {code:java} > reconstructor.freeBuffer(reader.getReadBuffer()); > reader.freeReadBuffer(); > reader.closeBlockReader(); {code} > free buffer firstly, but the StripedBlockReader still holds the buffer and > write it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17075021#comment-17075021 ] HuangTao commented on HDFS-15240: - fix checkstyle and the unit case > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch, > HDFS-15240.003.patch > > > When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > {code:java} > decode from [0, 2, 3, 4, 5, 7] -> [1, 6, 8] > Check Block(1) first 131072 bytes longest common substring length 4 > Check Block(6) first 131072 bytes longest common substring length 4 > Check Block(8) first 131072 bytes longest common substring length 4 > decode from [0, 2, 3, 4, 5, 6] -> [1, 7, 8] > Check Block(1) first 131072 bytes longest common substring length 65536 > CHECK AGAIN: Block(1) all 27262976 bytes longest common substring length > 27197440 # this one > Check Block(7) first 131072 bytes longest common substring length 4 > Check Block(8) first 131072 bytes longest common substring length 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) {code} > Reading from DN may timeout(hold by a future(F)) and output the INFO log, but > the futures that contains the future(F) is cleared, > {code:java} > return new StripingChunkReadResult(futures.remove(future), > StripingChunkReadResult.CANCELLED); {code} > futures.remove(future) cause NPE. So the EC reconstruction is failed. In the > finally phase, the code snippet in *getStripedReader().close()* > {code:java} > reconstructor.freeBuffer(reader.getReadBuffer()); > reader.freeReadBuffer(); > reader.closeBlockReader(); {code} > free buffer firstly, but the StripedBlockReader still holds the buffer and > write it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To
[jira] [Created] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger
Yang Yun created HDFS-15259: --- Summary: Reduce useless log information in FSNamesystemAuditLogger Key: HDFS-15259 URL: https://issues.apache.org/jira/browse/HDFS-15259 Project: Hadoop HDFS Issue Type: Improvement Components: logging, namenode Reporter: Yang Yun Assignee: Yang Yun For most operations, the 'dst' is null, add checking before logging the 'dst' information in FSNamesystemAuditLogger {code:java} 2020-04-03 16:34:40,021 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null proto=rpc 2020-04-03 16:35:16,329 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null proto=rpc 2020-04-03 16:35:16,362 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null perm=yang:supergroup:rwxr-xr-x proto=rpc{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger
[ https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15259: Attachment: HDFS-15259.001.patch Status: Patch Available (was: Open) > Reduce useless log information in FSNamesystemAuditLogger > - > > Key: HDFS-15259 > URL: https://issues.apache.org/jira/browse/HDFS-15259 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15259.001.patch > > > For most operations, the 'dst' is null, add checking before logging the 'dst' > information in FSNamesystemAuditLogger > {code:java} > 2020-04-03 16:34:40,021 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null > proto=rpc > 2020-04-03 16:35:16,329 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null > proto=rpc > 2020-04-03 16:35:16,362 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null > perm=yang:supergroup:rwxr-xr-x proto=rpc{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger
[ https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074514#comment-17074514 ] Hadoop QA commented on HDFS-15259: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 3m 36s{color} | {color:red} root in trunk failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s{color} | {color:red} hadoop-hdfs in trunk failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 7s{color} | {color:orange} The patch fails to run checkstyle in hadoop-hdfs {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 52s{color} | {color:red} hadoop-hdfs in trunk failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 1m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 24s{color} | {color:red} hadoop-hdfs in trunk failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s{color} | {color:red} hadoop-hdfs in trunk failed. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 6s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 585 new + 0 unchanged - 0 fixed = 585 total (was 0) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 44s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 182 new + 0 unchanged - 0 fixed = 182 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 87 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 3s{color} | {color:red} The patch 600 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 23m 8s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 29s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 42s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 101 new + 0 unchanged - 0 fixed = 101 total (was 0) {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 32s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 44s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}146m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerWithStripedBlocks | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.TestLeaseRecoveryStriped | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.TestDecommissionWithBackoffMonitor | | | hadoop.hdfs.tools.TestECAdmin | | | hadoop.hdfs.TestFileCreation | | | hadoop.hdfs.TestReadStripedFileWithDecodingCorruptData | | | hadoop.hdfs.TestAclsEndToEnd | | | hadoop.hdfs.TestEncryptionZones | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue | HDFS-15259 | | JIRA Patch URL |
[jira] [Commented] (HDFS-15258) RBF: Mark Router FSCK unstable
[ https://issues.apache.org/jira/browse/HDFS-15258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074669#comment-17074669 ] Hudson commented on HDFS-15258: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18116 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18116/]) HDFS-15258. RBF: Mark Router FSCK unstable. (#1934) (github: rev 1695d8d59c4b441448c16c6cee5c83347c4cb87f) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterFsck.java > RBF: Mark Router FSCK unstable > -- > > Key: HDFS-15258 > URL: https://issues.apache.org/jira/browse/HDFS-15258 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: release-blocker > Fix For: 3.3.0 > > > As per discussion in HDFS-15169, we should mark DFSRouter FSCK > public-evolving or public-unstable before 3.3.0 is released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15258) RBF: Mark Router FSCK unstable
[ https://issues.apache.org/jira/browse/HDFS-15258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15258: - Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Merged the PR into trunk and branch-3.3. > RBF: Mark Router FSCK unstable > -- > > Key: HDFS-15258 > URL: https://issues.apache.org/jira/browse/HDFS-15258 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: release-blocker > Fix For: 3.3.0 > > > As per discussion in HDFS-15169, we should mark DFSRouter FSCK > public-evolving or public-unstable before 3.3.0 is released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger
[ https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074718#comment-17074718 ] Chao Sun commented on HDFS-15259: - [~hadoop_yangyun] I don't think you can do this as many applications depend on the tabular format for parsing audit log and it will break them badly. > Reduce useless log information in FSNamesystemAuditLogger > - > > Key: HDFS-15259 > URL: https://issues.apache.org/jira/browse/HDFS-15259 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15259.001.patch > > > For most operations, the 'dst' is null, add checking before logging the 'dst' > information in FSNamesystemAuditLogger > {code:java} > 2020-04-03 16:34:40,021 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null > proto=rpc > 2020-04-03 16:35:16,329 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null > proto=rpc > 2020-04-03 16:35:16,362 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null > perm=yang:supergroup:rwxr-xr-x proto=rpc{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-5242) Reduce contention on DatanodeInfo instances
[ https://issues.apache.org/jira/browse/HDFS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-5242: -- Target Version/s: (was: ) Resolution: Duplicate Status: Resolved (was: Patch Available) > Reduce contention on DatanodeInfo instances > --- > > Key: HDFS-5242 > URL: https://issues.apache.org/jira/browse/HDFS-5242 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.0-alpha, 3.0.0-alpha1 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Major > Labels: BB2015-05-TBR > Attachments: HDFS-5242.patch, HDFS-5242.patch, HDFS-5242.patch, > HDFS-5242.patch > > > Synchronization in {{DatanodeInfo}} instances causes unnecessary contention > between call handlers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HuangTao updated HDFS-15240: Attachment: HDFS-15240.002.patch > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Attachments: HDFS-15240.001.patch, HDFS-15240.002.patch > > > When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > {code:java} > decode from [0, 2, 3, 4, 5, 7] -> [1, 6, 8] > Check Block(1) first 131072 bytes longest common substring length 4 > Check Block(6) first 131072 bytes longest common substring length 4 > Check Block(8) first 131072 bytes longest common substring length 4 > decode from [0, 2, 3, 4, 5, 6] -> [1, 7, 8] > Check Block(1) first 131072 bytes longest common substring length 65536 > CHECK AGAIN: Block(1) all 27262976 bytes longest common substring length > 27197440 # this one > Check Block(7) first 131072 bytes longest common substring length 4 > Check Block(8) first 131072 bytes longest common substring length 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) {code} > Reading from DN may timeout(hold by a future(F)) and output the INFO log, but > the futures that contains the future(F) is cleared, > {code:java} > return new StripingChunkReadResult(futures.remove(future), > StripingChunkReadResult.CANCELLED); {code} > futures.remove(future) cause NPE. So the EC reconstruction is failed. In the > finally phase, the code snippet in *getStripedReader().close()* > {code:java} > reconstructor.freeBuffer(reader.getReadBuffer()); > reader.freeReadBuffer(); > reader.closeBlockReader(); {code} > free buffer firstly, but the StripedBlockReader still holds the buffer and > write it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For
[jira] [Commented] (HDFS-15207) VolumeScanner skip to scan blocks accessed during recent scan peroid
[ https://issues.apache.org/jira/browse/HDFS-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074851#comment-17074851 ] Wei-Chiu Chuang commented on HDFS-15207: Patch makes sense tom me. Thanks for working on this [~hadoop_yangyun] In the test code: {code} assertTrue("Should not run to here", false); {code} Can you use {{LambdaTestUtils#intercept()}} instead, or use the more traditional {{fail()}} {{VolumeScanner#runLoop()}} is quite long now. Time to refactor it. Can be a separate jira to refactor that method. > VolumeScanner skip to scan blocks accessed during recent scan peroid > > > Key: HDFS-15207 > URL: https://issues.apache.org/jira/browse/HDFS-15207 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15207.002.patch, HDFS-15207.003.patch, > HDFS-15207.patch, HDFS-15207.patch > > > Check the access time of block file to avoid scanning recently changed > blocks, reducing disk IO. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15207) VolumeScanner skip to scan blocks accessed during recent scan peroid
[ https://issues.apache.org/jira/browse/HDFS-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074851#comment-17074851 ] Wei-Chiu Chuang edited comment on HDFS-15207 at 4/3/20, 7:38 PM: - Patch makes sense tom me. Thanks for working on this [~hadoop_yangyun] In the test code: {code} assertTrue("Should not run to here", false); {code} +1 after the test code change. Can you use {{LambdaTestUtils#intercept()}} instead, or use the more traditional {{fail()}} {{VolumeScanner#runLoop()}} is quite long now. Time to refactor it. Can be a separate jira to refactor that method. was (Author: jojochuang): Patch makes sense tom me. Thanks for working on this [~hadoop_yangyun] In the test code: {code} assertTrue("Should not run to here", false); {code} Can you use {{LambdaTestUtils#intercept()}} instead, or use the more traditional {{fail()}} {{VolumeScanner#runLoop()}} is quite long now. Time to refactor it. Can be a separate jira to refactor that method. > VolumeScanner skip to scan blocks accessed during recent scan peroid > > > Key: HDFS-15207 > URL: https://issues.apache.org/jira/browse/HDFS-15207 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15207.002.patch, HDFS-15207.003.patch, > HDFS-15207.patch, HDFS-15207.patch > > > Check the access time of block file to avoid scanning recently changed > blocks, reducing disk IO. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074946#comment-17074946 ] Hadoop QA commented on HDFS-15240: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 43s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 21m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 9s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 34s{color} | {color:orange} root: The patch generated 11 new + 27 unchanged - 0 fixed = 38 total (was 27) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 58s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 3s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 39s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}236m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.TestReconstructStripedFile | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue | HDFS-15240 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12998786/HDFS-15240.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4ad954db57d2 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64
[jira] [Commented] (HDFS-15207) VolumeScanner skip to scan blocks accessed during recent scan peroid
[ https://issues.apache.org/jira/browse/HDFS-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074984#comment-17074984 ] Íñigo Goiri commented on HDFS-15207: Another thing, let's merge the else and the if as it makes it less verbose and easier to reason about: {code} } else if (conf.skipRecentAccessed) { // Check the access time of block file to avoid scanning recently // changed blocks, reducing disk IO. ... } {code} > VolumeScanner skip to scan blocks accessed during recent scan peroid > > > Key: HDFS-15207 > URL: https://issues.apache.org/jira/browse/HDFS-15207 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15207.002.patch, HDFS-15207.003.patch, > HDFS-15207.patch, HDFS-15207.patch > > > Check the access time of block file to avoid scanning recently changed > blocks, reducing disk IO. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14978) In-place Erasure Coding Conversion
[ https://issues.apache.org/jira/browse/HDFS-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng updated HDFS-14978: -- Target Version/s: 3.4.0 > In-place Erasure Coding Conversion > -- > > Key: HDFS-14978 > URL: https://issues.apache.org/jira/browse/HDFS-14978 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Aravindan Vijayan >Priority: Major > Attachments: In-place Erasure Coding Conversion.pdf > > > HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses > encoding algorithms to reduce disk space usage while retaining redundancy > necessary for data recovery. It was a huge amount of work but it is just > getting adopted after almost 2 years. > One usability problem that’s blocking users from adopting HDFS Erasure Coding > is that existing replicated files have to be copied to an EC-enabled > directory explicitly. Renaming a file/directory to an EC-enabled directory > does not automatically convert the blocks. Therefore users typically perform > the following steps to erasure-code existing files: > {noformat} > Create $tmp directory, set EC policy at it > Distcp $src to $tmp > Delete $src (rm -rf $src) > mv $tmp $src > {noformat} > There are several reasons why this is not popular: > * Complex. The process involves several steps: distcp data to a temporary > destination; delete source file; move destination to the source path. > * Availability: there is a short period where nothing exists at the source > path, and jobs may fail unexpectedly. > * Overhead. During the copy phase, there is a point in time where all of > source and destination files exist at the same time, exhausting disk space. > * Not snapshot-friendly. If a snapshot is taken prior to performing the > conversion, the source (replicated) files will be preserved in the cluster > too. Therefore, the conversion actually increase storage space usage. > * Not management-friendly. This approach changes file inode number, > modification time and access time. Erasure coded files are supposed to store > cold data, but this conversion makes data “hot” again. > * Bulky. It’s either all or nothing. The directory may be partially erasure > coded, but this approach simply erasure code everything again. > To ease data management, we should offer a utility tool to convert replicated > files to erasure coded files in-place. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org