[jira] [Commented] (HDFS-15168) ABFS driver enhancement - Allow customizable translation from AAD SPNs and security groups to Linux user and group
[ https://issues.apache.org/jira/browse/HDFS-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044177#comment-17044177 ] Hadoop QA commented on HDFS-15168: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 0s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 51s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 16s{color} | {color:orange} hadoop-tools/hadoop-azure: The patch generated 2 new + 1 unchanged - 0 fixed = 3 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 23s{color} | {color:green} hadoop-azure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.6 Server=19.03.6 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1858/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1858 | | JIRA Issue | HDFS-15168 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0ae39372a98f 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / dda00d3 | | Default Java | 1.8.0_242 | | checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-1858/2/artifact/out/diff-checkstyle-hadoop-tools_hadoop-azure.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1858/2/testReport/ | | Max. process+thread count | 448 (vs. ulimit of 5500)
[jira] [Commented] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044148#comment-17044148 ] Yao Guangdong commented on HDFS-15186: -- [~ferhui], Thanks for your review and advice. I have been fixed it. Please check it again. > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch, > HDFS-15186.003.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Guangdong updated HDFS-15186: - Attachment: HDFS-15186.003.patch > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch, > HDFS-15186.003.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15168) ABFS driver enhancement - Allow customizable translation from AAD SPNs and security groups to Linux user and group
[ https://issues.apache.org/jira/browse/HDFS-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044135#comment-17044135 ] Hadoop QA commented on HDFS-15168: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 25m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 52s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 17s{color} | {color:orange} hadoop-tools/hadoop-azure: The patch generated 6 new + 1 unchanged - 0 fixed = 7 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s{color} | {color:red} hadoop-tools_hadoop-azure generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 23s{color} | {color:green} hadoop-azure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 82m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.6 Server=19.03.6 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1858/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1858 | | JIRA Issue | HDFS-15168 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4ee4629e5f6a 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / dda00d3 | | Default Java | 1.8.0_242 | | checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-1858/1/artifact/out/diff-checkstyle-hadoop-tools_hadoop-azure.txt | | javadoc |
[jira] [Commented] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044123#comment-17044123 ] Fei Hui commented on HDFS-15186: [~yaoguangdong] Thanks for your patch HDFS-15186.002.patch the whole fix looks good. Minor comments {quote} +//4. wait for decommissioning and not busy block to replicate +Thread.sleep(3000); {quote} Here maybe it will be good that GenericTestUtils.waitFor instead of it. > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044114#comment-17044114 ] Yang Yun commented on HDFS-15039: - Done, Thanks for review! > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.patch, HDFS-15039.patch, HDFS-15039.patch, > HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15039: Attachment: HDFS-15039.patch Status: Patch Available (was: Open) > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.patch, HDFS-15039.patch, HDFS-15039.patch, > HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15039: Status: Open (was: Patch Available) > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.patch, HDFS-15039.patch, HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044110#comment-17044110 ] Lisheng Sun commented on HDFS-15039: hi [~hadoop_yangyun] It is recommended not to make unrelated changes. {code:java} import java.io.*; {code} > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.patch, HDFS-15039.patch, HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043133#comment-17043133 ] Yao Guangdong edited comment on HDFS-15186 at 2/25/20 3:08 AM: --- [~ferhui], [~ayushtkn], [~gjhkael] Thanks for yours patient review. I agree with yours point that fix it in namenode side. I have a suspicion. We can copy blocks from others DN in 3 replica mode when we decommission DN and the decommissioning DN is busy. But, we only can copy blocks from the decommissioning DN in ec mode if we don't reconstruct it. The time we cost in decommission is 69 hours (1m / 4 / 3600 = 69 hours) if we have 1 million blocks in one DN and the cost time we copy a block is one second and the hard limit is default 4. Which will make the speed of decommission very slow if we copy all blocks from decommission DN and we add decommissioning busy replica into live replica check? Is my comprehend right? was (Author: yaoguangdong): [~ferhui], [~ayushtkn], [~gjhkael] Thanks for yours patient review. I agree with yours point that fix it in namenode side. I have a suspicion. We can copy blocks from others DN in 3 replica mode when we decommission DN and the decommissioning DN is busy. But, we only can copy blocks from the decommissioning DN in ec mode if we don't reconstruct it. The time we cost in decommission is 69 hours (100W / 4 / 3600 = 69 hours) if we have 100W blocks in one DN and the cost time we copy a block is one second and the hard limit is default 4. Which will make the speed of decommission very slow if we copy all blocks from decommission DN and we add decommissioning busy replica into live replica check? Is my comprehend right? > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044085#comment-17044085 ] Yao Guangdong commented on HDFS-15186: -- [~ferhui], [~ayushtkn], [~gjhkael], [~marvelrock], I have been fixed it in namenode side and add a new patch HDFS-15186.002.patch. Could you have time to review it. Thanks. > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Guangdong updated HDFS-15186: - Attachment: HDFS-15186.002.patch > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15039: Attachment: HDFS-15039.patch Status: Patch Available (was: Open) > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.patch, HDFS-15039.patch, HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15039: Attachment: HDFS-15039.patch > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.patch, HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15039: Status: Open (was: Patch Available) > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies
[ https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043905#comment-17043905 ] Hadoop QA commented on HDFS-15154: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 28m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 56s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 3 new + 581 unchanged - 0 fixed = 584 total (was 581) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 1063 unchanged - 2 fixed = 1063 total (was 1065) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 28s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}198m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.6 Server=19.03.6 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15154 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 6ceb9473a6ca 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9290040 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/28836/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Commented] (HDFS-15174) Optimize ReplicaCachingGetSpaceUsed by reducing unnecessary io operations
[ https://issues.apache.org/jira/browse/HDFS-15174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043895#comment-17043895 ] Hudson commented on HDFS-15174: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17988 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17988/]) HDFS-15174. Optimize ReplicaCachingGetSpaceUsed by reducing unnecessary (weichiu: rev 1c5d2f1fdc40b77731bc13973876b567865888d1) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaCachingGetSpaceUsed.java > Optimize ReplicaCachingGetSpaceUsed by reducing unnecessary io operations > - > > Key: HDFS-15174 > URL: https://issues.apache.org/jira/browse/HDFS-15174 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15174-001.patch > > > Calculating the size of each block and the size of the meta file requires io > operation In ReplicaCachingGetSpaceUsed#refresh(). Pressure on disk > performance when there are many block. HDFS-14313 is intended to reduce io > operation. So get block size by ReplicaInfo and meta size by > DataChecksum#getChecksumSize(). > {code:java} > @Override > protected void refresh() { > if (CollectionUtils.isNotEmpty(replicaInfos)) { > for (ReplicaInfo replicaInfo : replicaInfos) { > if (Objects.equals(replicaInfo.getVolume().getStorageID(), > volume.getStorageID())) { > dfsUsed += replicaInfo.getBlockDataLength(); > dfsUsed += replicaInfo.getMetadataLength(); > count++; > } > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043894#comment-17043894 ] Wei-Chiu Chuang commented on HDFS-15039: Patch has a conflict so submit a new patch and let it run through the precommit. > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15190) HttpFS : Add Support for Storage Policy Satisfier
[ https://issues.apache.org/jira/browse/HDFS-15190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043891#comment-17043891 ] Hadoop QA commented on HDFS-15190: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 1s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 19s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-httpfs: The patch generated 1 new + 461 unchanged - 0 fixed = 462 total (was 461) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 28s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.6 Server=19.03.6 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15190 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12994359/HDFS-15190.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3c2550f6cc3e 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9290040 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/28837/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-httpfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28837/testReport/ | | Max. process+thread count | 625 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-httpfs U: hadoop-hdfs-project/hadoop-hdfs-httpfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28837/console | | Powered by | Apache
[jira] [Updated] (HDFS-15174) Optimize ReplicaCachingGetSpaceUsed by reducing unnecessary io operations
[ https://issues.apache.org/jira/browse/HDFS-15174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15174: --- Resolution: Fixed Status: Resolved (was: Patch Available) Patch applies cleanly in trunk branch-3.2 and branch-3.1. Branch-2.10 will require an update. > Optimize ReplicaCachingGetSpaceUsed by reducing unnecessary io operations > - > > Key: HDFS-15174 > URL: https://issues.apache.org/jira/browse/HDFS-15174 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15174-001.patch > > > Calculating the size of each block and the size of the meta file requires io > operation In ReplicaCachingGetSpaceUsed#refresh(). Pressure on disk > performance when there are many block. HDFS-14313 is intended to reduce io > operation. So get block size by ReplicaInfo and meta size by > DataChecksum#getChecksumSize(). > {code:java} > @Override > protected void refresh() { > if (CollectionUtils.isNotEmpty(replicaInfos)) { > for (ReplicaInfo replicaInfo : replicaInfos) { > if (Objects.equals(replicaInfo.getVolume().getStorageID(), > volume.getStorageID())) { > dfsUsed += replicaInfo.getBlockDataLength(); > dfsUsed += replicaInfo.getMetadataLength(); > count++; > } > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15174) Optimize ReplicaCachingGetSpaceUsed by reducing unnecessary io operations
[ https://issues.apache.org/jira/browse/HDFS-15174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15174: --- Fix Version/s: 3.2.2 3.1.4 3.3.0 > Optimize ReplicaCachingGetSpaceUsed by reducing unnecessary io operations > - > > Key: HDFS-15174 > URL: https://issues.apache.org/jira/browse/HDFS-15174 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15174-001.patch > > > Calculating the size of each block and the size of the meta file requires io > operation In ReplicaCachingGetSpaceUsed#refresh(). Pressure on disk > performance when there are many block. HDFS-14313 is intended to reduce io > operation. So get block size by ReplicaInfo and meta size by > DataChecksum#getChecksumSize(). > {code:java} > @Override > protected void refresh() { > if (CollectionUtils.isNotEmpty(replicaInfos)) { > for (ReplicaInfo replicaInfo : replicaInfos) { > if (Objects.equals(replicaInfo.getVolume().getStorageID(), > volume.getStorageID())) { > dfsUsed += replicaInfo.getBlockDataLength(); > dfsUsed += replicaInfo.getMetadataLength(); > count++; > } > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043864#comment-17043864 ] Chen Liang edited comment on HDFS-15191 at 2/24/20 9:01 PM: There could be token compatibility issue though, if you only have HDFS-13617, but not HDFS-14611. If both changes are there, this should be fine. But even if HDFS-14611 is missing, I would expect a different error. Because seems the error happened at the very first call of {{readVLong}} when parsing the token. Those two Jiras only changes the behavior of tails of the block token. Also, even if we hit compatibility issue, I expect it to only affect the selective SASL feature. Will be watching this issue. was (Author: vagarychen): There could be token compatibility issue though, if you only have HDFS-13617, but not HDFS-14611. If both changes are there, this should be fine. But even if HDFS-14611 is missing, I would expect a different error. Because seems the error happened at the very first call of {{readVLong}} when parsing the token. Those two Jiras only changes the behavior of tails of the block token. > EOF when reading legacy buffer in BlockTokenIdentifier > -- > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Steven Rand >Priority: Major > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem
[jira] [Commented] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043864#comment-17043864 ] Chen Liang commented on HDFS-15191: --- There could be token compatibility issue though, if you only have HDFS-13617, but not HDFS-14611. If both changes are there, this should be fine. But even if HDFS-14611 is missing, I would expect a different error. Because seems the error happened at the very first call of {{readVLong}} when parsing the token. Those two Jiras only changes the behavior of tails of the block token. > EOF when reading legacy buffer in BlockTokenIdentifier > -- > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Steven Rand >Priority: Major > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging > [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
Steven Rand created HDFS-15191: -- Summary: EOF when reading legacy buffer in BlockTokenIdentifier Key: HDFS-15191 URL: https://issues.apache.org/jira/browse/HDFS-15191 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.2.1 Reporter: Steven Rand We have an HDFS client application which recently upgraded from 3.2.0 to 3.2.1. After this upgrade (but not before), we sometimes see these errors when this application is used with clusters still running Hadoop 2.x (more specifically CDH 5.12.1): {code} WARN [2020-02-24T00:54:32.856Z] org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing remote block reader. (_sampled: true) java.io.EOFException: at java.io.DataInputStream.readByte(DataInputStream.java:272) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) at org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) {code} We get this warning for all DataNodes with a copy of the block, so the read fails. I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15190) HttpFS : Add Support for Storage Policy Satisfier
[ https://issues.apache.org/jira/browse/HDFS-15190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043729#comment-17043729 ] Íñigo Goiri commented on HDFS-15190: Thanks [~hemanthboyina] for bringing this up. What is the use case to run the storage policy satisfier through HttpFS? > HttpFS : Add Support for Storage Policy Satisfier > -- > > Key: HDFS-15190 > URL: https://issues.apache.org/jira/browse/HDFS-15190 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15190.001.patch > > > Add support for SPS in httpfs -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15190) HttpFS : Add Support for Storage Policy Satisfier
[ https://issues.apache.org/jira/browse/HDFS-15190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hemanthboyina updated HDFS-15190: - Attachment: HDFS-15190.001.patch Status: Patch Available (was: Open) > HttpFS : Add Support for Storage Policy Satisfier > -- > > Key: HDFS-15190 > URL: https://issues.apache.org/jira/browse/HDFS-15190 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15190.001.patch > > > Add support for SPS in httpfs -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15190) HttpFS : Add Support for Storage Policy Satisfier
hemanthboyina created HDFS-15190: Summary: HttpFS : Add Support for Storage Policy Satisfier Key: HDFS-15190 URL: https://issues.apache.org/jira/browse/HDFS-15190 Project: Hadoop HDFS Issue Type: Improvement Reporter: hemanthboyina Assignee: hemanthboyina Add support for SPS in httpfs -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15187) CORRUPT replica mismatch between namenodes after failover
[ https://issues.apache.org/jira/browse/HDFS-15187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043614#comment-17043614 ] Hudson commented on HDFS-15187: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17983 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17983/]) HDFS-15187. CORRUPT replica mismatch between namenodes after failover. (ayushsaxena: rev 7f8685f4760f1358bb30927a7da9a5041e8c39e1) * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestCorruptionWithFailover.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java > CORRUPT replica mismatch between namenodes after failover > - > > Key: HDFS-15187 > URL: https://issues.apache.org/jira/browse/HDFS-15187 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-15187-01.patch, HDFS-15187-02.patch, > HDFS-15187-03.patch > > > The corrupt replica identified by Active Namenode, isn't identified by the > Other Namenode, when it is failovered to Active, in case the replica is being > marked corrupt due to updatePipeline. > Scenario to repro : > 1. Create a file, while writing turn one datanode down, to trigger update > pipeline. > 2. Write some more data. > 3. Close the file. > 4. Turn on the shutdown datanode. > 5. The replica in the datanode will be identifed as CORRUPT and the corrupt > count will be 1. > 6. Failover to other Namenode. > 7. Wait for all pending IBR processing. > 8. The corrupt count will not be same, and the FSCK won't show the corrupt > replica. > 9. Failover back to first namenode. > 10. Corrupt count and corrupt replica will be there. > Both Namenodes shows different stuff. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15166) Remove redundant field fStream in ByteStringLog
[ https://issues.apache.org/jira/browse/HDFS-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043605#comment-17043605 ] Hudson commented on HDFS-15166: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17982 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17982/]) HDFS-15166. Remove redundant field fStream in ByteStringLog. Contributed (ayushsaxena: rev 93b8f453b96470f1a6cc9ac098f4934ddd631657) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java > Remove redundant field fStream in ByteStringLog > --- > > Key: HDFS-15166 > URL: https://issues.apache.org/jira/browse/HDFS-15166 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Xieming Li >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15166.000.patch > > > {{ByteStringLog.fStream}} is only used in {{init()}} method and can be > replaced by a local variable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15166) Remove redundant field fStream in ByteStringLog
[ https://issues.apache.org/jira/browse/HDFS-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043602#comment-17043602 ] Xieming Li commented on HDFS-15166: --- Thank you for the report and review! > Remove redundant field fStream in ByteStringLog > --- > > Key: HDFS-15166 > URL: https://issues.apache.org/jira/browse/HDFS-15166 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Xieming Li >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15166.000.patch > > > {{ByteStringLog.fStream}} is only used in {{init()}} method and can be > replaced by a local variable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15111) stopStandbyServices() should log which service state it is transitioning from.
[ https://issues.apache.org/jira/browse/HDFS-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043596#comment-17043596 ] Ayush Saxena commented on HDFS-15111: - [~shv] thoughts on v003 ? > stopStandbyServices() should log which service state it is transitioning from. > -- > > Key: HDFS-15111 > URL: https://issues.apache.org/jira/browse/HDFS-15111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, logging >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Xieming Li >Priority: Major > Labels: newbie++ > Attachments: HDFS-15111.001.patch, HDFS-15111.002.patch, > HDFS-15111.003.patch > > > Trying to transition Observer to Standby state. {{stopStandbyServices()}} > logs that it is "Stopping services started for standby state". It should be > "Stopping services started for observer state" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15187) CORRUPT replica mismatch between namenodes after failover
[ https://issues.apache.org/jira/browse/HDFS-15187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043594#comment-17043594 ] Ayush Saxena commented on HDFS-15187: - Committed to trunk. Thanx [~elgoiri] and [~vinayakumarb] for the reviews!!! > CORRUPT replica mismatch between namenodes after failover > - > > Key: HDFS-15187 > URL: https://issues.apache.org/jira/browse/HDFS-15187 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-15187-01.patch, HDFS-15187-02.patch, > HDFS-15187-03.patch > > > The corrupt replica identified by Active Namenode, isn't identified by the > Other Namenode, when it is failovered to Active, in case the replica is being > marked corrupt due to updatePipeline. > Scenario to repro : > 1. Create a file, while writing turn one datanode down, to trigger update > pipeline. > 2. Write some more data. > 3. Close the file. > 4. Turn on the shutdown datanode. > 5. The replica in the datanode will be identifed as CORRUPT and the corrupt > count will be 1. > 6. Failover to other Namenode. > 7. Wait for all pending IBR processing. > 8. The corrupt count will not be same, and the FSCK won't show the corrupt > replica. > 9. Failover back to first namenode. > 10. Corrupt count and corrupt replica will be there. > Both Namenodes shows different stuff. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15187) CORRUPT replica mismatch between namenodes after failover
[ https://issues.apache.org/jira/browse/HDFS-15187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15187: Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > CORRUPT replica mismatch between namenodes after failover > - > > Key: HDFS-15187 > URL: https://issues.apache.org/jira/browse/HDFS-15187 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-15187-01.patch, HDFS-15187-02.patch, > HDFS-15187-03.patch > > > The corrupt replica identified by Active Namenode, isn't identified by the > Other Namenode, when it is failovered to Active, in case the replica is being > marked corrupt due to updatePipeline. > Scenario to repro : > 1. Create a file, while writing turn one datanode down, to trigger update > pipeline. > 2. Write some more data. > 3. Close the file. > 4. Turn on the shutdown datanode. > 5. The replica in the datanode will be identifed as CORRUPT and the corrupt > count will be 1. > 6. Failover to other Namenode. > 7. Wait for all pending IBR processing. > 8. The corrupt count will not be same, and the FSCK won't show the corrupt > replica. > 9. Failover back to first namenode. > 10. Corrupt count and corrupt replica will be there. > Both Namenodes shows different stuff. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15166) Remove redundant field fStream in ByteStringLog
[ https://issues.apache.org/jira/browse/HDFS-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043586#comment-17043586 ] Ayush Saxena commented on HDFS-15166: - Committed to trunk, 3.2, 3.1 and 2.10 Thanx [~risyomei] for the contribution and [~shv] for the report!!! > Remove redundant field fStream in ByteStringLog > --- > > Key: HDFS-15166 > URL: https://issues.apache.org/jira/browse/HDFS-15166 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Xieming Li >Priority: Major > Labels: newbie, newbie++ > Attachments: HDFS-15166.000.patch > > > {{ByteStringLog.fStream}} is only used in {{init()}} method and can be > replaced by a local variable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15166) Remove redundant field fStream in ByteStringLog
[ https://issues.apache.org/jira/browse/HDFS-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15166: Fix Version/s: 2.10.1 3.2.2 3.1.4 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > Remove redundant field fStream in ByteStringLog > --- > > Key: HDFS-15166 > URL: https://issues.apache.org/jira/browse/HDFS-15166 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Xieming Li >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15166.000.patch > > > {{ByteStringLog.fStream}} is only used in {{init()}} method and can be > replaced by a local variable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-13393) Improve OOM logging
[ https://issues.apache.org/jira/browse/HDFS-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota reassigned HDFS-13393: - Assignee: (was: Gabor Bota) > Improve OOM logging > --- > > Key: HDFS-13393 > URL: https://issues.apache.org/jira/browse/HDFS-13393 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, datanode >Reporter: Wei-Chiu Chuang >Priority: Major > > It is not uncommon to find "java.lang.OutOfMemoryError: unable to create new > native thread" errors in a HDFS cluster. Most often this happens when > DataNode creating DataXceiver threads, or when balancer creates threads for > moving blocks around. > In most of cases, the "OOM" is a symptom of number of threads reaching system > limit, rather than actually running out of memory, and the current logging of > this message is usually misleading (suggesting this is due to insufficient > memory) > How about capturing the OOM, and if it is due to "unable to create new native > thread", print some more helpful message like "bump your ulimit" or "take a > jstack of the process"? > Even better, surface this error to make it more visible. It usually takes a > while for an in-depth investigation after users notice some job fails, by the > time the evidences may already been gone (like jstack output). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15166) Remove redundant field fStream in ByteStringLog
[ https://issues.apache.org/jira/browse/HDFS-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043568#comment-17043568 ] Ayush Saxena commented on HDFS-15166: - +1 > Remove redundant field fStream in ByteStringLog > --- > > Key: HDFS-15166 > URL: https://issues.apache.org/jira/browse/HDFS-15166 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Xieming Li >Priority: Major > Labels: newbie, newbie++ > Attachments: HDFS-15166.000.patch > > > {{ByteStringLog.fStream}} is only used in {{init()}} method and can be > replaced by a local variable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15166) Remove redundant field fStream in ByteStringLog
[ https://issues.apache.org/jira/browse/HDFS-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043459#comment-17043459 ] Hadoop QA commented on HDFS-15166: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 13s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue} 0m 34s{color} | {color:blue} ASF License check generated no output? {color} | | {color:black}{color} | {color:black} {color} | {color:black}110m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStorageStateRecovery | | | hadoop.hdfs.TestFileCreationClient | | | hadoop.hdfs.TestDataTransferKeepalive | | | hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer | | | hadoop.hdfs.TestDecommissionWithStriped | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.TestMultipleNNPortQOP | | | hadoop.hdfs.TestDFSStripedInputStream | | | hadoop.hdfs.server.blockmanagement.TestNodeCount | | | hadoop.hdfs.TestWriteBlockGetsBlockLengthHint | | | hadoop.hdfs.TestReadStripedFileWithDecodingDeletedData | | | hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork | | | hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.server.blockmanagement.TestBlockInfoStriped | | | hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.6 Server=19.03.6 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15166 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12994305/HDFS-15166.000.patch | | Optional Tests | dupname
[jira] [Comment Edited] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043209#comment-17043209 ] Andrea edited comment on HDFS-15098 at 2/24/20 11:52 AM: - This patch can be used which hadoop version and openssl version ? was (Author: andrea_julianos_one): This patch can be used whice hadoop version and openssl version ? > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: liusheng >Priority: Major > Attachments: HDFS-15098.001.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15025) Applying NVDIMM storage media to HDFS
[ https://issues.apache.org/jira/browse/HDFS-15025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043443#comment-17043443 ] Hadoop QA commented on HDFS-15025: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 52s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 16 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 59s{color} | {color:green} trunk passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 59s{color} | {color:orange} The patch fails to run checkstyle in root {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 43s{color} | {color:red} hadoop-common in trunk failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 41s{color} | {color:red} hadoop-hdfs-client in trunk failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 24m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 57s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 23m 20s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 23m 20s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 23m 20s{color} | {color:red} root in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 43s{color} | {color:orange} The patch fails to run checkstyle in root {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 46s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 48s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 41s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 0m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 38s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 38s{color} | {color:red} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 35s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 27s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue} 0m 40s{color} | {color:blue} ASF License check generated no output? {color} | | {color:black}{color} | {color:black} {color} | {color:black}150m 5s{color} |
[jira] [Commented] (HDFS-15111) stopStandbyServices() should log which service state it is transitioning from.
[ https://issues.apache.org/jira/browse/HDFS-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043418#comment-17043418 ] Xieming Li commented on HDFS-15111: --- Sorry for not noticing the update for a long time. I am also ok with either way, but I prefer implementation in v003 mainly because " it shall maintain the present behavior", as Ayush stated. Using assertion impose a risk of stopping the program, though that possibility is nearly zero. > stopStandbyServices() should log which service state it is transitioning from. > -- > > Key: HDFS-15111 > URL: https://issues.apache.org/jira/browse/HDFS-15111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, logging >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Xieming Li >Priority: Major > Labels: newbie++ > Attachments: HDFS-15111.001.patch, HDFS-15111.002.patch, > HDFS-15111.003.patch > > > Trying to transition Observer to Standby state. {{stopStandbyServices()}} > logs that it is "Stopping services started for standby state". It should be > "Stopping services started for observer state" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15166) Remove redundant field fStream in ByteStringLog
[ https://issues.apache.org/jira/browse/HDFS-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-15166: -- Attachment: HDFS-15166.000.patch Status: Patch Available (was: Open) > Remove redundant field fStream in ByteStringLog > --- > > Key: HDFS-15166 > URL: https://issues.apache.org/jira/browse/HDFS-15166 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Xieming Li >Priority: Major > Labels: newbie, newbie++ > Attachments: HDFS-15166.000.patch > > > {{ByteStringLog.fStream}} is only used in {{init()}} method and can be > replaced by a local variable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15189) update jackon-databind version
[ https://issues.apache.org/jira/browse/HDFS-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated HDFS-15189: - Description: according to [CVE-2020-8840|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8840], maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?* (was: according to [CVE-2020-8840]([https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8840]), maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?*) > update jackon-databind version > -- > > Key: HDFS-15189 > URL: https://issues.apache.org/jira/browse/HDFS-15189 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: angerszhu >Priority: Major > > according to > [CVE-2020-8840|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8840], > maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15025) Applying NVDIMM storage media to HDFS
[ https://issues.apache.org/jira/browse/HDFS-15025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043375#comment-17043375 ] Hadoop QA commented on HDFS-15025: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HDFS-15025 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15025 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12989371/NVDIMM_patch%28WIP%29.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28834/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Applying NVDIMM storage media to HDFS > - > > Key: HDFS-15025 > URL: https://issues.apache.org/jira/browse/HDFS-15025 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, hdfs >Reporter: hadoop_hdfs_hw >Priority: Major > Attachments: Applying NVDIMM to HDFS.pdf, NVDIMM_patch(WIP).patch > > > The non-volatile memory NVDIMM is faster than SSD, it can be used > simultaneously with RAM, DISK, SSD. The data of HDFS stored directly on > NVDIMM can not only improves the response rate of HDFS, but also ensure the > reliability of the data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15189) update jackon-databind version
[ https://issues.apache.org/jira/browse/HDFS-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043376#comment-17043376 ] angerszhu commented on HDFS-15189: -- cc [~hexiaoqiao] can you help to ping some one who familiar to this? > update jackon-databind version > -- > > Key: HDFS-15189 > URL: https://issues.apache.org/jira/browse/HDFS-15189 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: angerszhu >Priority: Major > > according to > [CVE-2020-8840|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8840], > maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15189) update jackon-databind version
[ https://issues.apache.org/jira/browse/HDFS-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated HDFS-15189: - Description: according to [CVE-2020-8840]([https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8840]), maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?* (was: according to [*CVE-2020-8840]([https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8840]), maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?*) > update jackon-databind version > -- > > Key: HDFS-15189 > URL: https://issues.apache.org/jira/browse/HDFS-15189 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: angerszhu >Priority: Major > > according to > [CVE-2020-8840]([https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8840]), > maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15189) update jackon-databind version
angerszhu created HDFS-15189: Summary: update jackon-databind version Key: HDFS-15189 URL: https://issues.apache.org/jira/browse/HDFS-15189 Project: Hadoop HDFS Issue Type: Improvement Reporter: angerszhu according to *CVE-2020-8840, maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15189) update jackon-databind version
[ https://issues.apache.org/jira/browse/HDFS-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated HDFS-15189: - Description: according to [*CVE-2020-8840]([https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8840]), maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?* (was: according to *CVE-2020-8840, maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?*) > update jackon-databind version > -- > > Key: HDFS-15189 > URL: https://issues.apache.org/jira/browse/HDFS-15189 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: angerszhu >Priority: Major > > according to > [*CVE-2020-8840]([https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8840]), > maybe we should update jackson-databind to 2.9.10.3 or 2.10.x?* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15025) Applying NVDIMM storage media to HDFS
[ https://issues.apache.org/jira/browse/HDFS-15025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hadoop_hdfs_hw updated HDFS-15025: -- Attachment: (was: HDFS-15025.001.patch) > Applying NVDIMM storage media to HDFS > - > > Key: HDFS-15025 > URL: https://issues.apache.org/jira/browse/HDFS-15025 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, hdfs >Reporter: hadoop_hdfs_hw >Priority: Major > Attachments: Applying NVDIMM to HDFS.pdf, NVDIMM_patch(WIP).patch > > > The non-volatile memory NVDIMM is faster than SSD, it can be used > simultaneously with RAM, DISK, SSD. The data of HDFS stored directly on > NVDIMM can not only improves the response rate of HDFS, but also ensure the > reliability of the data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043367#comment-17043367 ] Stephen O'Donnell commented on HDFS-14854: -- Hi [~ayushtkn] there are quite a few factors to consider in this. Firstly, decommission speed is mostly dictated by the speed the replication manager works at, and nothing about that has changed with this patch. One thing we did attempt to do, was ensure that the blocks were shuffled so that if the decommissioning node has many disks, it should not pick blocks only from the same disk, which is what the origional monitor did. Depending on your settings for max-streams and work-multiplier, there may not be enough blocks moving at the same time to saturate a disk and therefore you would not see any benefit of this. This change should also result in less load on the NN. It scans the blocks less often to check if they have completed replication, does a lot of work under the read lock rather than write lock. For maintenance, it should also result in nodes going into maintenance faster provided they don't need to do any replication. Even if replication happens at the same speed, the new monitor should use less CPU and cause less lock contention on the NN, which is a good thing, but very hard to measure. > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: 012_to_013_changes.diff, > Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, HDFS-14854.002.patch, > HDFS-14854.003.patch, HDFS-14854.004.patch, HDFS-14854.005.patch, > HDFS-14854.006.patch, HDFS-14854.007.patch, HDFS-14854.008.patch, > HDFS-14854.009.patch, HDFS-14854.010.patch, HDFS-14854.011.patch, > HDFS-14854.012.patch, HDFS-14854.013.patch, HDFS-14854.014.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043360#comment-17043360 ] Yao Guangdong commented on HDFS-15186: -- [~ayushtkn], Thank you very much. I will fix it as soon as possible. :D > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15188) Add option to set Write/Read timeout extension for different StorageType
[ https://issues.apache.org/jira/browse/HDFS-15188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15188: Priority: Minor (was: Major) > Add option to set Write/Read timeout extension for different StorageType > > > Key: HDFS-15188 > URL: https://issues.apache.org/jira/browse/HDFS-15188 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, dfsclient >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15188.patch > > > Different storage types have different speeds. Especially for low-speed > Archive volume, errors are often reported under current timeout. Add an > unified solution to set options for different StorageType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043353#comment-17043353 ] Ayush Saxena commented on HDFS-15186: - [~yaoguangdong] he hasn't fixed this problem, he fixed a similar problem, where the live node was busy, In your problem decommissioning node is busy. You can similarly fix this problem as HDFS-14768 did for live busy nodes here. > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043286#comment-17043286 ] Yao Guangdong edited comment on HDFS-15186 at 2/24/20 10:06 AM: [~ayushtkn], OK. Thanks for your advice. [~gjhkael] had been fixed it in namenode side by HDFS-14768 . I think this is duplicated. You can close it. was (Author: yaoguangdong): [~ayushtkn], OK. Thanks for your reply. [~gjhkael] had been fixed it in namenode side by HDFS-14768 . I think this is duplicated. You can close it. > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043286#comment-17043286 ] Yao Guangdong commented on HDFS-15186: -- [~ayushtkn], OK. Thanks for your reply. [~gjhkael] had been fixed it in namenode side by HDFS-14768 . I think this is duplicated. You can close it. > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.
[ https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043268#comment-17043268 ] Ayush Saxena commented on HDFS-15120: - Thanx [~LiJinglun] . Had a quick look. Couldn't check in full. But can we avoid taking lock and stuff, this would be part of normal write flow, I am not sure, how much impact, but there would be some in taking read lock, though trivial, and BPP changing in runtime would be a very rare scenario. I don't think we should put its cost to any other basic process. Can we get rid of it. I think we can handle some latency in updating of BPP, when we reconfigure it at runtime. Secondly, Can we directly just change the {{placementPolicies}} variable with a new one in {{BlockManager}} rather than changing it internally? > Refresh BlockPlacementPolicy at runtime. > > > Key: HDFS-15120 > URL: https://issues.apache.org/jira/browse/HDFS-15120 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-15120.001.patch, HDFS-15120.002.patch, > HDFS-15120.003.patch, HDFS-15120.004.patch, HDFS-15120.005.patch, > HDFS-15120.006.patch > > > Now if we want to switch BlockPlacementPolicies we need to restart the > NameNode. It would be convenient if we can switch it at runtime. For example > we can switch between AvailableSpaceBlockPlacementPolicy and > BlockPlacementPolicyDefault as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043266#comment-17043266 ] Ayush Saxena commented on HDFS-15186: - Decommissioning taking time is another aspect, we can't mix it here. we can't directly compare EC with 3X rep, both have their pros and cons. May be we can track that and discuss the decommissioning taking time issue separately. > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15025) Applying NVDIMM storage media to HDFS
[ https://issues.apache.org/jira/browse/HDFS-15025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hadoop_hdfs_hw updated HDFS-15025: -- Attachment: HDFS-15025.001.patch Release Note: Adding the new storage media NVDIMM and ALL_NVDIMM storage policy on HDFS,including the test code for them Tags: HDFS, NVDIMM Status: Patch Available (was: Open) > Applying NVDIMM storage media to HDFS > - > > Key: HDFS-15025 > URL: https://issues.apache.org/jira/browse/HDFS-15025 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, hdfs >Reporter: hadoop_hdfs_hw >Priority: Major > Attachments: Applying NVDIMM to HDFS.pdf, HDFS-15025.001.patch, > NVDIMM_patch(WIP).patch > > > The non-volatile memory NVDIMM is faster than SSD, it can be used > simultaneously with RAM, DISK, SSD. The data of HDFS stored directly on > NVDIMM can not only improves the response rate of HDFS, but also ensure the > reliability of the data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043257#comment-17043257 ] Ayush Saxena commented on HDFS-14854: - Tried decommissioning using this with 30 Lack blocks, 3 datanodes. Sadly, this didn't save any time for me. To my surprise time taken with this and without this was exactly same. > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: 012_to_013_changes.diff, > Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, HDFS-14854.002.patch, > HDFS-14854.003.patch, HDFS-14854.004.patch, HDFS-14854.005.patch, > HDFS-14854.006.patch, HDFS-14854.007.patch, HDFS-14854.008.patch, > HDFS-14854.009.patch, HDFS-14854.010.patch, HDFS-14854.011.patch, > HDFS-14854.012.patch, HDFS-14854.013.patch, HDFS-14854.014.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043209#comment-17043209 ] Andrea commented on HDFS-15098: --- This patch can be used whice hadoop version and openssl version ? > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: liusheng >Priority: Major > Attachments: HDFS-15098.001.patch > > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org