date:20190908

[jira] [Comment Edited] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-08 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925383#comment-16925383
 ] 

Surendra Singh Lilhore edited comment on HDFS-14699 at 9/9/19 5:55 AM:
---

[~zhaoyim], I just want to move "{{liveBlockIndices.add(blockIndex);"}} above 
to threashold check, not source DN selection logic. 

Some thing like this...
{code:java}
if (isStriped) {
  
  liveBlockIndices.add(blockIndex);
  
  
}

if (node.getNumberOfBlocksToBeReplicated() >= replicationStreamsHardLimit) {
  continue;
}

if(isStriped || srcNodes.isEmpty()) { 
  srcNodes.add(node);
}{code}


was (Author: surendrasingh):
[~zhaoyim], I just want to move "{{liveBlockIndices.add(blockIndex);"}} above 
to threashold check, not source DN selection logic. 

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-08 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925383#comment-16925383
 ] 

Surendra Singh Lilhore commented on HDFS-14699:
---

[~zhaoyim], I just want to move "{{liveBlockIndices.add(blockIndex);"}} above 
to threashold check, not source DN selection logic. 

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log

2019-09-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925380#comment-16925380
 ] 

Hadoop QA commented on HDFS-14303:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red} HDFS-14303 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14303 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979602/HDFS-14303-addendnum-branch-2.01.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27821/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> check block directory logic not correct when there is only meta file, print 
> no meaning warn log
> ---
>
> Key: HDFS-14303
> URL: https://issues.apache.org/jira/browse/HDFS-14303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.7.3, 3.2.0, 2.9.2, 2.8.5
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: easy-fix
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14303-addendnum-branch-2.01.patch, 
> HDFS-14303-addendum-01.patch, HDFS-14303-addendum-02.patch, 
> HDFS-14303-branch-2.005.patch, HDFS-14303-branch-2.009.patch, 
> HDFS-14303-branch-2.010.patch, HDFS-14303-branch-2.015.patch, 
> HDFS-14303-branch-2.017.patch, HDFS-14303-branch-2.7.001.patch, 
> HDFS-14303-branch-2.7.004.patch, HDFS-14303-branch-2.7.006.patch, 
> HDFS-14303-branch-2.9.011.patch, HDFS-14303-branch-2.9.012.patch, 
> HDFS-14303-branch-2.9.013.patch, HDFS-14303-trunk.014.patch, 
> HDFS-14303-trunk.015.patch, HDFS-14303-trunk.016.patch, 
> HDFS-14303-trunk.016.path, HDFS-14303.branch-3.2.017.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> chek block directory logic not correct when there is only meta file,print no 
> meaning warn log, eg:
>  WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block 
> ID-based layout. Actual block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68,
>  expected block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log

2019-09-08 Thread He Xiaoqiao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925377#comment-16925377
 ] 

He Xiaoqiao commented on HDFS-14303:


[~iamgd67], just try to trigger Jenkins, please wait for a while.

> check block directory logic not correct when there is only meta file, print 
> no meaning warn log
> ---
>
> Key: HDFS-14303
> URL: https://issues.apache.org/jira/browse/HDFS-14303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.7.3, 3.2.0, 2.9.2, 2.8.5
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: easy-fix
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14303-addendnum-branch-2.01.patch, 
> HDFS-14303-addendum-01.patch, HDFS-14303-addendum-02.patch, 
> HDFS-14303-branch-2.005.patch, HDFS-14303-branch-2.009.patch, 
> HDFS-14303-branch-2.010.patch, HDFS-14303-branch-2.015.patch, 
> HDFS-14303-branch-2.017.patch, HDFS-14303-branch-2.7.001.patch, 
> HDFS-14303-branch-2.7.004.patch, HDFS-14303-branch-2.7.006.patch, 
> HDFS-14303-branch-2.9.011.patch, HDFS-14303-branch-2.9.012.patch, 
> HDFS-14303-branch-2.9.013.patch, HDFS-14303-trunk.014.patch, 
> HDFS-14303-trunk.015.patch, HDFS-14303-trunk.016.patch, 
> HDFS-14303-trunk.016.path, HDFS-14303.branch-3.2.017.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> chek block directory logic not correct when there is only meta file,print no 
> meaning warn log, eg:
>  WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block 
> ID-based layout. Actual block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68,
>  expected block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14373) EC : Decoding is failing when block group last incomplete cell fall in to AlignedStripe

2019-09-08 Thread Zhao Yi Ming (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925376#comment-16925376
 ] 

Zhao Yi Ming commented on HDFS-14373:
-

[~surendrasingh] Thanks for your steps! It help us understand the issue. Also 
waiting for you fix and Thanks for working on this! :)

> EC : Decoding is failing when block group last incomplete cell fall in to 
> AlignedStripe
> ---
>
> Key: HDFS-14373
> URL: https://issues.apache.org/jira/browse/HDFS-14373
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log

2019-09-08 Thread He Xiaoqiao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-14303:
---
Target Version/s: 2.9.2, 3.2.0  (was: 3.2.0, 2.9.2)
  Status: Patch Available  (was: Reopened)

> check block directory logic not correct when there is only meta file, print 
> no meaning warn log
> ---
>
> Key: HDFS-14303
> URL: https://issues.apache.org/jira/browse/HDFS-14303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.8.5, 2.9.2, 3.2.0, 2.7.3
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: easy-fix
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14303-addendnum-branch-2.01.patch, 
> HDFS-14303-addendum-01.patch, HDFS-14303-addendum-02.patch, 
> HDFS-14303-branch-2.005.patch, HDFS-14303-branch-2.009.patch, 
> HDFS-14303-branch-2.010.patch, HDFS-14303-branch-2.015.patch, 
> HDFS-14303-branch-2.017.patch, HDFS-14303-branch-2.7.001.patch, 
> HDFS-14303-branch-2.7.004.patch, HDFS-14303-branch-2.7.006.patch, 
> HDFS-14303-branch-2.9.011.patch, HDFS-14303-branch-2.9.012.patch, 
> HDFS-14303-branch-2.9.013.patch, HDFS-14303-trunk.014.patch, 
> HDFS-14303-trunk.015.patch, HDFS-14303-trunk.016.patch, 
> HDFS-14303-trunk.016.path, HDFS-14303.branch-3.2.017.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> chek block directory logic not correct when there is only meta file,print no 
> meaning warn log, eg:
>  WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block 
> ID-based layout. Actual block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68,
>  expected block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-08 Thread Zhao Yi Ming (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925373#comment-16925373
 ] 

Zhao Yi Ming commented on HDFS-14699:
-

[~ayushtkn] [~surendrasingh] Thanks for your review!

 

[~surendrasingh] 

1. I changed the issue title as your suggestion. Thanks!

2. The reason is the EC reconstruction work will not be controlled by the 
replicationStreamsHardLimit configuration, if move 
liveBlockIndices.add(blockIndex) before following block. If the DN is as more 
EC reconstruction works source node, all the reconstruction work need to read 
the data from it, the DN resource usage will be high, current fix make the DN 
under replicationStreamsHardLimit control, if the DN reach 
replicationStreamsHardLimit threshold, the DN will NOT be added in the source 
node list.

 

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log

2019-09-08 Thread qiang Liu (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925366#comment-16925366
 ] 

qiang Liu commented on HDFS-14303:
--

[~shv] [~ayushtkn] sorry for the delay response, thanks [~hexiaoqiao] for patch 
[^HDFS-14303-addendnum-branch-2.01.patch], the patch looks goot, it should fix 
the test fail issue. just wonder why [~hadoopqa] not trigger build.

> check block directory logic not correct when there is only meta file, print 
> no meaning warn log
> ---
>
> Key: HDFS-14303
> URL: https://issues.apache.org/jira/browse/HDFS-14303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.7.3, 3.2.0, 2.9.2, 2.8.5
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: easy-fix
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14303-addendnum-branch-2.01.patch, 
> HDFS-14303-addendum-01.patch, HDFS-14303-addendum-02.patch, 
> HDFS-14303-branch-2.005.patch, HDFS-14303-branch-2.009.patch, 
> HDFS-14303-branch-2.010.patch, HDFS-14303-branch-2.015.patch, 
> HDFS-14303-branch-2.017.patch, HDFS-14303-branch-2.7.001.patch, 
> HDFS-14303-branch-2.7.004.patch, HDFS-14303-branch-2.7.006.patch, 
> HDFS-14303-branch-2.9.011.patch, HDFS-14303-branch-2.9.012.patch, 
> HDFS-14303-branch-2.9.013.patch, HDFS-14303-trunk.014.patch, 
> HDFS-14303-trunk.015.patch, HDFS-14303-trunk.016.patch, 
> HDFS-14303-trunk.016.path, HDFS-14303.branch-3.2.017.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> chek block directory logic not correct when there is only meta file,print no 
> meaning warn log, eg:
>  WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block 
> ID-based layout. Actual block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68,
>  expected block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-08 Thread Siddharth Wagle (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925365#comment-16925365
 ] 

Siddharth Wagle commented on HDDS-1868:
---

Hi [~ljain], in my unit test I am able to get the roleInfoProto for the 
follower. Here is my test debug log.

{code}
2019-09-06 15:19:23,989 [Thread-0] INFO  ratis.XceiverServerRatis 
(XceiverServerRatis.java:getPipelineReport(582)) - - GroupInfoReply => self 
{
  id: "d5bdb10c-214e-4cae-819a-d560a085adbf"
  address: "0.0.0.0:51226"
}
role: FOLLOWER
roleElapsedTimeMs: 128
followerInfo {
  leaderInfo {
id {
  id: "a7a440c4-0b3b-45b2-9d5a-336a32af5742"
  address: "10.22.8.51:51227"
}
lastRpcElapsedTimeMs: 8
  }
  outstandingOp: 1
}
{code}

Working on finishing off the proper unit test for the change.

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-08 Thread Zhao Yi Ming (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yi Ming updated HDFS-14699:

Summary: Erasure Coding: Storage not considered in live replica when 
replication streams hard limit reached to threshold  (was: Erasure Coding: Can 
NOT trigger the reconstruction when have the dup internal blocks and missing 
one internal block)

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14834) in some particular situation, datanode will always in DECOMMISSION_INPROGRESS state

2019-09-08 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925344#comment-16925344
 ] 

Ayush Saxena commented on HDFS-14834:
-

Give a check if HDFS-14754 can solve your problem

> in some particular situation, datanode will always in DECOMMISSION_INPROGRESS 
> state
> ---
>
> Key: HDFS-14834
> URL: https://issues.apache.org/jira/browse/HDFS-14834
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
> Environment: Policy：RS-6-3-1024K
> Version：3.1.2
>Reporter: janick.wu
>Priority: Major
>
> The file's block index is [0,1,2,3,4,5,6,7,8]. I decommission index [3] and  
> increase the index 5 datanode's pendingReplicationWithoutTargets.
> After reconstruction of index 5， the block status is：
> ||index||isDecommissionInProgress||state||
> |0|               false|LIVE|
> |1|               false|LIVE|
> |2|               false|LIVE|
> |3|               true|DECOMMISSIONING|
> |4|               false|LIVE|
> |5|               false|LIVE|
> |6|               false|LIVE|
> |7|               false|LIVE|
> |8|               false|LIVE|
> |5|               false|LIVE|
>  
> In DatanodeAdminManager.Monitor thread, blockManager.countNodes(block) 
> caculate the live bitset is \{0, 1, 2, 4, 5, 6, 7, 8}, liveReplicas: 8, 
> redundant:1
> It's a low redundancy block, put it into queue and wait for schedule.
> In BlockManager.RedundancyMonitor thread, live bitset is \{0, 1, 2, 3, 4, 6, 
> 7, 8} , liveReplicas:9,  redundant :0
> The block waitting for replication will remove from queue, rbecause the 
> liveReplicas satisfies the expected redundancy
>  And this situation would lead to index[3]'s datanode allways in 
> DECOMMISSION_INPROGRESS state
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14528) Failover from Active to Standby Failed

2019-09-08 Thread Ravuri Sushma sree (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925119#comment-16925119
 ] 

Ravuri Sushma sree edited comment on HDFS-14528 at 9/9/19 4:28 AM:
---

Patch has been uploaded.Please Review

The above test failures aren't related 


was (Author: sushma_28):
The above test failures arent related 

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.2.Patch, ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14834) in some particular situation, datanode will always in DECOMMISSION_INPROGRESS state

2019-09-08 Thread janick.wu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

janick.wu updated HDFS-14834:
-
Description: 
The file's block index is [0,1,2,3,4,5,6,7,8]. I decommission index [3] and  
increase the index 5 datanode's pendingReplicationWithoutTargets.

After reconstruction of index 5， the block status is：
||index||isDecommissionInProgress||state||
|0|               false|LIVE|
|1|               false|LIVE|
|2|               false|LIVE|
|3|               true|DECOMMISSIONING|
|4|               false|LIVE|
|5|               false|LIVE|
|6|               false|LIVE|
|7|               false|LIVE|
|8|               false|LIVE|
|5|               false|LIVE|

 

In DatanodeAdminManager.Monitor thread, blockManager.countNodes(block) caculate 
the live bitset is \{0, 1, 2, 4, 5, 6, 7, 8}, liveReplicas: 8, redundant:1

It's a low redundancy block, put it into queue and wait for schedule.

In BlockManager.RedundancyMonitor thread, live bitset is \{0, 1, 2, 3, 4, 6, 7, 
8} , liveReplicas:9,  redundant :0

The block waitting for replication will remove from queue, rbecause the 
liveReplicas satisfies the expected redundancy

 And this situation would lead to index[3]'s datanode allways in 
DECOMMISSION_INPROGRESS state

 

  was:
The file's block index is [0,1,2,3,4,5,6,7,8]. I decommission index [3] and  
increase the index 5 datanode's pendingReplicationWithoutTargets.

After reconstruction of index 5， the block status is：
||index||isDecommissionInProgress||state||
|0|               false|LIVE|
|1|               false|LIVE|
|2|               false|LIVE|
|3|               true|DECOMMISSIONING|
|4|               false|LIVE|
|5|               false|LIVE|
|6|               false|LIVE|
|7|               false|LIVE|
|8|               false|LIVE|
|5|               false|LIVE|

 

In DatanodeAdminManager.Monitor thread, blockManager.countNodes(block) caculate 
the live bitset is \{0, 1, 2, 4, 5, 6, 7, 8}, liveReplicas: 8, redundant:1

It's a low redundancy block, put it into queue and wait for schedule.

In BlockManager.RedundancyMonitor thread, live bitset is \{0, 1, 2, 3, 4, 6, 7, 
8} , liveReplicas:9,  redundant :0

The block waitting for replication will remove from queue, rbecause the 
liveReplicas satisfies the expected redundancy

 

 


> in some particular situation, datanode will always in DECOMMISSION_INPROGRESS 
> state
> ---
>
> Key: HDFS-14834
> URL: https://issues.apache.org/jira/browse/HDFS-14834
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
> Environment: Policy：RS-6-3-1024K
> Version：3.1.2
>Reporter: janick.wu
>Priority: Major
>
> The file's block index is [0,1,2,3,4,5,6,7,8]. I decommission index [3] and  
> increase the index 5 datanode's pendingReplicationWithoutTargets.
> After reconstruction of index 5， the block status is：
> ||index||isDecommissionInProgress||state||
> |0|               false|LIVE|
> |1|               false|LIVE|
> |2|               false|LIVE|
> |3|               true|DECOMMISSIONING|
> |4|               false|LIVE|
> |5|               false|LIVE|
> |6|               false|LIVE|
> |7|               false|LIVE|
> |8|               false|LIVE|
> |5|               false|LIVE|
>  
> In DatanodeAdminManager.Monitor thread, blockManager.countNodes(block) 
> caculate the live bitset is \{0, 1, 2, 4, 5, 6, 7, 8}, liveReplicas: 8, 
> redundant:1
> It's a low redundancy block, put it into queue and wait for schedule.
> In BlockManager.RedundancyMonitor thread, live bitset is \{0, 1, 2, 3, 4, 6, 
> 7, 8} , liveReplicas:9,  redundant :0
> The block waitting for replication will remove from queue, rbecause the 
> liveReplicas satisfies the expected redundancy
>  And this situation would lead to index[3]'s datanode allways in 
> DECOMMISSION_INPROGRESS state
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14834) in some particular situation, datanode will always in DECOMMISSION_INPROGRESS state

2019-09-08 Thread janick.wu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

janick.wu updated HDFS-14834:
-
Description: 
The file's block index is [0,1,2,3,4,5,6,7,8]. I decommission index [3] and  
increase the index 5 datanode's pendingReplicationWithoutTargets.

After reconstruction of index 5， the block status is：
||index||isDecommissionInProgress||state||
|0|               false|LIVE|
|1|               false|LIVE|
|2|               false|LIVE|
|3|               true|DECOMMISSIONING|
|4|               false|LIVE|
|5|               false|LIVE|
|6|               false|LIVE|
|7|               false|LIVE|
|8|               false|LIVE|
|5|               false|LIVE|

 

In DatanodeAdminManager.Monitor thread, blockManager.countNodes(block) caculate 
the live bitset is \{0, 1, 2, 4, 5, 6, 7, 8}, liveReplicas: 8, redundant:1

It's a low redundancy block, put it into queue and wait for schedule.

In BlockManager.RedundancyMonitor thread, live bitset is \{0, 1, 2, 3, 4, 6, 7, 
8} , liveReplicas:9,  redundant :0

The block waitting for replication will remove from queue, rbecause the 
liveReplicas satisfies the expected redundancy

 

 

  was:
The file's block index is [0,1,2,3,4,5,6,7,8]. I decommission index [3] and  
increase the index 5 datanode's pendingReplicationWithoutTargets.

After reconstruction of index 5， the block status is：

 
||index||isDecommissionInProgress||state||
|0|               false|LIVE|
|1|               false|LIVE|
|2|               false|LIVE|
|3|               true|DECOMMISSIONING|
|4|               false|LIVE|
|5|               false|LIVE|
|6|               false|LIVE|
|7|               false|LIVE|
|8|               false|LIVE|
|5|               false|LIVE|

 

In DatanodeAdminManager.Monitor thread, blockManager.countNodes(block) caculate 
the live bitset is \{0, 1, 2, 4, 5, 6, 7, 8}, liveReplicas: 8, redundant:1

it's a low redundancy block, put it into queue and wait for schedule.

 

In BlockManager.RedundancyMonitor thread, 

live bitset is \{0, 1, 2, 3, 4, 6, 7, 8} , liveReplicas:9,  redundant :0

the block waitting for replication will remove from queue, rbecause the 
liveReplicas satisfies the expected redundancy

 

 


> in some particular situation, datanode will always in DECOMMISSION_INPROGRESS 
> state
> ---
>
> Key: HDFS-14834
> URL: https://issues.apache.org/jira/browse/HDFS-14834
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
> Environment: Policy：RS-6-3-1024K
> Version：3.1.2
>Reporter: janick.wu
>Priority: Major
>
> The file's block index is [0,1,2,3,4,5,6,7,8]. I decommission index [3] and  
> increase the index 5 datanode's pendingReplicationWithoutTargets.
> After reconstruction of index 5， the block status is：
> ||index||isDecommissionInProgress||state||
> |0|               false|LIVE|
> |1|               false|LIVE|
> |2|               false|LIVE|
> |3|               true|DECOMMISSIONING|
> |4|               false|LIVE|
> |5|               false|LIVE|
> |6|               false|LIVE|
> |7|               false|LIVE|
> |8|               false|LIVE|
> |5|               false|LIVE|
>  
> In DatanodeAdminManager.Monitor thread, blockManager.countNodes(block) 
> caculate the live bitset is \{0, 1, 2, 4, 5, 6, 7, 8}, liveReplicas: 8, 
> redundant:1
> It's a low redundancy block, put it into queue and wait for schedule.
> In BlockManager.RedundancyMonitor thread, live bitset is \{0, 1, 2, 3, 4, 6, 
> 7, 8} , liveReplicas:9,  redundant :0
> The block waitting for replication will remove from queue, rbecause the 
> liveReplicas satisfies the expected redundancy
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14834) in some particular situation, datanode will always in DECOMMISSION_INPROGRESS state

2019-09-08 Thread janick.wu (Jira)

janick.wu created HDFS-14834:


 Summary: in some particular situation, datanode will always in 
DECOMMISSION_INPROGRESS state
 Key: HDFS-14834
 URL: https://issues.apache.org/jira/browse/HDFS-14834
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.1.2
 Environment: Policy：RS-6-3-1024K

Version：3.1.2
Reporter: janick.wu


The file's block index is [0,1,2,3,4,5,6,7,8]. I decommission index [3] and  
increase the index 5 datanode's pendingReplicationWithoutTargets.

After reconstruction of index 5， the block status is：

 
||index||isDecommissionInProgress||state||
|0|               false|LIVE|
|1|               false|LIVE|
|2|               false|LIVE|
|3|               true|DECOMMISSIONING|
|4|               false|LIVE|
|5|               false|LIVE|
|6|               false|LIVE|
|7|               false|LIVE|
|8|               false|LIVE|
|5|               false|LIVE|

 

In DatanodeAdminManager.Monitor thread, blockManager.countNodes(block) caculate 
the live bitset is \{0, 1, 2, 4, 5, 6, 7, 8}, liveReplicas: 8, redundant:1

it's a low redundancy block, put it into queue and wait for schedule.

 

In BlockManager.RedundancyMonitor thread, 

live bitset is \{0, 1, 2, 3, 4, 6, 7, 8} , liveReplicas:9,  redundant :0

the block waitting for replication will remove from queue, rbecause the 
liveReplicas satisfies the expected redundancy

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2057) Incorrect Default OM Port in Ozone FS URI Error Message

2019-09-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2057?focusedWorklogId=308602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308602
 ]

ASF GitHub Bot logged work on HDDS-2057:


Author: ASF GitHub Bot
Created on: 09/Sep/19 04:14
Start Date: 09/Sep/19 04:14
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1377: HDDS-2057. 
Incorrect Default OM Port in Ozone FS URI Error Message. Contributed by 
Supratim Deka
URL: https://github.com/apache/hadoop/pull/1377#issuecomment-529292657
 
 
   Looks like acceptance test failures are related to this, can you once verify 
them, as I have not seen them failing in recent CI runs?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308602)
Time Spent: 1h 10m  (was: 1h)

> Incorrect Default OM Port in Ozone FS URI Error Message
> ---
>
> Key: HDDS-2057
> URL: https://issues.apache.org/jira/browse/HDDS-2057
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The error message displayed from BasicOzoneFilesystem.initialize specifies 
> 5678 as the OM port. This is not the default port.
> "Ozone file system URL " +
>  "should be one of the following formats: " +
>  "o3fs://bucket.volume/key OR " +
>  "o3fs://bucket.volume.om-host.example.com/key OR " +
>  "o3fs://bucket.volume.om-host.example.com:5678/key";
>  
> This should be fixed to pull the default value from the configuration 
> parameter, instead of a hard-coded value.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2057) Incorrect Default OM Port in Ozone FS URI Error Message

2019-09-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2057?focusedWorklogId=308601=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308601
 ]

ASF GitHub Bot logged work on HDDS-2057:


Author: ASF GitHub Bot
Created on: 09/Sep/19 04:13
Start Date: 09/Sep/19 04:13
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1377: HDDS-2057. 
Incorrect Default OM Port in Ozone FS URI Error Message. Contributed by 
Supratim Deka
URL: https://github.com/apache/hadoop/pull/1377#issuecomment-529292657
 
 
   Looks like acceptance test failures are related to this, can you once verify 
them?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308601)
Time Spent: 1h  (was: 50m)

> Incorrect Default OM Port in Ozone FS URI Error Message
> ---
>
> Key: HDDS-2057
> URL: https://issues.apache.org/jira/browse/HDDS-2057
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The error message displayed from BasicOzoneFilesystem.initialize specifies 
> 5678 as the OM port. This is not the default port.
> "Ozone file system URL " +
>  "should be one of the following formats: " +
>  "o3fs://bucket.volume/key OR " +
>  "o3fs://bucket.volume.om-host.example.com/key OR " +
>  "o3fs://bucket.volume.om-host.example.com:5678/key";
>  
> This should be fixed to pull the default value from the configuration 
> parameter, instead of a hard-coded value.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2098) Ozone shell command prints out ERROR when the log4j file is not present.

2019-09-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2098?focusedWorklogId=308599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308599
 ]

ASF GitHub Bot logged work on HDDS-2098:


Author: ASF GitHub Bot
Created on: 09/Sep/19 04:04
Start Date: 09/Sep/19 04:04
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1411: HDDS-2098 : 
Ozone shell command prints out ERROR when the log4j file …
URL: https://github.com/apache/hadoop/pull/1411#issuecomment-529291245
 
 
   I have a question
   During ozone tarball build, we do copy ozone-shell-log4j.properties to 
etc/hadoop (like we copy log4.properties then why do we see this error or 
something need to be fixed in copying this script?
   
   
https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/dist/dev-support/bin/dist-layout-stitching#L95
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308599)
Time Spent: 1h 10m  (was: 1h)

> Ozone shell command prints out ERROR when the log4j file is not present.
> 
>
> Key: HDDS-2098
> URL: https://issues.apache.org/jira/browse/HDDS-2098
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone CLI
>Affects Versions: 0.5.0
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Exception Trace*
> {code}
> log4j:ERROR Could not read configuration file from URL 
> [file:/etc/ozone/conf/ozone-shell-log4j.properties].
> java.io.FileNotFoundException: /etc/ozone/conf/ozone-shell-log4j.properties 
> (No such file or directory)
>   at java.io.FileInputStream.open0(Native Method)
>   at java.io.FileInputStream.open(FileInputStream.java:195)
>   at java.io.FileInputStream.(FileInputStream.java:138)
>   at java.io.FileInputStream.(FileInputStream.java:93)
>   at 
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>   at 
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>   at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
>   at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>   at org.apache.log4j.LogManager.(LogManager.java:127)
>   at org.slf4j.impl.Log4jLoggerFactory.(Log4jLoggerFactory.java:66)
>   at org.slf4j.impl.StaticLoggerBinder.(StaticLoggerBinder.java:72)
>   at 
> org.slf4j.impl.StaticLoggerBinder.(StaticLoggerBinder.java:45)
>   at org.slf4j.LoggerFactory.bind(LoggerFactory.java:150)
>   at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:124)
>   at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:412)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:357)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:383)
>   at org.apache.hadoop.ozone.web.ozShell.Shell.(Shell.java:35)
> log4j:ERROR Ignoring configuration file 
> [file:/etc/ozone/conf/ozone-shell-log4j.properties].
> log4j:WARN No appenders could be found for logger 
> (io.jaegertracing.thrift.internal.senders.ThriftSenderFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> {
>   "metadata" : { },
>   "name" : "vol-test-putfile-1567740142",
>   "admin" : "root",
>   "owner" : "root",
>   "creationTime" : 1567740146501,
>   "acls" : [ {
> "type" : "USER",
> "name" : "root",
> "aclScope" : "ACCESS",
> "aclList" : [ "ALL" ]
>   }, {
> "type" : "GROUP",
> "name" : "root",
> "aclScope" : "ACCESS",
> "aclList" : [ "ALL" ]
>   } ],
>   "quota" : 1152921504606846976
> }
> {code}
> *Fix*
> When a log4j file is not present, the default should be console.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2098) Ozone shell command prints out ERROR when the log4j file is not present.

2019-09-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2098?focusedWorklogId=308598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308598
 ]

ASF GitHub Bot logged work on HDDS-2098:


Author: ASF GitHub Bot
Created on: 09/Sep/19 04:03
Start Date: 09/Sep/19 04:03
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1411: HDDS-2098 : 
Ozone shell command prints out ERROR when the log4j file …
URL: https://github.com/apache/hadoop/pull/1411#issuecomment-529291245
 
 
   I have a question
   During ozone tarball build, we do copy ozone-shell-log4j.properties to 
etc/hadoop (like we copy log4.properties then why do we see this error?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308598)
Time Spent: 1h  (was: 50m)

> Ozone shell command prints out ERROR when the log4j file is not present.
> 
>
> Key: HDDS-2098
> URL: https://issues.apache.org/jira/browse/HDDS-2098
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone CLI
>Affects Versions: 0.5.0
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Exception Trace*
> {code}
> log4j:ERROR Could not read configuration file from URL 
> [file:/etc/ozone/conf/ozone-shell-log4j.properties].
> java.io.FileNotFoundException: /etc/ozone/conf/ozone-shell-log4j.properties 
> (No such file or directory)
>   at java.io.FileInputStream.open0(Native Method)
>   at java.io.FileInputStream.open(FileInputStream.java:195)
>   at java.io.FileInputStream.(FileInputStream.java:138)
>   at java.io.FileInputStream.(FileInputStream.java:93)
>   at 
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>   at 
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>   at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
>   at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>   at org.apache.log4j.LogManager.(LogManager.java:127)
>   at org.slf4j.impl.Log4jLoggerFactory.(Log4jLoggerFactory.java:66)
>   at org.slf4j.impl.StaticLoggerBinder.(StaticLoggerBinder.java:72)
>   at 
> org.slf4j.impl.StaticLoggerBinder.(StaticLoggerBinder.java:45)
>   at org.slf4j.LoggerFactory.bind(LoggerFactory.java:150)
>   at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:124)
>   at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:412)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:357)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:383)
>   at org.apache.hadoop.ozone.web.ozShell.Shell.(Shell.java:35)
> log4j:ERROR Ignoring configuration file 
> [file:/etc/ozone/conf/ozone-shell-log4j.properties].
> log4j:WARN No appenders could be found for logger 
> (io.jaegertracing.thrift.internal.senders.ThriftSenderFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> {
>   "metadata" : { },
>   "name" : "vol-test-putfile-1567740142",
>   "admin" : "root",
>   "owner" : "root",
>   "creationTime" : 1567740146501,
>   "acls" : [ {
> "type" : "USER",
> "name" : "root",
> "aclScope" : "ACCESS",
> "aclList" : [ "ALL" ]
>   }, {
> "type" : "GROUP",
> "name" : "root",
> "aclScope" : "ACCESS",
> "aclList" : [ "ALL" ]
>   } ],
>   "quota" : 1152921504606846976
> }
> {code}
> *Fix*
> When a log4j file is not present, the default should be console.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2087) Remove the hard coded config key in ChunkManager

2019-09-08 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925342#comment-16925342
 ] 

Hudson commented on HDDS-2087:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17257 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17257/])
HDDS-2087. Remove the hard coded config key in ChunkManager (#1409) (bharat: 
rev 3b9584d12b06a6b66abd737e768d9d684ff92c78)
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/ChunkManagerFactory.java
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
* (edit) hadoop-hdds/common/src/main/resources/ozone-default.xml
* (edit) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsConfigKeys.java


> Remove the hard coded config key in ChunkManager
> 
>
> Key: HDDS-2087
> URL: https://issues.apache.org/jira/browse/HDDS-2087
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Anu Engineer
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We have a hard-coded config key in the {{ChunkManagerFactory.java.}}
>  
> {code}
> boolean scrubber = config.getBoolean(
>  "hdds.containerscrub.enabled",
>  false);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2087) Remove the hard coded config key in ChunkManager

2019-09-08 Thread Bharat Viswanadham (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-2087.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> Remove the hard coded config key in ChunkManager
> 
>
> Key: HDDS-2087
> URL: https://issues.apache.org/jira/browse/HDDS-2087
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Anu Engineer
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We have a hard-coded config key in the {{ChunkManagerFactory.java.}}
>  
> {code}
> boolean scrubber = config.getBoolean(
>  "hdds.containerscrub.enabled",
>  false);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2087) Remove the hard coded config key in ChunkManager

2019-09-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2087?focusedWorklogId=308592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308592
 ]

ASF GitHub Bot logged work on HDDS-2087:


Author: ASF GitHub Bot
Created on: 09/Sep/19 03:44
Start Date: 09/Sep/19 03:44
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #1409: 
HDDS-2087. Remove the hard coded config key in ChunkManager
URL: https://github.com/apache/hadoop/pull/1409
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308592)
Time Spent: 1h 40m  (was: 1.5h)

> Remove the hard coded config key in ChunkManager
> 
>
> Key: HDDS-2087
> URL: https://issues.apache.org/jira/browse/HDDS-2087
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Anu Engineer
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We have a hard-coded config key in the {{ChunkManagerFactory.java.}}
>  
> {code}
> boolean scrubber = config.getBoolean(
>  "hdds.containerscrub.enabled",
>  false);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2087) Remove the hard coded config key in ChunkManager

2019-09-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2087?focusedWorklogId=308591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308591
 ]

ASF GitHub Bot logged work on HDDS-2087:


Author: ASF GitHub Bot
Created on: 09/Sep/19 03:44
Start Date: 09/Sep/19 03:44
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1409: HDDS-2087. 
Remove the hard coded config key in ChunkManager
URL: https://github.com/apache/hadoop/pull/1409#issuecomment-529288559
 
 
   Thank You @vivekratnavel for the contribution and @anuengineer for the 
review.
   I have committed this to the trunk.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308591)
Time Spent: 1.5h  (was: 1h 20m)

> Remove the hard coded config key in ChunkManager
> 
>
> Key: HDDS-2087
> URL: https://issues.apache.org/jira/browse/HDDS-2087
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Anu Engineer
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We have a hard-coded config key in the {{ChunkManagerFactory.java.}}
>  
> {code}
> boolean scrubber = config.getBoolean(
>  "hdds.containerscrub.enabled",
>  false);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-09-08 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: HDFS-14646.004.patch
Status: Patch Available  (was: Open)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.004.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-09-08 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Status: Open  (was: Patch Available)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN does not reply an HTTP_OK, it means the local SNN 
> should not

[jira] [Commented] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)

2019-09-08 Thread He Xiaoqiao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925322#comment-16925322
 ] 

He Xiaoqiao commented on HDFS-14771:


Hi [~jojochuang],[~sodonnell] is this ready to backport to branch-2?

> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> 
>
> Key: HDFS-14771
> URL: https://issues.apache.org/jira/browse/HDFS-14771
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>  Labels: release-blocker
> Attachments: HDFS-14771.branch-2.001.patch, 
> HDFS-14771.branch-2.002.patch, HDFS-14771.branch-2.003.patch
>
>
> This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by 
> writing sub-sections to the fsimage index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14810) review FSNameSystem editlog sync

2019-09-08 Thread He Xiaoqiao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925320#comment-16925320
 ] 

He Xiaoqiao commented on HDFS-14810:


Hi [~jojochuang],[~ayushtkn] any update for this improvement?

> review FSNameSystem editlog sync
> 
>
> Key: HDFS-14810
> URL: https://issues.apache.org/jira/browse/HDFS-14810
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-14810.001.patch, HDFS-14810.002.patch, 
> HDFS-14810.003.patch, HDFS-14810.004.patch
>
>
> refactor and unified type of edit log sync in FSNamesystem as HDFS-11246 
> mentioned.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14340) Lower the log level when can't get postOpAttr

2019-09-08 Thread Rohith Sharma K S (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925315#comment-16925315
 ] 

Rohith Sharma K S commented on HDFS-14340:
--

Removed Flag which was set as *Important*!

> Lower the log level when can't get postOpAttr
> -
>
> Key: HDFS-14340
> URL: https://issues.apache.org/jira/browse/HDFS-14340
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Reporter: Anuhan Torgonshar
>Assignee: Anuhan Torgonshar
>Priority: Minor
>  Labels: easyfix
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14340.trunk.patch
>
>
> I think should lower the log level when can't get postOpAttr in 
> _*hadoop-2.8.5-src/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/**RpcProgramNfs3.java*_.
>  
>  
> {code:java}
> **The fisrt code snippet***
> //the problematic log level ERROR, at line 1044
> try {
>dirWcc = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(preOpDirAttr),
>dfsClient, dirFileIdPath, iug);
> } catch (IOException e1) {
>LOG.error("Can't get postOpDirAttr for dirFileId: "
>+ dirHandle.getFileId(), e1);
> }
> **The second code snippet***
> //other practice in similar code snippets, line number is 475, the log 
> assigned with INFO level
> try { 
>wccData = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(preOpAttr), 
> dfsClient,   fileIdPath, iug); 
> } catch (IOException e1) { 
>LOG.info("Can't get postOpAttr for fileIdPath: " + fileIdPath, e1); 
> }
> **The third code snippet***
> //other practice in similar code snippets, line number is 1405, the log 
> assigned with INFO level
> try {
>fromDirWcc = Nfs3Utils.createWccData(
>Nfs3Utils.getWccAttr(fromPreOpAttr), dfsClient, fromDirFileIdPath,iug);
>toDirWcc = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(toPreOpAttr),
>dfsClient, toDirFileIdPath, iug);
> } catch (IOException e1) {
>LOG.info("Can't get postOpDirAttr for " + fromDirFileIdPath + " or"
>+ toDirFileIdPath, e1);
> }
> {code}
> Therefore, I think the logging practices should be consistent in similar 
> contexts. When the code catches _*IOException*_ for *_getWccAttr()_* method, 
> it more likely prints a log message with _*INFO*_ level, a lower level.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14340) Lower the log level when can't get postOpAttr

2019-09-08 Thread Rohith Sharma K S (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated HDFS-14340:
-
Hadoop Flags: Reviewed

> Lower the log level when can't get postOpAttr
> -
>
> Key: HDFS-14340
> URL: https://issues.apache.org/jira/browse/HDFS-14340
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Reporter: Anuhan Torgonshar
>Assignee: Anuhan Torgonshar
>Priority: Minor
>  Labels: easyfix
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14340.trunk.patch
>
>
> I think should lower the log level when can't get postOpAttr in 
> _*hadoop-2.8.5-src/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/**RpcProgramNfs3.java*_.
>  
>  
> {code:java}
> **The fisrt code snippet***
> //the problematic log level ERROR, at line 1044
> try {
>dirWcc = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(preOpDirAttr),
>dfsClient, dirFileIdPath, iug);
> } catch (IOException e1) {
>LOG.error("Can't get postOpDirAttr for dirFileId: "
>+ dirHandle.getFileId(), e1);
> }
> **The second code snippet***
> //other practice in similar code snippets, line number is 475, the log 
> assigned with INFO level
> try { 
>wccData = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(preOpAttr), 
> dfsClient,   fileIdPath, iug); 
> } catch (IOException e1) { 
>LOG.info("Can't get postOpAttr for fileIdPath: " + fileIdPath, e1); 
> }
> **The third code snippet***
> //other practice in similar code snippets, line number is 1405, the log 
> assigned with INFO level
> try {
>fromDirWcc = Nfs3Utils.createWccData(
>Nfs3Utils.getWccAttr(fromPreOpAttr), dfsClient, fromDirFileIdPath,iug);
>toDirWcc = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(toPreOpAttr),
>dfsClient, toDirFileIdPath, iug);
> } catch (IOException e1) {
>LOG.info("Can't get postOpDirAttr for " + fromDirFileIdPath + " or"
>+ toDirFileIdPath, e1);
> }
> {code}
> Therefore, I think the logging practices should be consistent in similar 
> contexts. When the code catches _*IOException*_ for *_getWccAttr()_* method, 
> it more likely prints a log message with _*INFO*_ level, a lower level.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14340) Lower the log level when can't get postOpAttr

2019-09-08 Thread Rohith Sharma K S (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated HDFS-14340:
-
Flags:   (was: Important)

> Lower the log level when can't get postOpAttr
> -
>
> Key: HDFS-14340
> URL: https://issues.apache.org/jira/browse/HDFS-14340
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Reporter: Anuhan Torgonshar
>Assignee: Anuhan Torgonshar
>Priority: Minor
>  Labels: easyfix
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14340.trunk.patch
>
>
> I think should lower the log level when can't get postOpAttr in 
> _*hadoop-2.8.5-src/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/**RpcProgramNfs3.java*_.
>  
>  
> {code:java}
> **The fisrt code snippet***
> //the problematic log level ERROR, at line 1044
> try {
>dirWcc = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(preOpDirAttr),
>dfsClient, dirFileIdPath, iug);
> } catch (IOException e1) {
>LOG.error("Can't get postOpDirAttr for dirFileId: "
>+ dirHandle.getFileId(), e1);
> }
> **The second code snippet***
> //other practice in similar code snippets, line number is 475, the log 
> assigned with INFO level
> try { 
>wccData = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(preOpAttr), 
> dfsClient,   fileIdPath, iug); 
> } catch (IOException e1) { 
>LOG.info("Can't get postOpAttr for fileIdPath: " + fileIdPath, e1); 
> }
> **The third code snippet***
> //other practice in similar code snippets, line number is 1405, the log 
> assigned with INFO level
> try {
>fromDirWcc = Nfs3Utils.createWccData(
>Nfs3Utils.getWccAttr(fromPreOpAttr), dfsClient, fromDirFileIdPath,iug);
>toDirWcc = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(toPreOpAttr),
>dfsClient, toDirFileIdPath, iug);
> } catch (IOException e1) {
>LOG.info("Can't get postOpDirAttr for " + fromDirFileIdPath + " or"
>+ toDirFileIdPath, e1);
> }
> {code}
> Therefore, I think the logging practices should be consistent in similar 
> contexts. When the code catches _*IOException*_ for *_getWccAttr()_* method, 
> it more likely prints a log message with _*INFO*_ level, a lower level.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-08 Thread guojh (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925298#comment-16925298
 ] 

guojh edited comment on HDFS-14768 at 9/9/19 2:16 AM:
--

[~surendrasingh] Thanks for you replay. you need run the code below, and you 
need check the block index 6 that recontruct on local path like 
'./hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/*/current/BP-xxx/current/finalized/*/*/'
{code:java}
// code placeholder
public void testFileDecommission() throws Exception { 
  LOG.info("Starting test testFileDecommission"); 
  final Path ecFile = new Path(ecDir, "testFileDecommission");
  int writeBytes = cellSize * dataBlocks;
  writeStripedFile(dfs, ecFile, writeBytes); Assert.assertEquals(0, 
bm.numOfUnderReplicatedBlocks());
  FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); 
  LocatedBlocks locatedBlocks = StripedFileTestUtil.getLocatedBlocks(ecFile, 
dfs);
  LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) 
.get(0);DatanodeInfo[] dnLocs = lb.getLocations();
  LocatedStripedBlock lastBlock = 
(LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
  DatanodeInfo[] storageInfos = lastBlock.getLocations(); 
  DatanodeDescriptor datanodeDescriptor = cluster.getNameNode().getNamesystem() 
.getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
  for (int i = 0; i < 100; i++) {
datanodeDescriptor.incrementPendingReplicationWithoutTargets();
  } 
  assertEquals(dataBlocks + parityBlocks, dnLocs.length);
 int[] decommNodeIndex = {3, 4};
 final List decommisionNodes = new ArrayList decommisionNodes = new 
ArrayList In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for

[jira] [Comment Edited] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-08 Thread guojh (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925298#comment-16925298
 ] 

guojh edited comment on HDFS-14768 at 9/9/19 2:14 AM:
--

[~surendrasingh] Thanks for you replay. you need run the code below, and you 
need check the block index 6 that recontruct on local path like 
'./hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/*/current/BP-xxx/current/finalized/*/*/'
{code:java}
// code placeholder
public void testFileDecommission() throws Exception { LOG.info("Starting test 
testFileDecommission"); final Path ecFile = new Path(ecDir, 
"testFileDecommission"); int writeBytes = cellSize * dataBlocks; 
writeStripedFile(dfs, ecFile, writeBytes); Assert.assertEquals(0, 
bm.numOfUnderReplicatedBlocks()); FileChecksum fileChecksum1 = 
dfs.getFileChecksum(ecFile, writeBytes); LocatedBlocks locatedBlocks = 
StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); LocatedBlock lb = 
dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) .get(0); DatanodeInfo[] 
dnLocs = lb.getLocations(); LocatedStripedBlock lastBlock = 
(LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); DatanodeInfo[] 
storageInfos = lastBlock.getLocations();  DatanodeDescriptor datanodeDescriptor 
= cluster.getNameNode().getNamesystem() 
.getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
 for (int i = 0; i < 100; i++) { 
datanodeDescriptor.incrementPendingReplicationWithoutTargets(); } 
assertEquals(dataBlocks + parityBlocks, dnLocs.length); int[] decommNodeIndex = 
{3, 4}; final List decommisionNodes = new 
ArrayList In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
>

[jira] [Comment Edited] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-08 Thread guojh (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925298#comment-16925298
 ] 

guojh edited comment on HDFS-14768 at 9/9/19 2:12 AM:
--

[~surendrasingh] Thanks for you replay. you need run the code below, and you 
need check the block index 6 that recontruct on local path like 
'./hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/*/current/BP-xxx/current/finalized/*/*/'
{code:java}
// code placeholder
public void testFileDecommission() throws Exception {
  LOG.info("Starting test testFileDecommission");
  final Path ecFile = new Path(ecDir, "testFileDecommission");
  int writeBytes = cellSize * dataBlocks; 
  writeStripedFile(dfs, ecFile, writeBytes);
  Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
  FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); 
  LocatedBlocks locatedBlocks = StripedFileTestUtil.getLocatedBlocks(ecFile, 
dfs);
  LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) 
.get(0); DatanodeInfo[] dnLocs = lb.getLocations();
  LocatedStripedBlock lastBlock = 
(LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); DatanodeInfo[] 
storageInfos = lastBlock.getLocations(); 
 DatanodeDescriptor datanodeDescriptor = cluster.getNameNode().getNamesystem() 
.getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
 for (int i = 0; i < 100; i++) { 
datanodeDescriptor.incrementPendingReplicationWithoutTargets(); } 
assertEquals(dataBlocks + parityBlocks, dnLocs.length); int[] decommNodeIndex = 
{3, 4}; final List decommisionNodes = new 
ArrayList(); // add the node which will be decommissioning  
decommisionNodes.add(dnLocs[decommNodeIndex[0]]); 
decommisionNodes.add(dnLocs[decommNodeIndex[1]]); decommissionNode(0, 
decommisionNodes, AdminStates.DECOMMISSIONED); 
assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); 
//assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));  // Ensure 
decommissioned datanode is not automatically shutdown  DFSClient client = 
getDfsClient(cluster.getNameNode(0), conf); assertEquals("All datanodes must be 
alive", numDNs, client.datanodeReport(DatanodeReportType.LIVE).length); 
FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); 
Assert.assertTrue("Checksum mismatches!", fileChecksum1.equals(fileChecksum2)); 
StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, null, 
blockGroupSize); } 
{code}
 


was (Author: gjhkael):
[~surendrasingh] Thanks for you replay. you need run the code below, and you 
need check the block index 6 that recontruct on local path like 
'./hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/*/current/BP-xxx/current/finalized/*/*/'
{code:java}
// code placeholder
public void testFileDecommission() throws Exception { LOG.info("Starting test 
testFileDecommission"); final Path ecFile = new Path(ecDir, 
"testFileDecommission"); int writeBytes = cellSize * dataBlocks; 
writeStripedFile(dfs, ecFile, writeBytes); Assert.assertEquals(0, 
bm.numOfUnderReplicatedBlocks()); FileChecksum fileChecksum1 = 
dfs.getFileChecksum(ecFile, writeBytes); LocatedBlocks locatedBlocks = 
StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); LocatedBlock lb = 
dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) .get(0); DatanodeInfo[] 
dnLocs = lb.getLocations(); LocatedStripedBlock lastBlock = 
(LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); DatanodeInfo[] 
storageInfos = lastBlock.getLocations(); //  DatanodeDescriptor 
datanodeDescriptor = cluster.getNameNode().getNamesystem() 
.getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
 for (int i = 0; i < 100; i++) { 
datanodeDescriptor.incrementPendingReplicationWithoutTargets(); } 
assertEquals(dataBlocks + parityBlocks, dnLocs.length); int[] decommNodeIndex = 
{3, 4}; final List decommisionNodes = new 
ArrayList(); // add the node which will be decommissioning  
decommisionNodes.add(dnLocs[decommNodeIndex[0]]); 
decommisionNodes.add(dnLocs[decommNodeIndex[1]]); decommissionNode(0, 
decommisionNodes, AdminStates.DECOMMISSIONED); 
assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); 
//assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));  // Ensure 
decommissioned datanode is not automatically shutdown  DFSClient client = 
getDfsClient(cluster.getNameNode(0), conf); assertEquals("All datanodes must be 
alive", numDNs, client.datanodeReport(DatanodeReportType.LIVE).length); 
FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); 
Assert.assertTrue("Checksum mismatches!", fileChecksum1.equals(fileChecksum2)); 
StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, null, 
blockGroupSize); } 
{code}
 

> In some cases, erasure blocks are corruption  when they are reconstruct.
>

[jira] [Comment Edited] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-08 Thread guojh (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925298#comment-16925298
 ] 

guojh edited comment on HDFS-14768 at 9/9/19 2:12 AM:
--

[~surendrasingh] Thanks for you replay. you need run the code below, and you 
need check the block index 6 that recontruct on local path like 
'./hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/*/current/BP-xxx/current/finalized/*/*/'
{code:java}
// code placeholder

{code}
 


was (Author: gjhkael):
[~surendrasingh] Thanks for you replay. you need run the code below, and you 
need check the block index 6 that recontruct on local path like 
'./hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/*/current/BP-xxx/current/finalized/*/*/'
{code:java}
// code placeholder
public void testFileDecommission() throws Exception {
  LOG.info("Starting test testFileDecommission");
  final Path ecFile = new Path(ecDir, "testFileDecommission");
  int writeBytes = cellSize * dataBlocks; 
  writeStripedFile(dfs, ecFile, writeBytes);
  Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
  FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); 
  LocatedBlocks locatedBlocks = StripedFileTestUtil.getLocatedBlocks(ecFile, 
dfs);
  LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) 
.get(0); DatanodeInfo[] dnLocs = lb.getLocations();
  LocatedStripedBlock lastBlock = 
(LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); DatanodeInfo[] 
storageInfos = lastBlock.getLocations(); 
 DatanodeDescriptor datanodeDescriptor = cluster.getNameNode().getNamesystem() 
.getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
 for (int i = 0; i < 100; i++) { 
datanodeDescriptor.incrementPendingReplicationWithoutTargets(); } 
assertEquals(dataBlocks + parityBlocks, dnLocs.length); int[] decommNodeIndex = 
{3, 4}; final List decommisionNodes = new 
ArrayList(); // add the node which will be decommissioning  
decommisionNodes.add(dnLocs[decommNodeIndex[0]]); 
decommisionNodes.add(dnLocs[decommNodeIndex[1]]); decommissionNode(0, 
decommisionNodes, AdminStates.DECOMMISSIONED); 
assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); 
//assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));  // Ensure 
decommissioned datanode is not automatically shutdown  DFSClient client = 
getDfsClient(cluster.getNameNode(0), conf); assertEquals("All datanodes must be 
alive", numDNs, client.datanodeReport(DatanodeReportType.LIVE).length); 
FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); 
Assert.assertTrue("Checksum mismatches!", fileChecksum1.equals(fileChecksum2)); 
StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, null, 
blockGroupSize); } 
{code}
 

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal

[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-08 Thread guojh (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925298#comment-16925298
 ] 

guojh commented on HDFS-14768:
--

[~surendrasingh] Thanks for you replay. you need run the code below, and you 
need check the block index 6 that recontruct on local path like 
'./hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/*/current/BP-xxx/current/finalized/*/*/'
{code:java}
// code placeholder
public void testFileDecommission() throws Exception { LOG.info("Starting test 
testFileDecommission"); final Path ecFile = new Path(ecDir, 
"testFileDecommission"); int writeBytes = cellSize * dataBlocks; 
writeStripedFile(dfs, ecFile, writeBytes); Assert.assertEquals(0, 
bm.numOfUnderReplicatedBlocks()); FileChecksum fileChecksum1 = 
dfs.getFileChecksum(ecFile, writeBytes); LocatedBlocks locatedBlocks = 
StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); LocatedBlock lb = 
dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) .get(0); DatanodeInfo[] 
dnLocs = lb.getLocations(); LocatedStripedBlock lastBlock = 
(LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); DatanodeInfo[] 
storageInfos = lastBlock.getLocations(); //  DatanodeDescriptor 
datanodeDescriptor = cluster.getNameNode().getNamesystem() 
.getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
 for (int i = 0; i < 100; i++) { 
datanodeDescriptor.incrementPendingReplicationWithoutTargets(); } 
assertEquals(dataBlocks + parityBlocks, dnLocs.length); int[] decommNodeIndex = 
{3, 4}; final List decommisionNodes = new 
ArrayList(); // add the node which will be decommissioning  
decommisionNodes.add(dnLocs[decommNodeIndex[0]]); 
decommisionNodes.add(dnLocs[decommNodeIndex[1]]); decommissionNode(0, 
decommisionNodes, AdminStates.DECOMMISSIONED); 
assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); 
//assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));  // Ensure 
decommissioned datanode is not automatically shutdown  DFSClient client = 
getDfsClient(cluster.getNameNode(0), conf); assertEquals("All datanodes must be 
alive", numDNs, client.datanodeReport(DatanodeReportType.LIVE).length); 
FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); 
Assert.assertTrue("Checksum mismatches!", fileChecksum1.equals(fileChecksum2)); 
StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, null, 
blockGroupSize); } 
{code}
 

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs,

[jira] [Commented] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable

2019-09-08 Thread Wei-Chiu Chuang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925291#comment-16925291
 ] 

Wei-Chiu Chuang commented on HDFS-14831:


The same fix is in branch-2.7 (not in the last release 2.7.7). Had 2.7 had 
another release (2.7.8), it would be possible to resolve this problem by 
downgrading to 2.7.8. But 2.7 is voted End of Life by the community so I doubt 
this could happen.

 

Similarly, downgrade could fail from 3.2.0 to 2.8.0 ~ 2.8.4, or from 3.2.0 to 
2.9.0 ~ 2.9.1, in which case, a user should downgrade to 2.8.5 and 2.9.2 
respectively (both releases have the fsimage corruption fix). I would suggest 
to update the Upgrade/Downgrade user doc, offer a recommended downgrade path, 
instead of fixing it.

> Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable 
> ---
>
> Key: HDFS-14831
> URL: https://issues.apache.org/jira/browse/HDFS-14831
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.3.0, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
>
> Mentioned on HDFS-13596
> Incompatible StringTable changes cause downgrade from 3.2.0 to 2.7.2 failed
> commit message as follow, but issue not found
> {quote}
> commit 8a41edb089fbdedc5e7d9a2aeec63d126afea49f
> Author: Vinayakumar B 
> Date:   Mon Oct 15 15:48:26 2018 +0530
> Fix potential FSImage corruption. Contributed by Daryn Sharp.
> {quote} 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14340) Lower the log level when can't get postOpAttr

2019-09-08 Thread Anuhan Torgonshar (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuhan Torgonshar updated HDFS-14340:
-
Priority: Minor  (was: Major)

> Lower the log level when can't get postOpAttr
> -
>
> Key: HDFS-14340
> URL: https://issues.apache.org/jira/browse/HDFS-14340
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Reporter: Anuhan Torgonshar
>Assignee: Anuhan Torgonshar
>Priority: Minor
>  Labels: easyfix
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14340.trunk.patch
>
>
> I think should lower the log level when can't get postOpAttr in 
> _*hadoop-2.8.5-src/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/**RpcProgramNfs3.java*_.
>  
>  
> {code:java}
> **The fisrt code snippet***
> //the problematic log level ERROR, at line 1044
> try {
>dirWcc = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(preOpDirAttr),
>dfsClient, dirFileIdPath, iug);
> } catch (IOException e1) {
>LOG.error("Can't get postOpDirAttr for dirFileId: "
>+ dirHandle.getFileId(), e1);
> }
> **The second code snippet***
> //other practice in similar code snippets, line number is 475, the log 
> assigned with INFO level
> try { 
>wccData = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(preOpAttr), 
> dfsClient,   fileIdPath, iug); 
> } catch (IOException e1) { 
>LOG.info("Can't get postOpAttr for fileIdPath: " + fileIdPath, e1); 
> }
> **The third code snippet***
> //other practice in similar code snippets, line number is 1405, the log 
> assigned with INFO level
> try {
>fromDirWcc = Nfs3Utils.createWccData(
>Nfs3Utils.getWccAttr(fromPreOpAttr), dfsClient, fromDirFileIdPath,iug);
>toDirWcc = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(toPreOpAttr),
>dfsClient, toDirFileIdPath, iug);
> } catch (IOException e1) {
>LOG.info("Can't get postOpDirAttr for " + fromDirFileIdPath + " or"
>+ toDirFileIdPath, e1);
> }
> {code}
> Therefore, I think the logging practices should be consistent in similar 
> contexts. When the code catches _*IOException*_ for *_getWccAttr()_* method, 
> it more likely prints a log message with _*INFO*_ level, a lower level.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-08 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925231#comment-16925231
 ] 

Surendra Singh Lilhore commented on HDFS-14768:
---

[~gjhkael], Thanks for patch.

I feel HDFS-14699 solve your issue. HDFS-14699 also trying to solve busy DN 
issue. Once all the comment addressed in HDFS-14699, you can try that patch. I 
run your test case, it is not reproducing this issue.

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2019-09-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925230#comment-16925230
 ] 

Hadoop QA commented on HDFS-14795:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
52s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 49s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 662 unchanged - 0 fixed = 664 total (was 662) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 32s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}158m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14795 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979779/HDFS-14795.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 90d62149b79d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ca32917 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27818/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt

[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925223#comment-16925223
 ] 

Hadoop QA commented on HDFS-14609:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 17s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 
38s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14609 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979780/HDFS-14609.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3ae1683c9834 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ca32917 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27819/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27819/testReport/ |
| Max. process+thread count | 1606 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27819/console |
| Powered by | Apache Yetus 0.8.0

[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-08 Thread Chen Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925206#comment-16925206
 ] 

Chen Zhang commented on HDFS-14609:
---

Update patch v5.

> RBF: Security should use common AuthenticationFilter
> 
>
> Key: HDFS-14609
> URL: https://issues.apache.org/jira/browse/HDFS-14609
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, 
> HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch
>
>
> We worked on router based federation security as part of HDFS-13532. We kept 
> it compatible with the way namenode works. However with HADOOP-16314 and 
> HDFS-16354 in trunk, auth filters seems to have been changed causing tests to 
> fail.
> Changes are needed appropriately in RBF, mainly fixing broken tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-08 Thread Chen Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-14609:
--
Attachment: HDFS-14609.005.patch

> RBF: Security should use common AuthenticationFilter
> 
>
> Key: HDFS-14609
> URL: https://issues.apache.org/jira/browse/HDFS-14609
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, 
> HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch
>
>
> We worked on router based federation security as part of HDFS-13532. We kept 
> it compatible with the way namenode works. However with HADOOP-16314 and 
> HDFS-16354 in trunk, auth filters seems to have been changed causing tests to 
> fail.
> Changes are needed appropriately in RBF, mainly fixing broken tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky

2019-09-08 Thread Chen Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925202#comment-16925202
 ] 

Chen Zhang commented on HDFS-14811:
---

Hi [~ayushtkn], I've gone through the discussion in HDFS-12288, the latest 
conclusion is to modify getXceiverCount() method to return real number of 
DataXceiver threads (current is much more than the real number), but the load 
of each DN is still not changed (using the activeNumberOfThread instead), so 
when a DN start writing a block, the load would still be 3, which makes it 
overloaded.

My initially idea is quite same as [~lukmajercak] mentioned at HDFS-12288: do 
not consider packetResponder thread when calculating DN's load. But this 
solution looks not a good choice.

> RBF: TestRouterRpc#testErasureCoding is flaky
> -
>
> Key: HDFS-14811
> URL: https://issues.apache.org/jira/browse/HDFS-14811
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch
>
>
> The Failed reason:
> {code:java}
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [
> Node /default-rack/127.0.0.1:53148 [
> ]
> Node /default-rack/127.0.0.1:53161 [
> ]
> Node /default-rack/127.0.0.1:53157 [
>   Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 
> > 2.6665).
> Node /default-rack/127.0.0.1:53143 [
> ]
> Node /default-rack/127.0.0.1:53165 [
> ]
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas 
> was chosen. Reason: {NODE_TOO_BUSY=1}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) 
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) 
> - Failed to place enough replicas: expected size is 1 but only 0 storage 
> types can be selected (replication=6, selected=[], unavailable=[DISK], 
> removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All 
> required storage types are unavailable:  unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO  
> ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default 
> port 53140, call Call#1270 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202
> java.io.IOException: File /testec/testfile2 could only be written to 5 of the 
> 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 
> node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
>   at

[jira] [Updated] (HDFS-14795) Add Throttler for writing block

2019-09-08 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14795:
---
Attachment: HDFS-14795.004.patch

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2019-09-08 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925199#comment-16925199
 ] 

Lisheng Sun commented on HDFS-14795:


Fixed checkstyle and uploaded the v004 patch.

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky

2019-09-08 Thread Chen Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924765#comment-16924765
 ] 

Chen Zhang edited comment on HDFS-14811 at 9/8/19 2:32 PM:
---

Uploaded patch v2 to disable considerLoad option. I've run the whole class 
test(using {{mvn -Dtest=TestRouterRpc test}}) 50 times in local, all of them 
passed after patch.

I've filed another Jira HDFS-14830 to track the xceiverCount problem.


was (Author: zhangchen):
Uploaded patch v2 to disable considerLoad option. I've run the whole class 
test(using {{mvn -Dtest=TestRouterRpc test}}) 50 times in local, all of them 
passed after patch.

I've filed another Jira HDFS-14803 to track the xceiverCount problem.

> RBF: TestRouterRpc#testErasureCoding is flaky
> -
>
> Key: HDFS-14811
> URL: https://issues.apache.org/jira/browse/HDFS-14811
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch
>
>
> The Failed reason:
> {code:java}
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [
> Node /default-rack/127.0.0.1:53148 [
> ]
> Node /default-rack/127.0.0.1:53161 [
> ]
> Node /default-rack/127.0.0.1:53157 [
>   Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 
> > 2.6665).
> Node /default-rack/127.0.0.1:53143 [
> ]
> Node /default-rack/127.0.0.1:53165 [
> ]
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas 
> was chosen. Reason: {NODE_TOO_BUSY=1}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) 
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) 
> - Failed to place enough replicas: expected size is 1 but only 0 storage 
> types can be selected (replication=6, selected=[], unavailable=[DISK], 
> removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All 
> required storage types are unavailable:  unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO  
> ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default 
> port 53140, call Call#1270 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202
> java.io.IOException: File /testec/testfile2 could only be written to 5 of the 
> 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 
> node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
>   at

[jira] [Commented] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky

2019-09-08 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925184#comment-16925184
 ] 

Ayush Saxena commented on HDFS-14811:
-

Thanx [~zhangchen] for the patch.  DataXceiver problem seems to be known now, 
but it seems little stuck.
[~elgoiri] do you have any opinion here, Should we rely on fixing that problem 
only, or go ahead ignoring the problem here and make the test healthy? 


> RBF: TestRouterRpc#testErasureCoding is flaky
> -
>
> Key: HDFS-14811
> URL: https://issues.apache.org/jira/browse/HDFS-14811
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch
>
>
> The Failed reason:
> {code:java}
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [
> Node /default-rack/127.0.0.1:53148 [
> ]
> Node /default-rack/127.0.0.1:53161 [
> ]
> Node /default-rack/127.0.0.1:53157 [
>   Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 
> > 2.6665).
> Node /default-rack/127.0.0.1:53143 [
> ]
> Node /default-rack/127.0.0.1:53165 [
> ]
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas 
> was chosen. Reason: {NODE_TOO_BUSY=1}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) 
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) 
> - Failed to place enough replicas: expected size is 1 but only 0 storage 
> types can be selected (replication=6, selected=[], unavailable=[DISK], 
> removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All 
> required storage types are unavailable:  unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO  
> ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default 
> port 53140, call Call#1270 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202
> java.io.IOException: File /testec/testfile2 could only be written to 5 of the 
> 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 
> node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>   at

[jira] [Commented] (HDFS-14798) Synchronize invalidateBlocks in DatanodeDescriptor

2019-09-08 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925178#comment-16925178
 ] 

Ayush Saxena commented on HDFS-14798:
-

If [~surendrasingh] is in favour. I would respect his opinion. 
 Go ahead [~surendrasingh]. You may proceed with the commit. 
+0 from my side.

> Synchronize invalidateBlocks in DatanodeDescriptor
> --
>
> Key: HDFS-14798
> URL: https://issues.apache.org/jira/browse/HDFS-14798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: n00b, newbie
> Attachments: HDFS-14798.001.patch
>
>
> {code:java|title=DatanodeDescriptor.java}
> public void resetBlocks() {
>   ...
>   this.invalidateBlocks.clear();
>   ...
> }
> public void clearBlockQueues() {
>   synchronized (invalidateBlocks) {
> this.invalidateBlocks.clear();
>   }
>   ...
> }
> {code}
> It may not be strictly necessary, but why risk it? The invalidateBlocks 
> should be protected in {{resetBlocks()}} just like it is in 
> {{clearBlockQueues()}}/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14833) RBF: Router Update Doesn't Sync Quota

2019-09-08 Thread Ayush Saxena (Jira)

Ayush Saxena created HDFS-14833:
---

 Summary: RBF: Router Update Doesn't Sync Quota
 Key: HDFS-14833
 URL: https://issues.apache.org/jira/browse/HDFS-14833
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ayush Saxena
Assignee: Ayush Saxena


HDFS-14777 Added a check to prevent RPC call, It checks whether in the present 
state whether quota is changing. 
But ignores the part that if the locations are changed. if the location is 
changed the new destination should be synchronized with the mount entry quota. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2076) Read fails because the block cannot be located in the container

2019-09-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2076?focusedWorklogId=308505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308505
 ]

ASF GitHub Bot logged work on HDDS-2076:


Author: ASF GitHub Bot
Created on: 08/Sep/19 14:00
Start Date: 08/Sep/19 14:00
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #1410: 
HDDS-2076. Read fails because the block cannot be located in the container
URL: https://github.com/apache/hadoop/pull/1410#discussion_r322011377
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/TarContainerPacker.java
 ##
 @@ -235,6 +238,16 @@ private void includePath(String containerPath, String 
subdir,
 }
   }
 
+  private void includeBCSID(ArchiveOutputStream archiveOutputStream, long 
bcsID)
 
 Review comment:
   `includeBCSID` is not used anywhere
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308505)
Time Spent: 40m  (was: 0.5h)

> Read fails because the block cannot be located in the container
> ---
>
> Key: HDDS-2076
> URL: https://issues.apache.org/jira/browse/HDDS-2076
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Attachments: log.zip
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Read fails as the client is not able to read the block from the container.
> {code}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Unable to find the block with bcsID 2515 .Container 7 bcsId is 0.
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambd2a0$getValid1a9to-08-30
>  12:51:20,081 | INFO  | SCMAudit | user=msingh | ip=192.168.0.r103 
> |List$0(ContainerP
> rotocolCalls.java:569)
> {code}
> The client eventually exits here
> {code}
> 2019-08-30 12:51:20,081 [pool-224-thread-6] ERROR 
> ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:readData(176)) - 
> LOADGEN: Read key:pool-224-thread-6_330651 failed with ex
> ception
> ERROR ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:load(121)) - 
> LOADGEN: Exiting due to exception
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2076) Read fails because the block cannot be located in the container

2019-09-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2076?focusedWorklogId=308506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308506
 ]

ASF GitHub Bot logged work on HDDS-2076:


Author: ASF GitHub Bot
Created on: 08/Sep/19 14:00
Start Date: 08/Sep/19 14:00
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #1410: 
HDDS-2076. Read fails because the block cannot be located in the container
URL: https://github.com/apache/hadoop/pull/1410#discussion_r322011257
 
 

 ##
 File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerReplication.java
 ##
 @@ -0,0 +1,199 @@
+package org.apache.hadoop.ozone.client.rpc;
 
 Review comment:
   License is missing
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308506)
Time Spent: 40m  (was: 0.5h)

> Read fails because the block cannot be located in the container
> ---
>
> Key: HDDS-2076
> URL: https://issues.apache.org/jira/browse/HDDS-2076
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Attachments: log.zip
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Read fails as the client is not able to read the block from the container.
> {code}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Unable to find the block with bcsID 2515 .Container 7 bcsId is 0.
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambd2a0$getValid1a9to-08-30
>  12:51:20,081 | INFO  | SCMAudit | user=msingh | ip=192.168.0.r103 
> |List$0(ContainerP
> rotocolCalls.java:569)
> {code}
> The client eventually exits here
> {code}
> 2019-08-30 12:51:20,081 [pool-224-thread-6] ERROR 
> ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:readData(176)) - 
> LOADGEN: Read key:pool-224-thread-6_330651 failed with ex
> ception
> ERROR ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:load(121)) - 
> LOADGEN: Exiting due to exception
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2076) Read fails because the block cannot be located in the container

2019-09-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2076?focusedWorklogId=308504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308504
 ]

ASF GitHub Bot logged work on HDDS-2076:


Author: ASF GitHub Bot
Created on: 08/Sep/19 14:00
Start Date: 08/Sep/19 14:00
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #1410: 
HDDS-2076. Read fails because the block cannot be located in the container
URL: https://github.com/apache/hadoop/pull/1410#discussion_r322011275
 
 

 ##
 File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerReplication.java
 ##
 @@ -0,0 +1,199 @@
+package org.apache.hadoop.ozone.client.rpc;
+
+import org.apache.hadoop.hdds.client.BlockID;
 
 Review comment:
   Unused import
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308504)
Time Spent: 0.5h  (was: 20m)

> Read fails because the block cannot be located in the container
> ---
>
> Key: HDDS-2076
> URL: https://issues.apache.org/jira/browse/HDDS-2076
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Attachments: log.zip
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Read fails as the client is not able to read the block from the container.
> {code}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Unable to find the block with bcsID 2515 .Container 7 bcsId is 0.
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambd2a0$getValid1a9to-08-30
>  12:51:20,081 | INFO  | SCMAudit | user=msingh | ip=192.168.0.r103 
> |List$0(ContainerP
> rotocolCalls.java:569)
> {code}
> The client eventually exits here
> {code}
> 2019-08-30 12:51:20,081 [pool-224-thread-6] ERROR 
> ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:readData(176)) - 
> LOADGEN: Read key:pool-224-thread-6_330651 failed with ex
> ception
> ERROR ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:load(121)) - 
> LOADGEN: Exiting due to exception
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13157) Do Not Remove Blocks Sequentially During Decommission

2019-09-08 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925176#comment-16925176
 ] 

David Mollitor commented on HDFS-13157:
---

One other thing here I don't quite understand

How is it handled, iterating through each DataNode, that a block is scheduled 
to be replicated onto a DataNode that will be decommissioned further down in 
the list?  i.e. Node A replicates to Node B, then Node B is decommissioned, so 
that same block is now replicated onto Node C, and so on...

> Do Not Remove Blocks Sequentially During Decommission 
> --
>
> Key: HDFS-13157
> URL: https://issues.apache.org/jira/browse/HDFS-13157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-13157.1.patch
>
>
> From what I understand of [DataNode 
> decommissioning|https://github.com/apache/hadoop/blob/42a1c98597e6dba2e371510a6b2b6b1fb94e4090/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java]
>  it appears that all the blocks are scheduled for removal _in order._. I'm 
> not 100% sure what the ordering is exactly, but I think it loops through each 
> data volume and schedules each block to be replicated elsewhere. The net 
> affect is that during a decommission, all of the DataNode transfer threads 
> slam on a single volume until it is cleaned out. At which point, they all 
> slam on the next volume, etc.
> Please randomize the block list so that there is a more even distribution 
> across all volumes when decommissioning a node.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14798) Synchronize invalidateBlocks in DatanodeDescriptor

2019-09-08 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925175#comment-16925175
 ] 

Surendra Singh Lilhore commented on HDFS-14798:
---

Agree with [~belugabehr] , we shouldn't wait to fail it in production and then 
fix it.

> Synchronize invalidateBlocks in DatanodeDescriptor
> --
>
> Key: HDFS-14798
> URL: https://issues.apache.org/jira/browse/HDFS-14798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: n00b, newbie
> Attachments: HDFS-14798.001.patch
>
>
> {code:java|title=DatanodeDescriptor.java}
> public void resetBlocks() {
>   ...
>   this.invalidateBlocks.clear();
>   ...
> }
> public void clearBlockQueues() {
>   synchronized (invalidateBlocks) {
> this.invalidateBlocks.clear();
>   }
>   ...
> }
> {code}
> It may not be strictly necessary, but why risk it? The invalidateBlocks 
> should be protected in {{resetBlocks()}} just like it is in 
> {{clearBlockQueues()}}/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-13157) Do Not Remove Blocks Sequentially During Decommission

2019-09-08 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925165#comment-16925165
 ] 

David Mollitor edited comment on HDFS-13157 at 9/8/19 1:01 PM:
---

Thank you all for the great investigatory work.

One question I have is in regards to:

bq. dropped until all other blocks have been tried and the iterator cycles 
round.

On the second run through the Iterator, shouldn't there be many fewer blocks in 
the list since many were successfully replicated away?


Regardless, I would like to propose a few idea for addressing these concerns.

# Update the Iterator to rotate over the volumes (it is very common that an 
Iterator does not guarantee any kind of order)
# With the lock held, process each DataNode as a task and process each task on 
its own thread.  In this way the duration of the lock held can be decreased and 
the replication requests will be interwoven across all the relevant DataNodes 
in the queue instead of loading the requests DN by DN in a serial fashion.
# Wrap the requests in the queue with a TTL value.  Requests that are rejected, 
for whatever reason, are placed back into the queue instead of dropped.  If 
they rotate through the queue many times without being serviced, then perform 
some sort of error logging and drop the request.
# Update documentation to include this information regarding how many blocks 
can be replicated per day, etc.


was (Author: belugabehr):
Thank you all for the great investigatory work.

One question I have is in regards to:

bq. dropped until all other blocks have been tried and the iterator cycles 
round.

On the second run through the Iterator, shouldn't there be many fewer blocks in 
the list since many were successfully replicated away?


Regardless, I would like to propose a few idea for addressing these concerns.

# Update the Iterator to rotate over the volumes (it is very common that an 
Iterator does not guarantee any kind of order)
# With the lock held, process each DataNode as a task and process each task on 
its own thread.  In this way the duration of the lock held can be decreased and 
the replication requests will be interwoven across all the relevant DataNodes 
in the queue instead of loading the requests DN by DN in a serial fashion.
# Update documentation to include this information regarding how many blocks 
can be replicated per day, etc.

> Do Not Remove Blocks Sequentially During Decommission 
> --
>
> Key: HDFS-13157
> URL: https://issues.apache.org/jira/browse/HDFS-13157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-13157.1.patch
>
>
> From what I understand of [DataNode 
> decommissioning|https://github.com/apache/hadoop/blob/42a1c98597e6dba2e371510a6b2b6b1fb94e4090/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java]
>  it appears that all the blocks are scheduled for removal _in order._. I'm 
> not 100% sure what the ordering is exactly, but I think it loops through each 
> data volume and schedules each block to be replicated elsewhere. The net 
> affect is that during a decommission, all of the DataNode transfer threads 
> slam on a single volume until it is cleaned out. At which point, they all 
> slam on the next volume, etc.
> Please randomize the block list so that there is a more even distribution 
> across all volumes when decommissioning a node.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13157) Do Not Remove Blocks Sequentially During Decommission

2019-09-08 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925165#comment-16925165
 ] 

David Mollitor commented on HDFS-13157:
---

Thank you all for the great investigatory work.

One question I have is in regards to:

bq. dropped until all other blocks have been tried and the iterator cycles 
round.

On the second run through the Iterator, shouldn't there be many fewer blocks in 
the list since many were successfully replicated away?


Regardless, I would like to propose a few idea for addressing these concerns.

# Update the Iterator to rotate over the volumes (it is very common that an 
Iterator does not guarantee any kind of order)
# With the lock held, process each DataNode as a task and process each task on 
its own thread.  In this way the duration of the lock held can be decreased and 
the replication requests will be interwoven across all the relevant DataNodes 
in the queue instead of loading the requests DN by DN in a serial fashion.
# Update documentation to include this information regarding how many blocks 
can be replicated per day, etc.

> Do Not Remove Blocks Sequentially During Decommission 
> --
>
> Key: HDFS-13157
> URL: https://issues.apache.org/jira/browse/HDFS-13157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-13157.1.patch
>
>
> From what I understand of [DataNode 
> decommissioning|https://github.com/apache/hadoop/blob/42a1c98597e6dba2e371510a6b2b6b1fb94e4090/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java]
>  it appears that all the blocks are scheduled for removal _in order._. I'm 
> not 100% sure what the ordering is exactly, but I think it loops through each 
> data volume and schedules each block to be replicated elsewhere. The net 
> affect is that during a decommission, all of the DataNode transfer threads 
> slam on a single volume until it is cleaned out. At which point, they all 
> slam on the next volume, etc.
> Please randomize the block list so that there is a more even distribution 
> across all volumes when decommissioning a node.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14798) Synchronize invalidateBlocks in DatanodeDescriptor

2019-09-08 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925163#comment-16925163
 ] 

David Mollitor commented on HDFS-14798:
---

Hello [~ayushtkn],

Thank you for the feedback.

I do not think the law of "premature optimization" applies to this case.  This 
issue is not about optimization, in the general sense of performance.  This 
issue is about shoring up a potential landmine before it hits in a production 
system.  Better to catch this potential issue in a static code review than in a 
live system.

> Synchronize invalidateBlocks in DatanodeDescriptor
> --
>
> Key: HDFS-14798
> URL: https://issues.apache.org/jira/browse/HDFS-14798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: n00b, newbie
> Attachments: HDFS-14798.001.patch
>
>
> {code:java|title=DatanodeDescriptor.java}
> public void resetBlocks() {
>   ...
>   this.invalidateBlocks.clear();
>   ...
> }
> public void clearBlockQueues() {
>   synchronized (invalidateBlocks) {
> this.invalidateBlocks.clear();
>   }
>   ...
> }
> {code}
> It may not be strictly necessary, but why risk it? The invalidateBlocks 
> should be protected in {{resetBlocks()}} just like it is in 
> {{clearBlockQueues()}}/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer

2019-09-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925162#comment-16925162
 ] 

Hadoop QA commented on HDFS-14292:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} HDFS-14292 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14292 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12960370/HDFS-14292.8.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27817/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Introduce Java ExecutorService to DataXceiverServer
> ---
>
> Key: HDFS-14292
> URL: https://issues.apache.org/jira/browse/HDFS-14292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14292.1.patch, HDFS-14292.2.patch, 
> HDFS-14292.3.patch, HDFS-14292.4.patch, HDFS-14292.5.patch, 
> HDFS-14292.6.patch, HDFS-14292.7.patch, HDFS-14292.8.patch, HDFS-14292.8.patch
>
>
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from 
> {{hdfs-site.xml}}.  It is described as "Specifies the maximum number of 
> threads to use for transferring data in and out of the DN."   The default 
> value is 4096.  I found it interesting because 4096 threads sounds like a lot 
> to me.  I'm not sure how a system with 8-16 cores would react to this large a 
> thread count.  Intuitively, I would say that the overhead of context 
> switching would be immense.
> During my investigation, I discovered the 
> [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
>  setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the 
> lifecycle of the service threads and to allow the DataNode to re-use existing 
> threads, saving on the need to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool to {{DataXceiverServer}} class that, on demand, will 
> create up to {{dfs.datanode.max.transfer.threads}}.  A thread that has 
> completed its prior duties will stay idle for up to 60 seconds 
> (configurable), it will be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and 
> less code within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with 
> {{blockReceiver}} instance variable



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14452) Make Op#valueOf() Public

2019-09-08 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925161#comment-16925161
 ] 

David Mollitor commented on HDFS-14452:
---

https://github.com/belugabehr/springdn/blob/master/src/main/java/io/github/belugabehr/datanode/comms/netty/dt/OpControlInboundHandlerAdapter.java#L48-L54

> Make Op#valueOf() Public
> 
>
> Key: HDFS-14452
> URL: https://issues.apache.org/jira/browse/HDFS-14452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: noob
> Attachments: HDFS-14452.patch
>
>
> Change signature of {{private static Op valueOf(byte code)}} to be public.  
> Right now, the only easy way to look up in Op is to pass in a {{DataInput}} 
> object, which is not all that flexible and efficient for other custom 
> implementations that want to store the Op code a different way.
> https://github.com/apache/hadoop/blob/8c95cb9d6bef369fef6a8364f0c0764eba90e44a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Op.java#L53



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-09-08 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925157#comment-16925157
 ] 

Ayush Saxena edited comment on HDFS-14754 at 9/8/19 12:17 PM:
--

That seems a little overkill to me. No need to proof that in multiple BR also 
it doesn’t get corrected. IMHO just verifying that your new introduced logic 
now works should be enough.
Anyway, there is a single test, Don't think we need a separate test class for 
it?
May be better if you can use existing test classes. Since the change is in 
BlockManager, can try {{TestBlockManager}} or maybe some other.


was (Author: ayushtkn):
That seems a little overkill to me. No need to proof that in multiple BR also 
it doesn’t get corrected. IMHO just verifying that your new introduced logic 
now works should be enough.

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, 
> HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, 
> HDFS-14754.006.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-09-08 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925157#comment-16925157
 ] 

Ayush Saxena commented on HDFS-14754:
-

That seems a little overkill to me. No need to proof that in multiple BR also 
it doesn’t get corrected. IMHO just verifying that your new introduced logic 
now works should be enough.

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, 
> HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, 
> HDFS-14754.006.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2019-09-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925156#comment-16925156
 ] 

Hadoop QA commented on HDFS-14795:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 51s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 7 new + 662 unchanged - 0 fixed = 669 total (was 662) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 15s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14795 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979770/HDFS-14795.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 72fe3e4bb3bc 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ca32917 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle |

[jira] [Commented] (HDFS-14340) Lower the log level when can't get postOpAttr

2019-09-08 Thread Rohith Sharma K S (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925129#comment-16925129
 ] 

Rohith Sharma K S commented on HDFS-14340:
--

[~jojochuang] [~OneisAll] [~starphin] Could anyone update the release note 
since it is marked as *Important* in Flags for release 3.2.1? If it nothing 
breaks backward compatible, better to remove the flag 'Important'. 

> Lower the log level when can't get postOpAttr
> -
>
> Key: HDFS-14340
> URL: https://issues.apache.org/jira/browse/HDFS-14340
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Reporter: Anuhan Torgonshar
>Assignee: Anuhan Torgonshar
>Priority: Major
>  Labels: easyfix
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14340.trunk.patch
>
>
> I think should lower the log level when can't get postOpAttr in 
> _*hadoop-2.8.5-src/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/**RpcProgramNfs3.java*_.
>  
>  
> {code:java}
> **The fisrt code snippet***
> //the problematic log level ERROR, at line 1044
> try {
>dirWcc = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(preOpDirAttr),
>dfsClient, dirFileIdPath, iug);
> } catch (IOException e1) {
>LOG.error("Can't get postOpDirAttr for dirFileId: "
>+ dirHandle.getFileId(), e1);
> }
> **The second code snippet***
> //other practice in similar code snippets, line number is 475, the log 
> assigned with INFO level
> try { 
>wccData = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(preOpAttr), 
> dfsClient,   fileIdPath, iug); 
> } catch (IOException e1) { 
>LOG.info("Can't get postOpAttr for fileIdPath: " + fileIdPath, e1); 
> }
> **The third code snippet***
> //other practice in similar code snippets, line number is 1405, the log 
> assigned with INFO level
> try {
>fromDirWcc = Nfs3Utils.createWccData(
>Nfs3Utils.getWccAttr(fromPreOpAttr), dfsClient, fromDirFileIdPath,iug);
>toDirWcc = Nfs3Utils.createWccData(Nfs3Utils.getWccAttr(toPreOpAttr),
>dfsClient, toDirFileIdPath, iug);
> } catch (IOException e1) {
>LOG.info("Can't get postOpDirAttr for " + fromDirFileIdPath + " or"
>+ toDirFileIdPath, e1);
> }
> {code}
> Therefore, I think the logging practices should be consistent in similar 
> contexts. When the code catches _*IOException*_ for *_getWccAttr()_* method, 
> it more likely prints a log message with _*INFO*_ level, a lower level.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14074) DataNode runs async disk checks maybe throws NullPointerException, and DataNode failed to register to NameSpace.

2019-09-08 Thread Rohith Sharma K S (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925128#comment-16925128
 ] 

Rohith Sharma K S commented on HDFS-14074:
--

[~jojochuang] [~luguangyi] [~arp] Could anyone update the release note since it 
is marked as incompatible change for release 3.2.1?

> DataNode runs async disk checks  maybe  throws NullPointerException, and 
> DataNode failed to register to NameSpace.
> --
>
> Key: HDFS-14074
> URL: https://issues.apache.org/jira/browse/HDFS-14074
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0, 3.0.0
> Environment: hadoop-2.7.3, hadoop-2.8.0
>Reporter: guangyi lu
>Assignee: guangyi lu
>Priority: Major
>  Labels: HDFS, HDFS-4
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14074-latest.patch, HDFS-14074.patch, 
> WechatIMG83.jpeg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In ThrottledAsyncChecker class，it members of the completedChecks is 
> WeakHashMap, its definition is as follows：
>       this.completedChecks = new WeakHashMap<>();
> and one of its uses is as follows in schedule method:
>      if (completedChecks.containsKey(target)) {  
>       // here may be happen garbage collection，and result may be null.
>        final LastCheckResult result = completedChecks.get(target);         
>  
>        final long msSinceLastCheck = timer.monotonicNow() - 
> result.completedAt;    
>        
> }
> after  "completedChecks.containsKey(target)"，  may be happen garbage 
> collection，  and result may be null.
> the solution is：
> this.completedChecks = new ReferenceMap(1, 1);
> or
>  this.completedChecks = new HashMap<>();
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2019-09-08 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925126#comment-16925126
 ] 

Lisheng Sun commented on HDFS-14795:


hi [~elgoiri] I updated this patch and uploaded v003 patch.
{quote}Can we also add some test?
{quote}
current UTs cover it.
 1 isTransfer TestDiskError#testDataTransferWhenBytesPerChecksumIsZero
 2 isWrite TestTransferRbw#testTransferRbw
 So I think we need not add some UT. Please correct me if I was wrong. Thanks a 
lot [~elgoiri].

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14795) Add Throttler for writing block

2019-09-08 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14795:
---
Attachment: HDFS-14795.003.patch

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14528) Failover from Active to Standby Failed

2019-09-08 Thread Ravuri Sushma sree (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925119#comment-16925119
 ] 

Ravuri Sushma sree commented on HDFS-14528:
---

The above test failures arent related 

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.2.Patch, ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big

2019-09-08 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925114#comment-16925114
 ] 

Lisheng Sun commented on HDFS-14820:


hi [~elgoiri], [~ayushtkn] [~xkrogen] Could you have time to help review this 
patch? Thank you.

>  The default 8KB buffer of 
> BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
> ---
>
> Key: HDFS-14820
> URL: https://issues.apache.org/jira/browse/HDFS-14820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14820.001.patch
>
>
> this issue is similar to HDFS-14535.
> {code:java}
> public static BlockReader newBlockReader(String file,
> ExtendedBlock block,
> Token blockToken,
> long startOffset, long len,
> boolean verifyChecksum,
> String clientName,
> Peer peer, DatanodeID datanodeID,
> PeerCache peerCache,
> CachingStrategy cachingStrategy,
> int networkDistance) throws IOException {
>   // in and out will be closed when sock is closed (by the caller)
>   final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
>   peer.getOutputStream()));
>   new Sender(out).readBlock(block, blockToken, clientName, startOffset, len,
>   verifyChecksum, cachingStrategy);
> }
> public BufferedOutputStream(OutputStream out) {
> this(out, 8192);
> }
> {code}
> Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, 
> verifyChecksum, cachingStrategy) could not use such a big buffer.
> So i think it should reduce BufferedOutputStream buffer.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14373) EC : Decoding is failing when block group last incomplete cell fall in to AlignedStripe

2019-09-08 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925095#comment-16925095
 ] 

Surendra Singh Lilhore commented on HDFS-14373:
---

[~zhaoyim], I will provide the patch soon for this..

> EC : Decoding is failing when block group last incomplete cell fall in to 
> AlignedStripe
> ---
>
> Key: HDFS-14373
> URL: https://issues.apache.org/jira/browse/HDFS-14373
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-09-08 Thread hemanthboyina (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925093#comment-16925093
 ] 

hemanthboyina commented on HDFS-14754:
--

   _cluster.triggerBlockReports();_
   _cluster.triggerHeartbeats();_
   _cluster.triggerHeartbeats();_
   _cluster.triggerBlockReports_

we added this to send block report multiple times , to show that block 
reconstruction is not happening even after sending multiple block reports 

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, 
> HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, 
> HDFS-14754.006.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-09-08 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925090#comment-16925090
 ] 

Surendra Singh Lilhore edited comment on HDFS-14754 at 9/8/19 7:01 AM:
---

[~ayushtkn]
{quote}HDFS-14699 seems to be handling a quite similar scenario. Whether post 
that gets in, still this problem be there. Whether reconstruction will still 
not happen, without this?
{quote}
Both the issue are different, as I mentioned in comment , HDFS-14699 is 
handling the busy DN scenario not duplicate block scenario. While reviewing I 
checked this already.


was (Author: surendrasingh):
[~ayushtkn]
{quote}HDFS-14699 seems to be handling a quite similar scenario. Whether post 
that gets in, still this problem be there. Whether reconstruction will still 
not happen, without this?
{quote}
Both the issue are different, as I mentioned in comment , HDFS-14699 is 
handling the busy DN scenario not duplicate block scenario. While reviewing I 
checked this already.

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, 
> HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, 
> HDFS-14754.006.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-09-08 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925090#comment-16925090
 ] 

Surendra Singh Lilhore commented on HDFS-14754:
---

[~ayushtkn]
{quote}HDFS-14699 seems to be handling a quite similar scenario. Whether post 
that gets in, still this problem be there. Whether reconstruction will still 
not happen, without this?
{quote}
Both the issue are different, as I mentioned in comment , HDFS-14699 is 
handling the busy DN scenario not duplicate block scenario. While reviewing I 
checked this already.

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, 
> HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, 
> HDFS-14754.006.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

72 matches

Mail list logo