[jira] [Commented] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-24 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023350#comment-16023350
 ] 

Vinitha Reddy Gankidi commented on HDFS-11837:
--

You are right. I was looking at a different branch. I have uploaded a new patch 
removing the unused import.

> Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in 
> batches
> -
>
> Key: HDFS-11837
> URL: https://issues.apache.org/jira/browse/HDFS-11837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9710-branch-2.7.00.patch, 
> HDFS-9710-branch-2.7.01.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9710 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-24 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11837:
-
Attachment: HDFS-9710-branch-2.7.01.patch

> Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in 
> batches
> -
>
> Key: HDFS-11837
> URL: https://issues.apache.org/jira/browse/HDFS-11837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9710-branch-2.7.00.patch, 
> HDFS-9710-branch-2.7.01.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9710 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-23 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022199#comment-16022199
 ] 

Vinitha Reddy Gankidi commented on HDFS-11837:
--

[~shv] ReplaceDatanodeOnFailure is used here in TestBatchIbr:
conf.setBoolean(ReplaceDatanodeOnFailure.BEST_EFFORT_KEY, true);
Is there something I'm missing?

> Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in 
> batches
> -
>
> Key: HDFS-11837
> URL: https://issues.apache.org/jira/browse/HDFS-11837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9710-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9710 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-23 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021617#comment-16021617
 ] 

Vinitha Reddy Gankidi commented on HDFS-11837:
--

[~shv] Please take a look. I've verified that all these tests pass locally.

> Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in 
> batches
> -
>
> Key: HDFS-11837
> URL: https://issues.apache.org/jira/browse/HDFS-11837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9710-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9710 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-22 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11837:
-
Status: Patch Available  (was: Open)

> Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in 
> batches
> -
>
> Key: HDFS-11837
> URL: https://issues.apache.org/jira/browse/HDFS-11837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9710-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9710 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-22 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11837:
-
Attachment: HDFS-9710-branch-2.7.00.patch

> Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in 
> batches
> -
>
> Key: HDFS-11837
> URL: https://issues.apache.org/jira/browse/HDFS-11837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9710-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9710 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11854) Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too long to complete

2017-05-19 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi resolved HDFS-11854.
--
Resolution: Duplicate

> Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too 
> long to complete
> --
>
> Key: HDFS-11854
> URL: https://issues.apache.org/jira/browse/HDFS-11854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9412 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11854) Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too long to complete

2017-05-19 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018042#comment-16018042
 ] 

Vinitha Reddy Gankidi commented on HDFS-11854:
--

[~arpiagariu] Yes. Let me resolve it. Thanks for that. The first time I tried 
to create I got an error but looks like it was actually successful.

> Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too 
> long to complete
> --
>
> Key: HDFS-11854
> URL: https://issues.apache.org/jira/browse/HDFS-11854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9412 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11855) Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too long to complete

2017-05-18 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016928#comment-16016928
 ] 

Vinitha Reddy Gankidi commented on HDFS-11855:
--

[~shv] Please take a look.

> Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too 
> long to complete
> --
>
> Key: HDFS-11855
> URL: https://issues.apache.org/jira/browse/HDFS-11855
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9412-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9412 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11855) Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too long to complete

2017-05-18 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11855:
-
Status: Patch Available  (was: Open)

> Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too 
> long to complete
> --
>
> Key: HDFS-11855
> URL: https://issues.apache.org/jira/browse/HDFS-11855
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9412-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9412 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11855) Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too long to complete

2017-05-18 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11855:
-
Attachment: HDFS-9412-branch-2.7.00.patch

> Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too 
> long to complete
> --
>
> Key: HDFS-11855
> URL: https://issues.apache.org/jira/browse/HDFS-11855
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9412-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9412 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11855) Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too long to complete

2017-05-18 Thread Vinitha Reddy Gankidi (JIRA)
Vinitha Reddy Gankidi created HDFS-11855:


 Summary: Backport HDFS-9412 to branch-2.7: getBlocks occupies 
FSLock and takes too long to complete
 Key: HDFS-11855
 URL: https://issues.apache.org/jira/browse/HDFS-11855
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Vinitha Reddy Gankidi
Assignee: Vinitha Reddy Gankidi


As per discussussion in [mailling 
list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
 backport HDFS-9412 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11854) Backport HDFS-9412 to branch-2.7: getBlocks occupies FSLock and takes too long to complete

2017-05-18 Thread Vinitha Reddy Gankidi (JIRA)
Vinitha Reddy Gankidi created HDFS-11854:


 Summary: Backport HDFS-9412 to branch-2.7: getBlocks occupies 
FSLock and takes too long to complete
 Key: HDFS-11854
 URL: https://issues.apache.org/jira/browse/HDFS-11854
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Vinitha Reddy Gankidi
Assignee: Vinitha Reddy Gankidi


As per discussussion in [mailling 
list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
 backport HDFS-9412 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9726) Refactor IBR code to a new class

2017-05-18 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-9726:

Attachment: HDFS-9726-branch-2.7.01.patch

> Refactor IBR code to a new class
> 
>
> Key: HDFS-9726
> URL: https://issues.apache.org/jira/browse/HDFS-9726
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: h9726_20160131.patch, h9726_20160201.patch, 
> h9726_20160203.patch, h9726_20160204.patch, HDFS-9726-branch-2.7.01.patch
>
>
> The IBR code currently is mainly in BPServiceActor.  The JIRA is to refactor 
> it to a new class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11839) Backport HDFS-9726 to branch-2.7: Refactor IBR code to a new class

2017-05-18 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11839:
-
Attachment: HDFS-9726-branch-2.7.01.patch

> Backport HDFS-9726 to branch-2.7: Refactor IBR code to a new class
> --
>
> Key: HDFS-11839
> URL: https://issues.apache.org/jira/browse/HDFS-11839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
> Attachments: HDFS-9726.branch-2.7.00.patch, 
> HDFS-9726-branch-2.7.01.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9726 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11839) Backport HDFS-9726 to branch-2.7: Refactor IBR code to a new class

2017-05-18 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016678#comment-16016678
 ] 

Vinitha Reddy Gankidi commented on HDFS-11839:
--

[~shv] Can you please review the patch? Regarding the checkstyle, other than 
the unused import the remaining issues should be there in the original patch as 
well. 

> Backport HDFS-9726 to branch-2.7: Refactor IBR code to a new class
> --
>
> Key: HDFS-11839
> URL: https://issues.apache.org/jira/browse/HDFS-11839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
> Attachments: HDFS-9726.branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9726 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11839) Backport HDFS-9726 to branch-2.7: Refactor IBR code to a new class

2017-05-18 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11839:
-
Attachment: HDFS-9726.branch-2.7.00.patch

> Backport HDFS-9726 to branch-2.7: Refactor IBR code to a new class
> --
>
> Key: HDFS-11839
> URL: https://issues.apache.org/jira/browse/HDFS-11839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
> Attachments: HDFS-9726.branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9726 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11839) Backport HDFS-9726 to branch-2.7: Refactor IBR code to a new class

2017-05-18 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11839:
-
Status: Patch Available  (was: Open)

> Backport HDFS-9726 to branch-2.7: Refactor IBR code to a new class
> --
>
> Key: HDFS-11839
> URL: https://issues.apache.org/jira/browse/HDFS-11839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
> Attachments: HDFS-9726.branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9726 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11838) Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed

2017-05-17 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11838:
-
Attachment: HDFS-7990-branch-2.7.01.patch

> Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed
> --
>
> Key: HDFS-11838
> URL: https://issues.apache.org/jira/browse/HDFS-11838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-7990-branch-2.7.00.patch, 
> HDFS-7990-branch-2.7.01.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-7990 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11838) Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed

2017-05-17 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015059#comment-16015059
 ] 

Vinitha Reddy Gankidi commented on HDFS-11838:
--

Good catch. Thanks Konstantin. Attached a new patch removing {{startTime}}.

> Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed
> --
>
> Key: HDFS-11838
> URL: https://issues.apache.org/jira/browse/HDFS-11838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-7990-branch-2.7.00.patch, 
> HDFS-7990-branch-2.7.01.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-7990 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11838) Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed

2017-05-17 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11838:
-
Status: Patch Available  (was: Open)

> Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed
> --
>
> Key: HDFS-11838
> URL: https://issues.apache.org/jira/browse/HDFS-11838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-7990-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-7990 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11838) Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed

2017-05-17 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013623#comment-16013623
 ] 

Vinitha Reddy Gankidi commented on HDFS-11838:
--

[~shv] Please review the patch

> Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed
> --
>
> Key: HDFS-11838
> URL: https://issues.apache.org/jira/browse/HDFS-11838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-7990-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-7990 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11838) Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed

2017-05-17 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11838:
-
Attachment: HDFS-7990-branch-2.7.00.patch

> Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed
> --
>
> Key: HDFS-11838
> URL: https://issues.apache.org/jira/browse/HDFS-11838
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-7990-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-7990 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11839) Backport HDFS-9726 to branch-2.7: Refactor IBR code to a new class

2017-05-17 Thread Vinitha Reddy Gankidi (JIRA)
Vinitha Reddy Gankidi created HDFS-11839:


 Summary: Backport HDFS-9726 to branch-2.7: Refactor IBR code to a 
new class
 Key: HDFS-11839
 URL: https://issues.apache.org/jira/browse/HDFS-11839
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Vinitha Reddy Gankidi
Assignee: Vinitha Reddy Gankidi
Priority: Minor


As per discussussion in [mailling 
list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
 backport HDFS-9726 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11838) Backport HDFS-7990 to branch-2.7: IBR delete ack should not be delayed

2017-05-17 Thread Vinitha Reddy Gankidi (JIRA)
Vinitha Reddy Gankidi created HDFS-11838:


 Summary: Backport HDFS-7990 to branch-2.7: IBR delete ack should 
not be delayed
 Key: HDFS-11838
 URL: https://issues.apache.org/jira/browse/HDFS-11838
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinitha Reddy Gankidi
Assignee: Vinitha Reddy Gankidi


As per discussussion in [mailling 
list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
 backport HDFS-7990 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-17 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013604#comment-16013604
 ] 

Vinitha Reddy Gankidi commented on HDFS-11837:
--

This patch depends on two other patches that aren't in branch-2.7: HDFS-7990 
and HDFS-9726. Will create separate JIRAs to track these two backports.

> Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in 
> batches
> -
>
> Key: HDFS-11837
> URL: https://issues.apache.org/jira/browse/HDFS-11837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9710 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-17 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11837:
-
Description: As per discussussion in [mailling 
list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
 backport HDFS-9710 to branch-2.7

> Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in 
> batches
> -
>
> Key: HDFS-11837
> URL: https://issues.apache.org/jira/browse/HDFS-11837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9710 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-17 Thread Vinitha Reddy Gankidi (JIRA)
Vinitha Reddy Gankidi created HDFS-11837:


 Summary: Backport HDFS-9710 to branch-2.7: Change DN to send block 
receipt IBRs in batches
 Key: HDFS-11837
 URL: https://issues.apache.org/jira/browse/HDFS-11837
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Vinitha Reddy Gankidi
Assignee: Vinitha Reddy Gankidi






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11808) Backport HDFS-8549 to branch-2.7: Abort the balancer if an upgrade is in progress

2017-05-12 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi reassigned HDFS-11808:


Assignee: (was: Vinitha Reddy Gankidi)

> Backport HDFS-8549 to branch-2.7: Abort the balancer if an upgrade is in 
> progress
> -
>
> Key: HDFS-11808
> URL: https://issues.apache.org/jira/browse/HDFS-11808
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-8549 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11808) Backport HDFS-8549 to branch-2.7: Abort the balancer if an upgrade is in progress

2017-05-11 Thread Vinitha Reddy Gankidi (JIRA)
Vinitha Reddy Gankidi created HDFS-11808:


 Summary: Backport HDFS-8549 to branch-2.7: Abort the balancer if 
an upgrade is in progress
 Key: HDFS-11808
 URL: https://issues.apache.org/jira/browse/HDFS-11808
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Vinitha Reddy Gankidi
Assignee: Vinitha Reddy Gankidi


As per discussussion in [mailling 
list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
 backport HDFS-8549 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11634) Optimize BlockIterator when interating starts in the middle.

2017-04-12 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966967#comment-15966967
 ] 

Vinitha Reddy Gankidi commented on HDFS-11634:
--

It's a good improvement. One minor nit: 
{{index}} is initialized to zero twice

[~zhz] raised a good point. It seems like we don't need the iterators for the 
skipped storages.  

> Optimize BlockIterator when interating starts in the middle.
> 
>
> Key: HDFS-11634
> URL: https://issues.apache.org/jira/browse/HDFS-11634
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.5
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-11634.001.patch, HDFS-11634.002.patch, 
> HDFS-11634.003.patch, HDFS-11634.004.patch
>
>
> {{BlockManager.getBlocksWithLocations()}} needs to iterate blocks from a 
> randomly selected {{startBlock}} index. It creates an iterator which points 
> to the first block and then skips all blocks until {{startBlock}}. It is 
> inefficient when DN has multiple storages. Instead of skipping blocks one by 
> one we can skip entire storages. Should be more efficient on average.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-04-12 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965511#comment-15965511
 ] 

Vinitha Reddy Gankidi commented on HDFS-11313:
--

Attached the design doc. Please take a look. I would appreciate any feedback on 
the design. Once we finalize on it, I'll create subtasks for the 
implementation. 

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: SegmentedBlockReports.pdf
>
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11313) Segmented Block Reports

2017-04-12 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11313:
-
Attachment: SegmentedBlockReports.pdf

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: SegmentedBlockReports.pdf
>
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

2017-04-10 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963707#comment-15963707
 ] 

Vinitha Reddy Gankidi commented on HDFS-11384:
--

[~shv] The delay logic looks good to me. It would be great if we can make 
BALANCER_NUM_RPC_PER_SEC configurable with a default value of 20.The test does 
not ensure that there are indeed 20 getBlocks calls per second and it probably 
is not straightforward to ensure that. So I would like to have the ability to 
configure BALANCER_NUM_RPC_PER_SEC.

> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -
>
> Key: HDFS-11384
> URL: https://issues.apache.org/jira/browse/HDFS-11384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch, HDFS-11384.003.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

2017-03-30 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949978#comment-15949978
 ] 

Vinitha Reddy Gankidi commented on HDFS-11384:
--

[~shv] I'm leaning towards reading from (4) instead of (3).
{{isGoodBlockCandidate}} needs a global view of the block replicas. Also there 
is some additional logic to deal with erasure coded(EC) blocks and this may be 
a blocker for reading from DNs. [~zhz] you probably have more context regarding 
the EC blocks.
{code}
 /**
   * Decide if the block/blockGroup is a good candidate to be moved from source
   * to target. A block is a good candidate if
   * 1. the block is not in the process of being moved/has not been moved;
   * 2. the block does not have a replica/internalBlock on the target;
   * 3. doing the move does not reduce the number of racks that the block has
   */
  private boolean isGoodBlockCandidate(StorageGroup source, StorageGroup target,
  StorageType targetStorageType, DBlock block) {
{code}

I agree that (2) and (4) are complimentary. 

> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -
>
> Key: HDFS-11384
> URL: https://issues.apache.org/jira/browse/HDFS-11384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

2017-03-30 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949978#comment-15949978
 ] 

Vinitha Reddy Gankidi edited comment on HDFS-11384 at 3/30/17 10:29 PM:


[~shv] I'm leaning towards (4) instead of (3).
{{isGoodBlockCandidate}} needs a global view of the block replicas. Also there 
is some additional logic to deal with erasure coded(EC) blocks and this may be 
a blocker for reading from DNs. [~zhz] you probably have more context regarding 
the EC blocks.
{code}
 /**
   * Decide if the block/blockGroup is a good candidate to be moved from source
   * to target. A block is a good candidate if
   * 1. the block is not in the process of being moved/has not been moved;
   * 2. the block does not have a replica/internalBlock on the target;
   * 3. doing the move does not reduce the number of racks that the block has
   */
  private boolean isGoodBlockCandidate(StorageGroup source, StorageGroup target,
  StorageType targetStorageType, DBlock block) {
{code}

I agree that (2) and (4) are complimentary. 


was (Author: redvine):
[~shv] I'm leaning towards reading from (4) instead of (3).
{{isGoodBlockCandidate}} needs a global view of the block replicas. Also there 
is some additional logic to deal with erasure coded(EC) blocks and this may be 
a blocker for reading from DNs. [~zhz] you probably have more context regarding 
the EC blocks.
{code}
 /**
   * Decide if the block/blockGroup is a good candidate to be moved from source
   * to target. A block is a good candidate if
   * 1. the block is not in the process of being moved/has not been moved;
   * 2. the block does not have a replica/internalBlock on the target;
   * 3. doing the move does not reduce the number of racks that the block has
   */
  private boolean isGoodBlockCandidate(StorageGroup source, StorageGroup target,
  StorageType targetStorageType, DBlock block) {
{code}

I agree that (2) and (4) are complimentary. 

> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -
>
> Key: HDFS-11384
> URL: https://issues.apache.org/jira/browse/HDFS-11384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

2017-03-30 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949757#comment-15949757
 ] 

Vinitha Reddy Gankidi edited comment on HDFS-11384 at 3/30/17 8:36 PM:
---

If we were to offload the calls to DN, dispersing calls wouldn't be a pressing 
issue. I would like to get some feedback  on the various approaches discussed. 
[~benoyantony], [~daryn], [~liuml07] and [~zhaoyunjiong] I would love to hear 
your opinions.


was (Author: redvine):
If we were to offload the calls to DN, dispersing calls wouldn't be a pressing 
issue. I would like to get some feedback  on the various approaches discussed. 
[~benoyantony] [~daryn] [~liuml07] [~zhaoyunjiong] I would love to hear your 
opinions.

> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -
>
> Key: HDFS-11384
> URL: https://issues.apache.org/jira/browse/HDFS-11384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

2017-03-30 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949757#comment-15949757
 ] 

Vinitha Reddy Gankidi edited comment on HDFS-11384 at 3/30/17 8:36 PM:
---

If we were to offload the calls to DN, dispersing calls wouldn't be a pressing 
issue. I would like to get some feedback  on the various approaches discussed. 
[~benoyantony] [~daryn] [~liuml07] [~zhaoyunjiong] I would love to hear your 
opinions.


was (Author: redvine):
If we were to offload the calls to DN, dispersing calls wouldn't be a pressing 
issue. I would like to get some feedback  on the various approaches discussed. 
[~benoyantony] [~daryn] [~liuml07] I would love to hear your opinions.

> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -
>
> Key: HDFS-11384
> URL: https://issues.apache.org/jira/browse/HDFS-11384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

2017-03-30 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949757#comment-15949757
 ] 

Vinitha Reddy Gankidi commented on HDFS-11384:
--

If we were to offload the calls to DN, dispersing calls wouldn't be a pressing 
issue. I would like to get some feedback  on the various approaches discussed. 
[~benoyantony] [~daryn] [~liuml07] I would love to hear your opinions.

> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -
>
> Key: HDFS-11384
> URL: https://issues.apache.org/jira/browse/HDFS-11384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

2017-03-30 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949593#comment-15949593
 ] 

Vinitha Reddy Gankidi commented on HDFS-11384:
--

Two other approaches to fix this:

1. In {{getBlockList()}} Dispatcher fetches the blocks belonging to a 
particular DN from the NN. And then it moves those blocks from the source DN to 
the target DN. Dispatcher can instead get the blocks directly from the 
particular DN. This makes {{getBlocksList()}} a distributed operation and 
doesn't impact any specific node.

2. Dispatcher can fetch the blocks from the Standby NN instead of the active. 
Balancer should be able to tolerate reasonable degree of staleness.

> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -
>
> Key: HDFS-11384
> URL: https://issues.apache.org/jira/browse/HDFS-11384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-03-24 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941294#comment-15941294
 ] 

Vinitha Reddy Gankidi commented on HDFS-11313:
--

Assigning it to myself. Will attach a design doc soon.

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11313) Segmented Block Reports

2017-03-24 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi reassigned HDFS-11313:


Assignee: Vinitha Reddy Gankidi

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-01-10 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816933#comment-15816933
 ] 

Vinitha Reddy Gankidi commented on HDFS-11313:
--

[~shv] This idea seems promising. I would like to work on it. I wanted to note 
that HDFS-7923 is related in the sense that the blocks reports by the DN are 
sent only when the NN gives the signal. Even with this patch, the issue of 
processing large DN reports under a global namespace lock still remains.  

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10733) NameNode terminated after full GC thinking QJM is unresponsive.

2017-01-10 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10733:
-
Status: Patch Available  (was: Open)

> NameNode terminated after full GC thinking QJM is unresponsive.
> ---
>
> Key: HDFS-10733
> URL: https://issues.apache.org/jira/browse/HDFS-10733
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.6.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10733.001.patch, HDFS-10733.002.patch
>
>
> NameNode went into full GC while in {{AsyncLoggerSet.waitForWriteQuorum()}}. 
> After completing GC it checks if the timeout for quorum is reached. If the GC 
> was long enough the timeout can expire, and {{QuorumCall.waitFor()}} will 
> throw {{TimeoutExcpetion}}. Finally {{FSEditLog.logSync()}} catches the 
> exception and terminates NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10733) NameNode terminated after full GC thinking QJM is unresponsive.

2017-01-10 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10733:
-
Attachment: HDFS-10733.002.patch

> NameNode terminated after full GC thinking QJM is unresponsive.
> ---
>
> Key: HDFS-10733
> URL: https://issues.apache.org/jira/browse/HDFS-10733
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.6.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10733.001.patch, HDFS-10733.002.patch
>
>
> NameNode went into full GC while in {{AsyncLoggerSet.waitForWriteQuorum()}}. 
> After completing GC it checks if the timeout for quorum is reached. If the GC 
> was long enough the timeout can expire, and {{QuorumCall.waitFor()}} will 
> throw {{TimeoutExcpetion}}. Finally {{FSEditLog.logSync()}} catches the 
> exception and terminates NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10733) NameNode terminated after full GC thinking QJM is unresponsive.

2017-01-10 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816705#comment-15816705
 ] 

Vinitha Reddy Gankidi commented on HDFS-10733:
--

[~shv] I agree. Attached a new patch with this change.

> NameNode terminated after full GC thinking QJM is unresponsive.
> ---
>
> Key: HDFS-10733
> URL: https://issues.apache.org/jira/browse/HDFS-10733
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.6.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10733.001.patch, HDFS-10733.002.patch
>
>
> NameNode went into full GC while in {{AsyncLoggerSet.waitForWriteQuorum()}}. 
> After completing GC it checks if the timeout for quorum is reached. If the GC 
> was long enough the timeout can expire, and {{QuorumCall.waitFor()}} will 
> throw {{TimeoutExcpetion}}. Finally {{FSEditLog.logSync()}} catches the 
> exception and terminates NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10733) NameNode terminated after full GC thinking QJM is unresponsive.

2017-01-09 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812538#comment-15812538
 ] 

Vinitha Reddy Gankidi commented on HDFS-10733:
--

[~kihwal] Thanks for the great suggestion. 

I have attached a patch that increases the endtime/timeout if there is a long 
pause due to a Full GC in NN. The unit test included asserts that a timeout 
exception is thrown instead of increasing the timeout as in the case of a Full 
GC if there indeed aren't any responses from the journal nodes. Please take a 
look. 

> NameNode terminated after full GC thinking QJM is unresponsive.
> ---
>
> Key: HDFS-10733
> URL: https://issues.apache.org/jira/browse/HDFS-10733
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.6.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10733.001.patch
>
>
> NameNode went into full GC while in {{AsyncLoggerSet.waitForWriteQuorum()}}. 
> After completing GC it checks if the timeout for quorum is reached. If the GC 
> was long enough the timeout can expire, and {{QuorumCall.waitFor()}} will 
> throw {{TimeoutExcpetion}}. Finally {{FSEditLog.logSync()}} catches the 
> exception and terminates NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10733) NameNode terminated after full GC thinking QJM is unresponsive.

2017-01-09 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10733:
-
Attachment: HDFS-10733.001.patch

> NameNode terminated after full GC thinking QJM is unresponsive.
> ---
>
> Key: HDFS-10733
> URL: https://issues.apache.org/jira/browse/HDFS-10733
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.6.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10733.001.patch
>
>
> NameNode went into full GC while in {{AsyncLoggerSet.waitForWriteQuorum()}}. 
> After completing GC it checks if the timeout for quorum is reached. If the GC 
> was long enough the timeout can expire, and {{QuorumCall.waitFor()}} will 
> throw {{TimeoutExcpetion}}. Finally {{FSEditLog.logSync()}} catches the 
> exception and terminates NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10733) NameNode terminated after full GC thinking QJM is unresponsive.

2016-10-17 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi reassigned HDFS-10733:


Assignee: Vinitha Reddy Gankidi

> NameNode terminated after full GC thinking QJM is unresponsive.
> ---
>
> Key: HDFS-10733
> URL: https://issues.apache.org/jira/browse/HDFS-10733
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.6.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>
> NameNode went into full GC while in {{AsyncLoggerSet.waitForWriteQuorum()}}. 
> After completing GC it checks if the timeout for quorum is reached. If the GC 
> was long enough the timeout can expire, and {{QuorumCall.waitFor()}} will 
> throw {{TimeoutExcpetion}}. Finally {{FSEditLog.logSync()}} catches the 
> exception and terminates NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-10-17 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: HDFS-10301.branch-2.7.015.patch

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.015.patch, HDFS-10301.branch-2.015.patch, 
> HDFS-10301.branch-2.7.015.patch, HDFS-10301.branch-2.7.patch, 
> HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-10-17 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583534#comment-15583534
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

Attached the patch for branch-2.7.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.015.patch, HDFS-10301.branch-2.015.patch, 
> HDFS-10301.branch-2.7.015.patch, HDFS-10301.branch-2.7.patch, 
> HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-10-14 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575943#comment-15575943
 ] 

Vinitha Reddy Gankidi commented on HDFS-10712:
--

[~shv] Somehow lost track of this one. Can you commit it?

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.branch-2.7.patch, HDFS-10712.branch-2.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-10-13 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: (was: HDFS-10301.016.patch)

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.015.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-10-13 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: HDFS-10301.015.patch

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.015.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-10-13 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: (was: HDFS-10301.015.patch)

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.016.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-10-13 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572936#comment-15572936
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

Updated the patch. The conflict was due to a recent patch pushed upstream.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.016.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-10-13 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: HDFS-10301.016.patch

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.015.patch, HDFS-10301.016.patch, HDFS-10301.branch-2.7.patch, 
> HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8028) TestNNHandlesBlockReportPerStorage/TestNNHandlesCombinedBlockReport Failed after patched HDFS-7704

2016-10-07 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556846#comment-15556846
 ] 

Vinitha Reddy Gankidi commented on HDFS-8028:
-

These tests fail on branch 2-7 after HDFS-7704 but they pass after 
HDFS-7430.This doesn't need to be fixed in branch-2.7. Initialization values 
for DN_RESCAN_INTERVAL and DN_RESCAN_EXTRA_WAIT need to be modified as per 
HDFS-7430 to fix it temporarily.

> TestNNHandlesBlockReportPerStorage/TestNNHandlesCombinedBlockReport Failed 
> after patched HDFS-7704
> --
>
> Key: HDFS-8028
> URL: https://issues.apache.org/jira/browse/HDFS-8028
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0
>Reporter: hongyu bi
>Assignee: hongyu bi
>Priority: Minor
> Attachments: HDFS-8028-v0.patch
>
>
> HDFS-7704 makes BadBlockReport asynchronously however 
> BlockReportTestBase#blockreport_02 doesn't wait for a while after blockreport.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-10-04 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546645#comment-15546645
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

The test failure seems unrelated. It passes locally.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.015.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-10-04 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: HDFS-10301.015.patch

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.015.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-10-04 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546390#comment-15546390
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

Patch 15 has the changes mentioned in 
https://issues.apache.org/jira/browse/HDFS-10301?focusedCommentId=15536676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15536676.
 Kindly review.

??It does not solve the race between a timed out BR and the repeating BR in 
multi-RPC BR case.??
When there is a race, the per-storage BRs that arrive after the removal of the 
node lease would not be processed. I think that is okay. BR retransmissions are 
handled by the underlying RPC layer. The same RPC request is retried as per the 
specified Retry policy. Since these retransmitted BRs are identical, it is 
sufficient if we process all the per-storage BRs once. It seems okay to ignore 
the subsequent retransmitted BRs from the same node once {{curRpc + 1 == 
totalRpcs}} is satisfied. Does that sound reasonable?

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-09-23 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15518131#comment-15518131
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

i) When BRs are split into multiple RPCS: Say 2 BRs from the same DN are 
processed at the same time. If we process the last storage report of the second 
BR before processing all the storage reports in the first BR, then the 
remaining storage reports in the first BR will be ignored as checkLease would 
return false.
{code}
if (context != null) {
if (context.getTotalRpcs() == context.getCurRpc() + 1) {
  long leaseId = this.getBlockReportLeaseManager().removeLease(node);
  BlockManagerFaultInjector.getInstance().
  removeBlockReportLease(node, leaseId);
}
{code}
ii) For single RPC BRs: As all storage reports in the single RPC BR satisfy the 
condition that triggers removal of the lease, all storage reports after the 
first storage report will be ignored without the change.


> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-09-23 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517992#comment-15517992
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

Why do we need to detect the last-report? I don't see any potential problems 
with the checkLease change. Like Konstantin mentioned, what exactly do you mean 
by the last-report? It will be helpful if you can give a scenario where this 
particular change can cause problems.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-09-19 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504561#comment-15504561
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

[~arpiagariu]  I understand that we may bypass the leaseID check if the storage 
report processing happens out of order. Are there any issues with this 
workaround? What needs to be modified?
We do not need to detect the last storage report in this implementation as the 
pruning of storages happens in the heartbeat. 

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-09-15 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495211#comment-15495211
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

[~jingzhao] 
??Then can this cover DN hotswap case??
Yes, I will explain how it does below.

??For DN hotswap, I think the DN only sends FBR to notify NN about the change??
That is right.

During hotswap {{DataNode.reconfigurePropertyImpl()}} is invoked which 
identifies the newly added/removed volumes. For all the volumes to be removed, 
{{FsDatasetImpl.removeVolumes()}} is called. This also removes the block infos 
from the FsDataset. It does so by adding these blocks to the 
{{blkToInvalidate}} map. Then the {{FsDatasetImpl.invalidate()}} method is 
invoked for all the blocks in the map.
{code}
   * Invalidate a block but does not delete the actual on-disk block file.
   *
   * It should only be used when deactivating disks.
   *
   * @param bpid the block pool ID.
   * @param block The block to be invalidated.
   */
  public void invalidate(String bpid, ReplicaInfo block) {
// If a DFSClient has the replica in its cache of short-circuit file
// descriptors (and the client is using ShortCircuitShm), invalidate it.
datanode.getShortCircuitRegistry().processBlockInvalidation(
new ExtendedBlockId(block.getBlockId(), bpid));

// If the block is cached, start uncaching it.
cacheManager.uncacheBlock(bpid, block.getBlockId());

datanode.notifyNamenodeDeletedBlock(new ExtendedBlock(bpid, block),
block.getStorageUuid());
  }
{code}

As you can see, these blocks are reported to the NN as deleted. So, the NN 
eventually removes all the blocks associated with this volume. Once this is 
done, the volume is actually pruned by {{DatanodeDescriptor.pruneStorageMap()}} 
in the subsequent heartbeat.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-09-14 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491142#comment-15491142
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

[~arpiagariu]  Storage reports are anyway sent in heartbeats and these reports 
have the information required to prune zombie storages. These storages are only 
marked as FAILED in the heartbeat. The replicas are removed in background by 
the HeartbeatManager. Why exactly do you think zombie removal in heartbeats is 
not safe? Why do we need to wait for all storage block reports from a FBR?

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-09-12 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485956#comment-15485956
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

[~arpiagariu] In the latest patch, BR lease is removed when 
{{context.getTotalRpcs() == context.getCurRpc() + 1}}. If BRs are processed out 
of order/interleaved, the BR lease for the DN will be removed before all the 
BRs from the DN are processed. So, I have modified the {{checkLease}} method in 
{{BlockReportLeaseManager}} to return true when {{node.leaseId == 0}}. Please 
let me know if you see any issues with this approach.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-09-12 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: HDFS-10301.014.patch

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch, 
> HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-09-12 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485884#comment-15485884
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

Upon thorough investigation of heartbeat logic I have verified that unreported 
storages do get removed without any code change. Attached patch 014 eliminates 
the state and the zombie storage removal logic introduced in HDFS-7960. 
I have added a unit test that verifies that when a DN storage with blocks is 
removed, this storage is removed from the DatanodeDescriptor as well and does 
not linger forever. Unreported storages are marked as FAILED in  
{{updateHeartbeatState}} method when {{checkFailedStorages}} is true. Thus when 
a DN storage is removed, it will be marked as FAILED in the next heartbeat. 
The storage removal happens in 2 steps after that (Refer Step 2 & 3 in 
https://issues.apache.org/jira/browse/HDFS-10301?focusedCommentId=15427387=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15427387).
 
The test {{testRemovingStorageDoesNotProduceZombies}} introduced in HDFS-7960 
passes by reducing the heartbeat recheck interval so that the test doesn't 
timeout. By default, the Heartbeat Manager removes blocks associated with 
failed storages every 5 minutes.
I have ignored {{testProcessOverReplicatedAndMissingStripedBlock}} in this 
patch. Please refer to HDFS-10854 for more details.


> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.branch-2.7.patch, 
> HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10809) getNumEncryptionZones causes NPE in branch-2.7

2016-08-30 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449841#comment-15449841
 ] 

Vinitha Reddy Gankidi commented on HDFS-10809:
--

Thanks [~zhz]. I could not reproduce the test failures locally as well. 

> getNumEncryptionZones causes NPE in branch-2.7
> --
>
> Key: HDFS-10809
> URL: https://issues.apache.org/jira/browse/HDFS-10809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.7.4
>Reporter: Zhe Zhang
>Assignee: Vinitha Reddy Gankidi
> Fix For: 2.7.4
>
> Attachments: HDFS-10809-branch-2.7.001.patch
>
>
> This bug was caused by the fact that we did HDFS-10458 from trunk to 
> branch-2.7, but we did HDFS-8721 initially up to branch-2.8. So from 
> branch-2.8 and up, the order is HDFS-8721 -> HDFS-10458. But in branch-2.7, 
> we have the reverse order. Hence the inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10814) Add assertion for getNumEncryptionZones when no EZ is created

2016-08-30 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449598#comment-15449598
 ] 

Vinitha Reddy Gankidi commented on HDFS-10814:
--

Thanks Zhe and Andrew!

> Add assertion for getNumEncryptionZones when no EZ is created
> -
>
> Key: HDFS-10814
> URL: https://issues.apache.org/jira/browse/HDFS-10814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-10814.001.patch
>
>
> HDFS-10809 adds an additional assertion to TestEncryptionZones to validate 
> that getNumEncryptionZones returns 0 if there is no EZ. This is a useful 
> check to add to trunk as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10809) getNumEncryptionZones causes NPE in branch-2.7

2016-08-30 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10809:
-
Attachment: HDFS-10809-branch-2.7.001.patch

> getNumEncryptionZones causes NPE in branch-2.7
> --
>
> Key: HDFS-10809
> URL: https://issues.apache.org/jira/browse/HDFS-10809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.7.4
>Reporter: Zhe Zhang
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10809-branch-2.7.001.patch
>
>
> This bug was caused by the fact that we did HDFS-10458 from trunk to 
> branch-2.7, but we did HDFS-8721 initially up to branch-2.8. So from 
> branch-2.8 and up, the order is HDFS-8721 -> HDFS-10458. But in branch-2.7, 
> we have the reverse order. Hence the inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10809) getNumEncryptionZones causes NPE in branch-2.7

2016-08-30 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10809:
-
Attachment: (was: HDFS-10809.001.patch)

> getNumEncryptionZones causes NPE in branch-2.7
> --
>
> Key: HDFS-10809
> URL: https://issues.apache.org/jira/browse/HDFS-10809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.7.4
>Reporter: Zhe Zhang
>Assignee: Vinitha Reddy Gankidi
>
> This bug was caused by the fact that we did HDFS-10458 from trunk to 
> branch-2.7, but we did HDFS-8721 initially up to branch-2.8. So from 
> branch-2.8 and up, the order is HDFS-8721 -> HDFS-10458. But in branch-2.7, 
> we have the reverse order. Hence the inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10814) Add assertion for getNumEncryptionZones when no EZ is created

2016-08-29 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10814:
-
Attachment: HDFS-10814.001.patch

> Add assertion for getNumEncryptionZones when no EZ is created
> -
>
> Key: HDFS-10814
> URL: https://issues.apache.org/jira/browse/HDFS-10814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
> Attachments: HDFS-10814.001.patch
>
>
> HDFS-10809 adds an additional assertion to TestEncryptionZones to validate 
> that getNumEncryptionZones returns 0 if there is no EZ. This is a useful 
> check to add to trunk as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10814) Add assertion for getNumEncryptionZones when no EZ is created

2016-08-29 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi reassigned HDFS-10814:


Assignee: Vinitha Reddy Gankidi

> Add assertion for getNumEncryptionZones when no EZ is created
> -
>
> Key: HDFS-10814
> URL: https://issues.apache.org/jira/browse/HDFS-10814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
>Priority: Minor
>
> HDFS-10809 adds an additional assertion to TestEncryptionZones to validate 
> that getNumEncryptionZones returns 0 if there is no EZ. This is a useful 
> check to add to trunk as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10814) Add assertion for getNumEncryptionZones when no EZ is created

2016-08-29 Thread Vinitha Reddy Gankidi (JIRA)
Vinitha Reddy Gankidi created HDFS-10814:


 Summary: Add assertion for getNumEncryptionZones when no EZ is 
created
 Key: HDFS-10814
 URL: https://issues.apache.org/jira/browse/HDFS-10814
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Reporter: Vinitha Reddy Gankidi
Priority: Minor


HDFS-10809 adds an additional assertion to TestEncryptionZones to validate that 
getNumEncryptionZones returns 0 if there is no EZ. This is a useful check to 
add to trunk as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10809) getNumEncryptionZones causes NPE in branch-2.7

2016-08-29 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447449#comment-15447449
 ] 

Vinitha Reddy Gankidi commented on HDFS-10809:
--

[~zhz] I have uploaded a patch. Please take a look. 

> getNumEncryptionZones causes NPE in branch-2.7
> --
>
> Key: HDFS-10809
> URL: https://issues.apache.org/jira/browse/HDFS-10809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.7.4
>Reporter: Zhe Zhang
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10809.001.patch
>
>
> This bug was caused by the fact that we did HDFS-10458 from trunk to 
> branch-2.7, but we did HDFS-8721 initially up to branch-2.8. So from 
> branch-2.8 and up, the order is HDFS-8721 -> HDFS-10458. But in branch-2.7, 
> we have the reverse order. Hence the inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10809) getNumEncryptionZones causes NPE in branch-2.7

2016-08-29 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10809:
-
Attachment: HDFS-10809.001.patch

> getNumEncryptionZones causes NPE in branch-2.7
> --
>
> Key: HDFS-10809
> URL: https://issues.apache.org/jira/browse/HDFS-10809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.7.4
>Reporter: Zhe Zhang
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10809.001.patch
>
>
> This bug was caused by the fact that we did HDFS-10458 from trunk to 
> branch-2.7, but we did HDFS-8721 initially up to branch-2.8. So from 
> branch-2.8 and up, the order is HDFS-8721 -> HDFS-10458. But in branch-2.7, 
> we have the reverse order. Hence the inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-18 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427460#comment-15427460
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

Thanks [~shv] for summarizing how zombies can be detected and appropriately 
handled using the existing mechanism in heartbeat. I am working on a patch that 
implements this. 

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.branch-2.7.patch, 
> HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-10 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416357#comment-15416357
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

[~daryn] That is a good suggestion. Zombies should be handled by the 
heartbeat's pruning of excess storages.
Why do we need to wait until block reports for all the storages in the 
heartbeat are processed? 
Do you want to submit a patch for this?

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.branch-2.7.patch, 
> HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-05 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: HDFS-10301.013.patch

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.branch-2.7.patch, 
> HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-05 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410364#comment-15410364
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

The real problem is the state associated with the Datanode 
(curBlockReportRpcsSeen, curBlockReportId) to figure out when to remove zombie 
storages. This state gets messed up when block reports are processed out of 
order. The current patch still allows out of order processing of block reports 
but gets rid of this state associated with the Datanode. 

In patch 012, although isStorageReport method returns true for STORAGE_REPORT 
BlockListsAsLong, this method gets overridden to return false in the 
BufferDecoder. I have attached a new patch (013) that fixes this issue. 


> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.branch-2.7.patch, 
> HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402938#comment-15402938
 ] 

Vinitha Reddy Gankidi commented on HDFS-10712:
--

Done.

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.branch-2.7.patch, HDFS-10712.branch-2.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: HDFS-10712.branch-2.patch

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.branch-2.7.patch, HDFS-10712.branch-2.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: HDFS-10712.branch-2.7.patch

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.branch-2.7.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: (was: HDFS-10712.001.patch)

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: HDFS-10712.001.patch

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.001.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: (was: HDFS-10712.001.patch)

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402881#comment-15402881
 ] 

Vinitha Reddy Gankidi commented on HDFS-10712:
--

[~shv] I have attached a patch. Can you please take a look? Thanks. 

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.001.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi reassigned HDFS-10712:


Assignee: Vinitha Reddy Gankidi

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.001.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: HDFS-10712.001.patch

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
> Attachments: HDFS-10712.001.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402734#comment-15402734
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

[~ebadger] Thanks for reporting this. TestDataNodeVolumeFailure does not call 
blockReport() with context=null on trunk. This was fixed as a part of 
HDFS-9260. We need to modify TestDataNodeVolumeFailure.testVolumeFailure() for 
branch-2.7 as well:
{code}
-cluster.getNameNodeRpc().blockReport(dnR, bpid, reports, null);
+cluster.getNameNodeRpc().blockReport(dnR, bpid, reports,
+new BlockReportContext(1, 0, System.nanoTime()));
{code}

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-29 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400179#comment-15400179
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

Added a patch for branch-2.7.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-29 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: HDFS-10301.branch-2.7.patch

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-20 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: HDFS-10301.012.patch

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-20 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386922#comment-15386922
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

Thanks for the review [~liuml07]. I have attached a new patch (012) that 
addresses your comments.

> FSImage#isUpgradeFinalized is not volatile and 
> nn.getFSImage().isUpgradeFinalized() is not holding the read lock in 
> NameNodeRpcServer#blockReport(). Is this a problem? This is not very related 
> to this issue though.

My patch does not make any changes to the isUpgradeFinalized method. If this is 
a problem, we should open another JIRA to address it.

> If you’re gonna process exceptions thrown by the task, I think we don’t need 
> to return it explicitly as Callable.call()is permitted to throw checked 
> exceptions

Thanks for the good suggestion! I have modified the Callable.call() to return a 
DataNodeCommand and throw IOException. I don't explicitly catch the exception 
since junit will take care of it.

> I think we need to interpret the return value of the future.get()?

 future.get() returns DataNodeCommand which we don’t take care about and don’t 
need to interpret. 

> do you mean Assert.assertArrayEquals(storageInfos, 
> dnDescriptor.getStorageInfos());

Yes, thanks for that! I have made the change.

> We should add javadoc for STORAGE_REPORT as it’s not that straightforward 
> defined in BlockListAsLongsabstract class.

Added the doc

> assert (blockList.getNumberOfBlocks() == -1); I believe we don’t need to use 
> assert statement along with Assert.asserEquals()?

I changed the assert to Assert.assertEquals. However, the existing test does 
use assert as well {{assert(numBlocksReported >= expectedTotalBlockCount);}}

> Always use slf4j placeholder in the code as you are doing int he latest 
> patch. 

Thanks for the tip! I noticed that placeholders were not used consistently. I 
tried to maintain the logging style that was already used in that particular 
file. I have modified all the log messages in my patch to use placeholders 
wherever possible. Sl4j was not used in some places,  for instance in 
TestNameNodePrunesMissingStorages.

> I see unnecessary blank lines in the v11 patch.I see not addressed long line 
> checkstyle warnings in BlockManager

I noticed two blank lines in TestNameNodePrunesMissingStorages inv11 patch. I 
removed that. I do not see any checkstyle warnings.

> if (nn.getFSImage().isUpgradeFinalized() &&
context.getTotalRpcs() == context.getCurRpc() + 1) {
  Set storageIDsInBlockReport = new HashSet<>();

Combined as suggested.

> BPServiceActor.java Let’s make cmd final.

Since cmd was not final previously, I have left it unchanged. 



> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-19 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10301:
-
Attachment: HDFS-10301.011.patch

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-19 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384955#comment-15384955
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

Patch 011 fixes the two checkstyle issues and the log message.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-19 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384611#comment-15384611
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

> For example, an RPC could have gotten duplicated by something in the network. 
[~cmccabe] Doesn't TCP ignore duplicate packets? Can you explain how this can 
happen? If the RPC does get duplicated, then we shouldn't return true right 
when {{node.leaseId == 0}} ?

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.sample.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-07-18 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383224#comment-15383224
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

I have made STORAGE_REPORT {{static final}} in the 010 patch.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.sample.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >