[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-12-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712703#comment-15712703
 ] 

Hudson commented on HDFS-8674:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10924 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10924/])
HDFS-8674. Improve performance of postponed block scans. Contributed by 
(kihwal: rev 96c574927a600d15fab919df1fdc9e07887af6c5)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-8674.2.patch, HDFS-8674.branch-2.patch, 
> HDFS-8674.patch, HDFS-8674.patch, HDFS-8674.trunk.2.patch, 
> HDFS-8674.trunk.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-12-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712633#comment-15712633
 ] 

Kihwal Lee commented on HDFS-8674:
--

+1 The patch looks good.

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.2.patch, HDFS-8674.branch-2.patch, 
> HDFS-8674.patch, HDFS-8674.patch, HDFS-8674.trunk.2.patch, 
> HDFS-8674.trunk.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-12-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712495#comment-15712495
 ] 

Hadoop QA commented on HDFS-8674:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
41s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 119 unchanged - 2 fixed = 119 total (was 121) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 61m  
9s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HDFS-8674 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12841300/HDFS-8674.trunk.2.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 06df29fbcd5a 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / e0fa492 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17731/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17731/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17731/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> 

[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-12-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712103#comment-15712103
 ] 

Kihwal Lee commented on HDFS-8674:
--

{{LightWeightLinkedSet}} does not support {{remove()}}.

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.2.patch, HDFS-8674.patch, HDFS-8674.patch, 
> HDFS-8674.trunk.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-11-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710112#comment-15710112
 ] 

Hadoop QA commented on HDFS-8674:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
49s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 119 unchanged - 2 fixed = 119 total (was 121) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 34s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.blockmanagement.TestNodeCount |
|   | hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks |
|   | hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | 
hadoop.hdfs.server.blockmanagement.TestReconstructStripedBlocksWithRackAwareness
 |
|   | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation |
|   | hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker |
|   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
|   | hadoop.hdfs.server.datanode.TestDataNodeMXBean |
|   | hadoop.hdfs.server.namenode.TestAddStripedBlockInFBR |
|   | hadoop.hdfs.TestLeaseRecoveryStriped |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshot |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencing |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HDFS-8674 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12841132/HDFS-8674.trunk.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 9ceb14463035 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 

[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-11-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709744#comment-15709744
 ] 

Kihwal Lee commented on HDFS-8674:
--

Looks like {{HDFS-8674.2.patch}} is for branch-2.  Can you post a trunk version?

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.2.patch, HDFS-8674.patch, HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-11-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706649#comment-15706649
 ] 

Hadoop QA commented on HDFS-8674:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} HDFS-8674 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-8674 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840925/HDFS-8674.2.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17702/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.2.patch, HDFS-8674.patch, HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-11-28 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702434#comment-15702434
 ] 

Daryn Sharp commented on HDFS-8674:
---

I'll update this patch.

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch, HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-10-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583932#comment-15583932
 ] 

Hadoop QA commented on HDFS-8674:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-8674 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-8674 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778098/HDFS-8674.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17194/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch, HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-03-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219121#comment-15219121
 ] 

Vinod Kumar Vavilapalli commented on HDFS-8674:
---

[~daryn] / [~mingma], is it possible to get this in 2.7.3 in a week? Tx?

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch, HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-03-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174323#comment-15174323
 ] 

Daryn Sharp commented on HDFS-8674:
---

I'll try to rebase again.

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch, HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-12-30 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075578#comment-15075578
 ] 

Ming Ma commented on HDFS-8674:
---

Daryn, from failed tests's logs, the latest patch uses LightWeightLinkedSet 
whose iterator doesn't support remove.

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch, HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-12-16 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060295#comment-15060295
 ] 

Ming Ma commented on HDFS-8674:
---

Thanks [~daryn]. Your explanation makes sense. It is good to know the perf 
difference between "the # of items < 5M" and "larger number of items". +1 on 
the patch.

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061120#comment-15061120
 ] 

Hadoop QA commented on HDFS-8674:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 16s {color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66 with JDK v1.8.0_66 
generated 1 new issues (was 32, now 32). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 57s {color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91 with JDK v1.7.0_91 
generated 1 new issues (was 34, now 34). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-hdfs-project/hadoop-hdfs (total was 161, now 160). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 16s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 8s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 128m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestDFSClientRetries |

[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-12-15 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058212#comment-15058212
 ] 

Daryn Sharp commented on HDFS-8674:
---

[~mingma], comments?

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-11-02 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986273#comment-14986273
 ] 

Daryn Sharp commented on HDFS-8674:
---

[~mingma] It's a LinkedHashSet to cover the cases you state.  

bq. The motivation of randomization is to make sure we don't end up scanning 
the same blocks if there isn't much update to postponedMisreplicatedBlocks. 
<...> What is the reason to change from HashSet to LinkedHashSet, to have some 
order guarantee?  HashSet is faster than LinkedHashSet. <...> During the scan, 
if a block is marked as POSTPONE, it will be removed from 
postponedMisreplicatedBlocks first and add it back later via 
rescannedMisreplicatedBlocks. Can it just remove those blocks from 
postponedMisreplicatedBlocks that aren't marked as POSTPONE without using 
rescannedMisreplicatedBlocks?

The LHS iterator is effectively functioning as a cheap circular array with 
acceptable insert/remove performance.  Yes, the order guarantee ensures all 
blocks are visited w/o "skipping" to a random location.  The block always has 
to be removed and re-inserted for the ordering to work.  Otherwise the same 
blocks will be scanned over and over again.

bq. Does the latency of couple seconds come from HashSet iteration? Some quick 
test indicates iterating through 5M entries of a HashSet take around 50ms.

Yes, the cycles wasted to skip the through the set were the killer.  A 
micro-benchmark doesn't capture the same chaotic runtime environment of over a 
100 threads competing for resources.  Kihwal says (ironically) the performance 
"improved" under 5M.

In multiple production incidents, the postponed queue backed up with 10-20M+ 
blocks.  Scans determined the blocks are all still postponed.  Pre-patch, the 
cycles took up to a few seconds and averaged in the many hundreds of ms.  
Performance was obviously atrocious.  Post-patch, same scans took no more than 
a few ms.



> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-10-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975390#comment-14975390
 ] 

Vinod Kumar Vavilapalli commented on HDFS-8674:
---

[~daryn] / [~mingma], any update on this? Considering this for a 2.7.2 RC this 
weekend. Thanks.

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975701#comment-14975701
 ] 

Hadoop QA commented on HDFS-8674:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12742164/HDFS-8674.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 96677be |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13211/console |


This message was automatically generated.

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-06-26 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603873#comment-14603873
 ] 

Ming Ma commented on HDFS-8674:
---

Thanks [~daryn]!

* The motivation of randomization is to make sure we don't end up scanning the 
same blocks if there isn't much update to postponedMisreplicatedBlocks. But 
maybe that doesn't matter much as it will eventually be drained; also HashSet 
doesn't make order guarantee.
* Does the latency of couple seconds come from HashSet iteration? Some quick 
test indicates iterating through 5M entries of a HashSet take around 50ms.
* What is the reason to change from HashSet to LinkedHashSet, to have some 
order guarantee? HashSet is faster than LinkedHashSet.
* During the scan, if a block is marked as POSTPONE, it will be removed from 
postponedMisreplicatedBlocks first and add it back later via 
rescannedMisreplicatedBlocks. Can it just remove those blocks from 
postponedMisreplicatedBlocks that aren't marked as POSTPONE without using 
rescannedMisreplicatedBlocks?

 Improve performance of postponed block scans
 

 Key: HDFS-8674
 URL: https://issues.apache.org/jira/browse/HDFS-8674
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-8674.patch


 When a standby goes active, it marks all nodes as stale which will cause 
 block invalidations for over-replicated blocks to be queued until full block 
 reports are received from the nodes with the block.  The replication monitor 
 scans the queue with O(N) runtime.  It picks a random offset and iterates 
 through the set to randomize blocks scanned.
 The result is devastating when a cluster loses multiple nodes during a 
 rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
 invalidations are postponed. Rescanning just 2k blocks out of millions of 
 postponed blocks may take multiple seconds. During the scan, the write lock 
 is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Auto-Re: [jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-06-26 Thread wsb
您的邮件已收到!谢谢!

Auto-Re: [jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-06-26 Thread wsb
您的邮件已收到!谢谢!

[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-06-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603715#comment-14603715
 ] 

Hadoop QA commented on HDFS-8674:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 16s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 24s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 162m 31s | Tests passed in hadoop-hdfs. 
|
| | | 204m 57s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12742164/HDFS-8674.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 60b858b |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11502/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11502/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11502/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11502/console |


This message was automatically generated.

 Improve performance of postponed block scans
 

 Key: HDFS-8674
 URL: https://issues.apache.org/jira/browse/HDFS-8674
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-8674.patch


 When a standby goes active, it marks all nodes as stale which will cause 
 block invalidations for over-replicated blocks to be queued until full block 
 reports are received from the nodes with the block.  The replication monitor 
 scans the queue with O(N) runtime.  It picks a random offset and iterates 
 through the set to randomize blocks scanned.
 The result is devastating when a cluster loses multiple nodes during a 
 rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
 invalidations are postponed. Rescanning just 2k blocks out of millions of 
 postponed blocks may take multiple seconds. During the scan, the write lock 
 is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)