[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-08-24 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591771#comment-16591771
 ] 

Gabor Bota commented on HDFS-13672:
---

With some delay, I've created HDFS-13864 as a follow-up.

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch, 
> HDFS-13672.003.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-07-24 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554351#comment-16554351
 ] 

Gabor Bota commented on HDFS-13672:
---

Adding lazy persist would be a great idea. [~xiaochen] if you can create a 
follow-up jira if you want to.
Another good idea is to create a service where you can iterate through a list 
of elements with a gained writeLock - and each element can be run through a 
lambda function. We may want to create a jira for that. (kudos for 
[~andrew.wang])

So to summarize this issue:
* It's not worth making a behavior change for this since this long blocking 
scan probably will only happen during debugging situations (and we have a 
workaround)
* The workaround is to disable the scrubber interval when debugging. In the 
real world/customer environments, there are no cases when there are so many 
corrupted lazy persist files.
* I will close this jira as won't fix and if there's no follow-up jiras, I'll 
create one tomorrow CET.

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch, 
> HDFS-13672.003.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-07-23 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553830#comment-16553830
 ] 

Xiao Chen commented on HDFS-13672:
--

Thanks all for the discussion here. I tend to agree with Andrew: since the 
scrubber has safemode checks to , this probably doesn't worth the effort to do 
a behavior change.

The idea of adding counters for lazy persist sounds good, can do that as an 
improvement jira if interested. :)

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch, 
> HDFS-13672.003.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553250#comment-16553250
 ] 

genericqa commented on HDFS-13672:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 15s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13672 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932727/HDFS-13672.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6d7229b49206 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bbe2f62 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24636/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-07-23 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553150#comment-16553150
 ] 

Andrew Wang commented on HDFS-13672:


Hi Gabor, I took a quick look. It doesn't look like the iterator will make 
forward progress between iterations since it's starting from the beginning each 
time. What we need here is a tail iterator that starts at the last processed 
element.

I recommend we close this as wontfix and then open a new JIRA to figure out if 
want to disable this feature (which is incompatible) or do some kind of smarter 
detection as to whether it's necessary. As an example, we check if encryption 
is being used by seeing if there are any encryption zones created.

It's also worth asking if it's worth making a behavior change at all, since 
this long blocking scan probably will only happen during debugging situations 
(and we have a workaround).

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch, 
> HDFS-13672.003.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-07-23 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553029#comment-16553029
 ] 

Gabor Bota commented on HDFS-13672:
---

We had an offline discussion with [~andrew.wang] on this topic with the 
following outcome:
* I upload a new patch for this issue with the iterator created inside the loop 
inside the writelock. If this patch is accepted, then this will be the proposed 
solution for this issue, so it can be committed.
* Another solution for this would be to disable the scrubber interval when 
debugging. In the real world/customer environments, there are no cases when 
there are so many corrupted *lazy* persist files.
* The open question about this is that should we disable the 
clearCorruptLazyPersistFiles by default?

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch, 
> HDFS-13672.003.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-07-03 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531431#comment-16531431
 ] 

Andrew Wang commented on HDFS-13672:


Hi Gabor, to respond to your question, I think the very common case will be 
zero lazy persist files, the rare case being "some" (~thousands), and the very 
very rare case lots (~millions).

I agree that holding the lock for a long time is an anti-pattern. I actually 
had a patch I was working on a while ago for a different feature that added a 
safe way of iterating over the block map.

However, for this case I don't know if it's worth spending a lot of time 
optimizing, since the # of corrupt blocks in the system is normally not that 
large. It's a rare situation that a NameNode is transitioned to active while 
missing a lot of DNs like this (and why we have startup safemode checks). This 
probably only happens during debugging, in which case we could also solve this 
problem by setting the scrubber interval to 0 to disable it.

[~jojochuang] what do you think?

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-07-03 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531389#comment-16531389
 ] 

Gabor Bota commented on HDFS-13672:
---

Thanks for the review [~andrew.wang]!

Based on what you've said that can't happen *or* it's not likely to happen to 
have that many corrupt lazy persist files? It's a really good idea to have a 
counter on this, but I still feel like we need to handle the case when a 
long-running operation like this can time out and crash the namenode.

In my next patch I will
* check if there's a counter for the lazy persist files, and if there's not, I 
will create one. 
* modify the clearCorruptLazyPersistFiles to re-create the iterator after every 
lock release
* fix the typos (sorry for that)
* add the config key documentation to hdfs-default.xml

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-07-03 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531371#comment-16531371
 ] 

Andrew Wang commented on HDFS-13672:


Hi Gabor, thanks for working on this,

I don't think it's thread safe to drop the lock while holding onto an iterator 
like this. This is a LinkedSetIterator and will throw a 
ConcurrentModificationException if the set is changed underneath it. We need a 
way to safely resume at a mid-point, and that seems a bit hard with 
LinkedSetIterator as it is.

Since I think the common case here is that there are zero lazy persist files, a 
better (though different) change would be to skip running this scrubber 
entirely if there aren't any lazy persist files. I'm hoping there's an easy way 
to add a counter for this (or some existing way to query if there are any lazy 
persist files).

We also need unit tests for new changes like this. I think you also typo'd the 
config key name with "sec" instead of "millis" or "ms". Config keys also needed 
to be added to hdfs-default.xml with a description for documentation purposes.

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-27 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525272#comment-16525272
 ] 

genericqa commented on HDFS-13672:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 24s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13672 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929414/HDFS-13672.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b78656d4af6a 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bedc4fe |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24507/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24507/testReport/ |
| Max. process+thread count | 3249 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-27 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525113#comment-16525113
 ] 

Gabor Bota commented on HDFS-13672:
---

I've uploaded my v002 patch with the time-based configuration.

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-20 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518158#comment-16518158
 ] 

genericqa commented on HDFS-13672:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 33s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}158m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
|   | hadoop.hdfs.TestDFSClientRetries |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13672 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928461/HDFS-13672.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1a5051bc4e08 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2d87592 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24480/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-20 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518008#comment-16518008
 ] 

Gabor Bota commented on HDFS-13672:
---

In my v001 patch, I'm solving the issue with the limited number of block per 
iteration.

[~jojochuang], do you mean +break out from the loop after 1 second+  by 
breaking out of the whole clearCorruptLazyPersistFiles function or just 
releasing a lock and continue the interation? If the 
{{LazyPersistFileScrubber}} runs periodically we don't even have to remove all 
the corrupt replica blocks in one run.

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-19 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517514#comment-16517514
 ] 

Wei-Chiu Chuang commented on HDFS-13672:


I've given it some thoughts.
Instead of breaking down the iteration by the number of blocks, would it make 
sense to have a time-based configuration? For example, break out from the loop 
after 1 second?

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-19 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517378#comment-16517378
 ] 

Gabor Bota commented on HDFS-13672:
---

Hi [~jojochuang],
What would be a reasonable amount of elements that could be iterated through at 
one pass?
Should this be number (number of iterations per pass) configurable from 
outside? 

Thanks,
Gabor

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org