subject:"\[jira\] \[Commented\] \(HDFS\-4015\) Safemode should count and report orphaned blocks"

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2017-03-16 Thread Daniel Pol (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929317#comment-15929317
 ] 

Daniel Pol commented on HDFS-4015:
--

[~anu] Managed to get my space back by triggering a full block report from the 
bad nodes. 

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2017-03-16 Thread Daniel Pol (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929265#comment-15929265
 ] 

Daniel Pol commented on HDFS-4015:
--

[~anu] I've seen it before in other cases, but here's my current one. I have an 
issue on my cluster where datanodes start failing when hit with heavy write 
activity (even to the point where I can't ssh to the system anymore and needs 
to be power cycled), mostly during the reduce phase from Terasort. So I start 
Teragen and during the run some datanodes crash, and the remaining nodes get 
more data to store compared to expected. At that point I stop the whole cluster 
and reboot all nodes (even some of the nodes that are not bad still take a long 
time to respond). Once the cluster is up I delete the Teragen folder (with 
skipTrash, and I don't use snapshots) because some nodes are now close to their 
space capacity. However not all space is freed up and upon investigation I see 
the bad nodes have orphaned blocks. A few runs like this quickly take up all my 
available space at which point I have to manually clean the orphaned blocks or 
reformat HDFS. Steps to reproduce in your enviroment;
1, Start Teragen (make sure its big enough to run for >5 min for example)
2, While Teragen is running (say half way in) kill the datanode process on some 
node (not shutdown)
3. Once Teragen finished, delete the Teragen folder
4. Restart the whole HDFS cluster, including the killed nodes
5. You should be able to find on the killed node orphaned blocks now that are 
not getting deleted. Fsck will say something like "Block blk_1075307258 does 
not exist" even in safe mode.



> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2017-03-16 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929244#comment-15929244
 ] 

Anu Engineer commented on HDFS-4015:


[~danielpol] Thanks for reporting this.  I will try to repro the case you have 
described. But just to make sure that we are on the same page, this patch 
addresses the issue of when NN is in safe mode. So is your case when you have 
Datanode down, you delete the directory and then you reboot the datanodes *and* 
namenode ? 

Can you please explain the steps to repro this issue ? Thanks in advance.


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2017-03-16 Thread Daniel Pol (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929237#comment-15929237
 ] 

Daniel Pol commented on HDFS-4015:
--

RE:"In this patch we track blocks with generation stamp greater than the 
current highest generation stamp that is known to NN. I have made the 
assumption that if DN comes back on-line and reports blocks for files that have 
been deleted, those Generation IDs for those blocks will be lesser than the 
current Generation Stamp of NN. Please let me know if you think this assumption 
is not valid or breaks down in special cases, Could this happen with V1 vs V2 
generation stamps ?"

I'm hitting the case with same Generation ID quite often during testing. Test 
scenario is run Teragen and for various reasons (mostly Hadoop settings) 
datanode service on some nodes dies abruptly (think power failures also). While 
the bad nodes are down, you delete the Teragen output folder (to free up space 
on the remaining good nodes that now are trying to maintain the replication 
factor with less nodes). Once all nodes are up again and running the bad nodes 
have orphaned blocks with the same Generation IDs. Right now its pretty painful 
to get rid of those manually. 

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2016-11-21 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684329#comment-15684329
 ] 

Anu Engineer commented on HDFS-4015:


bq. Should this be marked as incompatible change? 
I think we can argue both sides, so I am fine with either call you make after 
reading through both points of view.

bq. Earlier "dfsadmin -safemode leave" was leaving the safemode.
The behaviour in 99.9% of cases is exactly same. So it is a rare case of 
incompatibility, even if we end up defining this as an incompatibility.

bq. Now expects "-forceExit" also if there are any future bytes detected.
Future bytes -- is an error condition that should have been flagged by HDFS. 
This was a missing error check, if we call this incompatibility, it would mean 
that copying old fsImage or old NN metadata was a supported operation. I would 
argue that it never was a supported operation in the sense that NN metadata is 
sacrosanct and you are not supposed to roll it back.

So from that point of view, it just confirms what we always knew and avoids 
booting up with incorrect metadata, but both of us very well know that the 
reason why this JIRA is fixed is because people do this and lose data. 

With this change copy/restoring NN metadata it has become a supported operation 
(that is, HDFS is aware users are going to do this), and we explicitly warn the 
user of harm that this action can cause. If we were to argue that old behavior 
was a feature, then we are saying that changing NN metadata and losing data was 
a supported feature.

While it is still possible to copy an older version of NN metadata, now HDFS is 
going warn the end user about data loss. The question you are asking is should 
we classify that as a incompatibility or as enforcement of the core axioms of 
HDFS? 

My *personal* view is that is it not an incompatible change, since HDFS has 
never officially encouraged people to copy *older* versions of NN metadata. If 
you agree with that, then this change merely formalizes that assumption that NN 
metadata is sacrosanct and if you roll it back, we are in an error state that 
needs explicit user intervention.

But I also see that from an end users point of view (especially someone with 
lot of HDFS experience), this enforcement of a NN metadata integrity takes way 
some of the old dangerous behavior. Now we have added detection of an error 
condition which requires explicit action from user, you can syntactically argue 
that it is an incompatible change, though semantically I would suppose that it 
is obvious to any HDFS user that copying old versions of NN metadata is a bad 
idea.


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2016-11-21 Thread Vinayakumar B (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683168#comment-15683168
 ] 

Vinayakumar B commented on HDFS-4015:
-

Should this be marked as incompatible change?
Earlier "dfsadmin -safemode leave" was leaving the safemode.
Now expects "-forceExit" also if there are any future bytes detected.
Please correct If I am missing something here.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-12-01 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034287#comment-15034287
 ] 

Kihwal Lee commented on HDFS-4015:
--

[~anu],I have seen intermittent test failures in precommit builds. Do you think 
it is a test issue?

{noformat}
java.lang.AssertionError: expected:<18> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:125)
{noformat}

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-12-01 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034292#comment-15034292
 ] 

Anu Engineer commented on HDFS-4015:


Could be, let me take a look. Thanks for letting me know.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972437#comment-14972437
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2468 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2468/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971441#comment-14971441
 ] 

Anu Engineer commented on HDFS-4015:


none of the test failures seem to be related to this patch.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972342#comment-14972342
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972385#comment-14972385
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #578 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/578/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Fix For: 2.8.0
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972284#comment-14972284
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8699 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8699/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972348#comment-14972348
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1313 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1313/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972363#comment-14972363
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2523 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2523/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972370#comment-14972370
 ] 

Hudson commented on HDFS-4015:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #591 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/591/])
HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 
86c92227fc56b6e06d879d250728e8dc8cbe98fe)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972367#comment-14972367
 ] 

Arpit Agarwal commented on HDFS-4015:
-

Committed to trunk. Keeping Jira open for the branch-2 commit.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971746#comment-14971746
 ] 

Jitendra Nath Pandey commented on HDFS-4015:


+1 for the latest patch.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971670#comment-14971670
 ] 

Arpit Agarwal commented on HDFS-4015:
-

The TestHDFSCLI failure looks related, the following change fixes it. There is 
a missing space in DFSAdmin.

{code}
-String safemode = "-safemode :  Safe mode" 
+
+String safemode = "-safemode :  Safe mode 
" +
{code}

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971690#comment-14971690
 ] 

Anu Engineer commented on HDFS-4015:


[~arpitagarwal] Thanks for catching it and quickly fixing it.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972163#comment-14972163
 ] 

Hadoop QA commented on HDFS-4015:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  24m 37s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m 51s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 42s | The applied patch generated  3 
new checkstyle issues (total was 138, now 138). |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m  5s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 42s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 38s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 56s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests |  57m 32s | Tests passed in hadoop-hdfs. 
|
| {color:red}-1{color} | hdfs tests |   0m 36s | Tests failed in 
hadoop-hdfs-client. |
| | | 122m 52s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed build | hadoop-hdfs-client |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768388/HDFS-4015.007.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 15eb84b |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13168/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13168/console |


This message was automatically generated.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972192#comment-14972192
 ] 

Arpit Agarwal commented on HDFS-4015:
-

The build failure looks unrelated to the patch. Since the only Jenkins failure 
flagged by the v6 patch was fixed by a trivial whitespace change I will commit 
the v7 patch shortly.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971836#comment-14971836
 ] 

Mingliang Liu commented on HDFS-4015:
-

I revised the safe mode part and it looks good to me.

+1 for the latest patch.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970211#comment-14970211
 ] 

Hadoop QA commented on HDFS-4015:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  28m 33s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   9m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m 16s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   4m  2s | Site still builds. |
| {color:red}-1{color} | checkstyle |   3m 10s | The applied patch generated  3 
new checkstyle issues (total was 138, now 138). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m 13s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 11s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 32s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  53m 57s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 37s | Tests passed in 
hadoop-hdfs-client. |
| | | 124m 39s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure |
|   | hadoop.cli.TestHDFSCLI |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768132/HDFS-4015.006.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 124a412 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13140/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13140/console |


This message was automatically generated.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode"

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-16 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961458#comment-14961458
 ] 

Arpit Agarwal commented on HDFS-4015:
-

bq. When the operator makes the name node leave safe mode manually, the -force 
option is not checked, even if there are orphaned blocks. Is this possible? If 
true, is it expected?
[~liuml07], you are right. It's admittedly odd for an administrator to enter 
safe mode manually during startup but we should guard against the sequence of 
steps you described.

I need to think about this some more but we should be able to remove the 
{{isInStartupSafeMode()}} from the clause below. i.e. never exit safe mode 
without the force flag if there bytes with future generation stamps. (The 
rollback exception is already handled elsewhere).

{code}
private synchronized void leave(boolean force) {
...
  if (!force && isInStartupSafeMode() && (blockManager.getBytesInFuture() >
  0)) {
{code}

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-15 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958423#comment-14958423
 ] 

Mingliang Liu commented on HDFS-4015:
-

The patch looks good to me overall. More discussion is welcome about safe mode 
question.

The latest release audit warning is unrelated. Findbugs warnings may be caused 
by existing one tracked by [HDFS-9242]. Checkstyle warnings are caused by 
existing code, and may be addressed separately. Failing unit tests seem 
unrelated but we may need double check.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-15 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959367#comment-14959367
 ] 

Anu Engineer commented on HDFS-4015:


[~liuml07] Thanks for looking that the Hadoop QA results.  I did look at test 
results just to double check. 
2 of them are failures related to globbing that has already been reverted. 
Other failures are mostly timing related and not related to this patch.


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-14 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957422#comment-14957422
 ] 

Arpit Agarwal commented on HDFS-4015:
-

Hi [~anu], the v004 patch looks great. We should handle the case of 
{{blocksInFuture}} not being present in 
{{PBHelperClient#convert(GetFsStatsResponseProto res)}} (newer client + older 
NameNode). e.g.
{code}
result[ClientProtocol.GET_STATS_BYTES_IN_FUTURE_BLOCKS_IDX] =
res.hasBlocksInFuture() ? res.getBlocksInFuture() : 0;
{code}

+1 otherwise.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-14 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958072#comment-14958072
 ] 

Arpit Agarwal commented on HDFS-4015:
-

Thanks [~anu]. +1 pending Jenkins. Will hold off committing until tomorrow in 
case [~liuml07] wants to take another look.

bq. The new patch fixes that and also updates how RollBack is detected based on 
off-line comments.
FTR we felt detecting rollback from the  startup option was safer than 
overloading the meaning of {{shouldPostponeBlocksFromFuture}}.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958271#comment-14958271
 ] 

Hadoop QA commented on HDFS-4015:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  26m 48s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   9m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m 26s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 21s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | site |   3m 33s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 38s | The applied patch generated  3 
new checkstyle issues (total was 138, now 138). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 55s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 20s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 43s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  65m 50s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 42s | Tests passed in 
hadoop-hdfs-client. |
| | | 133m 58s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.TestDataTransferKeepalive |
|   | hadoop.hdfs.TestRecoverStripedFile |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage |
|   | hadoop.cli.TestHDFSCLI |
|   | hadoop.hdfs.server.blockmanagement.TestNodeCount |
|   | hadoop.fs.TestGlobPaths |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766677/HDFS-4015.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / be7a0ad |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12996/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12996/console |


This message was automatically generated.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-14 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958157#comment-14958157
 ] 

Mingliang Liu commented on HDFS-4015:
-

One quick question:
Consider the name node is in extension period (startup safe mode), the operator 
sets the safe mode manually. When the operator makes the name node leave safe 
mode manually, the {{-force}} option is not checked, even if there are orphaned 
blocks. Is this possible? If true, is it expected?

Another minor comment is that the following code may be re-used:
{code}
+  LOG.error("Refusing to leave safe mode without a force flag. " +
+  "Exiting safe mode will cause a deletion of " + blockManager
+  .getBytesInFuture() + " byte(s). Please use " +
+  "-forceExit flag to exit safe mode forcefully and data loss is " 
+
+  "acceptable.");
{code}

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-14 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958228#comment-14958228
 ] 

Anu Engineer commented on HDFS-4015:


bq.When the operator makes the name node leave safe mode manually, the -force 
option is not checked, even if there are orphaned blocks. Is this possible? If 
true, is it expected?

If there are orphaned blocks that we discovered during Startup safe mode, 
operator cannot exit without -forceExit. 


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-14 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958239#comment-14958239
 ] 

Mingliang Liu commented on HDFS-4015:
-

Thanks for the confirmation (and for the patch). I think the {{leaveSafeMode}} 
depends on  {{isInStartupSafeMode()}} which is false in this case.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-14 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958238#comment-14958238
 ] 

Mingliang Liu commented on HDFS-4015:
-

Thanks for the confirmation (and for the patch). I think the {{leaveSafeMode}} 
depends on  {{isInStartupSafeMode()}} which is false in this case.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-12 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953475#comment-14953475
 ] 

Anu Engineer commented on HDFS-4015:


Release audit, checkstyle and test failures are not related to this patch

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951591#comment-14951591
 ] 

Hadoop QA commented on HDFS-4015:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  23m 22s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 23s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | site |   3m  8s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 31s | The applied patch generated  3 
new checkstyle issues (total was 138, now 138). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  8s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 192m 25s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 31s | Tests passed in 
hadoop-hdfs-client. |
| | | 250m 38s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.cli.TestHDFSCLI |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.TestRenameWhileOpen |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765926/HDFS-4015.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / def374e |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12902/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12902/console |


This message was automatically generated.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948058#comment-14948058
 ] 

Hadoop QA commented on HDFS-4015:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m 46s | Pre-patch trunk has 748 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  9s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 30s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 46s | The applied patch generated  3 
new checkstyle issues (total was 138, now 138). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 38s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 11s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 154m 54s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 34s | Tests passed in 
hadoop-hdfs-client. |
| | | 209m  6s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestAddStripedBlocks |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.balancer.TestBalancer |
| Timed out tests | org.apache.hadoop.hdfs.TestWriteReadStripedFile |
|   | org.apache.hadoop.hdfs.server.namenode.TestFileContextXAttr |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765483/HDFS-4015.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fde729f |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-client.html
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12847/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12847/console |


This message was automatically generated.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-06 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945939#comment-14945939
 ] 

Arpit Agarwal commented on HDFS-4015:
-

Hi [~anu], thanks for addressing the earlier feedback. Feedback on the v2 patch.

# We will likely see blocks with future generation stamps during HDFS rollback. 
We should disable this check if NN has been restarted with a rollback option 
(either regular or rolling upgrade rollback).
# I apologize for not noticing this earlier. {{FsStatus}} is tagged as public 
and stable, so changing the constructor signature is incompatible. Instead we 
could add a new constructor that initializes . This will also avoid changes to 
FileSystem, ViewFS, RawLocalFileSystem
# fsck should also print this new counter. We can do it in a separate Jira.
# Don't consider this a binding but I would really like it if {{bytesInFuture}} 
can be renamed especially where it is exposed via public interfaces/metrics. It 
sounds confusing/ominous. {{bytesWithFutureGenerationStamps}} would be more 
precise.

Still reviewing the test cases.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945989#comment-14945989
 ] 

Hadoop QA commented on HDFS-4015:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  22m 47s | Pre-patch trunk has 8 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 20s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 14s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 44s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |   7m 33s | Tests failed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   8m 54s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | hdfs tests | 216m  1s | Tests failed in hadoop-hdfs. |
| {color:red}-1{color} | hdfs tests |   0m 23s | Tests failed in 
hadoop-hdfs-client. |
| | | 287m 42s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.metrics2.impl.TestGangliaMetrics |
|   | hadoop.hdfs.tools.TestGetGroups |
|   | hadoop.hdfs.TestSafeModeWithStripedFile |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.web.TestWebHdfsContentLength |
|   | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary |
|   | hadoop.hdfs.server.namenode.TestAddStripedBlocks |
|   | hadoop.cli.TestHDFSCLI |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
| Timed out tests | org.apache.hadoop.hdfs.web.TestWebHDFSAcl |
| Failed build | hadoop-hdfs-client |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765220/HDFS-4015.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a8b4d0f |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-client.html
 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12814/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12814/console |


This message was automatically generated.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
>

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-09-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936105#comment-14936105
 ] 

Hadoop QA commented on HDFS-4015:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 43s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   7m 40s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |   7m 13s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |   8m 14s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | hdfs tests |   0m 29s | Tests failed in hadoop-hdfs. |
| {color:red}-1{color} | hdfs tests |   0m 23s | Tests failed in 
hadoop-hdfs-client. |
| | |  70m 43s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.net.TestClusterTopology |
|   | hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor |
| Timed out tests | 
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot |
|   | org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
| Failed build | hadoop-hdfs |
|   | hadoop-hdfs-client |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764329/HDFS-4015.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6f335e4 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12746/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12746/console |


This message was automatically generated.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, 
> dfsHealth.html.message.png
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-09-29 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936086#comment-14936086
 ] 

Arpit Agarwal commented on HDFS-4015:
-

Hi [~anu], thanks for this improvement. Few comments below, I haven't reviewed 
the test case yet.

# ClientProtocol.java:729: Perhaps we can describe it as "bytes that are at 
risk for deletion."?
# DFSAdmin.java:474: This can happen even without blocks with future generation 
stamps e.g. DN is restarted after a long downtime and reports blocks for 
deleted files. 
# FSNamesystem.java:4438: For turn-off tip, should we check 
{{getBytesInFuture}} after the threshold of reported blocks isreached? One 
potential issue is that the administrator may see this message and immediately 
run {{-forceExit}} even before block thresholds are reached.
# FSNamesystem.java:4445: "you are ok with data loss." might also be confusing. 
Perhaps we can say "if you are certain that the NameNode was started with the 
correct FsImage and edit logs."
# FSNamesystem.java:4631: Not sure how this works. leaveSafeMode will just 
return {{if (isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0))}}

Comments also posted at 
https://github.com/arp7/hadoop/commit/f16f4525a9a814f0945e76af55ad06b5fc18ecb7

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, 
> dfsHealth.html.message.png
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-09-29 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936147#comment-14936147
 ] 

Anu Engineer commented on HDFS-4015:


Hi [~arpitagarwal], thanks for the review and comments. I will wait for the 
rest of the review comments and post a new patch.

bq.ClientProtocol.java:729: Perhaps we can describe it as "bytes that are at 
risk for deletion."?
Makes sense, I will modify this.

bq. DFSAdmin.java:474: This can happen even without blocks with future 
generation stamps e.g. DN is restarted after a long downtime and reports blocks 
for deleted files.
In this patch we track blocks with generation stamp greater than the current 
highest generation stamp that is known to NN. I have made the assumption that 
if DN comes back on-line and reports blocks for files that have been deleted, 
those Generation IDs for those blocks will be lesser than the current 
Generation Stamp of NN. Please let me know if you think this assumption is not 
valid or breaks down in special cases, Could this happen with V1 vs V2 
generation stamps ? 
bq. FSNamesystem.java:4438: For turn-off tip, should we check getBytesInFuture 
after the threshold of reported blocks isreached? One potential issue is that 
the administrator may see this message and immediately run -forceExit even 
before block thresholds are reached.

With this patch we are slightly changing the behavior of SafeMode. Even if we 
find the threshold blocks we will not exit if we find blocks with future 
generation stamps, under the assumption that NN meta-data has been modified. 

bq. FSNamesystem.java:4445: "you are ok with data loss." might also be 
confusing. Perhaps we can say "if you are certain that the NameNode was started 
with the correct FsImage and edit logs."
Agreed, I will modify this warning. But we also have have the case where 
someone is actually replacing the NN metadata, and is ok with data loss.

bq. FSNamesystem.java:4631: Not sure how this works. leaveSafeMode will just 
return if (isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0))
As the error message says , we are refusing to leave the safe mode -- we want 
the users to send up -forceExit to restart NN with right Metadata files before 
we will move out of safe mode.



> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, 
> dfsHealth.html.message.png
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-09-29 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936001#comment-14936001
 ] 

Anu Engineer commented on HDFS-4015:


Re-attaching the patch since build failed in jenkins , the png files confused 
jenkins as it tried to compile it.


> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, 
> dfsHealth.html.message.png
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-09-29 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936250#comment-14936250
 ] 

Arpit Agarwal commented on HDFS-4015:
-

bq. Please let me know if you think this assumption is not valid or breaks down 
in special cases, Could this happen with V1 vs V2 generation stamps ?
Hi [~anu], your assumption is correct. I was just referring to this statement 
"This means blocks have been reported which do not correspond to any file in 
the namespace". It's a minor point.

bq. With this patch we are slightly changing the behavior of SafeMode. Even if 
we find the threshold blocks we will not exit if we find blocks with future 
generation stamps, under the assumption that NN meta-data has been modified.
Agreed it's the right behavior. I meant the timing of displaying the new safe 
mode tip. It would be better displayed after thresholds are checked, so we know 
that it is a safe time to run the {{-forceExit}} command, assuming the correct 
metadata is being used. I also like [~liuml07]'s suggestion of combining the 
two messages if it is feasible. So the message explains both problems if 
applicable (e.g. _there are X missing blocks and there are Y blocks with 
generation stamps in the future..._). 

bq. As the error message says , we are refusing to leave the safe mode – we 
want the users to send up -forceExit to restart NN with right Metadata files 
before we will move out of safe mode.
{code}
+  case SAFEMODE_FORCE_EXIT:
+if (isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0)) {
+  LOG.warn("Leaving safe mode due to forceExit. This will cause a data 
" +
+  "loss of " + blockManager.getBytesInFuture() + " byte(s).");
+  // we should leave safe mode before clearing bytes, otherwise
+  // there is a race condition where bytes in future may not be zero.
+  leaveSafeMode();
+  blockManager.clearBytesInFuture();
{code}
So it looks like this call to {{leaveSafeMode}} is guaranteed to fail and we 
can remove it. The next iteration of {{SafeModeMonitor}} will bring us out of 
safe mode.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, 
> dfsHealth.html.message.png
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-09-29 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936226#comment-14936226
 ] 

Mingliang Liu commented on HDFS-4015:
-

# This patch looks good overall to me. The first assumption you made, aka the 
_Generation Stamp_ of those blocks reported by a rejoining DN will be less than 
the current highest generation stamp that is known to NN, makes sense to me.
# I agree with [~arpitagarwal] that this tip may not show up until the the 
thresholds are reached. As it surpasses its following threshold message, once 
the administrator sees this warning he/she may think that it is the right time 
to run {{forceExit}} even before block thresholds are reached. Or we may need 
to combine this warning with threshold message.
{code:title=FSNamesystem.java}
+  if(blockManager.getBytesInFuture() > 0) {
+String msg = "Name node detected blocks with generation stamps " ...
+return msg;
+  }
+
{code}
# I suppose the {{reached}} be 0 when we enter safemode, which stands for 
{{safe mode is on, and threshold is not reached yet}}.
{code:title=FSNamesystem.java}
+  @VisibleForTesting
+  synchronized void enableSafeModeForTesting(Configuration conf) {
+SafeModeInfo newSafemode = new SafeModeInfo(conf);
+newSafemode.reached = 1;
+this.safeMode =  newSafemode;
+  }
{code}

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, 
> dfsHealth.html.message.png
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

45 matches

Mail list logo