[jira] [Commented] (HDFS-5439) Fix TestPendingReplication

2013-11-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814688#comment-13814688
 ] 

Junping Du commented on HDFS-5439:
--

Thanks Arpit for explanation. Patch looks good to me. Verify it fix the test of 
TestPendingReplication and some other test failures (i.e. TestBlockReport) now, 
so +1. However, I think we still need to fix StorageReceivedDeletedBlocks with 
consistently using storageUuID or DatanodeUUID to initiate it (now you can find 
it get messed up in different places). 

 Fix TestPendingReplication
 --

 Key: HDFS-5439
 URL: https://issues.apache.org/jira/browse/HDFS-5439
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: Heterogeneous Storage (HDFS-2832)

 Attachments: HDFS-5439-demo1.patch, h5439.04.patch


 {{TestPendingReplication}} fails with the following exception:
 {code}
 java.lang.AssertionError: expected:4 but was:3
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at org.junit.Assert.assertEquals(Assert.java:456)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5458) Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs

2013-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814788#comment-13814788
 ] 

Hudson commented on HDFS-5458:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #384 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/384/])
HDFS-5458. Datanode failed volume threshold ignored if exception is thrown in 
getDataDirsFromURIs. Contributed by Mike Mellenthin. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539091)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java


 Datanode failed volume threshold ignored if exception is thrown in 
 getDataDirsFromURIs
 --

 Key: HDFS-5458
 URL: https://issues.apache.org/jira/browse/HDFS-5458
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Mike Mellenthin
 Fix For: 2.2.1

 Attachments: HDFS-5458-1.patch


 Saw a stacktrace of datanode startup with a bad volume, where even listing 
 directories would throw an IOException. The failed volume threshold was set 
 to 1, but it would fatally error out in {{File#getCanonicalPath}} in 
 {{getDataDirsFromURIs}}:
 {code}
   File dir = new File(dirURI.getPath());
   try {
 dataNodeDiskChecker.checkDir(localFS, new Path(dir.toURI()));
 dirs.add(dir);
   } catch (IOException ioe) {
 LOG.warn(Invalid  + DFS_DATANODE_DATA_DIR_KEY +  
 + dir +  : , ioe);
 invalidDirs.append(\).append(dir.getCanonicalPath()).append(\ );
   }
 {code}
 Since {{getCanonicalPath}} can need to do I/O and thus throw an IOException, 
 this catch clause doesn't properly protect startup from a failed volume.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5458) Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs

2013-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814860#comment-13814860
 ] 

Hudson commented on HDFS-5458:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1601 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1601/])
HDFS-5458. Datanode failed volume threshold ignored if exception is thrown in 
getDataDirsFromURIs. Contributed by Mike Mellenthin. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539091)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java


 Datanode failed volume threshold ignored if exception is thrown in 
 getDataDirsFromURIs
 --

 Key: HDFS-5458
 URL: https://issues.apache.org/jira/browse/HDFS-5458
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Mike Mellenthin
 Fix For: 2.2.1

 Attachments: HDFS-5458-1.patch


 Saw a stacktrace of datanode startup with a bad volume, where even listing 
 directories would throw an IOException. The failed volume threshold was set 
 to 1, but it would fatally error out in {{File#getCanonicalPath}} in 
 {{getDataDirsFromURIs}}:
 {code}
   File dir = new File(dirURI.getPath());
   try {
 dataNodeDiskChecker.checkDir(localFS, new Path(dir.toURI()));
 dirs.add(dir);
   } catch (IOException ioe) {
 LOG.warn(Invalid  + DFS_DATANODE_DATA_DIR_KEY +  
 + dir +  : , ioe);
 invalidDirs.append(\).append(dir.getCanonicalPath()).append(\ );
   }
 {code}
 Since {{getCanonicalPath}} can need to do I/O and thus throw an IOException, 
 this catch clause doesn't properly protect startup from a failed volume.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5458) Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs

2013-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814879#comment-13814879
 ] 

Hudson commented on HDFS-5458:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1575 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1575/])
HDFS-5458. Datanode failed volume threshold ignored if exception is thrown in 
getDataDirsFromURIs. Contributed by Mike Mellenthin. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539091)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java


 Datanode failed volume threshold ignored if exception is thrown in 
 getDataDirsFromURIs
 --

 Key: HDFS-5458
 URL: https://issues.apache.org/jira/browse/HDFS-5458
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Mike Mellenthin
 Fix For: 2.2.1

 Attachments: HDFS-5458-1.patch


 Saw a stacktrace of datanode startup with a bad volume, where even listing 
 directories would throw an IOException. The failed volume threshold was set 
 to 1, but it would fatally error out in {{File#getCanonicalPath}} in 
 {{getDataDirsFromURIs}}:
 {code}
   File dir = new File(dirURI.getPath());
   try {
 dataNodeDiskChecker.checkDir(localFS, new Path(dir.toURI()));
 dirs.add(dir);
   } catch (IOException ioe) {
 LOG.warn(Invalid  + DFS_DATANODE_DATA_DIR_KEY +  
 + dir +  : , ioe);
 invalidDirs.append(\).append(dir.getCanonicalPath()).append(\ );
   }
 {code}
 Since {{getCanonicalPath}} can need to do I/O and thus throw an IOException, 
 this catch clause doesn't properly protect startup from a failed volume.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5411) Update Bookkeeper dependency to 4.2.1

2013-11-06 Thread Robert Rati (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814951#comment-13814951
 ] 

Robert Rati commented on HDFS-5411:
---

4.2.2 causes compilation and test issues.  There will need to be additional 
work in addition to this patch to port to 4.2.2.  

 Update Bookkeeper dependency to 4.2.1
 -

 Key: HDFS-5411
 URL: https://issues.apache.org/jira/browse/HDFS-5411
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Robert Rati
Priority: Minor
 Attachments: HDFS-5411.patch


 Update the bookkeeper dependency to 4.2.1.  This eases compilation on Fedora 
 platforms



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5469) Add configuration property for the sub-directroy export path

2013-11-06 Thread Brandon Li (JIRA)
Brandon Li created HDFS-5469:


 Summary: Add configuration property for the sub-directroy export 
path
 Key: HDFS-5469
 URL: https://issues.apache.org/jira/browse/HDFS-5469
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Brandon Li
Assignee: Brandon Li


Currently only HDFS root is exported. Adding this property is the first step to 
support sub-directory mounting.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5469) Add configuration property for the sub-directroy export path

2013-11-06 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815052#comment-13815052
 ] 

Brandon Li commented on HDFS-5469:
--

Mounting sub-directory is especially useful for some Windows NFS clients, which 
can't mount root export possibly due to its own bug.

 Add configuration property for the sub-directroy export path
 

 Key: HDFS-5469
 URL: https://issues.apache.org/jira/browse/HDFS-5469
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li

 Currently only HDFS root is exported. Adding this property is the first step 
 to support sub-directory mounting.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5461) fallback to non-ssr(local short circuit reads) while oom detected

2013-11-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815149#comment-13815149
 ] 

Todd Lipcon commented on HDFS-5461:
---

Adding some kind of limit makes sense. But, I'm curious why we ended up with 
7GB worth of buffers in the referenced HBase ticket. It's because each open 
stream holds a buffer, and we have hundreds of open streams? Without direct 
buffers, wouldn't we just end up with a similar amount of memory usage in 
byte[] buffers, and OOME on the non-native heap?

 fallback to non-ssr(local short circuit reads) while oom detected
 -

 Key: HDFS-5461
 URL: https://issues.apache.org/jira/browse/HDFS-5461
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.2.0
Reporter: Liang Xie

 Currently, the DirectBufferPool used by ssr feature seems doesn't have a 
 upper-bound limit except DirectMemory VM option. So there's a risk to 
 encounter direct memory oom. see HBASE-8143 for example.
 IMHO, maybe we could improve it a bit:
 1) detect OOM or reach a setting up-limit from caller, then fallback to 
 non-ssr
 2) add a new metric about current raw consumed direct memory size.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: HDFS-5326.004.patch

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815326#comment-13815326
 ] 

Colin Patrick McCabe commented on HDFS-5326:


* {{TestClientNamenodeProtocolServerSideTranslatorPB}}: we don't need this test 
any more since validation is done server-side.  Rationale: I would rather keep 
the validation in one place than have it spread out across client and server.  
It *must* be on the server, since we can't trust arbitrary clients, so let's 
just put it all there and unit test it well.

* {{TestOfflineEditsViewer}}: I think this test failure happened because 
jenkins didn't apply the git binary diff to the {{editsStored}} file.  I don't 
think the version of GNU patch used by jenkins supports git binary diffs.  
We've seen this in the past when updating this test.

* added modifyPBCD test to {{TestPathBasedCacheRequests}}.
 
* Fix bug where we were assuming that all modify requests came with a path.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815325#comment-13815325
 ] 

Andrew Wang commented on HDFS-5326:
---

Hey Colin, thanks for another mondo patch. I like the overall idea here, and I 
like the new PB code and user-facing API. All of the comments here are minor, 
I'll do another pass on an updated patch but expect to quickly be +1:

Nitty:
* IdNotFoundException: Can we say what this is used for than just EINVAL?
* Update hdfs-default.xml too with changed config name
* ClientProtocol still revers to cache descriptor in 
removePathBasedCacheDescriptor
* Reusing PBCD for the {{filterInfo}} in {{DFS#listPBCD}} is a little too 
loosey goosey for me. Normally a PBCD always has a path and pool set, with just 
the repl and id being optional. This is a third form of usage, and we'll 
probably never want to filter on more than path and pool. How about keeping the 
old method signature? We can still use PBCD after DFS for simplicity if you 
like. It should probably also be named just {{filter}} too, since the type 
isn't named {{PBCDInfo}}.
* Maybe we should rename CachePoolInfo to CachePool so the public APIs and 
classes line up, e.g. you {{addPathBasedCacheDirective}} a {{PBCD}}, and 
{{addCachePool}} a {{CachePool}}. Or we could have a PBCDi class at the risk of 
shaming on Hacker News ;) If we went with PBCDi, listPBCD would of still take a 
{{filterInfo}}.
* DFS#listPBCD, DFS#addPathBasedCacheDirective javadoc needs to be updated with 
new params/return values
* DFS#removePBCD: can just say id of instead of id id of
* PBCD#getId: javadoc says it gets the path, not the id
* PBCD.Builder#setId: javadoc param descriptions are off
* Organizationally, do you mind moving the new modify stuff in the FSEditLog, 
Loader, Op, etc, so it goes add/modify/remove for directives, add/modify/remove 
for pools? Compat isn't a concern yet.
* CacheManager has some lines beyond 80 chars due to the new indent

Other:
* FSN#modifyPBCD, need to move the FSPermissionChecker get and the 
checkOperation above the retry cache check. We can't throw any exceptions after 
the retry cache check that don't also set the retry cache state. It's also I 
think normally first checkOperation then the pc get for consistency.
* We should add the new modify directive to DFSTestUtil so it gets tested too
* Seems like a lot of these checks in CacheManager are now very similar. Since 
there's now a try/catch wrapping everything, we no longer need to have the 
method name in the exception text, and it should also be in the stack trace. 
So, can we consolidate some of these into shared validation methods that throw 
generic exceptions?

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815327#comment-13815327
 ] 

Colin Patrick McCabe commented on HDFS-5326:


oops, looks like we commented at the same time.  patch 4 doesn't address your 
comments, just the test failures

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815330#comment-13815330
 ] 

Andrew Wang commented on HDFS-5326:
---

yea np, I figured the test fixups weren't going to be a big deal. I'll let you 
commit this one too when it's ready so you can ensure that editsStored is 
updated correctly.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815366#comment-13815366
 ] 

Colin Patrick McCabe commented on HDFS-5326:


bq. IdNotFoundException: Can we say what this is used for than just EINVAL?

ok.  I added a comment similar to the one in {{PathNotFoundException}}

bq. Update hdfs-default.xml too with changed config name

good call

bq. ClientProtocol still revers to cache descriptor in 
removePathBasedCacheDescriptor

fixed

bq. Reusing PBCD for the filterInfo in DFS#listPBCD is a little too loosey 
goosey for me. Normally a PBCD always has a path and pool set, with just the 
repl and id being optional. This is a third form of usage, and we'll probably 
never want to filter on more than path and pool. How about keeping the old 
method signature? We can still use PBCD after DFS for simplicity if you like. 
It should probably also be named just filter too, since the type isn't named 
PBCDInfo.

I don't know.  I'm kind of worried about the number of 
{{listPathBasedCacheDirectives}} overloads multiplying, the way the number of 
{{FileSystem#create}} overloads multiplied.  It seems cleaner to have one 
function that can handle any of these combinations.  Filter does seem like a 
better name than filterInfo, though...

bq. Maybe we should rename CachePoolInfo to CachePool so the public APIs and 
classes line up, e.g. you addPathBasedCacheDirective a PBCD, and addCachePool a 
CachePool. Or we could have a PBCDi class at the risk of shaming on Hacker News 
 If we went with PBCDi, listPBCD would of still take a filterInfo.

I think your instinct is right here.  PBCDi is just too long, whatever other 
merits it has.  But let's talk about possible renaming on another JIRA if we 
can think of something better, since this patch is already kinda big...

bq. DFS#listPBCD, DFS#addPathBasedCacheDirective javadoc needs to be updated 
with new params/return values

Done.  I also added list all directives visible to us to the Javadoc.  
Directives in pools that we don't have read permission on will never be listed.

bq. DFS#removePBCD: can just say id of instead of id id of

ok ok

bq. PBCD#getId / setId off

fixed

bq. CacheManager has some lines beyond 80 chars due to the new indent

fixed

bq. FSN#modifyPBCD, need to move the FSPermissionChecker get and the 
checkOperation above the retry cache check. We can't throw any exceptions after 
the retry cache check that don't also set the retry cache state. It's also I 
think normally first checkOperation then the pc get for consistency.

good catch

bq. We should add the new modify directive to DFSTestUtil so it gets tested too

ok

bq. Seems like a lot of these checks in CacheManager are now very similar. 
Since there's now a try/catch wrapping everything, we no longer need to have 
the method name in the exception text, and it should also be in the stack 
trace. So, can we consolidate some of these into shared validation methods that 
throw generic exceptions?

There sort of aren't as many commonalities as it seems.  The add operation 
checks that everything is set-- nothing can be null.  In contrast, modify 
allows everything to be null, except ID.  I feel like trying to factor out 
methods might make it confusing.  The big things, like {{DFSUtil#isValidName}}, 
are already common code, so I don't feel too bad about it.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-11-06 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815376#comment-13815376
 ] 

Konstantin Shvachko commented on HDFS-2832:
---

Arpit, I think we just agreed that collisions among UUIDs are possible but have 
low probability. 
This is a concern for me. Even though unlikely, a collision if it happens 
creates a serious problem for the system integrity. 
Does it concern you?

In my previous comment I tried to explain that in distributed case the 
randomness of it is the main problem. Forget for a moment about PRNGs. Assume 
that UUID is an incremental counter (such as generation stamp (and now block 
id)), which is incremented by each node independently but at start up each 
chooses a randomly number to start from. On a single node ++ can go on without 
collisions for a long enough time to guarantee I will never see it. Y4K bug is 
fine with me.
But if you take the second node and randomly choose a starting number it could 
be close to (1000 apart) the starting point of the first node. Then the second 
node can only generate 1000 storageIDs before colliding with those generated by 
the other node.
The same is with PRNG you just replace ++ with next(). Long period doesn't 
matter if you choose your starting points randomly.

 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: 20130813-HeterogeneousStorage.pdf, h2832_20131023.patch, 
 h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, 
 h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, 
 h2832_20131104.patch, h2832_20131105.patch


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-5470:


 Summary: Add back trunk's reportDiff algorithm to HDFS-2832
 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less memory. 
 It also has a faster running time.  We should add it back to the branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5464) Simplify block report diff calculation

2013-11-06 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815385#comment-13815385
 ] 

Konstantin Shvachko commented on HDFS-5464:
---

 you won't argue that the new code is simpler than the existing

Agreed on simpler. :-)

  see if I could come up a better solution

Sure would be interesting to see. I doubt much can be done in this respect. We 
need to find blocks that did not appear in the report: in one pass and with 
constant memory overhead. May be an interview question.

 Simplify block report diff calculation
 --

 Key: HDFS-5464
 URL: https://issues.apache.org/jira/browse/HDFS-5464
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Attachments: h5464_20131105.patch, h5464_20131105b.patch, 
 h5464_20131105c.patch


 The current calculation in BlockManager.reportDiff(..) is unnecessarily 
 complicated.  We could simplify the calculation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5470:
-

Attachment: h5470_20131106.patch

h5470_20131106.patch: add back the trunk code with storage

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815426#comment-13815426
 ] 

Arpit Agarwal commented on HDFS-5470:
-

Nicholas, any benefit to making {{DatanodeStorageInfo#BlockIterator}} an inner 
class? Can it be a static nested class like 
{{DatanodeDescriptor#BlockIterator}}?

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815434#comment-13815434
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5470:
--

In trunk, it is a static class with a DatanodeDescriptor field.  It is better 
to make it a non-static and use DatanodeDescriptor.this.

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815436#comment-13815436
 ] 

Arpit Agarwal commented on HDFS-5470:
-

The patch looks good to me but just curious what is the advantage? The other 
way it would have a DatanodeDescriptor field initialized in construction. 
Thanks.

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: HDFS-5326.006.patch

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815452#comment-13815452
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5470:
--

The advantage for (non-static) inner class is to access the enclosing class 
object by using this reference.  Just like methods, we may make all methods 
static and pass the enclosing object as a parameter.  I beg you won't think 
that it is a good design.

BTW, Java ArrayList.Itr is also non-static
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/ArrayList.java#ArrayList.Itr

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815456#comment-13815456
 ] 

Arpit Agarwal commented on HDFS-5470:
-

I was thinking of the extra inner object allocation when it may not be needed 
by the caller but it makes sense from code simplicity.

+1 for the patch.

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5428:


Attachment: HDFS-5428.000.patch

Continue the discussion in HDFS-5443 here..

So HDFS-5428.000.patch is a simple patch that implements similar idea mentioned 
in HDFS-5443: 
1) Record extra information in fsimage to indicate INodeFileUC that are only in 
snapshots. To keep the compatibility we keep the information in the 
under-construction-files section in fsimage, and just use .snapshot as 
their paths.
2) Identify these snapshot files while loading fsimage, and temporarily store 
them in a map in SnapshotManager.
3) When calculating total block number when starting NN, besides the files 
recorded in the lease map, also deduct the number of files recorded in 2).

In general the idea is very similar to Vinay's patch. The difference is that we 
do not keep and maintain records in the lease map and only handle these files 
when starting the NN. We can even clear the records in SnapshotManager after 
computing the total number of blocks.

One more thing we may need to handle is that if we remove the 0-sized blocks 
(HDFS-5443), it is possible that we can have an under-construction file in 
snapshot while there is no corresponding blockUC for the file. In that case we 
should not record extra information in fsimage for this kind of INodeFileUC. 

The current patch is just for demonstration. It can pass the new unit tests in 
Vinay's patch. If folks think the general idea is ok, we can continue our work 
based on this patch.


 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815466#comment-13815466
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5470:
--

 I was thinking of the extra inner object allocation ...

In our case, the object won't be null.  So it won't have extra object 
allocation.

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815474#comment-13815474
 ] 

Hadoop QA commented on HDFS-5326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612445/HDFS-5326.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5347//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5347//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5347//console

This message is automatically generated.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-5470.
--

   Resolution: Fixed
Fix Version/s: Heterogeneous Storage (HDFS-2832)
 Hadoop Flags: Reviewed

Thanks Aprit for reviewing the patch.

I have committed this.

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: Heterogeneous Storage (HDFS-2832)

 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5471) CacheAdmin -listPools fails when pools exist that user does not have permissions to

2013-11-06 Thread Stephen Chu (JIRA)
Stephen Chu created HDFS-5471:
-

 Summary: CacheAdmin -listPools fails when pools exist that user 
does not have permissions to
 Key: HDFS-5471
 URL: https://issues.apache.org/jira/browse/HDFS-5471
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0
Reporter: Stephen Chu


When a user does not have read permissions to a cache pool and executes hdfs 
cacheadmin -listPools the command will error complaining about missing 
required fields with something like:

{code}
[schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools
Exception in thread main 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): 
Message missing required fields: ownerName, groupName, mode, weight
at 
com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ListCachePoolsResponseElementProto$Builder.build(ClientNamenodeProtocolProtos.java:51722)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listCachePools(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2057)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1515)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2051)

at 
org.apache.hadoop.hdfs.tools.CacheAdmin$ListCachePoolsCommand.run(CacheAdmin.java:675)
at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:85)
at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:90)
[schu@hdfs-nfs ~]$ 
{code}

In this example, the pool root has 750 permissions, and the root superuser is 
able to successfully -listPools:

{code}
[root@hdfs-nfs ~]# hdfs cacheadmin -listPools
Found 4 results.
NAME  OWNER  GROUP  MODE   WEIGHT 
bar   root   root   rwxr-xr-x  100
foo   root   root   rwxr-xr-x  100
root  root   root   rwxr-x---  100
schu  root   root   rwxr-xr-x  100
[root@hdfs-nfs ~]# 
{code}


When we modify the root pool to mode 755, schu user can now -listPools 
successfully without error.

{code}
[schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools
Found 4 results.
NAME  OWNER  GROUP  MODE   WEIGHT 
bar   root   root   rwxr-xr-x  100
foo   root   root   rwxr-xr-x  100
root  root   root   rwxr-xr-x  100
schu  root   root   rwxr-xr-x  100
[schu@hdfs-nfs ~]$ 
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace

2013-11-06 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

Attachment: HDFS-5252.002.patch

Upload a new patch to address Jing's comments. Also added unit test.

 Stable write is not handled correctly in someplace
 --

 Key: HDFS-5252
 URL: https://issues.apache.org/jira/browse/HDFS-5252
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch


 When the client asks for a stable write but the prerequisite writes are not 
 transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
 to treat the write as unstable write and set the flag to UNSTABLE in the 
 write response.
 One bug was found during test with Ubuntu client when copying one 1KB file. 
 For small files like 1KB file, Ubuntu client does one stable write (with 
 FILE_SYNC flag). However, NFS gateway missed one place 
 where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
 to UNSTABLE.
 With this bug, the client thinks the write is on disk and thus doesn't send 
 COMMIT anymore. The following test tries to read the data back and of course 
 fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815513#comment-13815513
 ] 

Hadoop QA commented on HDFS-5252:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612487/HDFS-5252.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5350//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5350//console

This message is automatically generated.

 Stable write is not handled correctly in someplace
 --

 Key: HDFS-5252
 URL: https://issues.apache.org/jira/browse/HDFS-5252
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch


 When the client asks for a stable write but the prerequisite writes are not 
 transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
 to treat the write as unstable write and set the flag to UNSTABLE in the 
 write response.
 One bug was found during test with Ubuntu client when copying one 1KB file. 
 For small files like 1KB file, Ubuntu client does one stable write (with 
 FILE_SYNC flag). However, NFS gateway missed one place 
 where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
 to UNSTABLE.
 With this bug, the client thinks the write is on disk and thus doesn't send 
 COMMIT anymore. The following test tries to read the data back and of course 
 fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5472) Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark

2013-11-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5472:
-

Attachment: h5472_20131106.patch

h5472_20131106.patch: simple fixes

 Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark
 ---

 Key: HDFS-5472
 URL: https://issues.apache.org/jira/browse/HDFS-5472
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5472_20131106.patch


 - DatanodeDescriptor should be initialized with updateHeartbeat for updating 
 the timestamps.
 - NNThroughputBenchmark should create DatanodeRegistrations with real 
 datanode UUIDs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5472) Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark

2013-11-06 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-5472:


 Summary: Fix TestDatanodeManager, TestSafeMode and 
TestNNThroughputBenchmark
 Key: HDFS-5472
 URL: https://issues.apache.org/jira/browse/HDFS-5472
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5472_20131106.patch

- DatanodeDescriptor should be initialized with updateHeartbeat for updating 
the timestamps.
- NNThroughputBenchmark should create DatanodeRegistrations with real datanode 
UUIDs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815533#comment-13815533
 ] 

Andrew Wang commented on HDFS-5326:
---

Thanks for the bump, looks almost 100%. Just a few comments:

bq. But let's talk about possible renaming on another JIRA if we can think of 
something better, since this patch is already kinda big...
Sure, I'll file it.

bq. Organizationally, do you mind moving the new modify stuff in the FSEditLog, 
Loader, Op, etc, so it goes add/modify/remove for directives, add/modify/remove 
for pools? Compat isn't a concern yet.
This wasn't addressed, could you mind shuffling this around? I guess redoing 
the opcodes is optional (though appreciated), but I'd like to see all the 
methods/cases organized.

bq. There sort of aren't as many commonalities as it seems.

I took a hack at this and it ended up being less code and IMO cleaner. I can do 
this in a follow-on if you like, but:
* Add and modify aren't that different besides the difference in required, 
optional, and default fields. I just first validate all present fields in the 
directive, then enforce required fields, then fill in default values.
* Modify and remove have the same checks for an existing entry
* Add and modify have the same checks for an existing cache pool
* All three do write checks to a cache pool, moving this into 
FSPermissionChecker or a method was an easy savings

I think we should still remove the method name from the exception text 
everywhere (and capitalize like a sentence). Also a few other things here:
* need to add a space
{code}
throw new IOException(addDirective: replication' + replication +
throw new IOException(modifyDirective: replication' + replication +
{code}
* success/fail logs are inconsistently formatted. I'd like something like e.g. 
methodName: successfully verb directive directive and methodName: failed 
to verb noun parameters:, e
{code}
  LOG.warn(addDirective  + directive + : failed, e);
LOG.info(addDirective  + directive + : succeeded.);
...
  LOG.warn(modifyDirective  + idString + : error, e);
LOG.info(modifyDirective  + idString + : applied  + directive);
...
  LOG.warn(removeDirective  + id +  failed, e);
LOG.info(removeDirective  + id + : removed);
{code}

* I feel like we could dedupe the various PC exception texts by throwing the 
AccessControlException in pc#checkPermission itself. I think it's a 
straightforward change.
* Unrelated, but I noticed that CacheManager#listPBCDs does a pc check without 
first checking if pc is null, want to fix that here?
* I also noticed we have some unused imports in FSEditLog and CacheManager.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5473) Consistent naming of user-visible caching classes and methods

2013-11-06 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-5473:
-

 Summary: Consistent naming of user-visible caching classes and 
methods
 Key: HDFS-5473
 URL: https://issues.apache.org/jira/browse/HDFS-5473
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang


It's kind of warty that (after HDFS-5326 goes in) DistributedFileSystem has 
{{*CachePool}} methods take a {{CachePoolInfo}} and 
{{*PathBasedCacheDirective}} methods that thake a {{PathBasedCacheDirective}}. 
We should consider renaming {{CachePoolInfo}} to {{CachePool}} for consistency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread sathish (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815536#comment-13815536
 ] 

sathish commented on HDFS-5428:
---

Continue the discussion ( HDFS-5443) here.
As we discussed yesterday to verify this patch for (HDFS-5443).With this patch 
the issue is still reproducing i..e After restart NN is going to safemode.I am 
not sure where the flow is missing.

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815551#comment-13815551
 ] 

Hadoop QA commented on HDFS-5428:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612477/HDFS-5428.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5348//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5348//console

This message is automatically generated.

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815558#comment-13815558
 ] 

Vinay commented on HDFS-5428:
-

Hi Jing, thanks for posting the simplified patch.

Patch looks quite good, making all unit test in my patch pass.

Small improvements required to satisfy below points as well.
bq. (From issue Description) So when the Datanode reports RBW blocks those will 
not be updated in blocksmap. Some of the FINALIZED blocks will be marked as 
corrupt due to length mismatch.
This problem is still there, because while loading the fsimage, snapshot inodes 
are not replaced with an UCInode and last block is COMPLETE. In this case after 
reloading from fsimage we will not be able to read the last block. 
Replacing such inodes with UCInode is required.


 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread sathish (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815559#comment-13815559
 ] 

sathish commented on HDFS-5428:
---

{quote}
But I am little uncomfortable for managing leases for snapshotted files as they 
are readonly files, no need of leases. If all others ok on that point, I will 
not object.
{quote}

After this point ,Uma and me discussed the same points what Jing has mentioned 
in the HDFS-5428-000.patch.
It is better way to maintain the leases regarding the snapshot files in 
snapshot manager,As the responsibility of lease manager is to maintain the 
leases for open files for write.with the current implementation snapshots are 
read only,so there is no need to maintain the leases for snapshotted files in 
lease manager.so it is better to maintain the leases regarding the snapshotted 
files in snapshot manager.

+1 patch looks good
I will verify this patch in my env once.






































 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815562#comment-13815562
 ] 

Vinay commented on HDFS-5428:
-

I think, current updated patch HDFS-5428.000.patch can solve HDFS-5443. I mean 
NN will exit from safemode even without removing the 0-sized blocks.
But removing 0-sized blocks will be an added advantage.

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

2013-11-06 Thread sathish (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815566#comment-13815566
 ] 

sathish commented on HDFS-5443:
---

Thanks Jing for the patch.
I verifyied this patch in my env,it is working correctly.This patch is whiping 
out the zero sized blocks,so NN is coming out of safemode
Along with this pacth,if we merge the patch HDFS-5428-v2.patch,i feel it will 
clear all the problems for the underconstruction files with in the snapshot

 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file.
 

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 This issue is reported by Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815572#comment-13815572
 ] 

Hadoop QA commented on HDFS-5326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612476/HDFS-5326.006.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5349//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5349//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5349//console

This message is automatically generated.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815580#comment-13815580
 ] 

Vinay commented on HDFS-5428:
-

I tried to update the patch according to my previous comment.
But to replace the exact inode we need to have the full snapshot path. in the 
current case since the full snapshot path is not tracked anywhere we cannot 
replace the INode.
Need a way to track the full path of the snapshot INode and replace the INode 
with INodeFileUC.

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815581#comment-13815581
 ] 

Vinay commented on HDFS-5428:
-

I just tried the following to read the file after restart. It failed with 
BlockMissingException
{code}
  @Test
  public void testWithCheckpoint() throws Exception {
Path path = new Path(/test);
doWriteAndAbort(fs, path);
fs.delete(new Path(/test/test), true);
NameNode nameNode = cluster.getNameNode();
NameNodeAdapter.enterSafeMode(nameNode, false);
NameNodeAdapter.saveNamespace(nameNode);
NameNodeAdapter.leaveSafeMode(nameNode);
cluster.restartNameNode(true);
// read snapshot file after restart
String test2snapshotPath = Snapshot.getSnapshotPath(path.toString(),
s1/test/test2);
DFSTestUtil.readFile(fs, new Path(test2snapshotPath));
String test3snapshotPath = Snapshot.getSnapshotPath(path.toString(),
s1/test/test3);
DFSTestUtil.readFile(fs, new Path(test3snapshotPath));
  }{code}

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: HDFS-5326.007.patch

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815585#comment-13815585
 ] 

Colin Patrick McCabe commented on HDFS-5326:


bq. This wasn't addressed, could you mind shuffling this around? I guess 
redoing the opcodes is optional (though appreciated), but I'd like to see all 
the methods/cases organized.

I reordered the opcodes.  I suppose it does make sense to do.

bq. I took a hack at this and it ended up being less code and IMO cleaner. I 
can do this in a follow-on if you like, but:

Let's do this as part of HDFS-5471 if it looks good... similarly with 
refactoring pc#checkPermission.

bq. need to add a space

fixed

bq. Unrelated, but I noticed that CacheManager#listPBCDs does a pc check 
without first checking if pc is null, want to fix that here?

fxied

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: HDFS-5326.007.patch

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: (was: HDFS-5326.007.patch)

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5461) fallback to non-ssr(local short circuit reads) while oom detected

2013-11-06 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5461:


Attachment: HDFS-5461.txt

bq.  It's because each open stream holds a buffer, and we have hundreds of open 
streams?
i am not 100% sure, but in my mind, i agree with you,  this oom is easy to 
repro while we have lots of opened storefiles to be read(e.g. compaction can't 
catch up sometimes)

Oh, i see, seems the fallback only meaningful for some config like mine:  big 
Xmx and small MaxDirectMemorySize :)

I attached a patch with more logging about using/pooled direct buffer size. In 
my option, it could be useful probably while online resetting the log level to 
trace  during OOM occur.  And add a simple try/catch fallback handle for OOM 
without introducing any config value, per me, seems this way is more 
reasonable:)

 fallback to non-ssr(local short circuit reads) while oom detected
 -

 Key: HDFS-5461
 URL: https://issues.apache.org/jira/browse/HDFS-5461
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.2.0
Reporter: Liang Xie
 Attachments: HDFS-5461.txt


 Currently, the DirectBufferPool used by ssr feature seems doesn't have a 
 upper-bound limit except DirectMemory VM option. So there's a risk to 
 encounter direct memory oom. see HBASE-8143 for example.
 IMHO, maybe we could improve it a bit:
 1) detect OOM or reach a setting up-limit from caller, then fallback to 
 non-ssr
 2) add a new metric about current raw consumed direct memory size.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread sathish (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815641#comment-13815641
 ] 

sathish commented on HDFS-5428:
---

vinay as i observed when debugging the scenario along with your patch,
There is some path mismatch,when counting the blocks of snapshotfile under 
construction,due to this it is not removing that blocks from block threshold.
{code}
String fileSnapshotPath = StringUtils.replaceOnce(
  file,
  snapshottableDir,
  Snapshot.getSnapshotPath(snapshottableDir,
  Snapshot.getSnapshotName(snapshot)));
{code}
String util is not replacing the correct path.
logs for this 
2013-11-07 01:05:15,103 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: 
Exception in namenode join
java.lang.RuntimeException: java.io.FileNotFoundException: File does not exist: 
/.snapshot/snap_6ran/_temporary/0/_temporary/attempt_local1866843415_0001_m_00_0/part-m-0
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:5068)
(FSNamesystem.java:853)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:540)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:482)


 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

2013-11-06 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815653#comment-13815653
 ] 

Uma Maheswara Rao G commented on HDFS-5443:
---

+1 patch looks good. Thanks Jing, Vinay Sathish for your efforts.


 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file.
 

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 This issue is reported by Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815659#comment-13815659
 ] 

Hadoop QA commented on HDFS-5326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612514/HDFS-5326.007.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5351//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5351//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5351//console

This message is automatically generated.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815669#comment-13815669
 ] 

Hadoop QA commented on HDFS-5326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612516/HDFS-5326.007.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5352//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5352//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5352//console

This message is automatically generated.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace

2013-11-06 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815698#comment-13815698
 ] 

Jing Zhao commented on HDFS-5252:
-

The new patch looks great to me. +1.

 Stable write is not handled correctly in someplace
 --

 Key: HDFS-5252
 URL: https://issues.apache.org/jira/browse/HDFS-5252
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch


 When the client asks for a stable write but the prerequisite writes are not 
 transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
 to treat the write as unstable write and set the flag to UNSTABLE in the 
 write response.
 One bug was found during test with Ubuntu client when copying one 1KB file. 
 For small files like 1KB file, Ubuntu client does one stable write (with 
 FILE_SYNC flag). However, NFS gateway missed one place 
 where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
 to UNSTABLE.
 With this bug, the client thinks the write is on disk and thus doesn't send 
 COMMIT anymore. The following test tries to read the data back and of course 
 fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5474) Deletesnapshot can make Namenode in safemode on NN restarts.

2013-11-06 Thread Uma Maheswara Rao G (JIRA)
Uma Maheswara Rao G created HDFS-5474:
-

 Summary: Deletesnapshot can make Namenode in safemode on NN 
restarts.
 Key: HDFS-5474
 URL: https://issues.apache.org/jira/browse/HDFS-5474
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Uma Maheswara Rao G
Assignee: sathish


When we deletesnapshot, we are deleting the blocks associated to that snapshot 
and after that we do logsync to editlog about deleteSnapshot.
There can be a chance that blocks removed from blocks map but before log sync 
if there is BR ,  NN may finds that block does not exist in blocks map and may 
invalidate that block. As part HB, invalidation info also can go. After this 
steps if Namenode shutdown before actually do logsync,  On restart it will 
still consider that snapshot Inodes and expect blocks to report from DN.

Simple solution is, we should simply move down that blocks removal after 
logsync only. Similar to delete op.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815718#comment-13815718
 ] 

Vinay commented on HDFS-5428:
-

Hi [~sathish.gurram] , You are right. replacement is wrong if the snapshottable 
dir is /. I will update the patch if necessary. ;)

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

2013-11-06 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815721#comment-13815721
 ] 

Jing Zhao commented on HDFS-5443:
-

Thanks Uma, Sathish and Vinay! I will commit the patch tomorrow morning in case 
there is no more comments.

 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file.
 

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 This issue is reported by Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-06 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815727#comment-13815727
 ] 

Jing Zhao commented on HDFS-5428:
-

bq. But to replace the exact inode we need to have the full snapshot path. in 
the current case since the full snapshot path is not tracked anywhere we cannot 
replace the INode.

Yeah, in our current implementation it's hard (sometime impossible) to get the 
full path for a given snapshot inode. Thus it will be hard to replace the whole 
INodeFile. So here my question is whether it's possible that we just replace 
the last block of the snapshot INode with a BlockInfoUC (but without replacing 
the INodeFile with an INodeFileUC)?


 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)