date:20131106


 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: HDFS-5326.004.patch

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

[
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815326#comment-13815326
]

Colin Patrick McCabe commented on HDFS-5326:

* {{TestClientNamenodeProtocolServerSideTranslatorPB}}: we don't need this test
any more since validation is done server-side. Rationale: I would rather keep
the validation in one place than have it spread out across client and server.
It *must* be on the server, since we can't trust arbitrary clients, so let's
just put it all there and unit test it well.

* {{TestOfflineEditsViewer}}: I think this test failure happened because
jenkins didn't apply the git binary diff to the {{editsStored}} file. I don't
think the version of GNU patch used by jenkins supports git binary diffs.
We've seen this in the past when updating this test.

* added modifyPBCD test to {{TestPathBasedCacheRequests}}.

* Fix bug where we were assuming that all modify requests came with a path.

add modifyDirective to cacheAdmin
-

Key: HDFS-5326
URL: https://issues.apache.org/jira/browse/HDFS-5326
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch

We should add a way of modifying cache directives on the command-line,
similar to how modifyCachePool works.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

[
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815325#comment-13815325
]

Andrew Wang commented on HDFS-5326:
---

Hey Colin, thanks for another mondo patch. I like the overall idea here, and I
like the new PB code and user-facing API. All of the comments here are minor,
I'll do another pass on an updated patch but expect to quickly be +1:

Nitty:
* IdNotFoundException: Can we say what this is used for than just EINVAL?
* Update hdfs-default.xml too with changed config name
* ClientProtocol still revers to cache descriptor in
removePathBasedCacheDescriptor
* Reusing PBCD for the {{filterInfo}} in {{DFS#listPBCD}} is a little too
loosey goosey for me. Normally a PBCD always has a path and pool set, with just
the repl and id being optional. This is a third form of usage, and we'll
probably never want to filter on more than path and pool. How about keeping the
old method signature? We can still use PBCD after DFS for simplicity if you
like. It should probably also be named just {{filter}} too, since the type
isn't named {{PBCDInfo}}.
* Maybe we should rename CachePoolInfo to CachePool so the public APIs and
classes line up, e.g. you {{addPathBasedCacheDirective}} a {{PBCD}}, and
{{addCachePool}} a {{CachePool}}. Or we could have a PBCDi class at the risk of
shaming on Hacker News ;) If we went with PBCDi, listPBCD would of still take a
{{filterInfo}}.
* DFS#listPBCD, DFS#addPathBasedCacheDirective javadoc needs to be updated with
new params/return values
* DFS#removePBCD: can just say id of instead of id id of
* PBCD#getId: javadoc says it gets the path, not the id
* PBCD.Builder#setId: javadoc param descriptions are off
* Organizationally, do you mind moving the new modify stuff in the FSEditLog,
Loader, Op, etc, so it goes add/modify/remove for directives, add/modify/remove
for pools? Compat isn't a concern yet.
* CacheManager has some lines beyond 80 chars due to the new indent

Other:
* FSN#modifyPBCD, need to move the FSPermissionChecker get and the
checkOperation above the retry cache check. We can't throw any exceptions after
the retry cache check that don't also set the retry cache state. It's also I
think normally first checkOperation then the pc get for consistency.
* We should add the new modify directive to DFSTestUtil so it gets tested too
* Seems like a lot of these checks in CacheManager are now very similar. Since
there's now a try/catch wrapping everything, we no longer need to have the
method name in the exception text, and it should also be in the stack trace.
So, can we consolidate some of these into shared validation methods that throw
generic exceptions?

add modifyDirective to cacheAdmin
-

We should add a way of modifying cache directives on the command-line,
similar to how modifyCachePool works.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin


[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815327#comment-13815327
 ] 

Colin Patrick McCabe commented on HDFS-5326:


oops, looks like we commented at the same time.  patch 4 doesn't address your 
comments, just the test failures

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin


[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815330#comment-13815330
 ] 

Andrew Wang commented on HDFS-5326:
---

yea np, I figured the test fixups weren't going to be a big deal. I'll let you 
commit this one too when it's ready so you can ensure that editsStored is 
updated correctly.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin


[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815366#comment-13815366
 ] 

Colin Patrick McCabe commented on HDFS-5326:


bq. IdNotFoundException: Can we say what this is used for than just EINVAL?

ok.  I added a comment similar to the one in {{PathNotFoundException}}

bq. Update hdfs-default.xml too with changed config name

good call

bq. ClientProtocol still revers to cache descriptor in 
removePathBasedCacheDescriptor

fixed

bq. Reusing PBCD for the filterInfo in DFS#listPBCD is a little too loosey 
goosey for me. Normally a PBCD always has a path and pool set, with just the 
repl and id being optional. This is a third form of usage, and we'll probably 
never want to filter on more than path and pool. How about keeping the old 
method signature? We can still use PBCD after DFS for simplicity if you like. 
It should probably also be named just filter too, since the type isn't named 
PBCDInfo.

I don't know.  I'm kind of worried about the number of 
{{listPathBasedCacheDirectives}} overloads multiplying, the way the number of 
{{FileSystem#create}} overloads multiplied.  It seems cleaner to have one 
function that can handle any of these combinations.  Filter does seem like a 
better name than filterInfo, though...

bq. Maybe we should rename CachePoolInfo to CachePool so the public APIs and 
classes line up, e.g. you addPathBasedCacheDirective a PBCD, and addCachePool a 
CachePool. Or we could have a PBCDi class at the risk of shaming on Hacker News 
 If we went with PBCDi, listPBCD would of still take a filterInfo.

I think your instinct is right here.  PBCDi is just too long, whatever other 
merits it has.  But let's talk about possible renaming on another JIRA if we 
can think of something better, since this patch is already kinda big...

bq. DFS#listPBCD, DFS#addPathBasedCacheDirective javadoc needs to be updated 
with new params/return values

Done.  I also added list all directives visible to us to the Javadoc.  
Directives in pools that we don't have read permission on will never be listed.

bq. DFS#removePBCD: can just say id of instead of id id of

ok ok

bq. PBCD#getId / setId off

fixed

bq. CacheManager has some lines beyond 80 chars due to the new indent

fixed

bq. FSN#modifyPBCD, need to move the FSPermissionChecker get and the 
checkOperation above the retry cache check. We can't throw any exceptions after 
the retry cache check that don't also set the retry cache state. It's also I 
think normally first checkOperation then the pc get for consistency.

good catch

bq. We should add the new modify directive to DFSTestUtil so it gets tested too

ok

bq. Seems like a lot of these checks in CacheManager are now very similar. 
Since there's now a try/catch wrapping everything, we no longer need to have 
the method name in the exception text, and it should also be in the stack 
trace. So, can we consolidate some of these into shared validation methods that 
throw generic exceptions?

There sort of aren't as many commonalities as it seems.  The add operation 
checks that everything is set-- nothing can be null.  In contrast, modify 
allows everything to be null, except ID.  I feel like trying to factor out 
methods might make it confusing.  The big things, like {{DFSUtil#isValidName}}, 
are already common code, so I don't feel too bad about it.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-11-06 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815376#comment-13815376
]

Konstantin Shvachko commented on HDFS-2832:
---

Arpit, I think we just agreed that collisions among UUIDs are possible but have
low probability.
This is a concern for me. Even though unlikely, a collision if it happens
creates a serious problem for the system integrity.
Does it concern you?

In my previous comment I tried to explain that in distributed case the
randomness of it is the main problem. Forget for a moment about PRNGs. Assume
that UUID is an incremental counter (such as generation stamp (and now block
id)), which is incremented by each node independently but at start up each
chooses a randomly number to start from. On a single node ++ can go on without
collisions for a long enough time to guarantee I will never see it. Y4K bug is
fine with me.
But if you take the second node and randomly choose a starting number it could
be close to (1000 apart) the starting point of the first node. Then the second
node can only generate 1000 storageIDs before colliding with those generated by
the other node.
The same is with PRNG you just replace ++ with next(). Long period doesn't
matter if you choose your starting points randomly.

Enable support for heterogeneous storages in HDFS
-

Key: HDFS-2832
URL: https://issues.apache.org/jira/browse/HDFS-2832
Project: Hadoop HDFS
Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
Attachments: 20130813-HeterogeneousStorage.pdf, h2832_20131023.patch,
h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch,
h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch,
h2832_20131104.patch, h2832_20131105.patch

HDFS currently supports configuration where storages are a list of
directories. Typically each of these directories correspond to a volume with
its own file system. All these directories are homogeneous and therefore
identified as a single storage at the namenode. I propose, change to the
current model where Datanode * is a * storage, to Datanode * is a collection
* of strorages.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

Tsz Wo (Nicholas), SZE created HDFS-5470:


 Summary: Add back trunk's reportDiff algorithm to HDFS-2832
 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less memory. 
 It also has a faster running time.  We should add it back to the branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5464) Simplify block report diff calculation

2013-11-06 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815385#comment-13815385
 ] 

Konstantin Shvachko commented on HDFS-5464:
---

 you won't argue that the new code is simpler than the existing

Agreed on simpler. :-)

  see if I could come up a better solution

Sure would be interesting to see. I doubt much can be done in this respect. We 
need to find blocks that did not appear in the report: in one pass and with 
constant memory overhead. May be an interview question.

 Simplify block report diff calculation
 --

 Key: HDFS-5464
 URL: https://issues.apache.org/jira/browse/HDFS-5464
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Attachments: h5464_20131105.patch, h5464_20131105b.patch, 
 h5464_20131105c.patch


 The current calculation in BlockManager.reportDiff(..) is unnecessarily 
 complicated.  We could simplify the calculation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832


 [ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5470:
-

Attachment: h5470_20131106.patch

h5470_20131106.patch: add back the trunk code with storage

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815426#comment-13815426
 ] 

Arpit Agarwal commented on HDFS-5470:
-

Nicholas, any benefit to making {{DatanodeStorageInfo#BlockIterator}} an inner 
class? Can it be a static nested class like 
{{DatanodeDescriptor#BlockIterator}}?

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832


[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815434#comment-13815434
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5470:
--

In trunk, it is a static class with a DatanodeDescriptor field.  It is better 
to make it a non-static and use DatanodeDescriptor.this.

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815436#comment-13815436
 ] 

Arpit Agarwal commented on HDFS-5470:
-

The patch looks good to me but just curious what is the advantage? The other 
way it would have a DatanodeDescriptor field initialized in construction. 
Thanks.

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin


 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: HDFS-5326.006.patch

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832


[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815452#comment-13815452
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5470:
--

The advantage for (non-static) inner class is to access the enclosing class 
object by using this reference.  Just like methods, we may make all methods 
static and pass the enclosing object as a parameter.  I beg you won't think 
that it is a good design.

BTW, Java ArrayList.Itr is also non-static
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/ArrayList.java#ArrayList.Itr

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832

2013-11-06 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815456#comment-13815456
 ] 

Arpit Agarwal commented on HDFS-5470:
-

I was thinking of the extra inner object allocation when it may not be needed 
by the caller but it makes sense from code simplicity.

+1 for the patch.

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

[
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao updated HDFS-5428:

Attachment: HDFS-5428.000.patch

Continue the discussion in HDFS-5443 here..

So HDFS-5428.000.patch is a simple patch that implements similar idea mentioned
in HDFS-5443:
1) Record extra information in fsimage to indicate INodeFileUC that are only in
snapshots. To keep the compatibility we keep the information in the
under-construction-files section in fsimage, and just use .snapshot as
their paths.
2) Identify these snapshot files while loading fsimage, and temporarily store
them in a map in SnapshotManager.
3) When calculating total block number when starting NN, besides the files
recorded in the lease map, also deduct the number of files recorded in 2).

In general the idea is very similar to Vinay's patch. The difference is that we
do not keep and maintain records in the lease map and only handle these files
when starting the NN. We can even clear the records in SnapshotManager after
computing the total number of blocks.

One more thing we may need to handle is that if we remove the 0-sized blocks
(HDFS-5443), it is possible that we can have an under-construction file in
snapshot while there is no corresponding blockUC for the file. In that case we
should not record extra information in fsimage for this kind of INodeFileUC.

The current patch is just for demonstration. It can pass the new unit tests in
Vinay's patch. If folks think the general idea is ok, we can continue our work
based on this patch.

under construction files deletion after snapshot+checkpoint+nn restart leads
nn safemode

Key: HDFS-5428
URL: https://issues.apache.org/jira/browse/HDFS-5428
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch

1. allow snapshots under dir /foo
2. create a file /foo/test/bar and start writing to it
3. create a snapshot s1 under /foo after block is allocated and some data has
been written to it
4. Delete the directory /foo/test
5. wait till checkpoint or do saveNameSpace
6. restart NN.
NN enters to safemode.
Analysis:
Snapshot nodes loaded from fsimage are always complete and all blocks will be
in COMPLETE state.
So when the Datanode reports RBW blocks those will not be updated in
blocksmap.
Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832


[ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815466#comment-13815466
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5470:
--

 I was thinking of the extra inner object allocation ...

In our case, the object won't be null.  So it won't have extra object 
allocation.

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin


[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815474#comment-13815474
 ] 

Hadoop QA commented on HDFS-5326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612445/HDFS-5326.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5347//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5347//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5347//console

This message is automatically generated.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (HDFS-5470) Add back trunk's reportDiff algorithm to HDFS-2832


 [ 
https://issues.apache.org/jira/browse/HDFS-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-5470.
--

   Resolution: Fixed
Fix Version/s: Heterogeneous Storage (HDFS-2832)
 Hadoop Flags: Reviewed

Thanks Aprit for reviewing the patch.

I have committed this.

 Add back trunk's reportDiff algorithm to HDFS-2832
 --

 Key: HDFS-5470
 URL: https://issues.apache.org/jira/browse/HDFS-5470
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: Heterogeneous Storage (HDFS-2832)

 Attachments: h5470_20131106.patch


 As reminded by [~shv] in HDFS-5464, the report diff algorithm uses less 
 memory.  It also has a faster running time.  We should add it back to the 
 branch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HDFS-5471) CacheAdmin -listPools fails when pools exist that user does not have permissions to

2013-11-06 Thread Stephen Chu (JIRA)

Stephen Chu created HDFS-5471:
-

 Summary: CacheAdmin -listPools fails when pools exist that user 
does not have permissions to
 Key: HDFS-5471
 URL: https://issues.apache.org/jira/browse/HDFS-5471
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0
Reporter: Stephen Chu


When a user does not have read permissions to a cache pool and executes hdfs 
cacheadmin -listPools the command will error complaining about missing 
required fields with something like:

{code}
[schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools
Exception in thread main 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): 
Message missing required fields: ownerName, groupName, mode, weight
at 
com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ListCachePoolsResponseElementProto$Builder.build(ClientNamenodeProtocolProtos.java:51722)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listCachePools(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2057)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1515)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2051)

at 
org.apache.hadoop.hdfs.tools.CacheAdmin$ListCachePoolsCommand.run(CacheAdmin.java:675)
at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:85)
at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:90)
[schu@hdfs-nfs ~]$ 
{code}

In this example, the pool root has 750 permissions, and the root superuser is 
able to successfully -listPools:

{code}
[root@hdfs-nfs ~]# hdfs cacheadmin -listPools
Found 4 results.
NAME  OWNER  GROUP  MODE   WEIGHT 
bar   root   root   rwxr-xr-x  100
foo   root   root   rwxr-xr-x  100
root  root   root   rwxr-x---  100
schu  root   root   rwxr-xr-x  100
[root@hdfs-nfs ~]# 
{code}


When we modify the root pool to mode 755, schu user can now -listPools 
successfully without error.

{code}
[schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools
Found 4 results.
NAME  OWNER  GROUP  MODE   WEIGHT 
bar   root   root   rwxr-xr-x  100
foo   root   root   rwxr-xr-x  100
root  root   root   rwxr-xr-x  100
schu  root   root   rwxr-xr-x  100
[schu@hdfs-nfs ~]$ 
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace

2013-11-06 Thread Brandon Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

Attachment: HDFS-5252.002.patch

Upload a new patch to address Jing's comments. Also added unit test.

 Stable write is not handled correctly in someplace
 --

 Key: HDFS-5252
 URL: https://issues.apache.org/jira/browse/HDFS-5252
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch


 When the client asks for a stable write but the prerequisite writes are not 
 transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
 to treat the write as unstable write and set the flag to UNSTABLE in the 
 write response.
 One bug was found during test with Ubuntu client when copying one 1KB file. 
 For small files like 1KB file, Ubuntu client does one stable write (with 
 FILE_SYNC flag). However, NFS gateway missed one place 
 where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
 to UNSTABLE.
 With this bug, the client thinks the write is on disk and thus doesn't send 
 COMMIT anymore. The following test tries to read the data back and of course 
 fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace


[ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815513#comment-13815513
 ] 

Hadoop QA commented on HDFS-5252:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612487/HDFS-5252.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5350//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5350//console

This message is automatically generated.

 Stable write is not handled correctly in someplace
 --

 Key: HDFS-5252
 URL: https://issues.apache.org/jira/browse/HDFS-5252
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch


 When the client asks for a stable write but the prerequisite writes are not 
 transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
 to treat the write as unstable write and set the flag to UNSTABLE in the 
 write response.
 One bug was found during test with Ubuntu client when copying one 1KB file. 
 For small files like 1KB file, Ubuntu client does one stable write (with 
 FILE_SYNC flag). However, NFS gateway missed one place 
 where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
 to UNSTABLE.
 With this bug, the client thinks the write is on disk and thus doesn't send 
 COMMIT anymore. The following test tries to read the data back and of course 
 fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HDFS-5472) Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark


 [ 
https://issues.apache.org/jira/browse/HDFS-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5472:
-

Attachment: h5472_20131106.patch

h5472_20131106.patch: simple fixes

 Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark
 ---

 Key: HDFS-5472
 URL: https://issues.apache.org/jira/browse/HDFS-5472
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5472_20131106.patch


 - DatanodeDescriptor should be initialized with updateHeartbeat for updating 
 the timestamps.
 - NNThroughputBenchmark should create DatanodeRegistrations with real 
 datanode UUIDs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HDFS-5472) Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark

Tsz Wo (Nicholas), SZE created HDFS-5472:


 Summary: Fix TestDatanodeManager, TestSafeMode and 
TestNNThroughputBenchmark
 Key: HDFS-5472
 URL: https://issues.apache.org/jira/browse/HDFS-5472
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5472_20131106.patch

- DatanodeDescriptor should be initialized with updateHeartbeat for updating 
the timestamps.
- NNThroughputBenchmark should create DatanodeRegistrations with real datanode 
UUIDs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

[
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815533#comment-13815533
]

Andrew Wang commented on HDFS-5326:
---

Thanks for the bump, looks almost 100%. Just a few comments:

bq. But let's talk about possible renaming on another JIRA if we can think of
something better, since this patch is already kinda big...
Sure, I'll file it.

bq. Organizationally, do you mind moving the new modify stuff in the FSEditLog,
Loader, Op, etc, so it goes add/modify/remove for directives, add/modify/remove
for pools? Compat isn't a concern yet.
This wasn't addressed, could you mind shuffling this around? I guess redoing
the opcodes is optional (though appreciated), but I'd like to see all the
methods/cases organized.

bq. There sort of aren't as many commonalities as it seems.

I took a hack at this and it ended up being less code and IMO cleaner. I can do
this in a follow-on if you like, but:
* Add and modify aren't that different besides the difference in required,
optional, and default fields. I just first validate all present fields in the
directive, then enforce required fields, then fill in default values.
* Modify and remove have the same checks for an existing entry
* Add and modify have the same checks for an existing cache pool
* All three do write checks to a cache pool, moving this into
FSPermissionChecker or a method was an easy savings

I think we should still remove the method name from the exception text
everywhere (and capitalize like a sentence). Also a few other things here:
* need to add a space
{code}
throw new IOException(addDirective: replication' + replication +
throw new IOException(modifyDirective: replication' + replication +
{code}
* success/fail logs are inconsistently formatted. I'd like something like e.g.
methodName: successfully verb directive directive and methodName: failed
to verb noun parameters:, e
{code}
LOG.warn(addDirective + directive + : failed, e);
LOG.info(addDirective + directive + : succeeded.);
...
LOG.warn(modifyDirective + idString + : error, e);
LOG.info(modifyDirective + idString + : applied + directive);
...
LOG.warn(removeDirective + id + failed, e);
LOG.info(removeDirective + id + : removed);
{code}

* I feel like we could dedupe the various PC exception texts by throwing the
AccessControlException in pc#checkPermission itself. I think it's a
straightforward change.
* Unrelated, but I noticed that CacheManager#listPBCDs does a pc check without
first checking if pc is null, want to fix that here?
* I also noticed we have some unused imports in FSEditLog and CacheManager.

add modifyDirective to cacheAdmin
-

We should add a way of modifying cache directives on the command-line,
similar to how modifyCachePool works.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HDFS-5473) Consistent naming of user-visible caching classes and methods

Andrew Wang created HDFS-5473:
-

 Summary: Consistent naming of user-visible caching classes and 
methods
 Key: HDFS-5473
 URL: https://issues.apache.org/jira/browse/HDFS-5473
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang


It's kind of warty that (after HDFS-5326 goes in) DistributedFileSystem has 
{{*CachePool}} methods take a {{CachePoolInfo}} and 
{{*PathBasedCacheDirective}} methods that thake a {{PathBasedCacheDirective}}. 
We should consider renaming {{CachePoolInfo}} to {{CachePool}} for consistency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode


[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815536#comment-13815536
 ] 

sathish commented on HDFS-5428:
---

Continue the discussion ( HDFS-5443) here.
As we discussed yesterday to verify this patch for (HDFS-5443).With this patch 
the issue is still reproducing i..e After restart NN is going to safemode.I am 
not sure where the flow is missing.

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode


[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815551#comment-13815551
 ] 

Hadoop QA commented on HDFS-5428:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612477/HDFS-5428.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5348//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5348//console

This message is automatically generated.

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

[
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815558#comment-13815558
]

Vinay commented on HDFS-5428:
-

Hi Jing, thanks for posting the simplified patch.

Patch looks quite good, making all unit test in my patch pass.

Small improvements required to satisfy below points as well.
bq. (From issue Description) So when the Datanode reports RBW blocks those will
not be updated in blocksmap. Some of the FINALIZED blocks will be marked as
corrupt due to length mismatch.
This problem is still there, because while loading the fsimage, snapshot inodes
are not replaced with an UCInode and last block is COMPLETE. In this case after
reloading from fsimage we will not be able to read the last block.
Replacing such inodes with UCInode is required.

under construction files deletion after snapshot+checkpoint+nn restart leads
nn safemode

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

[
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815559#comment-13815559
]

sathish commented on HDFS-5428:
---

{quote}
But I am little uncomfortable for managing leases for snapshotted files as they
are readonly files, no need of leases. If all others ok on that point, I will
not object.
{quote}

After this point ,Uma and me discussed the same points what Jing has mentioned
in the HDFS-5428-000.patch.
It is better way to maintain the leases regarding the snapshot files in
snapshot manager,As the responsibility of lease manager is to maintain the
leases for open files for write.with the current implementation snapshots are
read only,so there is no need to maintain the leases for snapshotted files in
lease manager.so it is better to maintain the leases regarding the snapshotted
files in snapshot manager.

+1 patch looks good
I will verify this patch in my env once.

under construction files deletion after snapshot+checkpoint+nn restart leads
nn safemode

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode


[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815562#comment-13815562
 ] 

Vinay commented on HDFS-5428:
-

I think, current updated patch HDFS-5428.000.patch can solve HDFS-5443. I mean 
NN will exit from safemode even without removing the 0-sized blocks.
But removing 0-sized blocks will be an added advantage.

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.


[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815566#comment-13815566
 ] 

sathish commented on HDFS-5443:
---

Thanks Jing for the patch.
I verifyied this patch in my env,it is working correctly.This patch is whiping 
out the zero sized blocks,so NN is coming out of safemode
Along with this pacth,if we merge the patch HDFS-5428-v2.patch,i feel it will 
clear all the problems for the underconstruction files with in the snapshot

 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file.
 

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 This issue is reported by Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin


[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815572#comment-13815572
 ] 

Hadoop QA commented on HDFS-5326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612476/HDFS-5326.006.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5349//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5349//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5349//console

This message is automatically generated.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

[
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815580#comment-13815580
]

Vinay commented on HDFS-5428:
-

I tried to update the patch according to my previous comment.
But to replace the exact inode we need to have the full snapshot path. in the
current case since the full snapshot path is not tracked anywhere we cannot
replace the INode.
Need a way to track the full path of the snapshot INode and replace the INode
with INodeFileUC.

under construction files deletion after snapshot+checkpoint+nn restart leads
nn safemode

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode


[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815581#comment-13815581
 ] 

Vinay commented on HDFS-5428:
-

I just tried the following to read the file after restart. It failed with 
BlockMissingException
{code}
  @Test
  public void testWithCheckpoint() throws Exception {
Path path = new Path(/test);
doWriteAndAbort(fs, path);
fs.delete(new Path(/test/test), true);
NameNode nameNode = cluster.getNameNode();
NameNodeAdapter.enterSafeMode(nameNode, false);
NameNodeAdapter.saveNamespace(nameNode);
NameNodeAdapter.leaveSafeMode(nameNode);
cluster.restartNameNode(true);
// read snapshot file after restart
String test2snapshotPath = Snapshot.getSnapshotPath(path.toString(),
s1/test/test2);
DFSTestUtil.readFile(fs, new Path(test2snapshotPath));
String test3snapshotPath = Snapshot.getSnapshotPath(path.toString(),
s1/test/test3);
DFSTestUtil.readFile(fs, new Path(test3snapshotPath));
  }{code}

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin


 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: HDFS-5326.007.patch

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin


[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815585#comment-13815585
 ] 

Colin Patrick McCabe commented on HDFS-5326:


bq. This wasn't addressed, could you mind shuffling this around? I guess 
redoing the opcodes is optional (though appreciated), but I'd like to see all 
the methods/cases organized.

I reordered the opcodes.  I suppose it does make sense to do.

bq. I took a hack at this and it ended up being less code and IMO cleaner. I 
can do this in a follow-on if you like, but:

Let's do this as part of HDFS-5471 if it looks good... similarly with 
refactoring pc#checkPermission.

bq. need to add a space

fixed

bq. Unrelated, but I noticed that CacheManager#listPBCDs does a pc check 
without first checking if pc is null, want to fix that here?

fxied

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin


 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: HDFS-5326.007.patch

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin


 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: (was: HDFS-5326.007.patch)

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HDFS-5461) fallback to non-ssr(local short circuit reads) while oom detected

2013-11-06 Thread Liang Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5461:


Attachment: HDFS-5461.txt

bq.  It's because each open stream holds a buffer, and we have hundreds of open 
streams?
i am not 100% sure, but in my mind, i agree with you,  this oom is easy to 
repro while we have lots of opened storefiles to be read(e.g. compaction can't 
catch up sometimes)

Oh, i see, seems the fallback only meaningful for some config like mine:  big 
Xmx and small MaxDirectMemorySize :)

I attached a patch with more logging about using/pooled direct buffer size. In 
my option, it could be useful probably while online resetting the log level to 
trace  during OOM occur.  And add a simple try/catch fallback handle for OOM 
without introducing any config value, per me, seems this way is more 
reasonable:)

 fallback to non-ssr(local short circuit reads) while oom detected
 -

 Key: HDFS-5461
 URL: https://issues.apache.org/jira/browse/HDFS-5461
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.2.0
Reporter: Liang Xie
 Attachments: HDFS-5461.txt


 Currently, the DirectBufferPool used by ssr feature seems doesn't have a 
 upper-bound limit except DirectMemory VM option. So there's a risk to 
 encounter direct memory oom. see HBASE-8143 for example.
 IMHO, maybe we could improve it a bit:
 1) detect OOM or reach a setting up-limit from caller, then fallback to 
 non-ssr
 2) add a new metric about current raw consumed direct memory size.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

[
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815641#comment-13815641
]

sathish commented on HDFS-5428:
---

vinay as i observed when debugging the scenario along with your patch,
There is some path mismatch,when counting the blocks of snapshotfile under
construction,due to this it is not removing that blocks from block threshold.
{code}
String fileSnapshotPath = StringUtils.replaceOnce(
file,
snapshottableDir,
Snapshot.getSnapshotPath(snapshottableDir,
Snapshot.getSnapshotName(snapshot)));
{code}
String util is not replacing the correct path.
logs for this
2013-11-07 01:05:15,103 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:
Exception in namenode join
java.lang.RuntimeException: java.io.FileNotFoundException: File does not exist:
/.snapshot/snap_6ran/_temporary/0/_temporary/attempt_local1866843415_0001_m_00_0/part-m-0
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:5068)
(FSNamesystem.java:853)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:540)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:482)

under construction files deletion after snapshot+checkpoint+nn restart leads
nn safemode

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

2013-11-06 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815653#comment-13815653
 ] 

Uma Maheswara Rao G commented on HDFS-5443:
---

+1 patch looks good. Thanks Jing, Vinay Sathish for your efforts.


 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file.
 

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 This issue is reported by Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin


[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815659#comment-13815659
 ] 

Hadoop QA commented on HDFS-5326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612514/HDFS-5326.007.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5351//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5351//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5351//console

This message is automatically generated.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin


[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815669#comment-13815669
 ] 

Hadoop QA commented on HDFS-5326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612516/HDFS-5326.007.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5352//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5352//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5352//console

This message is automatically generated.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace


[ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815698#comment-13815698
 ] 

Jing Zhao commented on HDFS-5252:
-

The new patch looks great to me. +1.

 Stable write is not handled correctly in someplace
 --

 Key: HDFS-5252
 URL: https://issues.apache.org/jira/browse/HDFS-5252
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch


 When the client asks for a stable write but the prerequisite writes are not 
 transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
 to treat the write as unstable write and set the flag to UNSTABLE in the 
 write response.
 One bug was found during test with Ubuntu client when copying one 1KB file. 
 For small files like 1KB file, Ubuntu client does one stable write (with 
 FILE_SYNC flag). However, NFS gateway missed one place 
 where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
 to UNSTABLE.
 With this bug, the client thinks the write is on disk and thus doesn't send 
 COMMIT anymore. The following test tries to read the data back and of course 
 fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HDFS-5474) Deletesnapshot can make Namenode in safemode on NN restarts.

2013-11-06 Thread Uma Maheswara Rao G (JIRA)

Uma Maheswara Rao G created HDFS-5474:
-

 Summary: Deletesnapshot can make Namenode in safemode on NN 
restarts.
 Key: HDFS-5474
 URL: https://issues.apache.org/jira/browse/HDFS-5474
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Uma Maheswara Rao G
Assignee: sathish


When we deletesnapshot, we are deleting the blocks associated to that snapshot 
and after that we do logsync to editlog about deleteSnapshot.
There can be a chance that blocks removed from blocks map but before log sync 
if there is BR ,  NN may finds that block does not exist in blocks map and may 
invalidate that block. As part HB, invalidation info also can go. After this 
steps if Namenode shutdown before actually do logsync,  On restart it will 
still consider that snapshot Inodes and expect blocks to report from DN.

Simple solution is, we should simply move down that blocks removal after 
logsync only. Similar to delete op.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode


[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815718#comment-13815718
 ] 

Vinay commented on HDFS-5428:
-

Hi [~sathish.gurram] , You are right. replacement is wrong if the snapshottable 
dir is /. I will update the patch if necessary. ;)

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.


[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815721#comment-13815721
 ] 

Jing Zhao commented on HDFS-5443:
-

Thanks Uma, Sathish and Vinay! I will commit the patch tomorrow morning in case 
there is no more comments.

 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file.
 

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 This issue is reported by Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode