date:20140109


 [ 
https://issues.apache.org/jira/browse/HDFS-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5651:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

this was committed to trunk

 Remove dfs.namenode.caching.enabled and improve CRM locking
 ---

 Key: HDFS-5651
 URL: https://issues.apache.org/jira/browse/HDFS-5651
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0

 Attachments: HDFS-5651.001.patch, HDFS-5651.002.patch, 
 HDFS-5651.003.patch, HDFS-5651.004.patch, HDFS-5651.006.patch, 
 HDFS-5651.006.patch, HDFS-5651.008.patch, HDFS-5651.009.patch


 We can remove dfs.namenode.caching.enabled and simply always enable caching, 
 similar to how we do with snapshots and other features.  The main overhead is 
 the size of the cachedBlocks GSet.  However, we can simply make the size of 
 this GSet configurable, and people who don't want caching can set it to a 
 very small value.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5746) add ShortCircuitSharedMemorySegment

Colin Patrick McCabe created HDFS-5746:
--

 Summary: add ShortCircuitSharedMemorySegment
 Key: HDFS-5746
 URL: https://issues.apache.org/jira/browse/HDFS-5746
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0


Add ShortCircuitSharedMemorySegment, which will be used to communicate 
information between the datanode and the client about whether a replica is 
mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

[
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866545#comment-13866545
]

Colin Patrick McCabe commented on HDFS-5182:

A few notes about the planned implementation here:

The main idea here is to have a shared memory segment which the DFSClient and
Datanode can both read and write. Before each read, the DFSClient will look at
this shared memory segment to see if it can be anchored. A segment will be
anchorable if the datanode has mlocked it. If the segment can be anchored, the
dfsclient will increment the anchor count. Then, the client can read without
validating the checksum. When the client is done reading it will decrement the
anchor count. These are just memory operations, so they will be fast.

Similarly, when the client tries to do a zero-copy read, it will check to see
if the segment is anchorable, and increment the anchor count before performing
the mmap. The anchor count will stay incremented until the mmap is closed.
One exception is if the client passes the ReadOption.SKIP_CHECKSUMS flag. In
that case, we do not need to consult the anchor flag because we are willing to
tolerate bad data being returned or SIGBUS.

Shared memory segments will have a fixed size and contain a series of
fixed-size slots. The client will request a shared memory segment via the
REQUEST_SHORT_CIRCUIT_FDS operation. Of course, not every
REQUEST_SHORT_CIRCUIT_FDS operation needs to get a new shared memory segment,
since each segment can hold multiple slots. The client caches these segments
and only requests a new one when it needs it. Segments will be closed when no
more slots in them are in use.

One issue with the shared memory segments discussed here is that when a client
terminates, the datanode receives no notification that the shared memory
segment it created is no longer needed. For this reason, each shared memory
segment will have a domain socket associated with it. The only function of
this socket is to cause a close notification to be sent to the datanode when
the client closes (or vice versa). (When a UNIX domain socket closes, the
remote end gets a close notification). The socket which is used will be the
same socket on which the REQUEST_SHORT_CIRCUIT_FDS that fetched the segment was
performed. We simply don't put it back into the peer cache.

BlockReaderLocal must allow zero-copy reads only when the DN believes it's
valid
-

Key: HDFS-5182
URL: https://issues.apache.org/jira/browse/HDFS-5182
Project: Hadoop HDFS
Issue Type: Sub-task
Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe

BlockReaderLocal must allow zero-copy reads only when the DN believes it's
valid. This implies adding a new field to the response to
REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the
client to the DN, so that the DN can inform the client when the mapped region
is no longer locked into memory.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment


[ 
https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866564#comment-13866564
 ] 

Colin Patrick McCabe commented on HDFS-5746:


See here for some notes about the strategy for 5182: 
https://issues.apache.org/jira/browse/HDFS-5182?focusedCommentId=13866545page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13866545

* Add SharedFileDescriptorFactory.  This is a class which can produce anonymous 
shared memory segments suitable for passing from the DataNode to the DFSClient 
via file descriptor passing.  It would have been nice to do this without JNI, 
but unfortunately we don't have {{open(O_EXCL)}} support in JDK6, which we're 
still supporting.  There is {{NativeIO#open}}, but it doesn't allow me to 
cleanly separate {{EEXIST}} errors from other errors (the error gets turned 
into a textual exception which I don't want to parse).  Also there may be some 
symlink issues with the JDK6 java APIs for listing files in a directory, etc.  
Overall, the native implementation was just easier.  This is something we 
should probably revisit with JDK7, of course.

* Add {{NativeIO#mmap}} and {{NativeIO#munmap}}.  Although it would be nicer to 
use {{FileChannel#map}}, there is no public interface to get access to the 
virtual memory address of a {{MappedByteBuffer}}, and I needed that.  Luckily, 
the amount of code needed to just call mmap is really small.

* I didn't want to duplicate the code used to stuff a reference count + closed 
bit into {{DomainSocket#refCount}}, so I factored it out into 
{{CloseableReferenceCount}}.  This class is now used in both DomainSocket and 
{{ShortCircuitSharedMemorySegment}}.

* {{DomainSocketWatcher}} is a thread which calls poll() in a loop.  This will 
be used to detect when a DFSClient has closed and its shared memory segments 
can be closed, by detecting when their associated DomainSockets are closed.  I 
used poll() here rather than select() since select() has some limitations with 
high-numbered file descriptors on some platforms.  Also, poll's interface is a 
bit simpler.  It would have been nice to use Java NIO for this, but 
{{DomainSocket}} is not integrated with NIO.  poll() doesn't scale as well as 
epoll() and other platform-specific functions, but we don't need it to, since 
this just for handling clients closing, which should be a relatively infrequent 
event.  We're not using this for handling every packet sent through a webserver 
or something.

* {{ShortCircuitSharedMemorySegment}} is entirely in Java, using 
{{sun.misc.Unsafe}} for the anchor / unanchor / etc. operations.  This is 
preferrable to using JNI for this, since {{Unsafe#compareAndSwap}} will be 
inlined by the JVM.  (Thanks to [~tlipcon] for pointing out the existence of 
these functions).

 add ShortCircuitSharedMemorySegment
 ---

 Key: HDFS-5746
 URL: https://issues.apache.org/jira/browse/HDFS-5746
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0


 Add ShortCircuitSharedMemorySegment, which will be used to communicate 
 information between the datanode and the client about whether a replica is 
 mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5746) add ShortCircuitSharedMemorySegment


 [ 
https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5746:
---

Attachment: HDFS-5746.001.patch

 add ShortCircuitSharedMemorySegment
 ---

 Key: HDFS-5746
 URL: https://issues.apache.org/jira/browse/HDFS-5746
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0

 Attachments: HDFS-5746.001.patch


 Add ShortCircuitSharedMemorySegment, which will be used to communicate 
 information between the datanode and the client about whether a replica is 
 mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5746) add ShortCircuitSharedMemorySegment


 [ 
https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5746:
---

Status: Patch Available  (was: Open)

 add ShortCircuitSharedMemorySegment
 ---

 Key: HDFS-5746
 URL: https://issues.apache.org/jira/browse/HDFS-5746
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0

 Attachments: HDFS-5746.001.patch


 Add ShortCircuitSharedMemorySegment, which will be used to communicate 
 information between the datanode and the client about whether a replica is 
 mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null


[ 
https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866568#comment-13866568
 ] 

Hadoop QA commented on HDFS-5710:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621824/HDFS-5710.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5851//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5851//console

This message is automatically generated.

 FSDirectory#getFullPathName should check inodes against null
 

 Key: HDFS-5710
 URL: https://issues.apache.org/jira/browse/HDFS-5710
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Ted Yu
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-5710.patch, hdfs-5710-output.html


 From 
 https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/
  :
 {code}
 2014-01-01 00:10:15,571 INFO  [IPC Server handler 2 on 50198] 
 blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: 
 blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 
 2014-01-01 00:10:16,559 WARN  
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  namenode.FSDirectory(1854): Could not get full path. Corresponding file 
 might have deleted already.
 2014-01-01 00:10:16,560 FATAL 
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor 
 thread received Runtime exception. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871)
   at 
 org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482)
   at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112)
   at java.lang.Thread.run(Thread.java:724)
 {code}
 Looks like getRelativePathINodes() returned null but getFullPathName() didn't 
 check inodes against null, leading to NPE.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4922) Improve the short-circuit document


[ 
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866590#comment-13866590
 ] 

Hadoop QA commented on HDFS-4922:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622145/HDFS-4922-004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5852//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5852//console

This message is automatically generated.

 Improve the short-circuit document
 --

 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, 
 HDFS-4922-004.patch, HDFS-4922.patch


 explain the default value and add one configure key, which doesn't show in 
 the document, but exists in the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4922) Improve the short-circuit document


[ 
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866611#comment-13866611
 ] 

Hadoop QA commented on HDFS-4922:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622145/HDFS-4922-004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5853//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5853//console

This message is automatically generated.

 Improve the short-circuit document
 --

 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, 
 HDFS-4922-004.patch, HDFS-4922.patch


 explain the default value and add one configure key, which doesn't show in 
 the document, but exists in the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns

2014-01-09 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-5721:
-

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

I have committed this patch. Thanks Ted and Uma!

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 3.0.0

 Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt, hdfs-5721-v3.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns

2014-01-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866651#comment-13866651
 ] 

Hudson commented on HDFS-5721:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4978 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4978/])
HDFS-5721. sharedEditsImage in Namenode#initializeSharedEdits() should be 
closed before method returns. (Ted Yu via junping_du) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556803)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java


 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 3.0.0

 Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt, hdfs-5721-v3.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment


[ 
https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866659#comment-13866659
 ] 

Hadoop QA commented on HDFS-5746:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622157/HDFS-5746.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1550 javac 
compiler warnings (more than the trunk's current 1545 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5854//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5854//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5854//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5854//console

This message is automatically generated.

 add ShortCircuitSharedMemorySegment
 ---

 Key: HDFS-5746
 URL: https://issues.apache.org/jira/browse/HDFS-5746
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0

 Attachments: HDFS-5746.001.patch


 Add ShortCircuitSharedMemorySegment, which will be used to communicate 
 information between the datanode and the client about whether a replica is 
 mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException

Tsz Wo (Nicholas), SZE created HDFS-5747:


 Summary: BlocksMap.getStoredBlock(..) and 
BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw 
NullPointerException
 Key: HDFS-5747
 URL: https://issues.apache.org/jira/browse/HDFS-5747
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE


Found these NPEs in [build 
#5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/].
- BlocksMap is accessed after close:
{code}
2014-01-09 04:28:32,350 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
handler 2 on 58333, call 
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted 
from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018)
...
{code}
- expectedLocation can be null.
{code}
2014-01-09 04:28:35,384 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
handler 5 on 58333, call 
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 
127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987)
...
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5645) Support upgrade marker in editlog streams


[ 
https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866749#comment-13866749
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5645:
--

TestOfflineEditsViewer needs the binary edit log file.

TestPersistBlocks is not related. Filed HDFS-5747.

 Support upgrade marker in editlog streams
 -

 Key: HDFS-5645
 URL: https://issues.apache.org/jira/browse/HDFS-5645
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch


 During upgrade, a marker can be inserted into the editlog streams so that it 
 is possible to roll back to the marker transaction.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5748) Too much information shown in the dfs health page.

2014-01-09 Thread Kihwal Lee (JIRA)

Kihwal Lee created HDFS-5748:


 Summary: Too much information shown in the dfs health page.
 Key: HDFS-5748
 URL: https://issues.apache.org/jira/browse/HDFS-5748
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kihwal Lee


I've noticed that the node lists are shown in the default name node web page.  
This may be fine for small clusters, but for clusters with 1000s of nodes, this 
is not ideal. The following should be shown on demand. (Some of them have been 
there even before the recent rework.)

- Detailed data node information
- Startup progress
- Snapshot information




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5449) WebHdfs compatibility broken between 2.2 and 1.x / 23.x

2014-01-09 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866829#comment-13866829
 ] 

Daryn Sharp commented on HDFS-5449:
---

+1 Looks good.  The odd casting isn't a big deal.

 WebHdfs compatibility broken between 2.2 and 1.x / 23.x
 ---

 Key: HDFS-5449
 URL: https://issues.apache.org/jira/browse/HDFS-5449
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-5449.patch, HDFS-5449.patch, HDFS-5449.trunk.patch, 
 HDFS-5449.trunk.patch


 Similarly to HDFS-5403, getFileBlockLocations() fail between old (1.x, 
 0.23.x) and new (2.x), but this is worse since both directions won't work.  
 This is caused by the removal of name field from the serialized json format 
 of DatanodeInfo. 
 2.x namenode should include name (ip:port) in the response and 2.x webhdfs 
 client should use name, if ipAddr and xferPort don't exist in the 
 response. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4922) Improve the short-circuit document


[ 
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866841#comment-13866841
 ] 

Colin Patrick McCabe commented on HDFS-4922:


{code}
+  Local block reader maintains a chunk buffer, This controls the maximum chunks
+  can be filled in the chunk buffer for each read.
+  It would be better to be integral multiple of dfs.bytes-per-checksum
{code}

You should mention that this is specified in terms of bytes.

 Improve the short-circuit document
 --

 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, 
 HDFS-4922-004.patch, HDFS-4922.patch


 explain the default value and add one configure key, which doesn't show in 
 the document, but exists in the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration


 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Sheffer updated HDFS-5677:
--

Attachment: ha_config_warning.patch

Patch to issue warning message for unresolved namenode hostname on startup.

 Need error checking for HA cluster configuration
 

 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Assignee: Vincent Sheffer
Priority: Minor
 Fix For: 3.0.0, 2.3.0

 Attachments: ha_config_warning.patch


 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
 defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
 *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
 message is provided to indicate that.
 The only indication of a problem is a log message like the following:
 {code}
 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
 server: myCluster:8020
 {code}
 Another way to look at this is that no error or warning is provided when a 
 servicerpc-address/rpc-address property is defined for a node without a 
 corresponding node declared in *dfs.ha.namenodes.myCluster*.
 This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
 one of my node names.  It would be very helpful to have at least a warning 
 message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration


 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Sheffer updated HDFS-5677:
--

Release Note: Issue a warning message in the logs if a namenode does not 
resolve properly on startup.
  Status: Patch Available  (was: In Progress)

Added a simple check to determine if a namenode hostname has been successfully 
resolved at startup and, if not, append a WARNing message in the log.

 Need error checking for HA cluster configuration
 

 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Assignee: Vincent Sheffer
Priority: Minor
 Fix For: 3.0.0, 2.3.0

 Attachments: ha_config_warning.patch


 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
 defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
 *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
 message is provided to indicate that.
 The only indication of a problem is a log message like the following:
 {code}
 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
 server: myCluster:8020
 {code}
 Another way to look at this is that no error or warning is provided when a 
 servicerpc-address/rpc-address property is defined for a node without a 
 corresponding node declared in *dfs.ha.namenodes.myCluster*.
 This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
 one of my node names.  It would be very helpful to have at least a warning 
 message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Work started] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException


 [ 
https://issues.apache.org/jira/browse/HDFS-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5747 started by Arpit Agarwal.

 BlocksMap.getStoredBlock(..) and 
 BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw 
 NullPointerException
 -

 Key: HDFS-5747
 URL: https://issues.apache.org/jira/browse/HDFS-5747
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Arpit Agarwal

 Found these NPEs in [build 
 #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/].
 - BlocksMap is accessed after close:
 {code}
 2014-01-09 04:28:32,350 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
 handler 2 on 58333, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted
  from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018)
   ...
 {code}
 - expectedLocation can be null.
 {code}
 2014-01-09 04:28:35,384 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
 handler 5 on 58333, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 
 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987)
   ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration


 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Sheffer updated HDFS-5677:
--

Attachment: (was: ha_config_warning.patch)

 Need error checking for HA cluster configuration
 

 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Assignee: Vincent Sheffer
Priority: Minor
 Fix For: 3.0.0, 2.3.0


 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
 defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
 *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
 message is provided to indicate that.
 The only indication of a problem is a log message like the following:
 {code}
 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
 server: myCluster:8020
 {code}
 Another way to look at this is that no error or warning is provided when a 
 servicerpc-address/rpc-address property is defined for a node without a 
 corresponding node declared in *dfs.ha.namenodes.myCluster*.
 This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
 one of my node names.  It would be very helpful to have at least a warning 
 message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5677) Need error checking for HA cluster configuration


[ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866892#comment-13866892
 ] 

Vincent Sheffer commented on HDFS-5677:
---

Just a note:  I deleted the submitted patch because I did not follow the steps 
outlined in the http://wiki.apache.org/hadoop/HowToContribute.  Following 
proper procedure now...

 Need error checking for HA cluster configuration
 

 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Assignee: Vincent Sheffer
Priority: Minor
 Fix For: 3.0.0, 2.3.0


 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
 defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
 *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
 message is provided to indicate that.
 The only indication of a problem is a log message like the following:
 {code}
 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
 server: myCluster:8020
 {code}
 Another way to look at this is that no error or warning is provided when a 
 servicerpc-address/rpc-address property is defined for a node without a 
 corresponding node declared in *dfs.ha.namenodes.myCluster*.
 This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
 one of my node names.  It would be very helpful to have at least a warning 
 message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5449) WebHdfs compatibility broken between 2.2 and 1.x / 23.x

2014-01-09 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5449:
-

Attachment: HDFS-5449.branch-2.patch

The branch-2 version of patch is attached. It is a straight port of the trunk 
version. The difference is due to the use of the new get methods in trunk.  
Locally tested.  PreCommit won't succeed as the patch won't apply to trunk.

 WebHdfs compatibility broken between 2.2 and 1.x / 23.x
 ---

 Key: HDFS-5449
 URL: https://issues.apache.org/jira/browse/HDFS-5449
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-5449.branch-2.patch, HDFS-5449.patch, 
 HDFS-5449.patch, HDFS-5449.trunk.patch, HDFS-5449.trunk.patch


 Similarly to HDFS-5403, getFileBlockLocations() fail between old (1.x, 
 0.23.x) and new (2.x), but this is worse since both directions won't work.  
 This is caused by the removal of name field from the serialized json format 
 of DatanodeInfo. 
 2.x namenode should include name (ip:port) in the response and 2.x webhdfs 
 client should use name, if ipAddr and xferPort don't exist in the 
 response. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration


 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Sheffer updated HDFS-5677:
--

Status: Open  (was: Patch Available)

Canceling until I have re-submit following patch submittal process.

 Need error checking for HA cluster configuration
 

 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Assignee: Vincent Sheffer
Priority: Minor
 Fix For: 3.0.0, 2.3.0


 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
 defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
 *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
 message is provided to indicate that.
 The only indication of a problem is a log message like the following:
 {code}
 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
 server: myCluster:8020
 {code}
 Another way to look at this is that no error or warning is provided when a 
 servicerpc-address/rpc-address property is defined for a node without a 
 corresponding node declared in *dfs.ha.namenodes.myCluster*.
 This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
 one of my node names.  It would be very helpful to have at least a warning 
 message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5449) WebHdfs compatibility broken between 2.2 and 1.x / 23.x

2014-01-09 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5449:
-

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2.

 WebHdfs compatibility broken between 2.2 and 1.x / 23.x
 ---

 Key: HDFS-5449
 URL: https://issues.apache.org/jira/browse/HDFS-5449
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5449.branch-2.patch, HDFS-5449.patch, 
 HDFS-5449.patch, HDFS-5449.trunk.patch, HDFS-5449.trunk.patch


 Similarly to HDFS-5403, getFileBlockLocations() fail between old (1.x, 
 0.23.x) and new (2.x), but this is worse since both directions won't work.  
 This is caused by the removal of name field from the serialized json format 
 of DatanodeInfo. 
 2.x namenode should include name (ip:port) in the response and 2.x webhdfs 
 client should use name, if ipAddr and xferPort don't exist in the 
 response. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5449) WebHdfs compatibility broken between 2.2 and 1.x / 23.x

2014-01-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866954#comment-13866954
 ] 

Hudson commented on HDFS-5449:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4979 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4979/])
HDFS-5449. WebHdfs compatibility broken between 2.2 and 1.x / 23.x. Contributed 
by Kihwal Lee. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556927)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestJsonUtil.java


 WebHdfs compatibility broken between 2.2 and 1.x / 23.x
 ---

 Key: HDFS-5449
 URL: https://issues.apache.org/jira/browse/HDFS-5449
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5449.branch-2.patch, HDFS-5449.patch, 
 HDFS-5449.patch, HDFS-5449.trunk.patch, HDFS-5449.trunk.patch


 Similarly to HDFS-5403, getFileBlockLocations() fail between old (1.x, 
 0.23.x) and new (2.x), but this is worse since both directions won't work.  
 This is caused by the removal of name field from the serialized json format 
 of DatanodeInfo. 
 2.x namenode should include name (ip:port) in the response and 2.x webhdfs 
 client should use name, if ipAddr and xferPort don't exist in the 
 response. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException


 [ 
https://issues.apache.org/jira/browse/HDFS-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5747:


Status: Patch Available  (was: In Progress)

 BlocksMap.getStoredBlock(..) and 
 BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw 
 NullPointerException
 -

 Key: HDFS-5747
 URL: https://issues.apache.org/jira/browse/HDFS-5747
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Arpit Agarwal
 Attachments: HDFS-5747.01.patch


 Found these NPEs in [build 
 #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/].
 - BlocksMap is accessed after close:
 {code}
 2014-01-09 04:28:32,350 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
 handler 2 on 58333, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted
  from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018)
   ...
 {code}
 - expectedLocation can be null.
 {code}
 2014-01-09 04:28:35,384 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
 handler 5 on 58333, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 
 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987)
   ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException


 [ 
https://issues.apache.org/jira/browse/HDFS-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5747:


Attachment: HDFS-5747.01.patch

Thanks for reporting this Nicholas.

The first NPE looks like a preexisting bug. Shutting down {{namesystem}} before 
{{rpcServer}} is probably the root cause.

{code:java}
  private void stopCommonServices() {
if(namesystem != null) namesystem.close();
if(rpcServer != null) rpcServer.stop();
{code}

The second NPE looks like a regression and it is an easy fix. The attached 
patch addresses both.

 BlocksMap.getStoredBlock(..) and 
 BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw 
 NullPointerException
 -

 Key: HDFS-5747
 URL: https://issues.apache.org/jira/browse/HDFS-5747
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Arpit Agarwal
 Attachments: HDFS-5747.01.patch


 Found these NPEs in [build 
 #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/].
 - BlocksMap is accessed after close:
 {code}
 2014-01-09 04:28:32,350 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
 handler 2 on 58333, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted
  from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018)
   ...
 {code}
 - expectedLocation can be null.
 {code}
 2014-01-09 04:28:35,384 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
 handler 5 on 58333, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 
 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987)
   ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

2014-01-09 Thread Uma Maheswara Rao G (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866968#comment-13866968
]

Uma Maheswara Rao G commented on HDFS-5728:
---

Is this case happened only if we restart DN where crc has less data? as we
convert all RBW replica states to RWR and here length will be calculated based
on crc chunks. If that is the case, how about just setting the file length also
to same after creating RWR state?

[Diskfull] Block recovery will fail if the metafile not having crc for all
chunks of the block
--

Key: HDFS-5728
URL: https://issues.apache.org/jira/browse/HDFS-5728
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
Attachments: HDFS-5728.patch

1. Client (regionsever) has opened stream to write its WAL to HDFS. This is
not one time upload, data will be written slowly.
2. One of the DataNode got diskfull ( due to some other data filled up disks)
3. Unfortunately block was being written to only this datanode in cluster, so
client write has also failed.
4. After some time disk is made free and all processes are restarted.
5. Now HMaster try to recover the file by calling recoverLease.
At this time recovery was failing saying file length mismatch.
When checked,
actual block file length: 62484480
Calculated block length: 62455808
This was because, metafile was having crc for only 62455808 bytes, and it
considered 62455808 as the block size.
No matter how many times, recovery was continously failing.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5749) Access time of HDFS directories stays at 1969-12-31

2014-01-09 Thread Yongjun Zhang (JIRA)

Yongjun Zhang created HDFS-5749:
---

 Summary: Access time of HDFS directories stays at 1969-12-31
 Key: HDFS-5749
 URL: https://issues.apache.org/jira/browse/HDFS-5749
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


Modify FsShell so that fs -lsr can show access time in addition to 
modification time, the access time stays at 1969-12-31. This means the access 
time is not set up initially. Filing this jira to fix this issue.






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-198) org.apache.hadoop.dfs.LeaseExpiredException during dfs write

2014-01-09 Thread Dhanasekaran Anbalagan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867013#comment-13867013
 ] 

Dhanasekaran Anbalagan commented on HDFS-198:
-

Hi All,

getting same error. on hive External table, I am using 
hive-common-0.10.0-cdh4.4.0. 

In my case. we are using sqoop to import data with table. table stored data in 
rc file format. I am only facing issue with external table. 

4/01/08 12:21:40 INFO mapred.JobClient: Task Id : 
attempt_201312121801_0049_m_00_0, Status : FAILED
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/dv_data_warehouse/dv_eod_performance_report/_DYN0.337789259996055/trade_date=__HIVE_DEFAULT_PARTITION__/client=__HIVE_DEFAULT_PARTITION__/install=__HIVE_DEFAULT_PARTITION__/_temporary/_attempt_201312121801_0049_m_00_0/part-m-0:
 File is not open for writing. Holder DFSClient_NONMAPREDUCE_-794488327_1 does 
not have any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2452)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2262)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
at org.apache.hadoop.hdfs.protocol.pro
attempt_201312121801_0049_m_00_0: SLF4J: Class path contains multiple SLF4J 
bindings.
attempt_201312121801_0049_m_00_0: SLF4J: Found binding in 
[jar:file:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-simple-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201312121801_0049_m_00_0: SLF4J: Found binding in 
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201312121801_0049_m_00_0: SLF4J: Found binding in 
[jar:file:/disk1/mapred/local/taskTracker/tech/distcache/-6782344428220505463_-433811577_1927241260/nameservice1/user/tech/.staging/job_201312121801_0049/libjars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201312121801_0049_m_00_0: SLF4J: See 
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/01/08 12:21:55 INFO mapred.JobClient: Task Id : 
attempt_201312121801_0049_m_00_1, Status : FAILED
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/dv_data_warehouse/dv_eod_performance_report/_DYN0.337789259996055/trade_date=__HIVE_DEFAULT_PARTITION__/client=__HIVE_DEFAULT_PARTITION__/install=__HIVE_DEFAULT_PARTITION__/_temporary/_attempt_201312121801_0049_m_00_1/part-m-0:
 File is not open for writing. Holder DFSClient_NONMAPREDUCE_-390991563_1 does 
not have any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2452)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2262)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
at org.apache.hadoop.hdfs.protocol.pro
attempt_201312121801_0049_m_00_1: SLF4J: Class path contains multiple SLF4J 
bindings.
attempt_201312121801_0049_m_00_1: SLF4J: Found binding in 
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201312121801_0049_m_00_1: SLF4J: Found binding in 
[jar:file:/disk1/mapred/local/taskTracker/tech/distcache/7281954290425601736_-433811577_1927241260/nameservice1/user/tech/.staging/job_201312121801_0049/libjars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201312121801_0049_m_00_1: SLF4J: See 
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/01/08 12:22:12 INFO mapred.JobClient: Task Id : 
attempt_201312121801_0049_m_00_2, Status : FAILED
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/dv_data_warehouse/dv_eod_performance_report/_DYN0.337789259996055/trade_date=__HIVE_DEFAULT_PARTITION__/client=__HIVE_DEFAULT_PARTITION__/install=__HIVE_DEFAULT_PARTITION__/_temporary/_attempt_201312121801_0049_m_00_2/part-m-0:
 File is not open for writing. Holder DFSClient_NONMAPREDUCE_1338126902_1 does 
not have any open files.
at

[jira] [Commented] (HDFS-5677) Need error checking for HA cluster configuration


[ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867025#comment-13867025
 ] 

Hadoop QA commented on HDFS-5677:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/1267/ha_config_warning.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestPersistBlocks

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5855//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5855//console

This message is automatically generated.

 Need error checking for HA cluster configuration
 

 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Assignee: Vincent Sheffer
Priority: Minor
 Fix For: 3.0.0, 2.3.0


 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
 defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
 *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
 message is provided to indicate that.
 The only indication of a problem is a log message like the following:
 {code}
 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
 server: myCluster:8020
 {code}
 Another way to look at this is that no error or warning is provided when a 
 servicerpc-address/rpc-address property is defined for a node without a 
 corresponding node declared in *dfs.ha.namenodes.myCluster*.
 This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
 one of my node names.  It would be very helpful to have at least a warning 
 message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN

2014-01-09 Thread Eric Sirianni (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867031#comment-13867031
 ] 

Eric Sirianni commented on HDFS-5483:
-

This {{BLOCK_RECEIVED}} code path appears to modify the {{BlockInfo}}} list 
directly:
{noformat}
BlockInfo.listInsert(BlockInfo, DatanodeStorageInfo) line: 308  
DatanodeStorageInfo.addBlock(BlockInfo) line: 208   
DatanodeDescriptor.addBlock(String, BlockInfo) line: 168
BlockManager.addStoredBlock(BlockInfo, DatanodeDescriptor, String, 
DatanodeDescriptor, boolean) line: 2215  
BlockManager.processAndHandleReportedBlock(DatanodeDescriptor, String, 
Block, HdfsServerConstants$ReplicaState, DatanodeDescriptor) line: 2720  
BlockManager.addBlock(DatanodeDescriptor, String, Block, String) line: 
2695 
BlockManager.processIncrementalBlockReport(DatanodeID, String, 
StorageReceivedDeletedBlocks) line: 2769 
FSNamesystem.processIncrementalBlockReport(DatanodeID, String, 
StorageReceivedDeletedBlocks) line: 5285 
NameNodeRpcServer.blockReceivedAndDeleted(DatanodeRegistration, String, 
StorageReceivedDeletedBlocks[]) line: 993   
{noformat}

Couldn't this corrupt the {{BlockInfo}} list if a datanode sent two 
{{BLOCK_RECEIVED}}s for two different storages?

 NN should gracefully handle multiple block replicas on same DN
 --

 Key: HDFS-5483
 URL: https://issues.apache.org/jira/browse/HDFS-5483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
 Fix For: 3.0.0

 Attachments: h5483.02.patch


 {{BlockManager#reportDiff}} can cause an assertion failure in 
 {{BlockInfo#moveBlockToHead}} if the block report shows the same block as 
 belonging to more than one storage.
 The issue is that {{moveBlockToHead}} assumes it will find the 
 DatanodeStorageInfo for the given block.
 Exception details:
 {code}
 java.lang.AssertionError: Index is out of bound
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984)
 at 
 org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5645) Support upgrade marker in editlog streams

2014-01-09 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5645:


Hadoop Flags: Reviewed

 Support upgrade marker in editlog streams
 -

 Key: HDFS-5645
 URL: https://issues.apache.org/jira/browse/HDFS-5645
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch


 During upgrade, a marker can be inserted into the editlog streams so that it 
 is possible to roll back to the marker transaction.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5645) Support upgrade marker in editlog streams

2014-01-09 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867060#comment-13867060
 ] 

Jing Zhao commented on HDFS-5645:
-

+1 Patch looks good to me. Only one question: in the patch the editlog loader 
currently just stops when it hits the upgrade marker. I guess we will have more 
sophisticated actions in later jiras?

 Support upgrade marker in editlog streams
 -

 Key: HDFS-5645
 URL: https://issues.apache.org/jira/browse/HDFS-5645
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch


 During upgrade, a marker can be inserted into the editlog streams so that it 
 is possible to roll back to the marker transaction.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5750) JHLogAnalyzer#parseLogFile() should close stm upon return

2014-01-09 Thread Ted Yu (JIRA)

Ted Yu created HDFS-5750:


 Summary: JHLogAnalyzer#parseLogFile() should close stm upon return
 Key: HDFS-5750
 URL: https://issues.apache.org/jira/browse/HDFS-5750
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


stm is assigned to in
But stm may point to another InputStream :
{code}
if(compressionClass != null) {
  CompressionCodec codec = (CompressionCodec)
ReflectionUtils.newInstance(compressionClass, new Configuration());
  in = codec.createInputStream(stm);
{code}
stm should be closed in the finally block.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

[
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867073#comment-13867073
]

Todd Lipcon commented on HDFS-5182:
---

bq. it will check to see if the segment is anchorable, and increment the anchor
count before performing the mmap. The anchor count will stay incremented until
the mmap is closed.

That seems much longer than necessary -- don't we want clients to be able to
keep mmaps around in their cache for very long periods of time? And then, when
the user requests the read, we can anchor the mmap only for the duration of
time for which the user holds onto the zero-copy buffer? Once the user returns
the zero-copy buffer, we can decrement the count and allow the DN to evict the
block from the cache.

bq. One exception is if the client passes the ReadOption.SKIP_CHECKSUMS flag.
In that case, we do not need to consult the anchor flag because we are willing
to tolerate bad data being returned or SIGBUS.

I disagree on this. Just because you want to skip checksumming doesn't mean you
can tolerate SIGBUS. For example, many file formats have their own checksums,
so we can safely skip HDFS checksumming, but we still want to ensure that we're
only reading locked (i.e safe) memory via mmap.

bq. The only function of this socket is to cause a close notification to be
sent to the datanode when the client closes (or vice versa). (When a UNIX
domain socket closes, the remote end gets a close notification).
Maybe this can be put into a separate JIRA, and first implement just a very
simple timeout-based mechanism? The DN could change the anchor flag to a magic
value which invalidates the segment and then close it after some amount of
time. Then if the client looks at it again it will know to invalidate.

BlockReaderLocal must allow zero-copy reads only when the DN believes it's
valid
-

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN


[ 
https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867075#comment-13867075
 ] 

Arpit Agarwal commented on HDFS-5483:
-

{{BlockInfo#addStorage}} checks for it.

{code}
  boolean addStorage(DatanodeStorageInfo storage) {
int idx = findDatanode(storage.getDatanodeDescriptor());
...
// The block is on the DN but belongs to a different storage.
// Update our state.
{code}

 NN should gracefully handle multiple block replicas on same DN
 --

 Key: HDFS-5483
 URL: https://issues.apache.org/jira/browse/HDFS-5483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
 Fix For: 3.0.0

 Attachments: h5483.02.patch


 {{BlockManager#reportDiff}} can cause an assertion failure in 
 {{BlockInfo#moveBlockToHead}} if the block report shows the same block as 
 belonging to more than one storage.
 The issue is that {{moveBlockToHead}} assumes it will find the 
 DatanodeStorageInfo for the given block.
 Exception details:
 {code}
 java.lang.AssertionError: Index is out of bound
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984)
 at 
 org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5318) Pluggable interface for replica counting

2014-01-09 Thread Eric Sirianni (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867078#comment-13867078
 ] 

Eric Sirianni commented on HDFS-5318:
-

I will work on a patch that addresses your points above and update the JIRA.

 Pluggable interface for replica counting
 

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

[
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867110#comment-13867110
]

Colin Patrick McCabe commented on HDFS-5182:

bq. That seems much longer than necessary – don't we want clients to be able to
keep mmaps around in their cache for very long periods of time? And then, when
the user requests the read, we can anchor the mmap only for the duration of
time for which the user holds onto the zero-copy buffer? Once the user returns
the zero-copy buffer, we can decrement the count and allow the DN to evict the
block from the cache.

Sorry, I was unclear. When I said closed I mean that the user had returned
the zero-copy buffer. So the same thing you suggested.

bq. I disagree on this. Just because you want to skip checksumming doesn't mean
you can tolerate SIGBUS. For example, many file formats have their own
checksums, so we can safely skip HDFS checksumming, but we still want to ensure
that we're only reading locked (i.e safe) memory via mmap.

What I was referring to here is where a client has specifically requested an
mmap region using the zero-copy API and the SKIP_CHECKSUMS option. In that
case, the user is clearly going to be reading without any guarantees from us.
If the user just uses the normal (non-zero-copy, non-mmap) read path, SIGBUS
will not be an issue.

(There have been some proposals to improve the SIGBUS situation for zero-copy
reads without mlock, but they're certainly out of scope for this JIRA.)

bq. Maybe this can be put into a separate JIRA, and first implement just a very
simple timeout-based mechanism? The DN could change the anchor flag to a magic
value which invalidates the segment and then close it after some amount of
time. Then if the client looks at it again it will know to invalidate.

Timeouts and two-way protocols get complex. I already have the code for
closing the shared memory segment based on listening for the remote socket
getting closed. As for where the socket comes from-- we just don't put the
socket we used to get the FDs in the first place back into the peer cache.

BlockReaderLocal must allow zero-copy reads only when the DN believes it's
valid
-

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN

2014-01-09 Thread Eric Sirianni (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867102#comment-13867102
 ] 

Eric Sirianni commented on HDFS-5483:
-

OK - thanks, missed that guard.
{code}
  boolean addBlock(BlockInfo b) {
if(!b.addStorage(this))
  return false;
{code}

 NN should gracefully handle multiple block replicas on same DN
 --

 Key: HDFS-5483
 URL: https://issues.apache.org/jira/browse/HDFS-5483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
 Fix For: 3.0.0

 Attachments: h5483.02.patch


 {{BlockManager#reportDiff}} can cause an assertion failure in 
 {{BlockInfo#moveBlockToHead}} if the block report shows the same block as 
 belonging to more than one storage.
 The issue is that {{moveBlockToHead}} assumes it will find the 
 DatanodeStorageInfo for the given block.
 Exception details:
 {code}
 java.lang.AssertionError: Index is out of bound
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984)
 at 
 org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage

[
https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867122#comment-13867122
]

Todd Lipcon commented on HDFS-5722:
---

Yea, I don't really see the point here. It seems the motivation is a possible
optimization when the NN needs to skip a large section of the image which it
doesn't understand. That's only going to happen on a downgrade scenario, which
is rare and not on a hot path. Plus, do we have examples of _large_ new
sections we plan on adding to the image? Sure, we've added things in the past
like a list of snapshots, but they're typically pretty short. The example of
skipping the entire inodes section seems pretty contrived to me.

HDFS-1435 did show that adding compression slowed down the loading. But that's
because the decompression is on the same thread and the loading is a
single-threaded process. It would really be pretty easy to move the
decompression work onto another core, at which point reading less data is
definitely going to be faster.

Another important factor is the network bandwidth used when one of the image
dirs is on NFS. Many deployments use this for backup. Or, even if the NN isn't
directly writing to NFS, some cron job is backing up the image on a regular
basis using normal OS tools like rsync/scp over the network.

Implement compression in the HTTP server of SNN / SBN instead of FSImage

Key: HDFS-5722
URL: https://issues.apache.org/jira/browse/HDFS-5722
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Haohui Mai

The current FSImage format support compression, there is a field in the
header which specifies the compression codec used to compress the data in the
image. The main motivation was to reduce the number of bytes to be
transferred between SNN / SBN / NN.
The main disadvantage, however, is that it requires the client to access the
FSImage in strictly sequential order. This might not fit well with the new
design of FSImage. For example, serializing the data in protobuf allows the
client to quickly skip data that it does not understand. The compression
built-in the format, however, complicates the calculation of offsets and
lengths. Recovering from a corrupted, compressed FSImage is also non-trivial
as off-the-shelf tools like bzip2recover is inapplicable.
This jira proposes to move the compression from the format of the FSImage to
the transport layer, namely, the HTTP server of SNN / SBN. This design
simplifies the format of FSImage, opens up the opportunity to quickly
navigate through the FSImage, and eases the process of recovery. It also
retains the benefits of reducing the number of bytes to be transferred across
the wire since there are compression on the transport layer.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage

[
https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867125#comment-13867125
]

Todd Lipcon commented on HDFS-5722:
---

BTW, if you really see a use case for random-accessing portions of the image,
we could put an uncompressed trailer PB at the end of the file, which contains
the section descriptors with their physical offsets, sizes, and type
information. That would allow you to arbitrarily read a section without having
to skip() through the others.

Implement compression in the HTTP server of SNN / SBN instead of FSImage

Key: HDFS-5722
URL: https://issues.apache.org/jira/browse/HDFS-5722
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Haohui Mai

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException


[ 
https://issues.apache.org/jira/browse/HDFS-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867163#comment-13867163
 ] 

Hadoop QA commented on HDFS-5747:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622242/HDFS-5747.01.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestHttpsFileSystem

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5856//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5856//console

This message is automatically generated.

 BlocksMap.getStoredBlock(..) and 
 BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw 
 NullPointerException
 -

 Key: HDFS-5747
 URL: https://issues.apache.org/jira/browse/HDFS-5747
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Arpit Agarwal
 Attachments: HDFS-5747.01.patch


 Found these NPEs in [build 
 #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/].
 - BlocksMap is accessed after close:
 {code}
 2014-01-09 04:28:32,350 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
 handler 2 on 58333, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted
  from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018)
   ...
 {code}
 - expectedLocation can be null.
 {code}
 2014-01-09 04:28:35,384 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
 handler 5 on 58333, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 
 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987)
   ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid


[ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867175#comment-13867175
 ] 

Todd Lipcon commented on HDFS-5182:
---

bq. What I was referring to here is where a client has specifically requested 
an mmap region using the zero-copy API and the SKIP_CHECKSUMS option. In that 
case, the user is clearly going to be reading without any guarantees from us. 
If the user just uses the normal (non-zero-copy, non-mmap) read path, SIGBUS 
will not be an issue.

Why not two separate flags? One flag saying SKIP_CHECKSUMS (ie I will do my 
own checksumming) and another flag for NO_REQUIRE_MLOCK or UNSAFE_IO or 
something, which means you're OK with SIGBUS.

ie there are really three levels of guarantee we can provide:
1) Normal HDFS semantics: a read will only return correct data, and if it 
fails, a nice error code will return.
2) Skip-checksums semantics: a read will return data which might be corrupt. If 
it fails, a nice error code will return.
3) Unsafe semantics: a read will return data which might be corrupt. If it 
fails, either a nice error code or a SIGBUS.

There are a lot of applications that are OK with #2 but not #3. #3 is really 
hard to deal with since a bad disk in the cluster would SIGBUS everything 
running on the machine pretty fast, and we don't currently have any way of 
handling it.

 BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
 valid
 -

 Key: HDFS-5182
 URL: https://issues.apache.org/jira/browse/HDFS-5182
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe

 BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
 valid.  This implies adding a new field to the response to 
 REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
 client to the DN, so that the DN can inform the client when the mapped region 
 is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration


 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Sheffer updated HDFS-5677:
--

Attachment: HDFS-5677.patch

Resubmitting the patch having now followed the relevant process prior to doing 
so. 


 Need error checking for HA cluster configuration
 

 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Assignee: Vincent Sheffer
Priority: Minor
 Fix For: 3.0.0, 2.3.0

 Attachments: HDFS-5677.patch


 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
 defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
 *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
 message is provided to indicate that.
 The only indication of a problem is a log message like the following:
 {code}
 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
 server: myCluster:8020
 {code}
 Another way to look at this is that no error or warning is provided when a 
 servicerpc-address/rpc-address property is defined for a node without a 
 corresponding node declared in *dfs.ha.namenodes.myCluster*.
 This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
 one of my node names.  It would be very helpful to have at least a warning 
 message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration


 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Sheffer updated HDFS-5677:
--

Release Note:   (was: Issue a warning message in the logs if a namenode 
does not resolve properly on startup.)
  Status: Patch Available  (was: Open)

I am submitting again with the proper naming convention. 

*NOTE:*  I have not added or modified any tests because the change results in 
no side effects other than (possibly) an additional logging message if the one 
or more namenodes do not resolve.

 Need error checking for HA cluster configuration
 

 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Assignee: Vincent Sheffer
Priority: Minor
 Fix For: 3.0.0, 2.3.0

 Attachments: HDFS-5677.patch


 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
 defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
 *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
 message is provided to indicate that.
 The only indication of a problem is a log message like the following:
 {code}
 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
 server: myCluster:8020
 {code}
 Another way to look at this is that no error or warning is provided when a 
 servicerpc-address/rpc-address property is defined for a node without a 
 corresponding node declared in *dfs.ha.namenodes.myCluster*.
 This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
 one of my node names.  It would be very helpful to have at least a warning 
 message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start


 [ 
https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5742:


Status: Patch Available  (was: Open)

 DatanodeCluster (mini cluster of DNs) fails to start
 

 Key: HDFS-5742
 URL: https://issues.apache.org/jira/browse/HDFS-5742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
Priority: Minor
 Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, 
 HDFS-5742.03.patch


 DatanodeCluster fails to start with NPE in MiniDFSCluster.
 Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing 
 check for null configuration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException

2014-01-09 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867256#comment-13867256
 ] 

Jing Zhao commented on HDFS-5747:
-

The analysis for the root cause of NPE makes sense to me. +1 for the patch.
Besides we may also want to keep running the unit test overnight to make sure 
NPE is gone.

 BlocksMap.getStoredBlock(..) and 
 BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw 
 NullPointerException
 -

 Key: HDFS-5747
 URL: https://issues.apache.org/jira/browse/HDFS-5747
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Arpit Agarwal
 Attachments: HDFS-5747.01.patch


 Found these NPEs in [build 
 #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/].
 - BlocksMap is accessed after close:
 {code}
 2014-01-09 04:28:32,350 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
 handler 2 on 58333, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted
  from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018)
   ...
 {code}
 - expectedLocation can be null.
 {code}
 2014-01-09 04:28:35,384 WARN  ipc.Server (Server.java:run(2060)) - IPC Server 
 handler 5 on 58333, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 
 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987)
   ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5738) Serialize INode information in protobuf

2014-01-09 Thread Haohui Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5738:
-

Attachment: HDFS-5738.002.patch

 Serialize INode information in protobuf
 ---

 Key: HDFS-5738
 URL: https://issues.apache.org/jira/browse/HDFS-5738
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, 
 HDFS-5738.002.patch


 This jira proposes to serialize inode information with protobuf. 
 Snapshot-related information are out of the scope of this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeImpl interfaces

Arpit Agarwal created HDFS-5751:
---

 Summary: Remove the FsDatasetSpi and FsVolumeImpl interfaces
 Key: HDFS-5751
 URL: https://issues.apache.org/jira/browse/HDFS-5751
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, test
Affects Versions: 3.0.0
Reporter: Arpit Agarwal


The in-memory block map and disk interface portions of the DataNode have been 
abstracted out into an {{FsDatasetpSpi}} interface, which further uses 
{{FsVolumeSpi}} to represent individual volumes.

The abstraction is useful as it allows DataNode tests to use a 
{{SimulatedFSDataset}} which does not write any data to disk. Instead it just 
stores block metadata in memory and returns blank data for all reads. This is 
useful for both unit testing and for simulating arbitrarily large datanodes 
without having to provision real disk capacity.

A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and 
{{SimulatedFSDataset}} implement {{FsDatasetSpi}}.

However there are a few problems with this approach:
# Using the factory class significantly complicates the code flow for the 
common case. This makes the code harder to understand and debug.
# There is additional burden of maintaining two different dataset 
implementations.
# Fidelity between the two implementations is poor.

Instead we can get eliminate the SPIs and just hide the disk read/write 
routines with a dependency injection framework like Google Guice.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeImpl interfaces


 [ 
https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5751:


Description: 
The in-memory block map and disk interface portions of the DataNode have been 
abstracted out into an {{FsDatasetpSpi}} interface, which further uses 
{{FsVolumeSpi}} to represent individual volumes.

The abstraction is useful as it allows DataNode tests to use a 
{{SimulatedFSDataset}} which does not write any data to disk. Instead it just 
stores block metadata in memory and returns zeroes for all reads. This is 
useful for both unit testing and for simulating arbitrarily large datanodes 
without having to provision real disk capacity.

A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and 
{{SimulatedFSDataset}} implement {{FsDatasetSpi}}.

However there are a few problems with this approach:
# Using the factory class significantly complicates the code flow for the 
common case. This makes the code harder to understand and debug.
# There is additional burden of maintaining two different dataset 
implementations.
# Fidelity between the two implementations is poor.

Instead we can eliminate the SPIs and just hide the disk read/write routines 
with a dependency injection framework like Google Guice.


  was:
The in-memory block map and disk interface portions of the DataNode have been 
abstracted out into an {{FsDatasetpSpi}} interface, which further uses 
{{FsVolumeSpi}} to represent individual volumes.

The abstraction is useful as it allows DataNode tests to use a 
{{SimulatedFSDataset}} which does not write any data to disk. Instead it just 
stores block metadata in memory and returns zeroes for all reads. This is 
useful for both unit testing and for simulating arbitrarily large datanodes 
without having to provision real disk capacity.

A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and 
{{SimulatedFSDataset}} implement {{FsDatasetSpi}}.

However there are a few problems with this approach:
# Using the factory class significantly complicates the code flow for the 
common case. This makes the code harder to understand and debug.
# There is additional burden of maintaining two different dataset 
implementations.
# Fidelity between the two implementations is poor.

Instead we can get eliminate the SPIs and just hide the disk read/write 
routines with a dependency injection framework like Google Guice.



 Remove the FsDatasetSpi and FsVolumeImpl interfaces
 ---

 Key: HDFS-5751
 URL: https://issues.apache.org/jira/browse/HDFS-5751
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, test
Affects Versions: 3.0.0
Reporter: Arpit Agarwal

 The in-memory block map and disk interface portions of the DataNode have been 
 abstracted out into an {{FsDatasetpSpi}} interface, which further uses 
 {{FsVolumeSpi}} to represent individual volumes.
 The abstraction is useful as it allows DataNode tests to use a 
 {{SimulatedFSDataset}} which does not write any data to disk. Instead it just 
 stores block metadata in memory and returns zeroes for all reads. This is 
 useful for both unit testing and for simulating arbitrarily large datanodes 
 without having to provision real disk capacity.
 A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and 
 {{SimulatedFSDataset}} implement {{FsDatasetSpi}}.
 However there are a few problems with this approach:
 # Using the factory class significantly complicates the code flow for the 
 common case. This makes the code harder to understand and debug.
 # There is additional burden of maintaining two different dataset 
 implementations.
 # Fidelity between the two implementations is poor.
 Instead we can eliminate the SPIs and just hide the disk read/write routines 
 with a dependency injection framework like Google Guice.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeImpl interfaces


 [ 
https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5751:


Description: 
The in-memory block map and disk interface portions of the DataNode have been 
abstracted out into an {{FsDatasetpSpi}} interface, which further uses 
{{FsVolumeSpi}} to represent individual volumes.

The abstraction is useful as it allows DataNode tests to use a 
{{SimulatedFSDataset}} which does not write any data to disk. Instead it just 
stores block metadata in memory and returns zeroes for all reads. This is 
useful for both unit testing and for simulating arbitrarily large datanodes 
without having to provision real disk capacity.

A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and 
{{SimulatedFSDataset}} implement {{FsDatasetSpi}}.

However there are a few problems with this approach:
# Using the factory class significantly complicates the code flow for the 
common case. This makes the code harder to understand and debug.
# There is additional burden of maintaining two different dataset 
implementations.
# Fidelity between the two implementations is poor.

Instead we can get eliminate the SPIs and just hide the disk read/write 
routines with a dependency injection framework like Google Guice.


  was:
The in-memory block map and disk interface portions of the DataNode have been 
abstracted out into an {{FsDatasetpSpi}} interface, which further uses 
{{FsVolumeSpi}} to represent individual volumes.

The abstraction is useful as it allows DataNode tests to use a 
{{SimulatedFSDataset}} which does not write any data to disk. Instead it just 
stores block metadata in memory and returns blank data for all reads. This is 
useful for both unit testing and for simulating arbitrarily large datanodes 
without having to provision real disk capacity.

A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and 
{{SimulatedFSDataset}} implement {{FsDatasetSpi}}.

However there are a few problems with this approach:
# Using the factory class significantly complicates the code flow for the 
common case. This makes the code harder to understand and debug.
# There is additional burden of maintaining two different dataset 
implementations.
# Fidelity between the two implementations is poor.

Instead we can get eliminate the SPIs and just hide the disk read/write 
routines with a dependency injection framework like Google Guice.



 Remove the FsDatasetSpi and FsVolumeImpl interfaces
 ---

 Key: HDFS-5751
 URL: https://issues.apache.org/jira/browse/HDFS-5751
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, test
Affects Versions: 3.0.0
Reporter: Arpit Agarwal

 The in-memory block map and disk interface portions of the DataNode have been 
 abstracted out into an {{FsDatasetpSpi}} interface, which further uses 
 {{FsVolumeSpi}} to represent individual volumes.
 The abstraction is useful as it allows DataNode tests to use a 
 {{SimulatedFSDataset}} which does not write any data to disk. Instead it just 
 stores block metadata in memory and returns zeroes for all reads. This is 
 useful for both unit testing and for simulating arbitrarily large datanodes 
 without having to provision real disk capacity.
 A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and 
 {{SimulatedFSDataset}} implement {{FsDatasetSpi}}.
 However there are a few problems with this approach:
 # Using the factory class significantly complicates the code flow for the 
 common case. This makes the code harder to understand and debug.
 # There is additional burden of maintaining two different dataset 
 implementations.
 # Fidelity between the two implementations is poor.
 Instead we can get eliminate the SPIs and just hide the disk read/write 
 routines with a dependency injection framework like Google Guice.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5711) Removing memory limitation of the Namenode by persisting Block - Block location mappings to disk.

2014-01-09 Thread Rohan Pasalkar (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rohan Pasalkar updated HDFS-5711:
-

Priority: Minor (was: Major)

Removing memory limitation of the Namenode by persisting Block - Block
location mappings to disk.
-

Key: HDFS-5711
URL: https://issues.apache.org/jira/browse/HDFS-5711
Project: Hadoop HDFS
Issue Type: Sub-task
Components: namenode
Reporter: Rohan Pasalkar
Priority: Minor

This jira is to track changes to be made to remove HDFS name-node memory
limitation to hold block - block location mappings.
It is a known fact that the single Name-node architecture of HDFS has
scalability limits. The HDFS federation project alleviates this problem by
using horizontal scaling. This helps increase the throughput of metadata
operation and also the amount of data that can be stored in a Hadoop cluster.
The Name-node stores all the filesystem metadata in memory (even in the
federated architecture), the
Name-node design can be enhanced by persisting part of the metadata onto
secondary storage and retaining
the popular or recently accessed metadata information in main memory. This
design can benefit a HDFS deployment
which doesn't use federation but needs to store a large number of files or
large number of blocks. Lin Xiao from Hortonworks attempted a similar
project [1] in the Summer of 2013. They used LevelDB to persist the Namespace
information (i.e file and directory inode information).
A patch with this change is yet to be submitted to code base. We also intend
to use LevelDB to persist metadata, and plan to
provide a complete solution, by not just persisting the Namespace
information but also the Blocks Map onto secondary storage.
We did implement the basic prototype which stores the block-block location
mapping metadata to the persistent key-value store i.e. levelDB. Prototype
also maintains the in-memory cache of the recently used block-block location
mappings metadata.
References:
[1] Lin Xiao, Hortonworks, Removing Name-node’s memory limitation, HDFS-5389,
http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5711) Removing memory limitation of the Namenode by persisting Block - Block location mappings to disk.

2014-01-09 Thread Rohan Pasalkar (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rohan Pasalkar updated HDFS-5711:
-

Issue Type: Sub-task (was: Improvement)
Parent: HDFS-2362

Removing memory limitation of the Namenode by persisting Block - Block
location mappings to disk.
-

Key: HDFS-5711
URL: https://issues.apache.org/jira/browse/HDFS-5711
Project: Hadoop HDFS
Issue Type: Sub-task
Components: namenode
Reporter: Rohan Pasalkar

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5677) Need error checking for HA cluster configuration


[ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867377#comment-13867377
 ] 

Hadoop QA commented on HDFS-5677:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622277/HDFS-5677.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5857//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5857//console

This message is automatically generated.

 Need error checking for HA cluster configuration
 

 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Assignee: Vincent Sheffer
Priority: Minor
 Fix For: 3.0.0, 2.3.0

 Attachments: HDFS-5677.patch


 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
 defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
 *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
 message is provided to indicate that.
 The only indication of a problem is a log message like the following:
 {code}
 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
 server: myCluster:8020
 {code}
 Another way to look at this is that no error or warning is provided when a 
 servicerpc-address/rpc-address property is defined for a node without a 
 corresponding node declared in *dfs.ha.namenodes.myCluster*.
 This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
 one of my node names.  It would be very helpful to have at least a warning 
 message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5589) Namenode loops caching and uncaching when data should be uncached

2014-01-09 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5589:
--

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Resolving as this was committed to trunk.

 Namenode loops caching and uncaching when data should be uncached
 -

 Key: HDFS-5589
 URL: https://issues.apache.org/jira/browse/HDFS-5589
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: caching, namenode
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 3.0.0

 Attachments: hdfs-5589-1.patch, hdfs-5589-2.patch


 This was reported by [~cnauroth] and [~brandonli], and [~schu] repro'd it too.
 If you add a new caching directive then remove it, the Namenode will 
 sometimes get stuck in a loop where it sends DNA_CACHE and then DNA_UNCACHE 
 repeatedly to the datanodes where the data was previously cached.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start


[ 
https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867407#comment-13867407
 ] 

Hadoop QA commented on HDFS-5742:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622126/HDFS-5742.03.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5858//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5858//console

This message is automatically generated.

 DatanodeCluster (mini cluster of DNs) fails to start
 

 Key: HDFS-5742
 URL: https://issues.apache.org/jira/browse/HDFS-5742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
Priority: Minor
 Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, 
 HDFS-5742.03.patch


 DatanodeCluster fails to start with NPE in MiniDFSCluster.
 Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing 
 check for null configuration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeImpl interfaces

2014-01-09 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867409#comment-13867409
 ] 

Andrew Wang commented on HDFS-5751:
---

Hey Arpit, there have been a few people working on improving this FsDatasetSpi 
with the interest of backing the DN with their own alternate storage systems 
(e.g. PCI flash or filter). You can see HDFS-5194 for the details. Since this 
interface is intended to eventually be public and stable, I don't think it's 
okay to remove it.

 Remove the FsDatasetSpi and FsVolumeImpl interfaces
 ---

 Key: HDFS-5751
 URL: https://issues.apache.org/jira/browse/HDFS-5751
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, test
Affects Versions: 3.0.0
Reporter: Arpit Agarwal

 The in-memory block map and disk interface portions of the DataNode have been 
 abstracted out into an {{FsDatasetpSpi}} interface, which further uses 
 {{FsVolumeSpi}} to represent individual volumes.
 The abstraction is useful as it allows DataNode tests to use a 
 {{SimulatedFSDataset}} which does not write any data to disk. Instead it just 
 stores block metadata in memory and returns zeroes for all reads. This is 
 useful for both unit testing and for simulating arbitrarily large datanodes 
 without having to provision real disk capacity.
 A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and 
 {{SimulatedFSDataset}} implement {{FsDatasetSpi}}.
 However there are a few problems with this approach:
 # Using the factory class significantly complicates the code flow for the 
 common case. This makes the code harder to understand and debug.
 # There is additional burden of maintaining two different dataset 
 implementations.
 # Fidelity between the two implementations is poor.
 Instead we can eliminate the SPIs and just hide the disk read/write routines 
 with a dependency injection framework like Google Guice.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-4922) Improve the short-circuit document

2014-01-09 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4922:
--

Attachment: HDFS-4922-005.patch

 Improve the short-circuit document
 --

 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, 
 HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922.patch


 explain the default value and add one configure key, which doesn't show in 
 the document, but exists in the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4922) Improve the short-circuit document

2014-01-09 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867464#comment-13867464
 ] 

Fengdong Yu commented on HDFS-4922:
---

Thanks [~ajisakaa] for renew the patch,  [~cmccabe] review.
I uploaded a new patch to address all the comments.


 Improve the short-circuit document
 --

 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, 
 HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922.patch


 explain the default value and add one configure key, which doesn't show in 
 the document, but exists in the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

2014-01-09 Thread Vinay (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867487#comment-13867487
]

Vinay commented on HDFS-5728:
-

bq. Is this case happened only if we restart DN where crc has less data?
Yes
bq. as we convert all RBW replica states to RWR and here length will be
calculated based on crc chunks. If that is the case, how about just setting the
file length also to same after creating RWR state?
I too thought of same thing. That will be a implicit truncation without
recovery being called. But I felt better we come through recovery flow itself
and do truncation only on demand

[Diskfull] Block recovery will fail if the metafile not having crc for all
chunks of the block
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5645) Support upgrade marker in editlog streams


[ 
https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867530#comment-13867530
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5645:
--

It is correct to just stop at the upgrade marker for supporting rollback.  We 
should also add ignoring the upgrade marker for the standby NN and supporting 
downgrade.

Thanks for reviewing the patch!


 Support upgrade marker in editlog streams
 -

 Key: HDFS-5645
 URL: https://issues.apache.org/jira/browse/HDFS-5645
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch


 During upgrade, a marker can be inserted into the editlog streams so that it 
 is possible to roll back to the marker transaction.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4922) Improve the short-circuit document


[ 
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867532#comment-13867532
 ] 

Hadoop QA commented on HDFS-4922:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622336/HDFS-4922-005.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5859//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5859//console

This message is automatically generated.

 Improve the short-circuit document
 --

 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, 
 HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922.patch


 explain the default value and add one configure key, which doesn't show in 
 the document, but exists in the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5645) Support upgrade marker in editlog streams


 [ 
https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5645:
-

   Resolution: Fixed
Fix Version/s: HDFS-5535 (Rolling upgrades)
   Status: Resolved  (was: Patch Available)

I have committed this.

 Support upgrade marker in editlog streams
 -

 Key: HDFS-5645
 URL: https://issues.apache.org/jira/browse/HDFS-5645
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: HDFS-5535 (Rolling upgrades)

 Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch


 During upgrade, a marker can be inserted into the editlog streams so that it 
 is possible to roll back to the marker transaction.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5752) Add a new DFSAdminCommand for rolling upgrade

Tsz Wo (Nicholas), SZE created HDFS-5752:


 Summary: Add a new DFSAdminCommand for rolling upgrade
 Key: HDFS-5752
 URL: https://issues.apache.org/jira/browse/HDFS-5752
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5753) Add new NN startup options for downgrade and rollback using upgrade marker

Tsz Wo (Nicholas), SZE created HDFS-5753:


 Summary: Add new NN startup options for downgrade and rollback 
using upgrade marker 
 Key: HDFS-5753
 URL: https://issues.apache.org/jira/browse/HDFS-5753
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE


Namenode could be started up with -upgrade / -rollback.  The -rollback option 
will restore the data using the previous directory.  New options are needed for 
downgrade and rollback using the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

Tsz Wo (Nicholas), SZE created HDFS-5754:


 Summary: Split LayoutVerion into NamenodeLayoutVersion and 
DatanodeLayoutVersion 
 Key: HDFS-5754
 URL: https://issues.apache.org/jira/browse/HDFS-5754
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Tsz Wo (Nicholas), SZE


Currently, LayoutVersion defines the on-disk data format and supported features 
of the entire cluster including NN and DNs.  LayoutVersion is persisted in both 
NN and DNs.  When a NN/DN starts up, it checks its supported LayoutVersion 
against the on-disk LayoutVersion.  Also, a DN with a different LayoutVersion 
than NN cannot register with the NN.

We propose to split LayoutVersion into two independent values that are local to 
the nodes:
- NamenodeLayoutVersion - defines the on-disk data format in NN, including the 
format of FSImage, editlog and the directory structure.
- DatanodeLayoutVersion - defines the on-disk data format in DN, including the 
format of block data file, metadata file, block pool layout, and the directory 
structure.  

The LayoutVersion check will be removed in DN registration.  If 
NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling upgrade, 
then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

[
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tsz Wo (Nicholas), SZE reassigned HDFS-5754:

Assignee: Brandon Li

Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

Key: HDFS-5754
URL: https://issues.apache.org/jira/browse/HDFS-5754
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Brandon Li

Currently, LayoutVersion defines the on-disk data format and supported
features of the entire cluster including NN and DNs. LayoutVersion is
persisted in both NN and DNs. When a NN/DN starts up, it checks its
supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a
different LayoutVersion than NN cannot register with the NN.
We propose to split LayoutVersion into two independent values that are local
to the nodes:
- NamenodeLayoutVersion - defines the on-disk data format in NN, including
the format of FSImage, editlog and the directory structure.
- DatanodeLayoutVersion - defines the on-disk data format in DN, including
the format of block data file, metadata file, block pool layout, and the
directory structure.
The LayoutVersion check will be removed in DN registration. If
NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling
upgrade, then only rollback is supported and downgrade is not.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (HDFS-5753) Add new NN startup options for downgrade and rollback using upgrade marker