[jira] [Commented] (HDFS-5745) Unnecessary disk check triggered when socket operation has problem.
[ https://issues.apache.org/jira/browse/HDFS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866443#comment-13866443 ] Vinay commented on HDFS-5745: - HDFS-5503 also look similar, but with ClosedChannelException Unnecessary disk check triggered when socket operation has problem. --- Key: HDFS-5745 URL: https://issues.apache.org/jira/browse/HDFS-5745 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 1.2.1 Reporter: MaoYuan Xian When BlockReceiver transfer data fails, it can be found SocketOutputStream translates the exception as IOException with the message The stream is closed: 2014-01-06 11:48:04,716 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run(): java.io.IOException: The stream is closed at org.apache.hadoop.net.SocketOutputStream.write at java.io.BufferedOutputStream.flushBuffer at java.io.BufferedOutputStream.flush at java.io.DataOutputStream.flush at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run at java.lang.Thread.run Which makes the checkDiskError method of DataNode called and triggers the disk scan. Can we make the modifications like below in checkDiskError to avoiding this unneccessary disk scan operations?: {code} --- a/src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java +++ b/src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java @@ -938,7 +938,8 @@ public class DataNode extends Configured || e.getMessage().startsWith(An established connection was aborted) || e.getMessage().startsWith(Broken pipe) || e.getMessage().startsWith(Connection reset) - || e.getMessage().contains(java.nio.channels.SocketChannel)) { + || e.getMessage().contains(java.nio.channels.SocketChannel) + || e.getMessage().startsWith(The stream is closed)) { LOG.info(Not checking disk as checkDiskError was called on a network + related exception); return; {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866487#comment-13866487 ] Akira AJISAKA commented on HDFS-4922: - HDFS-4710 is already resolved, so the patch is ready to merge. However, the patch command failed to apply. I'll renew the patch. Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-4922: Attachment: HDFS-4922-004.patch Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5651) Remove dfs.namenode.caching.enabled and improve CRM locking
[ https://issues.apache.org/jira/browse/HDFS-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5651: --- Resolution: Fixed Status: Resolved (was: Patch Available) this was committed to trunk Remove dfs.namenode.caching.enabled and improve CRM locking --- Key: HDFS-5651 URL: https://issues.apache.org/jira/browse/HDFS-5651 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Attachments: HDFS-5651.001.patch, HDFS-5651.002.patch, HDFS-5651.003.patch, HDFS-5651.004.patch, HDFS-5651.006.patch, HDFS-5651.006.patch, HDFS-5651.008.patch, HDFS-5651.009.patch We can remove dfs.namenode.caching.enabled and simply always enable caching, similar to how we do with snapshots and other features. The main overhead is the size of the cachedBlocks GSet. However, we can simply make the size of this GSet configurable, and people who don't want caching can set it to a very small value. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5746) add ShortCircuitSharedMemorySegment
Colin Patrick McCabe created HDFS-5746: -- Summary: add ShortCircuitSharedMemorySegment Key: HDFS-5746 URL: https://issues.apache.org/jira/browse/HDFS-5746 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Add ShortCircuitSharedMemorySegment, which will be used to communicate information between the datanode and the client about whether a replica is mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866545#comment-13866545 ] Colin Patrick McCabe commented on HDFS-5182: A few notes about the planned implementation here: The main idea here is to have a shared memory segment which the DFSClient and Datanode can both read and write. Before each read, the DFSClient will look at this shared memory segment to see if it can be anchored. A segment will be anchorable if the datanode has mlocked it. If the segment can be anchored, the dfsclient will increment the anchor count. Then, the client can read without validating the checksum. When the client is done reading it will decrement the anchor count. These are just memory operations, so they will be fast. Similarly, when the client tries to do a zero-copy read, it will check to see if the segment is anchorable, and increment the anchor count before performing the mmap. The anchor count will stay incremented until the mmap is closed. One exception is if the client passes the ReadOption.SKIP_CHECKSUMS flag. In that case, we do not need to consult the anchor flag because we are willing to tolerate bad data being returned or SIGBUS. Shared memory segments will have a fixed size and contain a series of fixed-size slots. The client will request a shared memory segment via the REQUEST_SHORT_CIRCUIT_FDS operation. Of course, not every REQUEST_SHORT_CIRCUIT_FDS operation needs to get a new shared memory segment, since each segment can hold multiple slots. The client caches these segments and only requests a new one when it needs it. Segments will be closed when no more slots in them are in use. One issue with the shared memory segments discussed here is that when a client terminates, the datanode receives no notification that the shared memory segment it created is no longer needed. For this reason, each shared memory segment will have a domain socket associated with it. The only function of this socket is to cause a close notification to be sent to the datanode when the client closes (or vice versa). (When a UNIX domain socket closes, the remote end gets a close notification). The socket which is used will be the same socket on which the REQUEST_SHORT_CIRCUIT_FDS that fetched the segment was performed. We simply don't put it back into the peer cache. BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid - Key: HDFS-5182 URL: https://issues.apache.org/jira/browse/HDFS-5182 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid. This implies adding a new field to the response to REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the client to the DN, so that the DN can inform the client when the mapped region is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866564#comment-13866564 ] Colin Patrick McCabe commented on HDFS-5746: See here for some notes about the strategy for 5182: https://issues.apache.org/jira/browse/HDFS-5182?focusedCommentId=13866545page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13866545 * Add SharedFileDescriptorFactory. This is a class which can produce anonymous shared memory segments suitable for passing from the DataNode to the DFSClient via file descriptor passing. It would have been nice to do this without JNI, but unfortunately we don't have {{open(O_EXCL)}} support in JDK6, which we're still supporting. There is {{NativeIO#open}}, but it doesn't allow me to cleanly separate {{EEXIST}} errors from other errors (the error gets turned into a textual exception which I don't want to parse). Also there may be some symlink issues with the JDK6 java APIs for listing files in a directory, etc. Overall, the native implementation was just easier. This is something we should probably revisit with JDK7, of course. * Add {{NativeIO#mmap}} and {{NativeIO#munmap}}. Although it would be nicer to use {{FileChannel#map}}, there is no public interface to get access to the virtual memory address of a {{MappedByteBuffer}}, and I needed that. Luckily, the amount of code needed to just call mmap is really small. * I didn't want to duplicate the code used to stuff a reference count + closed bit into {{DomainSocket#refCount}}, so I factored it out into {{CloseableReferenceCount}}. This class is now used in both DomainSocket and {{ShortCircuitSharedMemorySegment}}. * {{DomainSocketWatcher}} is a thread which calls poll() in a loop. This will be used to detect when a DFSClient has closed and its shared memory segments can be closed, by detecting when their associated DomainSockets are closed. I used poll() here rather than select() since select() has some limitations with high-numbered file descriptors on some platforms. Also, poll's interface is a bit simpler. It would have been nice to use Java NIO for this, but {{DomainSocket}} is not integrated with NIO. poll() doesn't scale as well as epoll() and other platform-specific functions, but we don't need it to, since this just for handling clients closing, which should be a relatively infrequent event. We're not using this for handling every packet sent through a webserver or something. * {{ShortCircuitSharedMemorySegment}} is entirely in Java, using {{sun.misc.Unsafe}} for the anchor / unanchor / etc. operations. This is preferrable to using JNI for this, since {{Unsafe#compareAndSwap}} will be inlined by the JVM. (Thanks to [~tlipcon] for pointing out the existence of these functions). add ShortCircuitSharedMemorySegment --- Key: HDFS-5746 URL: https://issues.apache.org/jira/browse/HDFS-5746 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Add ShortCircuitSharedMemorySegment, which will be used to communicate information between the datanode and the client about whether a replica is mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5746: --- Attachment: HDFS-5746.001.patch add ShortCircuitSharedMemorySegment --- Key: HDFS-5746 URL: https://issues.apache.org/jira/browse/HDFS-5746 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Attachments: HDFS-5746.001.patch Add ShortCircuitSharedMemorySegment, which will be used to communicate information between the datanode and the client about whether a replica is mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5746: --- Status: Patch Available (was: Open) add ShortCircuitSharedMemorySegment --- Key: HDFS-5746 URL: https://issues.apache.org/jira/browse/HDFS-5746 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Attachments: HDFS-5746.001.patch Add ShortCircuitSharedMemorySegment, which will be used to communicate information between the datanode and the client about whether a replica is mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null
[ https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866568#comment-13866568 ] Hadoop QA commented on HDFS-5710: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621824/HDFS-5710.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5851//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5851//console This message is automatically generated. FSDirectory#getFullPathName should check inodes against null Key: HDFS-5710 URL: https://issues.apache.org/jira/browse/HDFS-5710 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: Uma Maheswara Rao G Attachments: HDFS-5710.patch, hdfs-5710-output.html From https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/ : {code} 2014-01-01 00:10:15,571 INFO [IPC Server handler 2 on 50198] blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 2014-01-01 00:10:16,559 WARN [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] namenode.FSDirectory(1854): Could not get full path. Corresponding file might have deleted already. 2014-01-01 00:10:16,560 FATAL [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor thread received Runtime exception. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871) at org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482) at org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112) at java.lang.Thread.run(Thread.java:724) {code} Looks like getRelativePathINodes() returned null but getFullPathName() didn't check inodes against null, leading to NPE. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866590#comment-13866590 ] Hadoop QA commented on HDFS-4922: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622145/HDFS-4922-004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5852//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5852//console This message is automatically generated. Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866611#comment-13866611 ] Hadoop QA commented on HDFS-4922: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622145/HDFS-4922-004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5853//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5853//console This message is automatically generated. Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-5721: - Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) I have committed this patch. Thanks Ted and Uma! sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 3.0.0 Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt, hdfs-5721-v3.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866651#comment-13866651 ] Hudson commented on HDFS-5721: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4978 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4978/]) HDFS-5721. sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns. (Ted Yu via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556803) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 3.0.0 Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt, hdfs-5721-v3.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866659#comment-13866659 ] Hadoop QA commented on HDFS-5746: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622157/HDFS-5746.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1550 javac compiler warnings (more than the trunk's current 1545 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5854//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5854//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5854//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5854//console This message is automatically generated. add ShortCircuitSharedMemorySegment --- Key: HDFS-5746 URL: https://issues.apache.org/jira/browse/HDFS-5746 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Attachments: HDFS-5746.001.patch Add ShortCircuitSharedMemorySegment, which will be used to communicate information between the datanode and the client about whether a replica is mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException
Tsz Wo (Nicholas), SZE created HDFS-5747: Summary: BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException Key: HDFS-5747 URL: https://issues.apache.org/jira/browse/HDFS-5747 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo (Nicholas), SZE Found these NPEs in [build #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/]. - BlocksMap is accessed after close: {code} 2014-01-09 04:28:32,350 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 2 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018) ... {code} - expectedLocation can be null. {code} 2014-01-09 04:28:35,384 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 5 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987) ... {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5645) Support upgrade marker in editlog streams
[ https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866749#comment-13866749 ] Tsz Wo (Nicholas), SZE commented on HDFS-5645: -- TestOfflineEditsViewer needs the binary edit log file. TestPersistBlocks is not related. Filed HDFS-5747. Support upgrade marker in editlog streams - Key: HDFS-5645 URL: https://issues.apache.org/jira/browse/HDFS-5645 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch During upgrade, a marker can be inserted into the editlog streams so that it is possible to roll back to the marker transaction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5748) Too much information shown in the dfs health page.
Kihwal Lee created HDFS-5748: Summary: Too much information shown in the dfs health page. Key: HDFS-5748 URL: https://issues.apache.org/jira/browse/HDFS-5748 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kihwal Lee I've noticed that the node lists are shown in the default name node web page. This may be fine for small clusters, but for clusters with 1000s of nodes, this is not ideal. The following should be shown on demand. (Some of them have been there even before the recent rework.) - Detailed data node information - Startup progress - Snapshot information -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5449) WebHdfs compatibility broken between 2.2 and 1.x / 23.x
[ https://issues.apache.org/jira/browse/HDFS-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866829#comment-13866829 ] Daryn Sharp commented on HDFS-5449: --- +1 Looks good. The odd casting isn't a big deal. WebHdfs compatibility broken between 2.2 and 1.x / 23.x --- Key: HDFS-5449 URL: https://issues.apache.org/jira/browse/HDFS-5449 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-5449.patch, HDFS-5449.patch, HDFS-5449.trunk.patch, HDFS-5449.trunk.patch Similarly to HDFS-5403, getFileBlockLocations() fail between old (1.x, 0.23.x) and new (2.x), but this is worse since both directions won't work. This is caused by the removal of name field from the serialized json format of DatanodeInfo. 2.x namenode should include name (ip:port) in the response and 2.x webhdfs client should use name, if ipAddr and xferPort don't exist in the response. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866841#comment-13866841 ] Colin Patrick McCabe commented on HDFS-4922: {code} + Local block reader maintains a chunk buffer, This controls the maximum chunks + can be filled in the chunk buffer for each read. + It would be better to be integral multiple of dfs.bytes-per-checksum {code} You should mention that this is specified in terms of bytes. Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Sheffer updated HDFS-5677: -- Attachment: ha_config_warning.patch Patch to issue warning message for unresolved namenode hostname on startup. Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 Attachments: ha_config_warning.patch If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Sheffer updated HDFS-5677: -- Release Note: Issue a warning message in the logs if a namenode does not resolve properly on startup. Status: Patch Available (was: In Progress) Added a simple check to determine if a namenode hostname has been successfully resolved at startup and, if not, append a WARNing message in the log. Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 Attachments: ha_config_warning.patch If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5747 started by Arpit Agarwal. BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException - Key: HDFS-5747 URL: https://issues.apache.org/jira/browse/HDFS-5747 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Arpit Agarwal Found these NPEs in [build #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/]. - BlocksMap is accessed after close: {code} 2014-01-09 04:28:32,350 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 2 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018) ... {code} - expectedLocation can be null. {code} 2014-01-09 04:28:35,384 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 5 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987) ... {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Sheffer updated HDFS-5677: -- Attachment: (was: ha_config_warning.patch) Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866892#comment-13866892 ] Vincent Sheffer commented on HDFS-5677: --- Just a note: I deleted the submitted patch because I did not follow the steps outlined in the http://wiki.apache.org/hadoop/HowToContribute. Following proper procedure now... Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5449) WebHdfs compatibility broken between 2.2 and 1.x / 23.x
[ https://issues.apache.org/jira/browse/HDFS-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5449: - Attachment: HDFS-5449.branch-2.patch The branch-2 version of patch is attached. It is a straight port of the trunk version. The difference is due to the use of the new get methods in trunk. Locally tested. PreCommit won't succeed as the patch won't apply to trunk. WebHdfs compatibility broken between 2.2 and 1.x / 23.x --- Key: HDFS-5449 URL: https://issues.apache.org/jira/browse/HDFS-5449 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-5449.branch-2.patch, HDFS-5449.patch, HDFS-5449.patch, HDFS-5449.trunk.patch, HDFS-5449.trunk.patch Similarly to HDFS-5403, getFileBlockLocations() fail between old (1.x, 0.23.x) and new (2.x), but this is worse since both directions won't work. This is caused by the removal of name field from the serialized json format of DatanodeInfo. 2.x namenode should include name (ip:port) in the response and 2.x webhdfs client should use name, if ipAddr and xferPort don't exist in the response. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Sheffer updated HDFS-5677: -- Status: Open (was: Patch Available) Canceling until I have re-submit following patch submittal process. Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5449) WebHdfs compatibility broken between 2.2 and 1.x / 23.x
[ https://issues.apache.org/jira/browse/HDFS-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5449: - Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. WebHdfs compatibility broken between 2.2 and 1.x / 23.x --- Key: HDFS-5449 URL: https://issues.apache.org/jira/browse/HDFS-5449 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5449.branch-2.patch, HDFS-5449.patch, HDFS-5449.patch, HDFS-5449.trunk.patch, HDFS-5449.trunk.patch Similarly to HDFS-5403, getFileBlockLocations() fail between old (1.x, 0.23.x) and new (2.x), but this is worse since both directions won't work. This is caused by the removal of name field from the serialized json format of DatanodeInfo. 2.x namenode should include name (ip:port) in the response and 2.x webhdfs client should use name, if ipAddr and xferPort don't exist in the response. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5449) WebHdfs compatibility broken between 2.2 and 1.x / 23.x
[ https://issues.apache.org/jira/browse/HDFS-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866954#comment-13866954 ] Hudson commented on HDFS-5449: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4979 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4979/]) HDFS-5449. WebHdfs compatibility broken between 2.2 and 1.x / 23.x. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556927) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestJsonUtil.java WebHdfs compatibility broken between 2.2 and 1.x / 23.x --- Key: HDFS-5449 URL: https://issues.apache.org/jira/browse/HDFS-5449 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5449.branch-2.patch, HDFS-5449.patch, HDFS-5449.patch, HDFS-5449.trunk.patch, HDFS-5449.trunk.patch Similarly to HDFS-5403, getFileBlockLocations() fail between old (1.x, 0.23.x) and new (2.x), but this is worse since both directions won't work. This is caused by the removal of name field from the serialized json format of DatanodeInfo. 2.x namenode should include name (ip:port) in the response and 2.x webhdfs client should use name, if ipAddr and xferPort don't exist in the response. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5747: Status: Patch Available (was: In Progress) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException - Key: HDFS-5747 URL: https://issues.apache.org/jira/browse/HDFS-5747 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Arpit Agarwal Attachments: HDFS-5747.01.patch Found these NPEs in [build #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/]. - BlocksMap is accessed after close: {code} 2014-01-09 04:28:32,350 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 2 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018) ... {code} - expectedLocation can be null. {code} 2014-01-09 04:28:35,384 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 5 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987) ... {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5747: Attachment: HDFS-5747.01.patch Thanks for reporting this Nicholas. The first NPE looks like a preexisting bug. Shutting down {{namesystem}} before {{rpcServer}} is probably the root cause. {code:java} private void stopCommonServices() { if(namesystem != null) namesystem.close(); if(rpcServer != null) rpcServer.stop(); {code} The second NPE looks like a regression and it is an easy fix. The attached patch addresses both. BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException - Key: HDFS-5747 URL: https://issues.apache.org/jira/browse/HDFS-5747 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Arpit Agarwal Attachments: HDFS-5747.01.patch Found these NPEs in [build #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/]. - BlocksMap is accessed after close: {code} 2014-01-09 04:28:32,350 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 2 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018) ... {code} - expectedLocation can be null. {code} 2014-01-09 04:28:35,384 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 5 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987) ... {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block
[ https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866968#comment-13866968 ] Uma Maheswara Rao G commented on HDFS-5728: --- Is this case happened only if we restart DN where crc has less data? as we convert all RBW replica states to RWR and here length will be calculated based on crc chunks. If that is the case, how about just setting the file length also to same after creating RWR state? [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block -- Key: HDFS-5728 URL: https://issues.apache.org/jira/browse/HDFS-5728 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5728.patch 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is not one time upload, data will be written slowly. 2. One of the DataNode got diskfull ( due to some other data filled up disks) 3. Unfortunately block was being written to only this datanode in cluster, so client write has also failed. 4. After some time disk is made free and all processes are restarted. 5. Now HMaster try to recover the file by calling recoverLease. At this time recovery was failing saying file length mismatch. When checked, actual block file length: 62484480 Calculated block length: 62455808 This was because, metafile was having crc for only 62455808 bytes, and it considered 62455808 as the block size. No matter how many times, recovery was continously failing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5749) Access time of HDFS directories stays at 1969-12-31
Yongjun Zhang created HDFS-5749: --- Summary: Access time of HDFS directories stays at 1969-12-31 Key: HDFS-5749 URL: https://issues.apache.org/jira/browse/HDFS-5749 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Modify FsShell so that fs -lsr can show access time in addition to modification time, the access time stays at 1969-12-31. This means the access time is not set up initially. Filing this jira to fix this issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-198) org.apache.hadoop.dfs.LeaseExpiredException during dfs write
[ https://issues.apache.org/jira/browse/HDFS-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867013#comment-13867013 ] Dhanasekaran Anbalagan commented on HDFS-198: - Hi All, getting same error. on hive External table, I am using hive-common-0.10.0-cdh4.4.0. In my case. we are using sqoop to import data with table. table stored data in rc file format. I am only facing issue with external table. 4/01/08 12:21:40 INFO mapred.JobClient: Task Id : attempt_201312121801_0049_m_00_0, Status : FAILED org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /dv_data_warehouse/dv_eod_performance_report/_DYN0.337789259996055/trade_date=__HIVE_DEFAULT_PARTITION__/client=__HIVE_DEFAULT_PARTITION__/install=__HIVE_DEFAULT_PARTITION__/_temporary/_attempt_201312121801_0049_m_00_0/part-m-0: File is not open for writing. Holder DFSClient_NONMAPREDUCE_-794488327_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2452) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2262) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.pro attempt_201312121801_0049_m_00_0: SLF4J: Class path contains multiple SLF4J bindings. attempt_201312121801_0049_m_00_0: SLF4J: Found binding in [jar:file:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-simple-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201312121801_0049_m_00_0: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201312121801_0049_m_00_0: SLF4J: Found binding in [jar:file:/disk1/mapred/local/taskTracker/tech/distcache/-6782344428220505463_-433811577_1927241260/nameservice1/user/tech/.staging/job_201312121801_0049/libjars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201312121801_0049_m_00_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 14/01/08 12:21:55 INFO mapred.JobClient: Task Id : attempt_201312121801_0049_m_00_1, Status : FAILED org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /dv_data_warehouse/dv_eod_performance_report/_DYN0.337789259996055/trade_date=__HIVE_DEFAULT_PARTITION__/client=__HIVE_DEFAULT_PARTITION__/install=__HIVE_DEFAULT_PARTITION__/_temporary/_attempt_201312121801_0049_m_00_1/part-m-0: File is not open for writing. Holder DFSClient_NONMAPREDUCE_-390991563_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2452) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2262) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.pro attempt_201312121801_0049_m_00_1: SLF4J: Class path contains multiple SLF4J bindings. attempt_201312121801_0049_m_00_1: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201312121801_0049_m_00_1: SLF4J: Found binding in [jar:file:/disk1/mapred/local/taskTracker/tech/distcache/7281954290425601736_-433811577_1927241260/nameservice1/user/tech/.staging/job_201312121801_0049/libjars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201312121801_0049_m_00_1: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 14/01/08 12:22:12 INFO mapred.JobClient: Task Id : attempt_201312121801_0049_m_00_2, Status : FAILED org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /dv_data_warehouse/dv_eod_performance_report/_DYN0.337789259996055/trade_date=__HIVE_DEFAULT_PARTITION__/client=__HIVE_DEFAULT_PARTITION__/install=__HIVE_DEFAULT_PARTITION__/_temporary/_attempt_201312121801_0049_m_00_2/part-m-0: File is not open for writing. Holder DFSClient_NONMAPREDUCE_1338126902_1 does not have any open files. at
[jira] [Commented] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867025#comment-13867025 ] Hadoop QA commented on HDFS-5677: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1267/ha_config_warning.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestPersistBlocks {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5855//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5855//console This message is automatically generated. Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867031#comment-13867031 ] Eric Sirianni commented on HDFS-5483: - This {{BLOCK_RECEIVED}} code path appears to modify the {{BlockInfo}}} list directly: {noformat} BlockInfo.listInsert(BlockInfo, DatanodeStorageInfo) line: 308 DatanodeStorageInfo.addBlock(BlockInfo) line: 208 DatanodeDescriptor.addBlock(String, BlockInfo) line: 168 BlockManager.addStoredBlock(BlockInfo, DatanodeDescriptor, String, DatanodeDescriptor, boolean) line: 2215 BlockManager.processAndHandleReportedBlock(DatanodeDescriptor, String, Block, HdfsServerConstants$ReplicaState, DatanodeDescriptor) line: 2720 BlockManager.addBlock(DatanodeDescriptor, String, Block, String) line: 2695 BlockManager.processIncrementalBlockReport(DatanodeID, String, StorageReceivedDeletedBlocks) line: 2769 FSNamesystem.processIncrementalBlockReport(DatanodeID, String, StorageReceivedDeletedBlocks) line: 5285 NameNodeRpcServer.blockReceivedAndDeleted(DatanodeRegistration, String, StorageReceivedDeletedBlocks[]) line: 993 {noformat} Couldn't this corrupt the {{BlockInfo}} list if a datanode sent two {{BLOCK_RECEIVED}}s for two different storages? NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Fix For: 3.0.0 Attachments: h5483.02.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5645) Support upgrade marker in editlog streams
[ https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5645: Hadoop Flags: Reviewed Support upgrade marker in editlog streams - Key: HDFS-5645 URL: https://issues.apache.org/jira/browse/HDFS-5645 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch During upgrade, a marker can be inserted into the editlog streams so that it is possible to roll back to the marker transaction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5645) Support upgrade marker in editlog streams
[ https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867060#comment-13867060 ] Jing Zhao commented on HDFS-5645: - +1 Patch looks good to me. Only one question: in the patch the editlog loader currently just stops when it hits the upgrade marker. I guess we will have more sophisticated actions in later jiras? Support upgrade marker in editlog streams - Key: HDFS-5645 URL: https://issues.apache.org/jira/browse/HDFS-5645 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch During upgrade, a marker can be inserted into the editlog streams so that it is possible to roll back to the marker transaction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5750) JHLogAnalyzer#parseLogFile() should close stm upon return
Ted Yu created HDFS-5750: Summary: JHLogAnalyzer#parseLogFile() should close stm upon return Key: HDFS-5750 URL: https://issues.apache.org/jira/browse/HDFS-5750 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Priority: Minor stm is assigned to in But stm may point to another InputStream : {code} if(compressionClass != null) { CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(compressionClass, new Configuration()); in = codec.createInputStream(stm); {code} stm should be closed in the finally block. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867073#comment-13867073 ] Todd Lipcon commented on HDFS-5182: --- bq. it will check to see if the segment is anchorable, and increment the anchor count before performing the mmap. The anchor count will stay incremented until the mmap is closed. That seems much longer than necessary -- don't we want clients to be able to keep mmaps around in their cache for very long periods of time? And then, when the user requests the read, we can anchor the mmap only for the duration of time for which the user holds onto the zero-copy buffer? Once the user returns the zero-copy buffer, we can decrement the count and allow the DN to evict the block from the cache. bq. One exception is if the client passes the ReadOption.SKIP_CHECKSUMS flag. In that case, we do not need to consult the anchor flag because we are willing to tolerate bad data being returned or SIGBUS. I disagree on this. Just because you want to skip checksumming doesn't mean you can tolerate SIGBUS. For example, many file formats have their own checksums, so we can safely skip HDFS checksumming, but we still want to ensure that we're only reading locked (i.e safe) memory via mmap. bq. The only function of this socket is to cause a close notification to be sent to the datanode when the client closes (or vice versa). (When a UNIX domain socket closes, the remote end gets a close notification). Maybe this can be put into a separate JIRA, and first implement just a very simple timeout-based mechanism? The DN could change the anchor flag to a magic value which invalidates the segment and then close it after some amount of time. Then if the client looks at it again it will know to invalidate. BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid - Key: HDFS-5182 URL: https://issues.apache.org/jira/browse/HDFS-5182 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid. This implies adding a new field to the response to REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the client to the DN, so that the DN can inform the client when the mapped region is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867075#comment-13867075 ] Arpit Agarwal commented on HDFS-5483: - {{BlockInfo#addStorage}} checks for it. {code} boolean addStorage(DatanodeStorageInfo storage) { int idx = findDatanode(storage.getDatanodeDescriptor()); ... // The block is on the DN but belongs to a different storage. // Update our state. {code} NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Fix For: 3.0.0 Attachments: h5483.02.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Pluggable interface for replica counting
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867078#comment-13867078 ] Eric Sirianni commented on HDFS-5318: - I will work on a patch that addresses your points above and update the JIRA. Pluggable interface for replica counting Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Eric Sirianni Attachments: HDFS-5318.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867110#comment-13867110 ] Colin Patrick McCabe commented on HDFS-5182: bq. That seems much longer than necessary – don't we want clients to be able to keep mmaps around in their cache for very long periods of time? And then, when the user requests the read, we can anchor the mmap only for the duration of time for which the user holds onto the zero-copy buffer? Once the user returns the zero-copy buffer, we can decrement the count and allow the DN to evict the block from the cache. Sorry, I was unclear. When I said closed I mean that the user had returned the zero-copy buffer. So the same thing you suggested. bq. I disagree on this. Just because you want to skip checksumming doesn't mean you can tolerate SIGBUS. For example, many file formats have their own checksums, so we can safely skip HDFS checksumming, but we still want to ensure that we're only reading locked (i.e safe) memory via mmap. What I was referring to here is where a client has specifically requested an mmap region using the zero-copy API and the SKIP_CHECKSUMS option. In that case, the user is clearly going to be reading without any guarantees from us. If the user just uses the normal (non-zero-copy, non-mmap) read path, SIGBUS will not be an issue. (There have been some proposals to improve the SIGBUS situation for zero-copy reads without mlock, but they're certainly out of scope for this JIRA.) bq. Maybe this can be put into a separate JIRA, and first implement just a very simple timeout-based mechanism? The DN could change the anchor flag to a magic value which invalidates the segment and then close it after some amount of time. Then if the client looks at it again it will know to invalidate. Timeouts and two-way protocols get complex. I already have the code for closing the shared memory segment based on listening for the remote socket getting closed. As for where the socket comes from-- we just don't put the socket we used to get the FDs in the first place back into the peer cache. BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid - Key: HDFS-5182 URL: https://issues.apache.org/jira/browse/HDFS-5182 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid. This implies adding a new field to the response to REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the client to the DN, so that the DN can inform the client when the mapped region is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867102#comment-13867102 ] Eric Sirianni commented on HDFS-5483: - OK - thanks, missed that guard. {code} boolean addBlock(BlockInfo b) { if(!b.addStorage(this)) return false; {code} NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Fix For: 3.0.0 Attachments: h5483.02.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage
[ https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867122#comment-13867122 ] Todd Lipcon commented on HDFS-5722: --- Yea, I don't really see the point here. It seems the motivation is a possible optimization when the NN needs to skip a large section of the image which it doesn't understand. That's only going to happen on a downgrade scenario, which is rare and not on a hot path. Plus, do we have examples of _large_ new sections we plan on adding to the image? Sure, we've added things in the past like a list of snapshots, but they're typically pretty short. The example of skipping the entire inodes section seems pretty contrived to me. HDFS-1435 did show that adding compression slowed down the loading. But that's because the decompression is on the same thread and the loading is a single-threaded process. It would really be pretty easy to move the decompression work onto another core, at which point reading less data is definitely going to be faster. Another important factor is the network bandwidth used when one of the image dirs is on NFS. Many deployments use this for backup. Or, even if the NN isn't directly writing to NFS, some cron job is backing up the image on a regular basis using normal OS tools like rsync/scp over the network. Implement compression in the HTTP server of SNN / SBN instead of FSImage Key: HDFS-5722 URL: https://issues.apache.org/jira/browse/HDFS-5722 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai The current FSImage format support compression, there is a field in the header which specifies the compression codec used to compress the data in the image. The main motivation was to reduce the number of bytes to be transferred between SNN / SBN / NN. The main disadvantage, however, is that it requires the client to access the FSImage in strictly sequential order. This might not fit well with the new design of FSImage. For example, serializing the data in protobuf allows the client to quickly skip data that it does not understand. The compression built-in the format, however, complicates the calculation of offsets and lengths. Recovering from a corrupted, compressed FSImage is also non-trivial as off-the-shelf tools like bzip2recover is inapplicable. This jira proposes to move the compression from the format of the FSImage to the transport layer, namely, the HTTP server of SNN / SBN. This design simplifies the format of FSImage, opens up the opportunity to quickly navigate through the FSImage, and eases the process of recovery. It also retains the benefits of reducing the number of bytes to be transferred across the wire since there are compression on the transport layer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage
[ https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867125#comment-13867125 ] Todd Lipcon commented on HDFS-5722: --- BTW, if you really see a use case for random-accessing portions of the image, we could put an uncompressed trailer PB at the end of the file, which contains the section descriptors with their physical offsets, sizes, and type information. That would allow you to arbitrarily read a section without having to skip() through the others. Implement compression in the HTTP server of SNN / SBN instead of FSImage Key: HDFS-5722 URL: https://issues.apache.org/jira/browse/HDFS-5722 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai The current FSImage format support compression, there is a field in the header which specifies the compression codec used to compress the data in the image. The main motivation was to reduce the number of bytes to be transferred between SNN / SBN / NN. The main disadvantage, however, is that it requires the client to access the FSImage in strictly sequential order. This might not fit well with the new design of FSImage. For example, serializing the data in protobuf allows the client to quickly skip data that it does not understand. The compression built-in the format, however, complicates the calculation of offsets and lengths. Recovering from a corrupted, compressed FSImage is also non-trivial as off-the-shelf tools like bzip2recover is inapplicable. This jira proposes to move the compression from the format of the FSImage to the transport layer, namely, the HTTP server of SNN / SBN. This design simplifies the format of FSImage, opens up the opportunity to quickly navigate through the FSImage, and eases the process of recovery. It also retains the benefits of reducing the number of bytes to be transferred across the wire since there are compression on the transport layer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867163#comment-13867163 ] Hadoop QA commented on HDFS-5747: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622242/HDFS-5747.01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestHttpsFileSystem {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5856//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5856//console This message is automatically generated. BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException - Key: HDFS-5747 URL: https://issues.apache.org/jira/browse/HDFS-5747 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Arpit Agarwal Attachments: HDFS-5747.01.patch Found these NPEs in [build #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/]. - BlocksMap is accessed after close: {code} 2014-01-09 04:28:32,350 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 2 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018) ... {code} - expectedLocation can be null. {code} 2014-01-09 04:28:35,384 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 5 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987) ... {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867175#comment-13867175 ] Todd Lipcon commented on HDFS-5182: --- bq. What I was referring to here is where a client has specifically requested an mmap region using the zero-copy API and the SKIP_CHECKSUMS option. In that case, the user is clearly going to be reading without any guarantees from us. If the user just uses the normal (non-zero-copy, non-mmap) read path, SIGBUS will not be an issue. Why not two separate flags? One flag saying SKIP_CHECKSUMS (ie I will do my own checksumming) and another flag for NO_REQUIRE_MLOCK or UNSAFE_IO or something, which means you're OK with SIGBUS. ie there are really three levels of guarantee we can provide: 1) Normal HDFS semantics: a read will only return correct data, and if it fails, a nice error code will return. 2) Skip-checksums semantics: a read will return data which might be corrupt. If it fails, a nice error code will return. 3) Unsafe semantics: a read will return data which might be corrupt. If it fails, either a nice error code or a SIGBUS. There are a lot of applications that are OK with #2 but not #3. #3 is really hard to deal with since a bad disk in the cluster would SIGBUS everything running on the machine pretty fast, and we don't currently have any way of handling it. BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid - Key: HDFS-5182 URL: https://issues.apache.org/jira/browse/HDFS-5182 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid. This implies adding a new field to the response to REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the client to the DN, so that the DN can inform the client when the mapped region is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Sheffer updated HDFS-5677: -- Attachment: HDFS-5677.patch Resubmitting the patch having now followed the relevant process prior to doing so. Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 Attachments: HDFS-5677.patch If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Sheffer updated HDFS-5677: -- Release Note: (was: Issue a warning message in the logs if a namenode does not resolve properly on startup.) Status: Patch Available (was: Open) I am submitting again with the proper naming convention. *NOTE:* I have not added or modified any tests because the change results in no side effects other than (possibly) an additional logging message if the one or more namenodes do not resolve. Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 Attachments: HDFS-5677.patch If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5742: Status: Patch Available (was: Open) DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, HDFS-5742.03.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5747) BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867256#comment-13867256 ] Jing Zhao commented on HDFS-5747: - The analysis for the root cause of NPE makes sense to me. +1 for the patch. Besides we may also want to keep running the unit test overnight to make sure NPE is gone. BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException - Key: HDFS-5747 URL: https://issues.apache.org/jira/browse/HDFS-5747 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Arpit Agarwal Attachments: HDFS-5747.01.patch Found these NPEs in [build #5849|https://builds.apache.org/job/PreCommit-HDFS-Build/5849//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/]. - BlocksMap is accessed after close: {code} 2014-01-09 04:28:32,350 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 2 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted from 127.0.0.1:55572 Call#32 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getStoredBlock(BlocksMap.java:113) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:1915) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2698) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2685) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2759) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5321) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1018) ... {code} - expectedLocation can be null. {code} 2014-01-09 04:28:35,384 WARN ipc.Server (Server.java:run(2060)) - IPC Server handler 5 on 58333, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 127.0.0.1:55583 Call#47 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.addReplicaIfNotPresent(BlockInfoUnderConstruction.java:331) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:1801) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1645) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:987) ... {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5738: - Attachment: HDFS-5738.002.patch Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, HDFS-5738.002.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeImpl interfaces
Arpit Agarwal created HDFS-5751: --- Summary: Remove the FsDatasetSpi and FsVolumeImpl interfaces Key: HDFS-5751 URL: https://issues.apache.org/jira/browse/HDFS-5751 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, test Affects Versions: 3.0.0 Reporter: Arpit Agarwal The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns blank data for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can get eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeImpl interfaces
[ https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5751: Description: The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns zeroes for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. was: The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns zeroes for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can get eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. Remove the FsDatasetSpi and FsVolumeImpl interfaces --- Key: HDFS-5751 URL: https://issues.apache.org/jira/browse/HDFS-5751 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, test Affects Versions: 3.0.0 Reporter: Arpit Agarwal The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns zeroes for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeImpl interfaces
[ https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5751: Description: The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns zeroes for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can get eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. was: The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns blank data for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can get eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. Remove the FsDatasetSpi and FsVolumeImpl interfaces --- Key: HDFS-5751 URL: https://issues.apache.org/jira/browse/HDFS-5751 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, test Affects Versions: 3.0.0 Reporter: Arpit Agarwal The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns zeroes for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can get eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5711) Removing memory limitation of the Namenode by persisting Block - Block location mappings to disk.
[ https://issues.apache.org/jira/browse/HDFS-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohan Pasalkar updated HDFS-5711: - Priority: Minor (was: Major) Removing memory limitation of the Namenode by persisting Block - Block location mappings to disk. - Key: HDFS-5711 URL: https://issues.apache.org/jira/browse/HDFS-5711 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Rohan Pasalkar Priority: Minor This jira is to track changes to be made to remove HDFS name-node memory limitation to hold block - block location mappings. It is a known fact that the single Name-node architecture of HDFS has scalability limits. The HDFS federation project alleviates this problem by using horizontal scaling. This helps increase the throughput of metadata operation and also the amount of data that can be stored in a Hadoop cluster. The Name-node stores all the filesystem metadata in memory (even in the federated architecture), the Name-node design can be enhanced by persisting part of the metadata onto secondary storage and retaining the popular or recently accessed metadata information in main memory. This design can benefit a HDFS deployment which doesn't use federation but needs to store a large number of files or large number of blocks. Lin Xiao from Hortonworks attempted a similar project [1] in the Summer of 2013. They used LevelDB to persist the Namespace information (i.e file and directory inode information). A patch with this change is yet to be submitted to code base. We also intend to use LevelDB to persist metadata, and plan to provide a complete solution, by not just persisting the Namespace information but also the Blocks Map onto secondary storage. We did implement the basic prototype which stores the block-block location mapping metadata to the persistent key-value store i.e. levelDB. Prototype also maintains the in-memory cache of the recently used block-block location mappings metadata. References: [1] Lin Xiao, Hortonworks, Removing Name-node’s memory limitation, HDFS-5389, http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5711) Removing memory limitation of the Namenode by persisting Block - Block location mappings to disk.
[ https://issues.apache.org/jira/browse/HDFS-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohan Pasalkar updated HDFS-5711: - Issue Type: Sub-task (was: Improvement) Parent: HDFS-2362 Removing memory limitation of the Namenode by persisting Block - Block location mappings to disk. - Key: HDFS-5711 URL: https://issues.apache.org/jira/browse/HDFS-5711 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Rohan Pasalkar This jira is to track changes to be made to remove HDFS name-node memory limitation to hold block - block location mappings. It is a known fact that the single Name-node architecture of HDFS has scalability limits. The HDFS federation project alleviates this problem by using horizontal scaling. This helps increase the throughput of metadata operation and also the amount of data that can be stored in a Hadoop cluster. The Name-node stores all the filesystem metadata in memory (even in the federated architecture), the Name-node design can be enhanced by persisting part of the metadata onto secondary storage and retaining the popular or recently accessed metadata information in main memory. This design can benefit a HDFS deployment which doesn't use federation but needs to store a large number of files or large number of blocks. Lin Xiao from Hortonworks attempted a similar project [1] in the Summer of 2013. They used LevelDB to persist the Namespace information (i.e file and directory inode information). A patch with this change is yet to be submitted to code base. We also intend to use LevelDB to persist metadata, and plan to provide a complete solution, by not just persisting the Namespace information but also the Blocks Map onto secondary storage. We did implement the basic prototype which stores the block-block location mapping metadata to the persistent key-value store i.e. levelDB. Prototype also maintains the in-memory cache of the recently used block-block location mappings metadata. References: [1] Lin Xiao, Hortonworks, Removing Name-node’s memory limitation, HDFS-5389, http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867377#comment-13867377 ] Hadoop QA commented on HDFS-5677: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622277/HDFS-5677.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5857//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5857//console This message is automatically generated. Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 Attachments: HDFS-5677.patch If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5589) Namenode loops caching and uncaching when data should be uncached
[ https://issues.apache.org/jira/browse/HDFS-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5589: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Resolving as this was committed to trunk. Namenode loops caching and uncaching when data should be uncached - Key: HDFS-5589 URL: https://issues.apache.org/jira/browse/HDFS-5589 Project: Hadoop HDFS Issue Type: Sub-task Components: caching, namenode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 3.0.0 Attachments: hdfs-5589-1.patch, hdfs-5589-2.patch This was reported by [~cnauroth] and [~brandonli], and [~schu] repro'd it too. If you add a new caching directive then remove it, the Namenode will sometimes get stuck in a loop where it sends DNA_CACHE and then DNA_UNCACHE repeatedly to the datanodes where the data was previously cached. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867407#comment-13867407 ] Hadoop QA commented on HDFS-5742: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622126/HDFS-5742.03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5858//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5858//console This message is automatically generated. DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch, HDFS-5742.03.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeImpl interfaces
[ https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867409#comment-13867409 ] Andrew Wang commented on HDFS-5751: --- Hey Arpit, there have been a few people working on improving this FsDatasetSpi with the interest of backing the DN with their own alternate storage systems (e.g. PCI flash or filter). You can see HDFS-5194 for the details. Since this interface is intended to eventually be public and stable, I don't think it's okay to remove it. Remove the FsDatasetSpi and FsVolumeImpl interfaces --- Key: HDFS-5751 URL: https://issues.apache.org/jira/browse/HDFS-5751 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, test Affects Versions: 3.0.0 Reporter: Arpit Agarwal The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns zeroes for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated HDFS-4922: -- Attachment: HDFS-4922-005.patch Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867464#comment-13867464 ] Fengdong Yu commented on HDFS-4922: --- Thanks [~ajisakaa] for renew the patch, [~cmccabe] review. I uploaded a new patch to address all the comments. Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block
[ https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867487#comment-13867487 ] Vinay commented on HDFS-5728: - bq. Is this case happened only if we restart DN where crc has less data? Yes bq. as we convert all RBW replica states to RWR and here length will be calculated based on crc chunks. If that is the case, how about just setting the file length also to same after creating RWR state? I too thought of same thing. That will be a implicit truncation without recovery being called. But I felt better we come through recovery flow itself and do truncation only on demand [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block -- Key: HDFS-5728 URL: https://issues.apache.org/jira/browse/HDFS-5728 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5728.patch 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is not one time upload, data will be written slowly. 2. One of the DataNode got diskfull ( due to some other data filled up disks) 3. Unfortunately block was being written to only this datanode in cluster, so client write has also failed. 4. After some time disk is made free and all processes are restarted. 5. Now HMaster try to recover the file by calling recoverLease. At this time recovery was failing saying file length mismatch. When checked, actual block file length: 62484480 Calculated block length: 62455808 This was because, metafile was having crc for only 62455808 bytes, and it considered 62455808 as the block size. No matter how many times, recovery was continously failing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5645) Support upgrade marker in editlog streams
[ https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867530#comment-13867530 ] Tsz Wo (Nicholas), SZE commented on HDFS-5645: -- It is correct to just stop at the upgrade marker for supporting rollback. We should also add ignoring the upgrade marker for the standby NN and supporting downgrade. Thanks for reviewing the patch! Support upgrade marker in editlog streams - Key: HDFS-5645 URL: https://issues.apache.org/jira/browse/HDFS-5645 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch During upgrade, a marker can be inserted into the editlog streams so that it is possible to roll back to the marker transaction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867532#comment-13867532 ] Hadoop QA commented on HDFS-4922: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622336/HDFS-4922-005.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5859//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5859//console This message is automatically generated. Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5645) Support upgrade marker in editlog streams
[ https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5645: - Resolution: Fixed Fix Version/s: HDFS-5535 (Rolling upgrades) Status: Resolved (was: Patch Available) I have committed this. Support upgrade marker in editlog streams - Key: HDFS-5645 URL: https://issues.apache.org/jira/browse/HDFS-5645 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: HDFS-5535 (Rolling upgrades) Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch During upgrade, a marker can be inserted into the editlog streams so that it is possible to roll back to the marker transaction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5752) Add a new DFSAdminCommand for rolling upgrade
Tsz Wo (Nicholas), SZE created HDFS-5752: Summary: Add a new DFSAdminCommand for rolling upgrade Key: HDFS-5752 URL: https://issues.apache.org/jira/browse/HDFS-5752 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5753) Add new NN startup options for downgrade and rollback using upgrade marker
Tsz Wo (Nicholas), SZE created HDFS-5753: Summary: Add new NN startup options for downgrade and rollback using upgrade marker Key: HDFS-5753 URL: https://issues.apache.org/jira/browse/HDFS-5753 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Namenode could be started up with -upgrade / -rollback. The -rollback option will restore the data using the previous directory. New options are needed for downgrade and rollback using the upgrade marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
Tsz Wo (Nicholas), SZE created HDFS-5754: Summary: Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion Key: HDFS-5754 URL: https://issues.apache.org/jira/browse/HDFS-5754 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Tsz Wo (Nicholas), SZE Currently, LayoutVersion defines the on-disk data format and supported features of the entire cluster including NN and DNs. LayoutVersion is persisted in both NN and DNs. When a NN/DN starts up, it checks its supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a different LayoutVersion than NN cannot register with the NN. We propose to split LayoutVersion into two independent values that are local to the nodes: - NamenodeLayoutVersion - defines the on-disk data format in NN, including the format of FSImage, editlog and the directory structure. - DatanodeLayoutVersion - defines the on-disk data format in DN, including the format of block data file, metadata file, block pool layout, and the directory structure. The LayoutVersion check will be removed in DN registration. If NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion
[ https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE reassigned HDFS-5754: Assignee: Brandon Li Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion Key: HDFS-5754 URL: https://issues.apache.org/jira/browse/HDFS-5754 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Brandon Li Currently, LayoutVersion defines the on-disk data format and supported features of the entire cluster including NN and DNs. LayoutVersion is persisted in both NN and DNs. When a NN/DN starts up, it checks its supported LayoutVersion against the on-disk LayoutVersion. Also, a DN with a different LayoutVersion than NN cannot register with the NN. We propose to split LayoutVersion into two independent values that are local to the nodes: - NamenodeLayoutVersion - defines the on-disk data format in NN, including the format of FSImage, editlog and the directory structure. - DatanodeLayoutVersion - defines the on-disk data format in DN, including the format of block data file, metadata file, block pool layout, and the directory structure. The LayoutVersion check will be removed in DN registration. If NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling upgrade, then only rollback is supported and downgrade is not. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5753) Add new NN startup options for downgrade and rollback using upgrade marker
[ https://issues.apache.org/jira/browse/HDFS-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE reassigned HDFS-5753: Assignee: Tsz Wo (Nicholas), SZE Add new NN startup options for downgrade and rollback using upgrade marker --- Key: HDFS-5753 URL: https://issues.apache.org/jira/browse/HDFS-5753 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Namenode could be started up with -upgrade / -rollback. The -rollback option will restore the data using the previous directory. New options are needed for downgrade and rollback using the upgrade marker. -- This message was sent by Atlassian JIRA (v6.1.5#6160)