[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827418#comment-13827418 ] Hadoop QA commented on HDFS-2832: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614794/h2832_20131119b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 45 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5499//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5499//console This message is automatically generated. > Enable support for heterogeneous storages in HDFS > - > > Key: HDFS-2832 > URL: https://issues.apache.org/jira/browse/HDFS-2832 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, > editsStored, h2832_20131023.patch, h2832_20131023b.patch, > h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, > h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, > h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, > h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, > h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, > h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch > > > HDFS currently supports configuration where storages are a list of > directories. Typically each of these directories correspond to a volume with > its own file system. All these directories are homogeneous and therefore > identified as a single storage at the namenode. I propose, change to the > current model where Datanode * is a * storage, to Datanode * is a collection > * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832
[ https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827387#comment-13827387 ] Junping Du commented on HDFS-5527: -- Thanks Arpit for the patch. +1. Patch looks good to me. Also verify a few iterations of tests that all passed. > Fix TestUnderReplicatedBlocks on branch HDFS-2832 > - > > Key: HDFS-5527 > URL: https://issues.apache.org/jira/browse/HDFS-5527 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Attachments: HDFS-5527.patch, h5527.02.patch > > > The failure seems like a deadlock, which is show in: > https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832
[ https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned HDFS-5527: Assignee: Arpit Agarwal (was: Junping Du) > Fix TestUnderReplicatedBlocks on branch HDFS-2832 > - > > Key: HDFS-5527 > URL: https://issues.apache.org/jira/browse/HDFS-5527 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Arpit Agarwal > Attachments: HDFS-5527.patch, h5527.02.patch > > > The failure seems like a deadlock, which is show in: > https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-3215) Block size is logging as zero Even blockrecevied command received by DN
[ https://issues.apache.org/jira/browse/HDFS-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827383#comment-13827383 ] Hadoop QA commented on HDFS-3215: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614783/HDFS-3215.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5498//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5498//console This message is automatically generated. > Block size is logging as zero Even blockrecevied command received by DN > > > Key: HDFS-3215 > URL: https://issues.apache.org/jira/browse/HDFS-3215 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Brahma Reddy Battula >Assignee: Shinichi Yamashita >Priority: Minor > Attachments: HDFS-3215.patch, HDFS-3215.patch > > > Scenario 1 > == > Start NN and DN. > write file. > Block size is logging as zero Even blockrecevied command received by DN > *NN log* > 2012-03-14 20:23:40,541 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: /hadoop-create-user.sh._COPYING_. > BP-1166515020-10.18.40.24-1331736264353 > blk_1264419582929433995_1002{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[XXX:50010|RBW]]} > 2012-03-14 20:24:26,357 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > addStoredBlock: blockMap updated: XXX:50010 is added to > blk_1264419582929433995_1002{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[XXX:50010|RBW]]} > size 0 > *DN log* > 2012-03-14 20:24:17,519 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving block > BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002 src: > /XXX:53141 dest: /XXX:50010 > 2012-03-14 20:24:26,517 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /XXX:53141, dest: /XXX:50010, bytes: 512, op: HDFS_WRITE, cliID: > DFSClient_NONMAPREDUCE_1612873957_1, offset: 0, srvID: > DS-1639667928-XXX-50010-1331736284942, blockid: > BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002, duration: > 1286482503 > 2012-03-14 20:24:26,517 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > PacketResponder: > BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002, > type=LAST_IN_PIPELINE, downstreams=0:[] terminating > 2012-03-14 20:24:31,533 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > succeeded for BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report
[ https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5533: Attachment: (was: HDFS-5533.patch) > Symlink delete/create should be treated as DELETE/CREATE in snapshot diff > report > > > Key: HDFS-5533 > URL: https://issues.apache.org/jira/browse/HDFS-5533 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang >Priority: Minor > Attachments: HDFS-5533.patch > > > Currently the original code treat symlink delete/create as modify, but > symlink is immutable, should be CREATE and DELETE -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report
[ https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5533: Attachment: HDFS-5533.patch > Symlink delete/create should be treated as DELETE/CREATE in snapshot diff > report > > > Key: HDFS-5533 > URL: https://issues.apache.org/jira/browse/HDFS-5533 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang >Priority: Minor > Attachments: HDFS-5533.patch > > > Currently the original code treat symlink delete/create as modify, but > symlink is immutable, should be CREATE and DELETE -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report
[ https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5533: Attachment: HDFS-5533.patch > Symlink delete/create should be treated as DELETE/CREATE in snapshot diff > report > > > Key: HDFS-5533 > URL: https://issues.apache.org/jira/browse/HDFS-5533 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang >Priority: Minor > Attachments: HDFS-5533.patch > > > Currently the original code treat symlink delete/create as modify, but > symlink is immutable, should be CREATE and DELETE -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report
[ https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5533: Description: Currently the original code treat symlink delete/create as modify, but symlink is immutable, should be CREATE and DELETE > Symlink delete/create should be treated as DELETE/CREATE in snapshot diff > report > > > Key: HDFS-5533 > URL: https://issues.apache.org/jira/browse/HDFS-5533 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang >Priority: Minor > > Currently the original code treat symlink delete/create as modify, but > symlink is immutable, should be CREATE and DELETE -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report
[ https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5533: Status: Patch Available (was: Open) > Symlink delete/create should be treated as DELETE/CREATE in snapshot diff > report > > > Key: HDFS-5533 > URL: https://issues.apache.org/jira/browse/HDFS-5533 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang >Priority: Minor > > Currently the original code treat symlink delete/create as modify, but > symlink is immutable, should be CREATE and DELETE -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report
[ https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5533: Environment: (was: Currently the original code treat symlink delete/create as modify, but symlink is immutable, should be CREATE and DELETE) > Symlink delete/create should be treated as DELETE/CREATE in snapshot diff > report > > > Key: HDFS-5533 > URL: https://issues.apache.org/jira/browse/HDFS-5533 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report
Binglin Chang created HDFS-5533: --- Summary: Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report Key: HDFS-5533 URL: https://issues.apache.org/jira/browse/HDFS-5533 Project: Hadoop HDFS Issue Type: Bug Environment: Currently the original code treat symlink delete/create as modify, but symlink is immutable, should be CREATE and DELETE Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5484) StorageType and State in DatanodeStorageInfo in NameNode is not accurate
[ https://issues.apache.org/jira/browse/HDFS-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5484: Assignee: (was: Arpit Agarwal) > StorageType and State in DatanodeStorageInfo in NameNode is not accurate > > > Key: HDFS-5484 > URL: https://issues.apache.org/jira/browse/HDFS-5484 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Eric Sirianni > > The fields in DatanodeStorageInfo are updated from two distinct paths: > # block reports > # storage reports (via heartbeats) > The {{state}} and {{storageType}} fields are updated via the Block Report. > However, as seen in the code blow, these fields are populated from a "dummy" > {{DatanodeStorage}} object constructed in the DataNode: > {code} > BPServiceActor.blockReport() { > //... > // Dummy DatanodeStorage object just for sending the block report. > DatanodeStorage dnStorage = new DatanodeStorage(storageID); > //... > } > {code} > The net effect is that the {{state}} and {{storageType}} fields are always > the default of {{NORMAL}} and {{DISK}} in the NameNode. > The recommended fix is to change {{FsDatasetSpi.getBlockReports()}} from: > {code} > public Map getBlockReports(String bpid); > {code} > to: > {code} > public Map getBlockReports(String bpid); > {code} > thereby allowing {{BPServiceActor}} to send the "real" {{DatanodeStorage}} > object with the block report. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-5484) StorageType and State in DatanodeStorageInfo in NameNode is not accurate
[ https://issues.apache.org/jira/browse/HDFS-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal reassigned HDFS-5484: --- Assignee: Arpit Agarwal > StorageType and State in DatanodeStorageInfo in NameNode is not accurate > > > Key: HDFS-5484 > URL: https://issues.apache.org/jira/browse/HDFS-5484 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Eric Sirianni >Assignee: Arpit Agarwal > > The fields in DatanodeStorageInfo are updated from two distinct paths: > # block reports > # storage reports (via heartbeats) > The {{state}} and {{storageType}} fields are updated via the Block Report. > However, as seen in the code blow, these fields are populated from a "dummy" > {{DatanodeStorage}} object constructed in the DataNode: > {code} > BPServiceActor.blockReport() { > //... > // Dummy DatanodeStorage object just for sending the block report. > DatanodeStorage dnStorage = new DatanodeStorage(storageID); > //... > } > {code} > The net effect is that the {{state}} and {{storageType}} fields are always > the default of {{NORMAL}} and {{DISK}} in the NameNode. > The recommended fix is to change {{FsDatasetSpi.getBlockReports()}} from: > {code} > public Map getBlockReports(String bpid); > {code} > to: > {code} > public Map getBlockReports(String bpid); > {code} > thereby allowing {{BPServiceActor}} to send the "real" {{DatanodeStorage}} > object with the block report. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable
[ https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827364#comment-13827364 ] Hadoop QA commented on HDFS-5014: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614777/HDFS-5014-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5497//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5497//console This message is automatically generated. > BPOfferService#processCommandFromActor() synchronization on namenode RPC call > delays IBR to Active NN, if Stanby NN is unstable > --- > > Key: HDFS-5014 > URL: https://issues.apache.org/jira/browse/HDFS-5014 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, > HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014-v2.patch, > HDFS-5014-v2.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch > > > In one of our cluster, following has happened which failed HDFS write. > 1. Standby NN was unstable and continously restarting due to some errors. But > Active NN was stable. > 2. MR Job was writing files. > 3. At some point SNN went down again while datanode processing the REGISTER > command for SNN. > 4. Datanodes started retrying to connect to SNN to register at the following > code in BPServiceActor#retrieveNamespaceInfo() which will be called under > synchronization. > {code} try { > nsInfo = bpNamenode.versionRequest(); > LOG.debug(this + " received versionRequest response: " + nsInfo); > break;{code} > Unfortunately in all datanodes at same point this happened. > 5. For next 7-8 min standby was down, and no blocks were reported to active > NN at this point and writes have failed. > So culprit is {{BPOfferService#processCommandFromActor()}} is completely > synchronized which is not required. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-2832: Attachment: h2832_20131119b.patch > Enable support for heterogeneous storages in HDFS > - > > Key: HDFS-2832 > URL: https://issues.apache.org/jira/browse/HDFS-2832 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, > editsStored, h2832_20131023.patch, h2832_20131023b.patch, > h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, > h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, > h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, > h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, > h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, > h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch > > > HDFS currently supports configuration where storages are a list of > directories. Typically each of these directories correspond to a volume with > its own file system. All these directories are homogeneous and therefore > identified as a single storage at the namenode. I propose, change to the > current model where Datanode * is a * storage, to Datanode * is a collection > * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832
[ https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5527: Attachment: h5527.02.patch This is a DN bug caused by failure to remove an earlier incremental block report entry (on a different storage) when adding a new entry. Attaching a patch with the fix. Will also submit a merge patch with this fix to Jenkins to see what it thinks. > Fix TestUnderReplicatedBlocks on branch HDFS-2832 > - > > Key: HDFS-5527 > URL: https://issues.apache.org/jira/browse/HDFS-5527 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Attachments: HDFS-5527.patch, h5527.02.patch > > > The failure seems like a deadlock, which is show in: > https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5451) add more debugging for cache rescan
[ https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827339#comment-13827339 ] Andrew Wang commented on HDFS-5451: --- Haven't checked the new patch yet, just replying to comments: bq. as per our discussion earlier, I'd rather not create more subclasses. Users should be able to get back a PBCD from listDirectives, modify one thing, and then use it in modifyDirective. Is the subclass that gross here? You could PBCD.Builder(subclass) still to seed the PBCD for modification, and thus hide the new setter/getters. Just thinking about how we might clean up the API. bq. I don't want to do so much copying. This list could have arbitrary length. This can't go in DFSUtil since it depends on the configuration value for maximum blocks to print. Yea, you're right on the copying. Didn't realize that. You could pass the config param in to the function to put this in DFSUtil still; there are other functions like this one there, which is why I suggested it. bq. Let's do that later. It's not really related to the other changes here and we want the stats in soon. Could you file the follow on JIRA then? thanks. > add more debugging for cache rescan > --- > > Key: HDFS-5451 > URL: https://issues.apache.org/jira/browse/HDFS-5451 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5451.001.patch, HDFS-5451.002.patch, > HDFS-5451.003.patch > > > It would be nice to have message at DEBUG level that described all the > decisions we made for cache entries. That way we could turn on this > debugging to get more information. We should also store the number of bytes > each PBCE wanted, and the number of bytes it got, plus the number of inodes > it got, and output those in {{listDirectives}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-3215) Block size is logging as zero Even blockrecevied command received by DN
[ https://issues.apache.org/jira/browse/HDFS-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-3215: - Attachment: HDFS-3215.patch I changed it to treat block size with an argument based on a previous test result. > Block size is logging as zero Even blockrecevied command received by DN > > > Key: HDFS-3215 > URL: https://issues.apache.org/jira/browse/HDFS-3215 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Brahma Reddy Battula >Assignee: Shinichi Yamashita >Priority: Minor > Attachments: HDFS-3215.patch, HDFS-3215.patch > > > Scenario 1 > == > Start NN and DN. > write file. > Block size is logging as zero Even blockrecevied command received by DN > *NN log* > 2012-03-14 20:23:40,541 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: /hadoop-create-user.sh._COPYING_. > BP-1166515020-10.18.40.24-1331736264353 > blk_1264419582929433995_1002{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[XXX:50010|RBW]]} > 2012-03-14 20:24:26,357 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > addStoredBlock: blockMap updated: XXX:50010 is added to > blk_1264419582929433995_1002{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[XXX:50010|RBW]]} > size 0 > *DN log* > 2012-03-14 20:24:17,519 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving block > BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002 src: > /XXX:53141 dest: /XXX:50010 > 2012-03-14 20:24:26,517 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /XXX:53141, dest: /XXX:50010, bytes: 512, op: HDFS_WRITE, cliID: > DFSClient_NONMAPREDUCE_1612873957_1, offset: 0, srvID: > DS-1639667928-XXX-50010-1331736284942, blockid: > BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002, duration: > 1286482503 > 2012-03-14 20:24:26,517 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > PacketResponder: > BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002, > type=LAST_IN_PIPELINE, downstreams=0:[] terminating > 2012-03-14 20:24:31,533 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > succeeded for BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5531) Combine the getNsQuota() and getDsQuota() methods in INode
[ https://issues.apache.org/jira/browse/HDFS-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827299#comment-13827299 ] Vinay commented on HDFS-5531: - Patch looks pretty good Nicholas. One small nit which will fix the test failures I hope ( I verified in local). {code}+ public boolean equals(Object obj) { +if (obj == this) { + return true; +} else if (obj == null || !(obj instanceof EnumCounters)) { + return false; +} +final EnumCounters that = (EnumCounters)obj; +return this.enumConstants == that.enumConstants +&& Arrays.equals(this.counters, that.counters); + }{code} Here {{return this.enumConstants == that.enumConstants}} will always returns false as {{Quota.values()==Quota.values()}} is always false. Should be replaced with {{Arrays.equals(this.enumConstants, that.enumConstants)}}. This will make the test pass. > Combine the getNsQuota() and getDsQuota() methods in INode > -- > > Key: HDFS-5531 > URL: https://issues.apache.org/jira/browse/HDFS-5531 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Attachments: h5531_20131119.patch > > > I suggest to combine these two methods into > {code} > public Quota.Counts getQuotaCounts() > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable
[ https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5014: Attachment: HDFS-5014-v2.patch Attached the findbug fixed patch. Test failure seems unrelated. same test passed in my local. > BPOfferService#processCommandFromActor() synchronization on namenode RPC call > delays IBR to Active NN, if Stanby NN is unstable > --- > > Key: HDFS-5014 > URL: https://issues.apache.org/jira/browse/HDFS-5014 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, > HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014-v2.patch, > HDFS-5014-v2.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch > > > In one of our cluster, following has happened which failed HDFS write. > 1. Standby NN was unstable and continously restarting due to some errors. But > Active NN was stable. > 2. MR Job was writing files. > 3. At some point SNN went down again while datanode processing the REGISTER > command for SNN. > 4. Datanodes started retrying to connect to SNN to register at the following > code in BPServiceActor#retrieveNamespaceInfo() which will be called under > synchronization. > {code} try { > nsInfo = bpNamenode.versionRequest(); > LOG.debug(this + " received versionRequest response: " + nsInfo); > break;{code} > Unfortunately in all datanodes at same point this happened. > 5. For next 7-8 min standby was down, and no blocks were reported to active > NN at this point and writes have failed. > So culprit is {{BPOfferService#processCommandFromActor()}} is completely > synchronized which is not required. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5451) add more debugging for cache rescan
[ https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827271#comment-13827271 ] Hadoop QA commented on HDFS-5451: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614774/HDFS-5451.003.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5496//console This message is automatically generated. > add more debugging for cache rescan > --- > > Key: HDFS-5451 > URL: https://issues.apache.org/jira/browse/HDFS-5451 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5451.001.patch, HDFS-5451.002.patch, > HDFS-5451.003.patch > > > It would be nice to have message at DEBUG level that described all the > decisions we made for cache entries. That way we could turn on this > debugging to get more information. We should also store the number of bytes > each PBCE wanted, and the number of bytes it got, plus the number of inodes > it got, and output those in {{listDirectives}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5531) Combine the getNsQuota() and getDsQuota() methods in INode
[ https://issues.apache.org/jira/browse/HDFS-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827266#comment-13827266 ] Hadoop QA commented on HDFS-5531: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614737/h5531_20131119.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5495//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5495//console This message is automatically generated. > Combine the getNsQuota() and getDsQuota() methods in INode > -- > > Key: HDFS-5531 > URL: https://issues.apache.org/jira/browse/HDFS-5531 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Attachments: h5531_20131119.patch > > > I suggest to combine these two methods into > {code} > public Quota.Counts getQuotaCounts() > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5532) Enable the webhdfs by default to support new HDFS web UI
Vinay created HDFS-5532: --- Summary: Enable the webhdfs by default to support new HDFS web UI Key: HDFS-5532 URL: https://issues.apache.org/jira/browse/HDFS-5532 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinay Assignee: Vinay Recently in HDFS-5444, new HDFS web UI is made as default. but this needs webhdfs to be enabled. WebHDFS is disabled by default. Lets enable it by default to support new really cool web UI. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5451) add more debugging for cache rescan
[ https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5451: --- Attachment: HDFS-5451.003.patch > add more debugging for cache rescan > --- > > Key: HDFS-5451 > URL: https://issues.apache.org/jira/browse/HDFS-5451 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5451.001.patch, HDFS-5451.002.patch, > HDFS-5451.003.patch > > > It would be nice to have message at DEBUG level that described all the > decisions we made for cache entries. That way we could turn on this > debugging to get more information. We should also store the number of bytes > each PBCE wanted, and the number of bytes it got, plus the number of inodes > it got, and output those in {{listDirectives}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5451) add more debugging for cache rescan
[ https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827261#comment-13827261 ] Colin Patrick McCabe commented on HDFS-5451: bq. Can we hide these new PBCD Builder methods from users of DFS? They're meaningless for create/modify, ideally we only see them when doing listing. Seems well suited as a listing subclass? as per our discussion earlier, I'd rather not create more subclasses. Users should be able to get back a PBCD from listDirectives, modify one thing, and then use it in modifyDirective. bq.Extra newline at the end of the Builder ok bq. Seems like having #clear and #increment(long) methods would better suit the new PBCE byte methods. ok bq. I feel like this should be a min then, so the repl 1 PBCE doesn't get charged double Yeah, this was supposed to be min. Good call. bq. BPOS#blockIdArrayToString, could this go in DFSUtil instead? Seems like a better place for it. Also, you can use Guava's Joiner for doing this kind of task, and Arrays.asList and List.subList could get the max behavior. I don't want to do so much copying. This list could have arbitrary length. This can't go in DFSUtil since it depends on the configuration value for maximum blocks to print. bq. Should this extra logging maybe be at DEBUG? It could be a rather large message, even with the 1000 limit. I think seeing what was cached is useful. This will be our only window into what happened in production in a lot of cases. bq. CacheAdmin, the row extension for printStatus is kinda ugly. Maybe use an ArrayList so we can cleanly append? It's kind of annoying, but {{ArrayList}} doesn't actually give you acess to the backing array. Anyway, this array is tiny. Let's just use a List and then call toArray at the end. bq. Good catch on right justifying numeric columns. Mind doing the same for the ID field? ok bq. Tests need to be rebased on trunk, but there's an above comment on the JIRA about verifying uncache/cache races. Let's do that later. It's not really related to the other changes here and we want the stats in soon. bq. Can we also get a test for the code snippet above, where we have multiple things caching the same file with different repls? I added a new test for this area. bq. Test verifying stats for a cached directory as files are added to it? the new test covers this, I think > add more debugging for cache rescan > --- > > Key: HDFS-5451 > URL: https://issues.apache.org/jira/browse/HDFS-5451 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5451.001.patch, HDFS-5451.002.patch, > HDFS-5451.003.patch > > > It would be nice to have message at DEBUG level that described all the > decisions we made for cache entries. That way we could turn on this > debugging to get more information. We should also store the number of bytes > each PBCE wanted, and the number of bytes it got, plus the number of inodes > it got, and output those in {{listDirectives}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827256#comment-13827256 ] Vinay commented on HDFS-5526: - Yes you are right nicholas. We dont have any clue whether we have upgraded or not. One thing is Version file will be overwritten if the layoutVersion is latest or ctime is latest., But that namenode also should be part of the same cluster otherwise upgrade will not happen. One more thing, ./current/VERSION will be overwritten everytime DN restarted after upgrade because of the following check. {code}// do upgrade if (this.layoutVersion > HdfsConstants.LAYOUT_VERSION || this.cTime < nsInfo.getCTime()) { doUpgrade(sd, nsInfo); // upgrade return; }{code} because, ./current/VERSION file ctime is always same. and upgraded NN will have higher ctime. > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5526.patch > > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832
[ https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827255#comment-13827255 ] Junping Du commented on HDFS-5527: -- Yes. I also see this in my local test failure, suspecting it could be some race conditions in block queue of UnderReplicatedBlocks. Still in more investigation. > Fix TestUnderReplicatedBlocks on branch HDFS-2832 > - > > Key: HDFS-5527 > URL: https://issues.apache.org/jira/browse/HDFS-5527 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Attachments: HDFS-5527.patch > > > The failure seems like a deadlock, which is show in: > https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827242#comment-13827242 ] Tsz Wo (Nicholas), SZE commented on HDFS-5526: -- > But during upgrade only clusterId and layoutVersion are overwritten, ctime is > never modified. clusterId and layoutVersion are never going to change > dynamically. right? You are right but we also need to consider some error cases such as connecting a DN to a wrong cluster, moving the storage to another DN, rolling back to a wrong version, upgrading again without rollback, etc. We need to make sure all the error cases will fail. I think it is the hard part. > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5526.patch > > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable
[ https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827239#comment-13827239 ] Hadoop QA commented on HDFS-5014: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614728/HDFS-5014-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5494//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5494//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5494//console This message is automatically generated. > BPOfferService#processCommandFromActor() synchronization on namenode RPC call > delays IBR to Active NN, if Stanby NN is unstable > --- > > Key: HDFS-5014 > URL: https://issues.apache.org/jira/browse/HDFS-5014 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, > HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch > > > In one of our cluster, following has happened which failed HDFS write. > 1. Standby NN was unstable and continously restarting due to some errors. But > Active NN was stable. > 2. MR Job was writing files. > 3. At some point SNN went down again while datanode processing the REGISTER > command for SNN. > 4. Datanodes started retrying to connect to SNN to register at the following > code in BPServiceActor#retrieveNamespaceInfo() which will be called under > synchronization. > {code} try { > nsInfo = bpNamenode.versionRequest(); > LOG.debug(this + " received versionRequest response: " + nsInfo); > break;{code} > Unfortunately in all datanodes at same point this happened. > 5. For next 7-8 min standby was down, and no blocks were reported to active > NN at this point and writes have failed. > So culprit is {{BPOfferService#processCommandFromActor()}} is completely > synchronized which is not required. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5498) Improve datanode startup time
[ https://issues.apache.org/jira/browse/HDFS-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827236#comment-13827236 ] Hadoop QA commented on HDFS-5498: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614733/HDFS-5498.with_du_change.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.TestDFSUpgrade {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5493//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5493//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5493//console This message is automatically generated. > Improve datanode startup time > - > > Key: HDFS-5498 > URL: https://issues.apache.org/jira/browse/HDFS-5498 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5498.with_du_change.patch > > > Similarly to HDFS-5027, an improvement can be made for getVomeMap(). This is > the phase in which ReplicaMap.is populated. But it will be even better if > datanode scans only once and do both. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827227#comment-13827227 ] Vinay commented on HDFS-5526: - bq. For example, the ctime or some ids may have been changed in some unexpected way without being noticed Overwriting the version file in datanode current directory is only during format and upgrade. But during upgrade only clusterId and layoutVersion are overwritten, ctime is never modified. clusterId and layoutVersion are never going to change dynamically. right? {noformat}if (LayoutVersion.supports(Feature.FEDERATION, layoutVersion)) { clusterID = nsInfo.getClusterID(); layoutVersion = nsInfo.getLayoutVersion(); writeProperties(sd); return; }{noformat} Hi [~kihwal], Patch looks really simple. +1 do you think we need to update this comment now..? {code} * Do nothing, if previous directory does not exist.{code} > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5526.patch > > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable
[ https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827222#comment-13827222 ] Uma Maheswara Rao G commented on HDFS-5014: --- I think it can happen here (I guess). {code} DatanodeCommand cmd = blockReport(); processCommand(new DatanodeCommand[]{ cmd }); {code} We can check null here itself. But That's ok I think. Let me check the latest patch. > BPOfferService#processCommandFromActor() synchronization on namenode RPC call > delays IBR to Active NN, if Stanby NN is unstable > --- > > Key: HDFS-5014 > URL: https://issues.apache.org/jira/browse/HDFS-5014 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, > HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch > > > In one of our cluster, following has happened which failed HDFS write. > 1. Standby NN was unstable and continously restarting due to some errors. But > Active NN was stable. > 2. MR Job was writing files. > 3. At some point SNN went down again while datanode processing the REGISTER > command for SNN. > 4. Datanodes started retrying to connect to SNN to register at the following > code in BPServiceActor#retrieveNamespaceInfo() which will be called under > synchronization. > {code} try { > nsInfo = bpNamenode.versionRequest(); > LOG.debug(this + " received versionRequest response: " + nsInfo); > break;{code} > Unfortunately in all datanodes at same point this happened. > 5. For next 7-8 min standby was down, and no blocks were reported to active > NN at this point and writes have failed. > So culprit is {{BPOfferService#processCommandFromActor()}} is completely > synchronized which is not required. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-3987) Support webhdfs over HTTPS
[ https://issues.apache.org/jira/browse/HDFS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827210#comment-13827210 ] Haohui Mai commented on HDFS-3987: -- I tested the patch by running distcp to write to swebhdfs. I ran the test in both secure and insecure clusters. Both set ups worked. I clean up the warning of HttpServer in order to work around a Jenkins' bug. The original patch only touches hadoop-auth and hadoop-hdfs, in which case Jenkins won't build hadoop-common. Therefore, the test TestHdfsNativeCodeLoader cannot find libhadoop.so built in hadoop-common, causing the test to fail. Touching a file in hadoop-common forces the build and works around the problem. > Support webhdfs over HTTPS > -- > > Key: HDFS-3987 > URL: https://issues.apache.org/jira/browse/HDFS-3987 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha >Reporter: Alejandro Abdelnur >Assignee: Haohui Mai > Fix For: 2.3.0 > > Attachments: HDFS-3987.000.patch, HDFS-3987.001.patch, > HDFS-3987.002.patch, HDFS-3987.003.patch, HDFS-3987.004.patch, > HDFS-3987.005.patch, HDFS-3987.006.patch, HDFS-3987.007.patch, > HDFS-3987.008.patch, HDFS-3987.009.patch > > > This is a follow up of HDFS-3983. > We should have a new filesystem client impl/binding for encrypted WebHDFS, > i.e. *webhdfss://* > On the server side, webhdfs and httpfs we should only need to start the > service on a secured (HTTPS) endpoint. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5511) improve CacheManipulator interface to allow better unit testing
[ https://issues.apache.org/jira/browse/HDFS-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827191#comment-13827191 ] Hudson commented on HDFS-5511: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4764 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4764/]) HDFS-5511. improve CacheManipulator interface to allow better unit testing (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543676) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/ReadaheadPool.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeConfig.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestCachingStrategy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestPathBasedCacheRequests.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java > improve CacheManipulator interface to allow better unit testing > --- > > Key: HDFS-5511 > URL: https://issues.apache.org/jira/browse/HDFS-5511 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5511.001.patch, HDFS-5511.002.patch > > > The CacheManipulator interface has been helpful in allowing us to stub out > {{mlock}} in cases where we don't want to test it. We should move the > {{getMemlockLimit}} and {{getOperatingSystemPageSize}} functions into this > interface as well so that we don't have to skip these tests on machines where > these methods would ordinarily not work for us. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-3987) Support webhdfs over HTTPS
[ https://issues.apache.org/jira/browse/HDFS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827188#comment-13827188 ] Jing Zhao commented on HDFS-3987: - The new patch looks good to me. Still, please mention how you did system tests to verify the patch. And also please mention why you need to add modification to HttpServer.java. +1 for the latest patch. [~tucu00], do you still have further comments? I will commit the patch tomorrow in case there is no more comment. > Support webhdfs over HTTPS > -- > > Key: HDFS-3987 > URL: https://issues.apache.org/jira/browse/HDFS-3987 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha >Reporter: Alejandro Abdelnur >Assignee: Haohui Mai > Fix For: 2.3.0 > > Attachments: HDFS-3987.000.patch, HDFS-3987.001.patch, > HDFS-3987.002.patch, HDFS-3987.003.patch, HDFS-3987.004.patch, > HDFS-3987.005.patch, HDFS-3987.006.patch, HDFS-3987.007.patch, > HDFS-3987.008.patch, HDFS-3987.009.patch > > > This is a follow up of HDFS-3983. > We should have a new filesystem client impl/binding for encrypted WebHDFS, > i.e. *webhdfss://* > On the server side, webhdfs and httpfs we should only need to start the > service on a secured (HTTPS) endpoint. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5511) improve CacheManipulator interface to allow better unit testing
[ https://issues.apache.org/jira/browse/HDFS-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5511: --- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > improve CacheManipulator interface to allow better unit testing > --- > > Key: HDFS-5511 > URL: https://issues.apache.org/jira/browse/HDFS-5511 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5511.001.patch, HDFS-5511.002.patch > > > The CacheManipulator interface has been helpful in allowing us to stub out > {{mlock}} in cases where we don't want to test it. We should move the > {{getMemlockLimit}} and {{getOperatingSystemPageSize}} functions into this > interface as well so that we don't have to skip these tests on machines where > these methods would ordinarily not work for us. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5513) CacheAdmin commands fail when using . as the path
[ https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827182#comment-13827182 ] Hudson commented on HDFS-5513: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4763 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4763/]) HDFS-5513. CacheAdmin commands fail when using . as the path. Contributed by Andrew Wang. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543670) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/PathBasedCacheDirective.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestPathBasedCacheRequests.java > CacheAdmin commands fail when using . as the path > - > > Key: HDFS-5513 > URL: https://issues.apache.org/jira/browse/HDFS-5513 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, tools >Affects Versions: 3.0.0 >Reporter: Stephen Chu >Assignee: Andrew Wang >Priority: Minor > Fix For: 3.0.0 > > Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch, hdfs-5513-3.patch > > > The hdfs CLI commands generally accept "." as a path argument. > e.g. > {code} > hdfs dfs -rm . > hdfs dfsadmin -allowSnapshot . > {code} > I don't think it's very common to use the path "." but the CacheAdmin > commands will fail saying that it cannot create a Path from an empty string. > {code} > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path . > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5513) CacheAdmin commands fail when using . as the path
[ https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5513: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks again Colin for reviews. > CacheAdmin commands fail when using . as the path > - > > Key: HDFS-5513 > URL: https://issues.apache.org/jira/browse/HDFS-5513 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, tools >Affects Versions: 3.0.0 >Reporter: Stephen Chu >Assignee: Andrew Wang >Priority: Minor > Fix For: 3.0.0 > > Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch, hdfs-5513-3.patch > > > The hdfs CLI commands generally accept "." as a path argument. > e.g. > {code} > hdfs dfs -rm . > hdfs dfsadmin -allowSnapshot . > {code} > I don't think it's very common to use the path "." but the CacheAdmin > commands will fail saying that it cannot create a Path from an empty string. > {code} > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path . > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5513) CacheAdmin commands fail when using . as the path
[ https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827176#comment-13827176 ] Andrew Wang commented on HDFS-5513: --- With Jenkins clean, will commit this shortly based on Colin's earlier +1. Thanks again for the review. > CacheAdmin commands fail when using . as the path > - > > Key: HDFS-5513 > URL: https://issues.apache.org/jira/browse/HDFS-5513 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, tools >Affects Versions: 3.0.0 >Reporter: Stephen Chu >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch, hdfs-5513-3.patch > > > The hdfs CLI commands generally accept "." as a path argument. > e.g. > {code} > hdfs dfs -rm . > hdfs dfsadmin -allowSnapshot . > {code} > I don't think it's very common to use the path "." but the CacheAdmin > commands will fail saying that it cannot create a Path from an empty string. > {code} > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path . > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5513) CacheAdmin commands fail when using . as the path
[ https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827168#comment-13827168 ] Hadoop QA commented on HDFS-5513: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614695/hdfs-5513-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5492//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5492//console This message is automatically generated. > CacheAdmin commands fail when using . as the path > - > > Key: HDFS-5513 > URL: https://issues.apache.org/jira/browse/HDFS-5513 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, tools >Affects Versions: 3.0.0 >Reporter: Stephen Chu >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch, hdfs-5513-3.patch > > > The hdfs CLI commands generally accept "." as a path argument. > e.g. > {code} > hdfs dfs -rm . > hdfs dfsadmin -allowSnapshot . > {code} > I don't think it's very common to use the path "." but the CacheAdmin > commands will fail saying that it cannot create a Path from an empty string. > {code} > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path . > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832
[ https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827163#comment-13827163 ] Arpit Agarwal commented on HDFS-5527: - Looks like an NN bug. From the [Jenkins logs|https://builds.apache.org/job/PreCommit-HDFS-Build/5488//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/]. {code} 2013-11-19 19:09:03,406 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1366)) - BLOCK* ask 127.0.0.1:59892 to replicate blk_1073741825_1001 to datanode(s) 127.0.0.1:38708 127.0.0.1:50461 {code} And then again: {code} 2013-11-19 19:09:06,407 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1366)) - BLOCK* ask 127.0.0.1:50461 to replicate blk_1073741825_1001 to datanode(s) 127.0.0.1:38708 {code} > Fix TestUnderReplicatedBlocks on branch HDFS-2832 > - > > Key: HDFS-5527 > URL: https://issues.apache.org/jira/browse/HDFS-5527 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Attachments: HDFS-5527.patch > > > The failure seems like a deadlock, which is show in: > https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827156#comment-13827156 ] Tsz Wo (Nicholas), SZE commented on HDFS-5526: -- I like the simplicity of the patch -- it only changes the rollback code but not upgrade. However, the VERSION file is overwritten but not restored during rollback. I worry if it is possible that the new VERSION file is different from the original VERSION file. For example, the ctime or some ids may have been changed in some unexpected way without being noticed. How can we make sure the new and the original VERSION files are the same? > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5526.patch > > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832
[ https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827149#comment-13827149 ] Junping Du commented on HDFS-5527: -- You mean reproduce the failure? Yes. It failed intermittent so I am misunderstanding the patch attached can fix it. I can put failed log here if you think it is helpful. > Fix TestUnderReplicatedBlocks on branch HDFS-2832 > - > > Key: HDFS-5527 > URL: https://issues.apache.org/jira/browse/HDFS-5527 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Attachments: HDFS-5527.patch > > > The failure seems like a deadlock, which is show in: > https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5531) Combine the getNsQuota() and getDsQuota() methods in INode
[ https://issues.apache.org/jira/browse/HDFS-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5531: - Status: Patch Available (was: Open) > Combine the getNsQuota() and getDsQuota() methods in INode > -- > > Key: HDFS-5531 > URL: https://issues.apache.org/jira/browse/HDFS-5531 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Attachments: h5531_20131119.patch > > > I suggest to combine these two methods into > {code} > public Quota.Counts getQuotaCounts() > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5531) Combine the getNsQuota() and getDsQuota() methods in INode
[ https://issues.apache.org/jira/browse/HDFS-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5531: - Attachment: h5531_20131119.patch h5531_20131119.patch: 1st patch. > Combine the getNsQuota() and getDsQuota() methods in INode > -- > > Key: HDFS-5531 > URL: https://issues.apache.org/jira/browse/HDFS-5531 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Attachments: h5531_20131119.patch > > > I suggest to combine these two methods into > {code} > public Quota.Counts getQuotaCounts() > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5531) Combine the getNsQuota() and getDsQuota() methods in INode
Tsz Wo (Nicholas), SZE created HDFS-5531: Summary: Combine the getNsQuota() and getDsQuota() methods in INode Key: HDFS-5531 URL: https://issues.apache.org/jira/browse/HDFS-5531 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5531_20131119.patch I suggest to combine these two methods into {code} public Quota.Counts getQuotaCounts() {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5498) Improve datanode startup time
[ https://issues.apache.org/jira/browse/HDFS-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5498: - Attachment: HDFS-5498.with_du_change.patch Attaching a patch that includes HADOOP-10111. This is not a commit candidate, but for reviews and testing. > Improve datanode startup time > - > > Key: HDFS-5498 > URL: https://issues.apache.org/jira/browse/HDFS-5498 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kihwal Lee > Attachments: HDFS-5498.with_du_change.patch > > > Similarly to HDFS-5027, an improvement can be made for getVomeMap(). This is > the phase in which ReplicaMap.is populated. But it will be even better if > datanode scans only once and do both. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5498) Improve datanode startup time
[ https://issues.apache.org/jira/browse/HDFS-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5498: - Assignee: Kihwal Lee Status: Patch Available (was: Open) > Improve datanode startup time > - > > Key: HDFS-5498 > URL: https://issues.apache.org/jira/browse/HDFS-5498 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5498.with_du_change.patch > > > Similarly to HDFS-5027, an improvement can be made for getVomeMap(). This is > the phase in which ReplicaMap.is populated. But it will be even better if > datanode scans only once and do both. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5526: - Status: Open (was: Patch Available) > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5526.patch > > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5473) Consistent naming of user-visible caching classes and methods
[ https://issues.apache.org/jira/browse/HDFS-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827088#comment-13827088 ] Andrew Wang commented on HDFS-5473: --- I'll also note that we discussed adding the cache commands to the public HdfsAdmin class for easier accessibility. Would be a nice change to get in here too, or in a follow-on. > Consistent naming of user-visible caching classes and methods > - > > Key: HDFS-5473 > URL: https://issues.apache.org/jira/browse/HDFS-5473 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Colin Patrick McCabe > > It's kind of warty that (after HDFS-5326 goes in) DistributedFileSystem has > {{*CachePool}} methods take a {{CachePoolInfo}} and > {{*PathBasedCacheDirective}} methods that thake a > {{PathBasedCacheDirective}}. We should consider renaming {{CachePoolInfo}} to > {{CachePool}} for consistency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-1386) TestJMXGet fails in jdk7
[ https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827086#comment-13827086 ] Hudson commented on HDFS-1386: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4762 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4762/]) HDFS-1386. TestJMXGet fails in jdk7 (jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543612) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestJMXGet.java > TestJMXGet fails in jdk7 > > > Key: HDFS-1386 > URL: https://issues.apache.org/jira/browse/HDFS-1386 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode, test >Affects Versions: 0.22.0 >Reporter: Tanping Wang >Assignee: Jonathan Eagles >Priority: Blocker > Labels: java7 > Attachments: HDFS-1386.patch, HDFS-1386.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever
[ https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827089#comment-13827089 ] Vinay commented on HDFS-4516: - Thanks Uma and Nicholas > Client crash after block allocation and NN switch before lease recovery for > the same file can cause readers to fail forever > --- > > Key: HDFS-4516 > URL: https://issues.apache.org/jira/browse/HDFS-4516 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Uma Maheswara Rao G >Assignee: Vinay >Priority: Critical > Attachments: HDFS-4516-Test.patch, HDFS-4516.patch, HDFS-4516.patch, > HDFS-4516.patch, HDFS-4516.txt > > > If client crashes just after allocating block( blocks not yet created in DNs) > and NN also switched after this, then new Namenode will not know about locs. > Further details will be in comment. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable
[ https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5014: Attachment: HDFS-5014-v2.patch Thanks Uma for finding out the failure reason. Its strange that cmd is null. Need to check. Here is the updated patch to check for null for cmd before using it. > BPOfferService#processCommandFromActor() synchronization on namenode RPC call > delays IBR to Active NN, if Stanby NN is unstable > --- > > Key: HDFS-5014 > URL: https://issues.apache.org/jira/browse/HDFS-5014 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, > HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch > > > In one of our cluster, following has happened which failed HDFS write. > 1. Standby NN was unstable and continously restarting due to some errors. But > Active NN was stable. > 2. MR Job was writing files. > 3. At some point SNN went down again while datanode processing the REGISTER > command for SNN. > 4. Datanodes started retrying to connect to SNN to register at the following > code in BPServiceActor#retrieveNamespaceInfo() which will be called under > synchronization. > {code} try { > nsInfo = bpNamenode.versionRequest(); > LOG.debug(this + " received versionRequest response: " + nsInfo); > break;{code} > Unfortunately in all datanodes at same point this happened. > 5. For next 7-8 min standby was down, and no blocks were reported to active > NN at this point and writes have failed. > So culprit is {{BPOfferService#processCommandFromActor()}} is completely > synchronized which is not required. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-5473) Consistent naming of user-visible caching classes and methods
[ https://issues.apache.org/jira/browse/HDFS-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang reassigned HDFS-5473: - Assignee: Colin Patrick McCabe (was: Andrew Wang) > Consistent naming of user-visible caching classes and methods > - > > Key: HDFS-5473 > URL: https://issues.apache.org/jira/browse/HDFS-5473 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Colin Patrick McCabe > > It's kind of warty that (after HDFS-5326 goes in) DistributedFileSystem has > {{*CachePool}} methods take a {{CachePoolInfo}} and > {{*PathBasedCacheDirective}} methods that thake a > {{PathBasedCacheDirective}}. We should consider renaming {{CachePoolInfo}} to > {{CachePool}} for consistency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5473) Consistent naming of user-visible caching classes and methods
[ https://issues.apache.org/jira/browse/HDFS-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827079#comment-13827079 ] Andrew Wang commented on HDFS-5473: --- +1 for the proposal from me, thanks Colin. I like the Eclipse refactoring tools a lot for this, but sed probably works too. We're going to have a bunch of javadoc/variable names to update too, but I figure we can just do that as best we can and fix as we see it in future JIRAs. > Consistent naming of user-visible caching classes and methods > - > > Key: HDFS-5473 > URL: https://issues.apache.org/jira/browse/HDFS-5473 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > > It's kind of warty that (after HDFS-5326 goes in) DistributedFileSystem has > {{*CachePool}} methods take a {{CachePoolInfo}} and > {{*PathBasedCacheDirective}} methods that thake a > {{PathBasedCacheDirective}}. We should consider renaming {{CachePoolInfo}} to > {{CachePool}} for consistency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-3987) Support webhdfs over HTTPS
[ https://issues.apache.org/jira/browse/HDFS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827077#comment-13827077 ] Hadoop QA commented on HDFS-3987: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614672/HDFS-3987.009.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-auth hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5490//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5490//console This message is automatically generated. > Support webhdfs over HTTPS > -- > > Key: HDFS-3987 > URL: https://issues.apache.org/jira/browse/HDFS-3987 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha >Reporter: Alejandro Abdelnur >Assignee: Haohui Mai > Fix For: 2.3.0 > > Attachments: HDFS-3987.000.patch, HDFS-3987.001.patch, > HDFS-3987.002.patch, HDFS-3987.003.patch, HDFS-3987.004.patch, > HDFS-3987.005.patch, HDFS-3987.006.patch, HDFS-3987.007.patch, > HDFS-3987.008.patch, HDFS-3987.009.patch > > > This is a follow up of HDFS-3983. > We should have a new filesystem client impl/binding for encrypted WebHDFS, > i.e. *webhdfss://* > On the server side, webhdfs and httpfs we should only need to start the > service on a secured (HTTPS) endpoint. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827072#comment-13827072 ] Hadoop QA commented on HDFS-5526: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614686/HDFS-5526.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5491//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5491//console This message is automatically generated. > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5526.patch > > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5451) add more debugging for cache rescan
[ https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827070#comment-13827070 ] Andrew Wang commented on HDFS-5451: --- Thanks Colin, this is going to be really useful for end users. Some review comments: * Can we hide these new PBCD Builder methods from users of DFS? They're meaningless for create/modify, ideally we only see them when doing listing. Seems well suited as a listing subclass? * Extra newline at the end of the Builder * Seems like having {{#clear}} and {{#increment(long)}} methods would better suit the new PBCE byte methods. {code} List cachedOn = ocblock.getDatanodes(Type.CACHED); long cachedByBlock = Math.max(cachedOn.size(), pce.getReplication()) * blockInfo.getNumBytes(); cachedTotal += cachedByBlock; {code} I'm guessing the max here is for when we have e.g. PBCEs with repl 1 and 2 caching the same block? I feel like this should be a min then, so the repl 1 PBCE doesn't get charged double. Comment would be nice. * {{BPOS#blockIdArrayToString}}, could this go in DFSUtil instead? Seems like a better place for it. Also, you can use Guava's Joiner for doing this kind of task, and Arrays.asList and List.subList could get the max behavior. * Should this extra logging maybe be at DEBUG? It could be a rather large message, even with the 1000 limit. * CacheAdmin, the row extension for printStatus is kinda ugly. Maybe use an ArrayList so we can cleanly append? * Good catch on right justifying numeric columns. Mind doing the same for the ID field? * Tests need to be rebased on trunk, but there's an above comment on the JIRA about verifying uncache/cache races. * Can we also get a test for the code snippet above, where we have multiple things caching the same file with different repls? * Test verifying stats for a cached directory as files are added to it? > add more debugging for cache rescan > --- > > Key: HDFS-5451 > URL: https://issues.apache.org/jira/browse/HDFS-5451 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5451.001.patch, HDFS-5451.002.patch > > > It would be nice to have message at DEBUG level that described all the > decisions we made for cache entries. That way we could turn on this > debugging to get more information. We should also store the number of bytes > each PBCE wanted, and the number of bytes it got, plus the number of inodes > it got, and output those in {{listDirectives}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827055#comment-13827055 ] Kihwal Lee commented on HDFS-5526: -- bq. Would storageID and cTime be preserved? I think so. The slight difficulty is at loading current/VERSION without blowing up. After reading in, it needs to override a couple of fields and call writeProperties(). bq. BTW, do you know why cTime=0 in my test case above? DataStorage's cTime is set to 0 when the node is formatted, but that of BlockPoolSliceStorage is supposed to be set to the one from nsInfo. So my guess is, when NNStorage is formatted cTime is 0. NNStorage.newNamespaceInfo() is setting it to 0 and this must be used for formatting. > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5526.patch > > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-1386) TestJMXGet fails in jdk7
[ https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827053#comment-13827053 ] Jonathan Eagles commented on HDFS-1386: --- HDFS-5530 and YARN-1426 were filed. Thanks again, Kihwal. > TestJMXGet fails in jdk7 > > > Key: HDFS-1386 > URL: https://issues.apache.org/jira/browse/HDFS-1386 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode, test >Affects Versions: 0.22.0 >Reporter: Tanping Wang >Assignee: Jonathan Eagles >Priority: Blocker > Labels: java7 > Attachments: HDFS-1386.patch, HDFS-1386.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5530) HDFS Components are unable to unregister from DefaultMetricsSystem
Jonathan Eagles created HDFS-5530: - Summary: HDFS Components are unable to unregister from DefaultMetricsSystem Key: HDFS-5530 URL: https://issues.apache.org/jira/browse/HDFS-5530 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-1386) TestJMXGet fails in jdk7
[ https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated HDFS-1386: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks Kihwal for the review. I will file the two JIRAs and post them here. > TestJMXGet fails in jdk7 > > > Key: HDFS-1386 > URL: https://issues.apache.org/jira/browse/HDFS-1386 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode, test >Affects Versions: 0.22.0 >Reporter: Tanping Wang >Assignee: Jonathan Eagles >Priority: Blocker > Labels: java7 > Attachments: HDFS-1386.patch, HDFS-1386.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-1386) TestJMXGet fails in jdk7
[ https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated HDFS-1386: -- Target Version/s: 3.0.0, 2.3.0 (was: 3.0.0, 2.3.0, 0.23.10) > TestJMXGet fails in jdk7 > > > Key: HDFS-1386 > URL: https://issues.apache.org/jira/browse/HDFS-1386 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode, test >Affects Versions: 0.22.0 >Reporter: Tanping Wang >Assignee: Jonathan Eagles >Priority: Blocker > Labels: java7 > Attachments: HDFS-1386.patch, HDFS-1386.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5525) Inline dust templates
[ https://issues.apache.org/jira/browse/HDFS-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827007#comment-13827007 ] Hadoop QA commented on HDFS-5525: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614569/HDFS-5525.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5489//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5489//console This message is automatically generated. > Inline dust templates > - > > Key: HDFS-5525 > URL: https://issues.apache.org/jira/browse/HDFS-5525 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5525.000.patch, HDFS-5525.000.patch, screenshot.png > > > Currently the dust templates are stored as separate files on the server side. > The web UI has to make separate HTTP requests to load the templates, which > increases the network overheads and page load latency. > This jira proposes to inline all dust templates with the main HTML file, so > that the page can be loaded faster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5194) Robust support for alternate FsDatasetSpi implementations
[ https://issues.apache.org/jira/browse/HDFS-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826996#comment-13826996 ] Eli Collins commented on HDFS-5194: --- Notes from the call: - Attendees: Dave Powell, Eric Sirianni, Andrew Wang, Eli Collins - Scope here is non-file storage for DataNode storage, specifically a subset of DataNode storage for storing HDFS blocks given that parts of the data directory (eg MD) are managed via DataStorage which is not covered here. We could make DataStorage pluggable in the future as well indepdent of this, would probably require shuffling functionality that plugins would want to share outside DataStorage. - FsDatasetSpi is currently private, we need to come up with an API (for the Spi and the classes it returns) that could be declared stable so that users would not have to maintain different plugins for subsequent 2.x releases. - Would help to have a dummy plugin to help articulate what interfaces are public and catch API and semantic breakages. Also a potential place for plugin authors to share code. Maintaining a functional dummy plugin is expensive so might make more sense to start with something that's compile only. - Currently there is functionality in the FsDataset implementations that could be shared across plugins that could be moved outside and would decrease the effort required to plug out FsDataset and make it easier to maintain semantic compatibility. - Pluggability is currently DataNode wide, it might make sense to have the ability to specify the plugin on a per-volume basis for example due to wanting different plugins for different types of storage (HDFS-2832). - Should look into replacing standard java IO classes with Hadoop specific classes in the relevant FsDataSet APIs since they have baked in assumptions around file-based storage and interface baggage - Next step is to breakdown the HDFS-5194 proposal into sub-tasks and hash out each patch individually. Perhaps create a feature branch if there are sufficiently many patches that need to stay out of trunk. > Robust support for alternate FsDatasetSpi implementations > - > > Key: HDFS-5194 > URL: https://issues.apache.org/jira/browse/HDFS-5194 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs-client >Reporter: David Powell >Priority: Minor > Attachments: HDFS-5194.design.09112013.pdf, HDFS-5194.patch.09112013 > > > The existing FsDatasetSpi interface is well-positioned to permit extending > Hadoop to run natively on non-traditional storage architectures. Before this > can be done, however, a number of gaps need to be addressed. This JIRA > documents those gaps, suggests some solutions, and puts forth a sample > implementation of some of the key changes needed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5513) CacheAdmin commands fail when using . as the path
[ https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5513: -- Attachment: hdfs-5513-3.patch Thanks Colin. I combined the test into one of the other ones in the file. Let's see what Jenkins thinks. > CacheAdmin commands fail when using . as the path > - > > Key: HDFS-5513 > URL: https://issues.apache.org/jira/browse/HDFS-5513 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, tools >Affects Versions: 3.0.0 >Reporter: Stephen Chu >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch, hdfs-5513-3.patch > > > The hdfs CLI commands generally accept "." as a path argument. > e.g. > {code} > hdfs dfs -rm . > hdfs dfsadmin -allowSnapshot . > {code} > I don't think it's very common to use the path "." but the CacheAdmin > commands will fail saying that it cannot create a Path from an empty string. > {code} > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path . > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826977#comment-13826977 ] Tsz Wo (Nicholas), SZE commented on HDFS-5526: -- Would storageID and cTime be preserved? BTW, do you know why cTime=0 in my test case above? > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5526.patch > > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5526: - Assignee: Kihwal Lee Status: Patch Available (was: Open) > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-5526.patch > > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5526: - Attachment: HDFS-5526.patch Simply doing what I suggested breaks TestDFSRollback. The existence of previous directory needs to be checked first. I am attaching a candidate patch. > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Priority: Blocker > Attachments: HDFS-5526.patch > > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5511) improve CacheManipulator interface to allow better unit testing
[ https://issues.apache.org/jira/browse/HDFS-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826903#comment-13826903 ] Colin Patrick McCabe commented on HDFS-5511: Findbugs warnings are HADOOP-10116, not related to this change. Will commit shortly based on Andrew's +1. > improve CacheManipulator interface to allow better unit testing > --- > > Key: HDFS-5511 > URL: https://issues.apache.org/jira/browse/HDFS-5511 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5511.001.patch, HDFS-5511.002.patch > > > The CacheManipulator interface has been helpful in allowing us to stub out > {{mlock}} in cases where we don't want to test it. We should move the > {{getMemlockLimit}} and {{getOperatingSystemPageSize}} functions into this > interface as well so that we don't have to skip these tests on machines where > these methods would ordinarily not work for us. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever
[ https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826874#comment-13826874 ] Uma Maheswara Rao G commented on HDFS-4516: --- Thanks Nicholas for the review! I will commit it shortly. > Client crash after block allocation and NN switch before lease recovery for > the same file can cause readers to fail forever > --- > > Key: HDFS-4516 > URL: https://issues.apache.org/jira/browse/HDFS-4516 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Uma Maheswara Rao G >Assignee: Vinay >Priority: Critical > Attachments: HDFS-4516-Test.patch, HDFS-4516.patch, HDFS-4516.patch, > HDFS-4516.patch, HDFS-4516.txt > > > If client crashes just after allocating block( blocks not yet created in DNs) > and NN also switched after this, then new Namenode will not know about locs. > Further details will be in comment. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826875#comment-13826875 ] Hadoop QA commented on HDFS-2832: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614660/h2832_20131119.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 45 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5488//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5488//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5488//console This message is automatically generated. > Enable support for heterogeneous storages in HDFS > - > > Key: HDFS-2832 > URL: https://issues.apache.org/jira/browse/HDFS-2832 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, > editsStored, h2832_20131023.patch, h2832_20131023b.patch, > h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, > h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, > h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, > h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, > h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, > h2832_20131118.patch, h2832_20131119.patch > > > HDFS currently supports configuration where storages are a list of > directories. Typically each of these directories correspond to a volume with > its own file system. All these directories are homogeneous and therefore > identified as a single storage at the namenode. I propose, change to the > current model where Datanode * is a * storage, to Datanode * is a collection > * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever
[ https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-4516: -- Target Version/s: 3.0.0, 2.3.0, 2.2.1 (was: 3.0.0, 2.1.0-beta) > Client crash after block allocation and NN switch before lease recovery for > the same file can cause readers to fail forever > --- > > Key: HDFS-4516 > URL: https://issues.apache.org/jira/browse/HDFS-4516 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Uma Maheswara Rao G >Assignee: Vinay >Priority: Critical > Attachments: HDFS-4516-Test.patch, HDFS-4516.patch, HDFS-4516.patch, > HDFS-4516.patch, HDFS-4516.txt > > > If client crashes just after allocating block( blocks not yet created in DNs) > and NN also switched after this, then new Namenode will not know about locs. > Further details will be in comment. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5474) Deletesnapshot can make Namenode in safemode on NN restarts.
[ https://issues.apache.org/jira/browse/HDFS-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5474: Priority: Blocker (was: Major) Target Version/s: 2.2.1 > Deletesnapshot can make Namenode in safemode on NN restarts. > > > Key: HDFS-5474 > URL: https://issues.apache.org/jira/browse/HDFS-5474 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Uma Maheswara Rao G >Assignee: sathish >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5474-001.patch, HDFS-5474-002.patch > > > When we deletesnapshot, we are deleting the blocks associated to that > snapshot and after that we do logsync to editlog about deleteSnapshot. > There can be a chance that blocks removed from blocks map but before log sync > if there is BR , NN may finds that block does not exist in blocks map and > may invalidate that block. As part HB, invalidation info also can go. After > this steps if Namenode shutdown before actually do logsync, On restart it > will still consider that snapshot Inodes and expect blocks to report from DN. > Simple solution is, we should simply move down that blocks removal after > logsync only. Similar to delete op. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5504: Priority: Blocker (was: Major) Target Version/s: 2.2.1 > In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, > leads to NN safemode. > > > Key: HDFS-5504 > URL: https://issues.apache.org/jira/browse/HDFS-5504 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0, 2.2.0 >Reporter: Vinay >Assignee: Vinay >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5504.patch, HDFS-5504.patch > > > 1. HA installation, standby NN is down. > 2. delete snapshot is called and it has deleted the blocks from blocksmap and > all datanodes. log sync also happened. > 3. before next log roll NN crashed > 4. When the namenode restartes then it will fsimage and finalized edits from > shared storage and set the safemode threshold. which includes blocks from > deleted snapshot also. (because this edits is not yet read as namenode is > restarted before the last edits segment is not finalized) > 5. When it becomes active, it finalizes the edits and read the delete > snapshot edits_op. but at this time, it was not reducing the safemode count. > and it will continuing in safemode. > 6. On next restart, as the edits is already finalized, on startup only it > will read and set the safemode threshold correctly. > But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5428: Priority: Blocker (was: Major) Target Version/s: 2.2.1 > under construction files deletion after snapshot+checkpoint+nn restart leads > nn safemode > > > Key: HDFS-5428 > URL: https://issues.apache.org/jira/browse/HDFS-5428 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0, 2.2.0 >Reporter: Vinay >Assignee: Jing Zhao >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, > HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, > HDFS-5428.004.patch, HDFS-5428.patch > > > 1. allow snapshots under dir /foo > 2. create a file /foo/test/bar and start writing to it > 3. create a snapshot s1 under /foo after block is allocated and some data has > been written to it > 4. Delete the directory /foo/test > 5. wait till checkpoint or do saveNameSpace > 6. restart NN. > NN enters to safemode. > Analysis: > Snapshot nodes loaded from fsimage are always complete and all blocks will be > in COMPLETE state. > So when the Datanode reports RBW blocks those will not be updated in > blocksmap. > Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5443: Priority: Blocker (was: Major) Target Version/s: 2.2.1 (was: 3.0.0, 2.3.0) > Delete 0-sized block when deleting an under-construction file that is > included in snapshot > -- > > Key: HDFS-5443 > URL: https://issues.apache.org/jira/browse/HDFS-5443 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0, 2.2.0 >Reporter: Uma Maheswara Rao G >Assignee: Jing Zhao >Priority: Blocker > Fix For: 2.3.0 > > Attachments: 5443-test.patch, HDFS-5443.000.patch > > > Namenode can stuck in safemode on restart if it crashes just after addblock > logsync and after taking snapshot for such file. This issue is reported by > Prakash and Sathish. > On looking into the issue following things are happening. > . > 1) Client added block at NN and just did logsync >So, NN has block ID persisted. > 2)Before returning addblock response to client take a snapshot for root or > parent directories for that file > 3) Delete parent directory for that file > 4) Now crash the NN with out responding success to client for that addBlock > call > Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion
[ https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5476: Priority: Blocker (was: Major) Target Version/s: 2.2.1 > Snapshot: clean the blocks/files/directories under a renamed file/directory > while deletion > -- > > Key: HDFS-5476 > URL: https://issues.apache.org/jira/browse/HDFS-5476 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5476.001.patch > > > Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree > under the DstReference node for file/directory/snapshot deletion. > Use case 1: > # rename under-construction file with 0-sized blocks after snapshot. > # delete the renamed directory. > We need to make sure we delete the 0-sized block. > Use case 2: > # create snapshot s0 for / > # create a new file under /foo/bar/ > # rename foo --> foo2 > # create snapshot s1 > # delete bar and foo2 > # delete snapshot s1 > We need to make sure we delete the file under /foo/bar since it is not > included in snapshot s0. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5425) Renaming underconstruction file with snapshots can make NN failure on restart
[ https://issues.apache.org/jira/browse/HDFS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5425: Priority: Blocker (was: Major) Target Version/s: 2.2.1 > Renaming underconstruction file with snapshots can make NN failure on restart > - > > Key: HDFS-5425 > URL: https://issues.apache.org/jira/browse/HDFS-5425 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, snapshots >Affects Versions: 3.0.0, 2.2.0 >Reporter: sathish >Assignee: Jing Zhao >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5425.001.patch, HDFS-5425.patch, HDFS-5425.patch, > HDFS-5425.patch > > > I faced this When i am doing some snapshot operations like > createSnapshot,renameSnapshot,i restarted my NN,it is shutting down with > exception, > 2013-10-24 21:07:03,040 FATAL > org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join > java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:133) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.replace(INodeDirectoryWithSnapshot.java:82) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.access$700(INodeDirectoryWithSnapshot.java:62) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.replaceChild(INodeDirectoryWithSnapshot.java:397) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.access$900(INodeDirectoryWithSnapshot.java:376) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot.replaceChild(INodeDirectoryWithSnapshot.java:598) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedReplaceINodeFile(FSDirectory.java:1548) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.replaceINodeFile(FSDirectory.java:1537) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:855) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:910) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:751) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:720) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:266) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:784) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:563) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:422) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:472) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:670) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:655) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1245) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1311) > 2013-10-24 21:07:03,050 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > 2013-10-24 21:07:03,052 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > SHUTDOWN_MSG: -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart
[ https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5427: Target Version/s: 2.2.1 > not able to read deleted files from snapshot directly under snapshottable dir > after checkpoint and NN restart > - > > Key: HDFS-5427 > URL: https://issues.apache.org/jira/browse/HDFS-5427 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0, 2.2.0 >Reporter: Vinay >Assignee: Vinay >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch > > > 1. allow snapshots under dir /foo > 2. create a file /foo/bar > 3. create a snapshot s1 under /foo > 4. delete the file /foo/bar > 5. wait till checkpoint or do saveNameSpace > 6. restart NN. > 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar > client will get BlockMissingException > Reason is > While loading the deleted file list for a snashottable dir from fsimage, > blocks were not updated in blocksmap -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5257) addBlock() retry should return LocatedBlock with locations else client will get AIOBE
[ https://issues.apache.org/jira/browse/HDFS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5257: Target Version/s: 2.2.1 > addBlock() retry should return LocatedBlock with locations else client will > get AIOBE > - > > Key: HDFS-5257 > URL: https://issues.apache.org/jira/browse/HDFS-5257 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, namenode >Affects Versions: 2.1.1-beta >Reporter: Vinay >Assignee: Vinay >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch, > HDFS-5257.patch > > > {{addBlock()}} call retry should return the LocatedBlock with locations if > the block was created in previous call and failover/restart of namenode > happened. > otherwise client will get {{ArrayIndexOutOfBoundsException}} while creating > the block and write will fail. > {noformat}java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1118) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:511){noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable
[ https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826859#comment-13826859 ] Uma Maheswara Rao G edited comment on HDFS-5014 at 11/19/13 7:42 PM: - Seems like there is an issue with the path. We should move the cmd null check before DNA_REGISTER if condition. {noformat} 2013-11-19 14:08:33,394 ERROR datanode.DataNode (BPServiceActor.java:run(719)) - Exception in BPOfferService for Block pool BP-1297942247-67.195.138.31-1384870112818 (storage id DS-234026112-67.195.138.31-43443-1384870113355) service to localhost/127.0.0.1:48821 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:507) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:745) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:597) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:717) at java.lang.Thread.run(Thread.java:662) {noformat} This would be the reason for timeouts in Jenkins. was (Author: umamaheswararao): Seems like there is an issue with the path. We should move the cmd null check before DNA_REGISTER if condition. {noformat} 2013-11-19 14:08:33,394 ERROR datanode.DataNode (BPServiceActor.java:run(719)) - Exception in BPOfferService for Block pool BP-1297942247-67.195.138.31-1384870112818 (storage id DS-234026112-67.195.138.31-43443-1384870113355) service to localhost/127.0.0.1:48821 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:507) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:745) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:597) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:717) at java.lang.Thread.run(Thread.java:662) {code} > BPOfferService#processCommandFromActor() synchronization on namenode RPC call > delays IBR to Active NN, if Stanby NN is unstable > --- > > Key: HDFS-5014 > URL: https://issues.apache.org/jira/browse/HDFS-5014 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, > HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch > > > In one of our cluster, following has happened which failed HDFS write. > 1. Standby NN was unstable and continously restarting due to some errors. But > Active NN was stable. > 2. MR Job was writing files. > 3. At some point SNN went down again while datanode processing the REGISTER > command for SNN. > 4. Datanodes started retrying to connect to SNN to register at the following > code in BPServiceActor#retrieveNamespaceInfo() which will be called under > synchronization. > {code} try { > nsInfo = bpNamenode.versionRequest(); > LOG.debug(this + " received versionRequest response: " + nsInfo); > break;{code} > Unfortunately in all datanodes at same point this happened. > 5. For next 7-8 min standby was down, and no blocks were reported to active > NN at this point and writes have failed. > So culprit is {{BPOfferService#processCommandFromActor()}} is completely > synchronized which is not required. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5257) addBlock() retry should return LocatedBlock with locations else client will get AIOBE
[ https://issues.apache.org/jira/browse/HDFS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5257: Priority: Blocker (was: Critical) > addBlock() retry should return LocatedBlock with locations else client will > get AIOBE > - > > Key: HDFS-5257 > URL: https://issues.apache.org/jira/browse/HDFS-5257 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, namenode >Affects Versions: 2.1.1-beta >Reporter: Vinay >Assignee: Vinay >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch, > HDFS-5257.patch > > > {{addBlock()}} call retry should return the LocatedBlock with locations if > the block was created in previous call and failover/restart of namenode > happened. > otherwise client will get {{ArrayIndexOutOfBoundsException}} while creating > the block and write will fail. > {noformat}java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1118) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:511){noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable
[ https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826859#comment-13826859 ] Uma Maheswara Rao G commented on HDFS-5014: --- Seems like there is an issue with the path. We should move the cmd null check before DNA_REGISTER if condition. {noformat} 2013-11-19 14:08:33,394 ERROR datanode.DataNode (BPServiceActor.java:run(719)) - Exception in BPOfferService for Block pool BP-1297942247-67.195.138.31-1384870112818 (storage id DS-234026112-67.195.138.31-43443-1384870113355) service to localhost/127.0.0.1:48821 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:507) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:745) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:597) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:717) at java.lang.Thread.run(Thread.java:662) {code} > BPOfferService#processCommandFromActor() synchronization on namenode RPC call > delays IBR to Active NN, if Stanby NN is unstable > --- > > Key: HDFS-5014 > URL: https://issues.apache.org/jira/browse/HDFS-5014 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, > HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch > > > In one of our cluster, following has happened which failed HDFS write. > 1. Standby NN was unstable and continously restarting due to some errors. But > Active NN was stable. > 2. MR Job was writing files. > 3. At some point SNN went down again while datanode processing the REGISTER > command for SNN. > 4. Datanodes started retrying to connect to SNN to register at the following > code in BPServiceActor#retrieveNamespaceInfo() which will be called under > synchronization. > {code} try { > nsInfo = bpNamenode.versionRequest(); > LOG.debug(this + " received versionRequest response: " + nsInfo); > break;{code} > Unfortunately in all datanodes at same point this happened. > 5. For next 7-8 min standby was down, and no blocks were reported to active > NN at this point and writes have failed. > So culprit is {{BPOfferService#processCommandFromActor()}} is completely > synchronized which is not required. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5511) improve CacheManipulator interface to allow better unit testing
[ https://issues.apache.org/jira/browse/HDFS-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826834#comment-13826834 ] Hadoop QA commented on HDFS-5511: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614645/HDFS-5511.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5487//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5487//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5487//console This message is automatically generated. > improve CacheManipulator interface to allow better unit testing > --- > > Key: HDFS-5511 > URL: https://issues.apache.org/jira/browse/HDFS-5511 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5511.001.patch, HDFS-5511.002.patch > > > The CacheManipulator interface has been helpful in allowing us to stub out > {{mlock}} in cases where we don't want to test it. We should move the > {{getMemlockLimit}} and {{getOperatingSystemPageSize}} functions into this > interface as well so that we don't have to skip these tests on machines where > these methods would ordinarily not work for us. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-1386) TestJMXGet fails in jdk7
[ https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826830#comment-13826830 ] Kihwal Lee commented on HDFS-1386: -- bq. If you are ok with this I can file separate JIRAs, one for YARN and one for Default Metrics System unregistration. Sounds reasonable. +1 for the patch. Please do check whether there is already a jira for the test failure. > TestJMXGet fails in jdk7 > > > Key: HDFS-1386 > URL: https://issues.apache.org/jira/browse/HDFS-1386 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode, test >Affects Versions: 0.22.0 >Reporter: Tanping Wang >Assignee: Jonathan Eagles >Priority: Blocker > Labels: java7 > Attachments: HDFS-1386.patch, HDFS-1386.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever
[ https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4516: - Target Version/s: 2.1.0-beta, 3.0.0 (was: 3.0.0, 2.1.0-beta) Hadoop Flags: Reviewed +1 patch looks good. > Client crash after block allocation and NN switch before lease recovery for > the same file can cause readers to fail forever > --- > > Key: HDFS-4516 > URL: https://issues.apache.org/jira/browse/HDFS-4516 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Uma Maheswara Rao G >Assignee: Vinay >Priority: Critical > Attachments: HDFS-4516-Test.patch, HDFS-4516.patch, HDFS-4516.patch, > HDFS-4516.patch, HDFS-4516.txt > > > If client crashes just after allocating block( blocks not yet created in DNs) > and NN also switched after this, then new Namenode will not know about locs. > Further details will be in comment. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826825#comment-13826825 ] Tsz Wo (Nicholas), SZE commented on HDFS-5526: -- Kihwal, you are right that there is no backup copy of ./current/VERSION and so it cannot be rolled back. > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Priority: Blocker > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826821#comment-13826821 ] Kihwal Lee commented on HDFS-5526: -- What about adding the following in the beginning of {{DataStorage.doRollback()}}? This is similar to what is in {{DataStorage.doUpgrade()}}. Since the VERSION file is not read yet in rollback (if it does it blows up as you reported), {{this.layoutVersion}} will be 0. So instead, it is checking the software layout version and see whether it is using the same layout version as the name node. {code} if (LayoutVersion.supports(Feature.FEDERATION, HdfsConstants.LAYOUT_VERSION) && HdfsConstants.LAYOUT_VERSION == nsInfo.getLayoutVersion()) { clusterID = nsInfo.getClusterID(); layoutVersion = nsInfo.getLayoutVersion(); writeProperties(sd); return; } {code} > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Priority: Blocker > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-3987) Support webhdfs over HTTPS
[ https://issues.apache.org/jira/browse/HDFS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826814#comment-13826814 ] Haohui Mai commented on HDFS-3987: -- bq. SWebHdfsFileSystem.java, why do we need a new token type? What that adds? On the client side, each FileSystem has to have a unique token kind so that Token.cancel() / Token.renew() can be redirected to the appropriate code paths. Hftp / hsftp / webhdfs follow the same pattern, so as swebhdfs. Without the new token kind swebhdfs will try to cancel / renew the token via the code path of webhdfs (i.e., canceling / renewing token via http, but not https). The same issues happens in hsftp and it is fixed in HDFS-5502. bq. WebHdfsFileSystem.java, it seems things are hardcoded to use a specific token kind, it should use the token kind send by the server. I didn't fully understand the question. WebHdfsFileSystem has the same logic (w.r.t token handling) of the old code. The server specifies the token kind, and the client redirects to the appropriate code path for handling the token using reflections. bq. My concern here is HttpFS tokens. Have you verified HttpFS works with swebhdfs? I think that HttpFS is out of the scope of this jira, maybe we can address them in a separate jira if these issues arise. > Support webhdfs over HTTPS > -- > > Key: HDFS-3987 > URL: https://issues.apache.org/jira/browse/HDFS-3987 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha >Reporter: Alejandro Abdelnur >Assignee: Haohui Mai > Fix For: 2.3.0 > > Attachments: HDFS-3987.000.patch, HDFS-3987.001.patch, > HDFS-3987.002.patch, HDFS-3987.003.patch, HDFS-3987.004.patch, > HDFS-3987.005.patch, HDFS-3987.006.patch, HDFS-3987.007.patch, > HDFS-3987.008.patch, HDFS-3987.009.patch > > > This is a follow up of HDFS-3983. > We should have a new filesystem client impl/binding for encrypted WebHDFS, > i.e. *webhdfss://* > On the server side, webhdfs and httpfs we should only need to start the > service on a secured (HTTPS) endpoint. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5523) Support subdirectory mount and multiple exports in HDFS-NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826794#comment-13826794 ] Brandon Li commented on HDFS-5523: -- [~aw]: Originally, NFS gateway had "/" as the only export for the NFS client to mount. This is obviously a big limitation especially when the user doesn't want to let NFS clients see the whole namespace structure for security or management concerns. HDFS-5469 added the property to make the single export configurable so the user can decide which subtree(export) of the namespace to share. However, users still have only one export, and don't have the flexibility to share multiple subtrees and give each export different access control (e.g., which hosts can mount which export with rw or ro access). This JIRA is to enable the NFS gateway to share multiple subtrees each with different access control like that in a traditional NFS server. > Support subdirectory mount and multiple exports in HDFS-NFS gateway > > > Key: HDFS-5523 > URL: https://issues.apache.org/jira/browse/HDFS-5523 > Project: Hadoop HDFS > Issue Type: New Feature > Components: nfs >Reporter: Brandon Li > > Supporting multiple exports and subdirectory mount usually can make data and > security management easier for the HDFS-NFS client. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-3987) Support webhdfs over HTTPS
[ https://issues.apache.org/jira/browse/HDFS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-3987: - Attachment: HDFS-3987.009.patch > Support webhdfs over HTTPS > -- > > Key: HDFS-3987 > URL: https://issues.apache.org/jira/browse/HDFS-3987 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha >Reporter: Alejandro Abdelnur >Assignee: Haohui Mai > Fix For: 2.3.0 > > Attachments: HDFS-3987.000.patch, HDFS-3987.001.patch, > HDFS-3987.002.patch, HDFS-3987.003.patch, HDFS-3987.004.patch, > HDFS-3987.005.patch, HDFS-3987.006.patch, HDFS-3987.007.patch, > HDFS-3987.008.patch, HDFS-3987.009.patch > > > This is a follow up of HDFS-3983. > We should have a new filesystem client impl/binding for encrypted WebHDFS, > i.e. *webhdfss://* > On the server side, webhdfs and httpfs we should only need to start the > service on a secured (HTTPS) endpoint. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832
[ https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826741#comment-13826741 ] Arpit Agarwal commented on HDFS-5527: - Junping, were you able to duplicate the failure? > Fix TestUnderReplicatedBlocks on branch HDFS-2832 > - > > Key: HDFS-5527 > URL: https://issues.apache.org/jira/browse/HDFS-5527 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Attachments: HDFS-5527.patch > > > The failure seems like a deadlock, which is show in: > https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-2832: Attachment: h2832_20131119.patch > Enable support for heterogeneous storages in HDFS > - > > Key: HDFS-2832 > URL: https://issues.apache.org/jira/browse/HDFS-2832 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, > editsStored, h2832_20131023.patch, h2832_20131023b.patch, > h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, > h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, > h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, > h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, > h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, > h2832_20131118.patch, h2832_20131119.patch > > > HDFS currently supports configuration where storages are a list of > directories. Typically each of these directories correspond to a volume with > its own file system. All these directories are homogeneous and therefore > identified as a single storage at the namenode. I propose, change to the > current model where Datanode * is a * storage, to Datanode * is a collection > * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5513) CacheAdmin commands fail when using . as the path
[ https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1382#comment-1382 ] Colin Patrick McCabe commented on HDFS-5513: bq. This was caused by the attempt to deep-copy in the PBCD Builder. Paths normalize the URI upon creation, so the single . simply gets thrown away. There doesn't seem to be a way to deep-copy a Path, but at the same time it doesn't look like you can mutate a Path either. I took another look and you are right. Although the Path does make its URI accessible to the outside world, the URI has no methods that could be used to mutate it. Can we merge {{testSingleDotPath}} into another junit test? It just seems kind of like overkill to set up a whole DFSCluster just to see if a PBCE with "." as the path can be added and then removed. It would be nice to keep test execution time down. +1 once that's addressed > CacheAdmin commands fail when using . as the path > - > > Key: HDFS-5513 > URL: https://issues.apache.org/jira/browse/HDFS-5513 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, tools >Affects Versions: 3.0.0 >Reporter: Stephen Chu >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch > > > The hdfs CLI commands generally accept "." as a path argument. > e.g. > {code} > hdfs dfs -rm . > hdfs dfsadmin -allowSnapshot . > {code} > I don't think it's very common to use the path "." but the CacheAdmin > commands will fail saying that it cannot create a Path from an empty string. > {code} > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path . > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598) > at > org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180) > at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82) > at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826661#comment-13826661 ] Kihwal Lee commented on HDFS-5526: -- Well, it is being called for {{DataStorage}} when {{recoverTransitionRead()}} is called for the first block pool that is being initialized. {{DataStorage.doUpgrade()}} will simply write the new version and return. So "previous" won't be created. In {{DataStorage.doRollback()}}, the comment says, "Do nothing, if previous directory does not exist" and that's what it does. > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Priority: Blocker > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5511) improve CacheManipulator interface to allow better unit testing
[ https://issues.apache.org/jira/browse/HDFS-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5511: --- Attachment: HDFS-5511.002.patch * add getters and setters for cacheManipulator * add javaDoc > improve CacheManipulator interface to allow better unit testing > --- > > Key: HDFS-5511 > URL: https://issues.apache.org/jira/browse/HDFS-5511 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5511.001.patch, HDFS-5511.002.patch > > > The CacheManipulator interface has been helpful in allowing us to stub out > {{mlock}} in cases where we don't want to test it. We should move the > {{getMemlockLimit}} and {{getOperatingSystemPageSize}} functions into this > interface as well so that we don't have to skip these tests on machines where > these methods would ordinarily not work for us. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version
[ https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826631#comment-13826631 ] Kihwal Lee commented on HDFS-5526: -- It looks like {{recoverTransitionRead()}} is done for individual block pool storage from {{DataNode.initStorage()}}, but never done for {{DataStorage}} itself. That also explains why it's lacking "previous" after upgrade. > Datanode cannot roll back to previous layout version > > > Key: HDFS-5526 > URL: https://issues.apache.org/jira/browse/HDFS-5526 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo (Nicholas), SZE >Priority: Blocker > > Current trunk layout version is -48. > Hadoop v2.2.0 layout version is -47. > If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes > cannot start with -rollback. It will fail with IncorrectVersionException. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable
[ https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5014: Attachment: HDFS-5014-v2.patch Thanks Uma, for the comments. I have updated the java docs and removed unnecessary DNA_REGISTER command. By this time jenkins was still running. I will update again if any issues from jenkins. > BPOfferService#processCommandFromActor() synchronization on namenode RPC call > delays IBR to Active NN, if Stanby NN is unstable > --- > > Key: HDFS-5014 > URL: https://issues.apache.org/jira/browse/HDFS-5014 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, > HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, > HDFS-5014.patch > > > In one of our cluster, following has happened which failed HDFS write. > 1. Standby NN was unstable and continously restarting due to some errors. But > Active NN was stable. > 2. MR Job was writing files. > 3. At some point SNN went down again while datanode processing the REGISTER > command for SNN. > 4. Datanodes started retrying to connect to SNN to register at the following > code in BPServiceActor#retrieveNamespaceInfo() which will be called under > synchronization. > {code} try { > nsInfo = bpNamenode.versionRequest(); > LOG.debug(this + " received versionRequest response: " + nsInfo); > break;{code} > Unfortunately in all datanodes at same point this happened. > 5. For next 7-8 min standby was down, and no blocks were reported to active > NN at this point and writes have failed. > So culprit is {{BPOfferService#processCommandFromActor()}} is completely > synchronized which is not required. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever
[ https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826568#comment-13826568 ] Hadoop QA commented on HDFS-4516: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614572/HDFS-4516.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5484//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5484//console This message is automatically generated. > Client crash after block allocation and NN switch before lease recovery for > the same file can cause readers to fail forever > --- > > Key: HDFS-4516 > URL: https://issues.apache.org/jira/browse/HDFS-4516 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Uma Maheswara Rao G >Assignee: Vinay >Priority: Critical > Attachments: HDFS-4516-Test.patch, HDFS-4516.patch, HDFS-4516.patch, > HDFS-4516.patch, HDFS-4516.txt > > > If client crashes just after allocating block( blocks not yet created in DNs) > and NN also switched after this, then new Namenode will not know about locs. > Further details will be in comment. -- This message was sent by Atlassian JIRA (v6.1#6144)