[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827418#comment-13827418
 ] 

Hadoop QA commented on HDFS-2832:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614794/h2832_20131119b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 45 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5499//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5499//console

This message is automatically generated.

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, 
> editsStored, h2832_20131023.patch, h2832_20131023b.patch, 
> h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, 
> h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, 
> h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, 
> h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, 
> h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, 
> h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832

2013-11-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827387#comment-13827387
 ] 

Junping Du commented on HDFS-5527:
--

Thanks Arpit for the patch.
+1. Patch looks good to me. Also verify a few iterations of tests that all 
passed. 

> Fix TestUnderReplicatedBlocks on branch HDFS-2832
> -
>
> Key: HDFS-5527
> URL: https://issues.apache.org/jira/browse/HDFS-5527
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: HDFS-5527.patch, h5527.02.patch
>
>
> The failure seems like a deadlock, which is show in:
> https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832

2013-11-19 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned HDFS-5527:


Assignee: Arpit Agarwal  (was: Junping Du)

> Fix TestUnderReplicatedBlocks on branch HDFS-2832
> -
>
> Key: HDFS-5527
> URL: https://issues.apache.org/jira/browse/HDFS-5527
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Arpit Agarwal
> Attachments: HDFS-5527.patch, h5527.02.patch
>
>
> The failure seems like a deadlock, which is show in:
> https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-3215) Block size is logging as zero Even blockrecevied command received by DN

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827383#comment-13827383
 ] 

Hadoop QA commented on HDFS-3215:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614783/HDFS-3215.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5498//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5498//console

This message is automatically generated.

> Block size is logging as zero Even blockrecevied command received by DN 
> 
>
> Key: HDFS-3215
> URL: https://issues.apache.org/jira/browse/HDFS-3215
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Brahma Reddy Battula
>Assignee: Shinichi Yamashita
>Priority: Minor
> Attachments: HDFS-3215.patch, HDFS-3215.patch
>
>
> Scenario 1
> ==
> Start NN and DN.
> write file.
> Block size is logging as zero Even blockrecevied command received by DN 
>  *NN log*
> 2012-03-14 20:23:40,541 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: /hadoop-create-user.sh._COPYING_. 
> BP-1166515020-10.18.40.24-1331736264353 
> blk_1264419582929433995_1002{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[XXX:50010|RBW]]}
> 2012-03-14 20:24:26,357 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> addStoredBlock: blockMap updated: XXX:50010 is added to 
> blk_1264419582929433995_1002{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[XXX:50010|RBW]]} 
> size 0
>  *DN log* 
> 2012-03-14 20:24:17,519 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block 
> BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002 src: 
> /XXX:53141 dest: /XXX:50010
> 2012-03-14 20:24:26,517 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /XXX:53141, dest: /XXX:50010, bytes: 512, op: HDFS_WRITE, cliID: 
> DFSClient_NONMAPREDUCE_1612873957_1, offset: 0, srvID: 
> DS-1639667928-XXX-50010-1331736284942, blockid: 
> BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002, duration: 
> 1286482503
> 2012-03-14 20:24:26,517 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder: 
> BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002, 
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2012-03-14 20:24:31,533 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report

2013-11-19 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5533:


Attachment: (was: HDFS-5533.patch)

> Symlink delete/create should be treated as DELETE/CREATE in snapshot diff 
> report
> 
>
> Key: HDFS-5533
> URL: https://issues.apache.org/jira/browse/HDFS-5533
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
> Attachments: HDFS-5533.patch
>
>
> Currently the original code treat symlink delete/create as modify, but 
> symlink is immutable, should be CREATE and DELETE



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report

2013-11-19 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5533:


Attachment: HDFS-5533.patch

> Symlink delete/create should be treated as DELETE/CREATE in snapshot diff 
> report
> 
>
> Key: HDFS-5533
> URL: https://issues.apache.org/jira/browse/HDFS-5533
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
> Attachments: HDFS-5533.patch
>
>
> Currently the original code treat symlink delete/create as modify, but 
> symlink is immutable, should be CREATE and DELETE



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report

2013-11-19 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5533:


Attachment: HDFS-5533.patch

> Symlink delete/create should be treated as DELETE/CREATE in snapshot diff 
> report
> 
>
> Key: HDFS-5533
> URL: https://issues.apache.org/jira/browse/HDFS-5533
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
> Attachments: HDFS-5533.patch
>
>
> Currently the original code treat symlink delete/create as modify, but 
> symlink is immutable, should be CREATE and DELETE



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report

2013-11-19 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5533:


Description: Currently the original code treat symlink delete/create as 
modify, but symlink is immutable, should be CREATE and DELETE

> Symlink delete/create should be treated as DELETE/CREATE in snapshot diff 
> report
> 
>
> Key: HDFS-5533
> URL: https://issues.apache.org/jira/browse/HDFS-5533
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
>
> Currently the original code treat symlink delete/create as modify, but 
> symlink is immutable, should be CREATE and DELETE



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report

2013-11-19 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5533:


Status: Patch Available  (was: Open)

> Symlink delete/create should be treated as DELETE/CREATE in snapshot diff 
> report
> 
>
> Key: HDFS-5533
> URL: https://issues.apache.org/jira/browse/HDFS-5533
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
>
> Currently the original code treat symlink delete/create as modify, but 
> symlink is immutable, should be CREATE and DELETE



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report

2013-11-19 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5533:


Environment: (was: Currently the original code treat symlink 
delete/create as modify, but symlink is immutable, should be CREATE and DELETE)

> Symlink delete/create should be treated as DELETE/CREATE in snapshot diff 
> report
> 
>
> Key: HDFS-5533
> URL: https://issues.apache.org/jira/browse/HDFS-5533
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5533) Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report

2013-11-19 Thread Binglin Chang (JIRA)
Binglin Chang created HDFS-5533:
---

 Summary: Symlink delete/create should be treated as DELETE/CREATE 
in snapshot diff report
 Key: HDFS-5533
 URL: https://issues.apache.org/jira/browse/HDFS-5533
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Currently the original code treat symlink delete/create 
as modify, but symlink is immutable, should be CREATE and DELETE
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5484) StorageType and State in DatanodeStorageInfo in NameNode is not accurate

2013-11-19 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5484:


Assignee: (was: Arpit Agarwal)

> StorageType and State in DatanodeStorageInfo in NameNode is not accurate
> 
>
> Key: HDFS-5484
> URL: https://issues.apache.org/jira/browse/HDFS-5484
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Eric Sirianni
>
> The fields in DatanodeStorageInfo are updated from two distinct paths:
> # block reports
> # storage reports (via heartbeats)
> The {{state}} and {{storageType}} fields are updated via the Block Report.  
> However, as seen in the code blow, these fields are populated from a "dummy" 
> {{DatanodeStorage}} object constructed in the DataNode:
> {code}
> BPServiceActor.blockReport() {
> //...
> // Dummy DatanodeStorage object just for sending the block report.
> DatanodeStorage dnStorage = new DatanodeStorage(storageID);
> //...
> }
> {code}
> The net effect is that the {{state}} and {{storageType}} fields are always 
> the default of {{NORMAL}} and {{DISK}} in the NameNode.
> The recommended fix is to change {{FsDatasetSpi.getBlockReports()}} from:
> {code}
> public Map getBlockReports(String bpid);
> {code}
> to:
> {code}
> public Map getBlockReports(String bpid);
> {code}
> thereby allowing {{BPServiceActor}} to send the "real" {{DatanodeStorage}} 
> object with the block report.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HDFS-5484) StorageType and State in DatanodeStorageInfo in NameNode is not accurate

2013-11-19 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDFS-5484:
---

Assignee: Arpit Agarwal

> StorageType and State in DatanodeStorageInfo in NameNode is not accurate
> 
>
> Key: HDFS-5484
> URL: https://issues.apache.org/jira/browse/HDFS-5484
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Eric Sirianni
>Assignee: Arpit Agarwal
>
> The fields in DatanodeStorageInfo are updated from two distinct paths:
> # block reports
> # storage reports (via heartbeats)
> The {{state}} and {{storageType}} fields are updated via the Block Report.  
> However, as seen in the code blow, these fields are populated from a "dummy" 
> {{DatanodeStorage}} object constructed in the DataNode:
> {code}
> BPServiceActor.blockReport() {
> //...
> // Dummy DatanodeStorage object just for sending the block report.
> DatanodeStorage dnStorage = new DatanodeStorage(storageID);
> //...
> }
> {code}
> The net effect is that the {{state}} and {{storageType}} fields are always 
> the default of {{NORMAL}} and {{DISK}} in the NameNode.
> The recommended fix is to change {{FsDatasetSpi.getBlockReports()}} from:
> {code}
> public Map getBlockReports(String bpid);
> {code}
> to:
> {code}
> public Map getBlockReports(String bpid);
> {code}
> thereby allowing {{BPServiceActor}} to send the "real" {{DatanodeStorage}} 
> object with the block report.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827364#comment-13827364
 ] 

Hadoop QA commented on HDFS-5014:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614777/HDFS-5014-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5497//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5497//console

This message is automatically generated.

> BPOfferService#processCommandFromActor() synchronization on namenode RPC call 
> delays IBR to Active NN, if Stanby NN is unstable
> ---
>
> Key: HDFS-5014
> URL: https://issues.apache.org/jira/browse/HDFS-5014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, 
> HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014-v2.patch, 
> HDFS-5014-v2.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch
>
>
> In one of our cluster, following has happened which failed HDFS write.
> 1. Standby NN was unstable and continously restarting due to some errors. But 
> Active NN was stable.
> 2. MR Job was writing files.
> 3. At some point SNN went down again while datanode processing the REGISTER 
> command for SNN. 
> 4. Datanodes started retrying to connect to SNN to register at the following 
> code  in BPServiceActor#retrieveNamespaceInfo() which will be called under 
> synchronization.
> {code}  try {
> nsInfo = bpNamenode.versionRequest();
> LOG.debug(this + " received versionRequest response: " + nsInfo);
> break;{code}
> Unfortunately in all datanodes at same point this happened.
> 5. For next 7-8 min standby was down, and no blocks were reported to active 
> NN at this point and writes have failed.
> So culprit is {{BPOfferService#processCommandFromActor()}} is completely 
> synchronized which is not required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-11-19 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-2832:


Attachment: h2832_20131119b.patch

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, 
> editsStored, h2832_20131023.patch, h2832_20131023b.patch, 
> h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, 
> h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, 
> h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, 
> h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, 
> h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, 
> h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832

2013-11-19 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5527:


Attachment: h5527.02.patch

This is a DN bug caused by failure to remove an earlier incremental block 
report entry (on a different storage) when adding a new entry.

Attaching a patch with the fix. Will also submit a merge patch with this fix to 
Jenkins to see what it thinks.

> Fix TestUnderReplicatedBlocks on branch HDFS-2832
> -
>
> Key: HDFS-5527
> URL: https://issues.apache.org/jira/browse/HDFS-5527
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: HDFS-5527.patch, h5527.02.patch
>
>
> The failure seems like a deadlock, which is show in:
> https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5451) add more debugging for cache rescan

2013-11-19 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827339#comment-13827339
 ] 

Andrew Wang commented on HDFS-5451:
---

Haven't checked the new patch yet, just replying to comments:

bq. as per our discussion earlier, I'd rather not create more subclasses. Users 
should be able to get back a PBCD from listDirectives, modify one thing, and 
then use it in modifyDirective.

Is the subclass that gross here? You could PBCD.Builder(subclass) still to seed 
the PBCD for modification, and thus hide the new setter/getters. Just thinking 
about how we might clean up the API.

bq. I don't want to do so much copying. This list could have arbitrary length. 
This can't go in DFSUtil since it depends on the configuration value for 
maximum blocks to print.

Yea, you're right on the copying. Didn't realize that. You could pass the 
config param in to the function to put this in DFSUtil still; there are other 
functions like this one there, which is why I suggested it.

bq. Let's do that later. It's not really related to the other changes here and 
we want the stats in soon.

Could you file the follow on JIRA then? thanks.

> add more debugging for cache rescan
> ---
>
> Key: HDFS-5451
> URL: https://issues.apache.org/jira/browse/HDFS-5451
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5451.001.patch, HDFS-5451.002.patch, 
> HDFS-5451.003.patch
>
>
> It would be nice to have message at DEBUG level that described all the 
> decisions we made for cache entries.  That way we could turn on this 
> debugging to get more information.  We should also store the number of bytes 
> each PBCE wanted, and the number of bytes it got, plus the number of inodes 
> it got, and output those in {{listDirectives}}.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-3215) Block size is logging as zero Even blockrecevied command received by DN

2013-11-19 Thread Shinichi Yamashita (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichi Yamashita updated HDFS-3215:
-

Attachment: HDFS-3215.patch

I changed it to treat block size with an argument based on a previous test 
result.

> Block size is logging as zero Even blockrecevied command received by DN 
> 
>
> Key: HDFS-3215
> URL: https://issues.apache.org/jira/browse/HDFS-3215
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Brahma Reddy Battula
>Assignee: Shinichi Yamashita
>Priority: Minor
> Attachments: HDFS-3215.patch, HDFS-3215.patch
>
>
> Scenario 1
> ==
> Start NN and DN.
> write file.
> Block size is logging as zero Even blockrecevied command received by DN 
>  *NN log*
> 2012-03-14 20:23:40,541 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: /hadoop-create-user.sh._COPYING_. 
> BP-1166515020-10.18.40.24-1331736264353 
> blk_1264419582929433995_1002{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[XXX:50010|RBW]]}
> 2012-03-14 20:24:26,357 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> addStoredBlock: blockMap updated: XXX:50010 is added to 
> blk_1264419582929433995_1002{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[XXX:50010|RBW]]} 
> size 0
>  *DN log* 
> 2012-03-14 20:24:17,519 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block 
> BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002 src: 
> /XXX:53141 dest: /XXX:50010
> 2012-03-14 20:24:26,517 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /XXX:53141, dest: /XXX:50010, bytes: 512, op: HDFS_WRITE, cliID: 
> DFSClient_NONMAPREDUCE_1612873957_1, offset: 0, srvID: 
> DS-1639667928-XXX-50010-1331736284942, blockid: 
> BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002, duration: 
> 1286482503
> 2012-03-14 20:24:26,517 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder: 
> BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002, 
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2012-03-14 20:24:31,533 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> succeeded for BP-1166515020-XXX-1331736264353:blk_1264419582929433995_1002



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5531) Combine the getNsQuota() and getDsQuota() methods in INode

2013-11-19 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827299#comment-13827299
 ] 

Vinay commented on HDFS-5531:
-

Patch looks pretty good Nicholas.
One small nit which will fix the test failures I hope ( I verified in local).

{code}+  public boolean equals(Object obj) {
+if (obj == this) {
+  return true;
+} else if (obj == null || !(obj instanceof EnumCounters)) {
+  return false;
+}
+final EnumCounters that = (EnumCounters)obj;
+return this.enumConstants == that.enumConstants
+&& Arrays.equals(this.counters, that.counters);
+  }{code}

Here 
{{return this.enumConstants == that.enumConstants}} will always returns false 
as {{Quota.values()==Quota.values()}} is always false.
Should be replaced with {{Arrays.equals(this.enumConstants, 
that.enumConstants)}}.
This will make the test pass.

> Combine the getNsQuota() and getDsQuota() methods in INode
> --
>
> Key: HDFS-5531
> URL: https://issues.apache.org/jira/browse/HDFS-5531
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Attachments: h5531_20131119.patch
>
>
> I suggest to combine these two methods into 
> {code}
> public Quota.Counts getQuotaCounts()
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable

2013-11-19 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5014:


Attachment: HDFS-5014-v2.patch

Attached the findbug fixed patch.

Test failure seems unrelated. same test passed in my local.

> BPOfferService#processCommandFromActor() synchronization on namenode RPC call 
> delays IBR to Active NN, if Stanby NN is unstable
> ---
>
> Key: HDFS-5014
> URL: https://issues.apache.org/jira/browse/HDFS-5014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, 
> HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014-v2.patch, 
> HDFS-5014-v2.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch
>
>
> In one of our cluster, following has happened which failed HDFS write.
> 1. Standby NN was unstable and continously restarting due to some errors. But 
> Active NN was stable.
> 2. MR Job was writing files.
> 3. At some point SNN went down again while datanode processing the REGISTER 
> command for SNN. 
> 4. Datanodes started retrying to connect to SNN to register at the following 
> code  in BPServiceActor#retrieveNamespaceInfo() which will be called under 
> synchronization.
> {code}  try {
> nsInfo = bpNamenode.versionRequest();
> LOG.debug(this + " received versionRequest response: " + nsInfo);
> break;{code}
> Unfortunately in all datanodes at same point this happened.
> 5. For next 7-8 min standby was down, and no blocks were reported to active 
> NN at this point and writes have failed.
> So culprit is {{BPOfferService#processCommandFromActor()}} is completely 
> synchronized which is not required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5451) add more debugging for cache rescan

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827271#comment-13827271
 ] 

Hadoop QA commented on HDFS-5451:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614774/HDFS-5451.003.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5496//console

This message is automatically generated.

> add more debugging for cache rescan
> ---
>
> Key: HDFS-5451
> URL: https://issues.apache.org/jira/browse/HDFS-5451
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5451.001.patch, HDFS-5451.002.patch, 
> HDFS-5451.003.patch
>
>
> It would be nice to have message at DEBUG level that described all the 
> decisions we made for cache entries.  That way we could turn on this 
> debugging to get more information.  We should also store the number of bytes 
> each PBCE wanted, and the number of bytes it got, plus the number of inodes 
> it got, and output those in {{listDirectives}}.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5531) Combine the getNsQuota() and getDsQuota() methods in INode

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827266#comment-13827266
 ] 

Hadoop QA commented on HDFS-5531:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614737/h5531_20131119.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5495//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5495//console

This message is automatically generated.

> Combine the getNsQuota() and getDsQuota() methods in INode
> --
>
> Key: HDFS-5531
> URL: https://issues.apache.org/jira/browse/HDFS-5531
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Attachments: h5531_20131119.patch
>
>
> I suggest to combine these two methods into 
> {code}
> public Quota.Counts getQuotaCounts()
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5532) Enable the webhdfs by default to support new HDFS web UI

2013-11-19 Thread Vinay (JIRA)
Vinay created HDFS-5532:
---

 Summary: Enable the webhdfs by default to support new HDFS web UI
 Key: HDFS-5532
 URL: https://issues.apache.org/jira/browse/HDFS-5532
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Vinay
Assignee: Vinay


Recently in HDFS-5444, new HDFS web UI is made as default. 
 but this needs webhdfs to be enabled. 

WebHDFS is disabled by default. Lets enable it by default to support new really 
cool web UI.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5451) add more debugging for cache rescan

2013-11-19 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5451:
---

Attachment: HDFS-5451.003.patch

> add more debugging for cache rescan
> ---
>
> Key: HDFS-5451
> URL: https://issues.apache.org/jira/browse/HDFS-5451
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5451.001.patch, HDFS-5451.002.patch, 
> HDFS-5451.003.patch
>
>
> It would be nice to have message at DEBUG level that described all the 
> decisions we made for cache entries.  That way we could turn on this 
> debugging to get more information.  We should also store the number of bytes 
> each PBCE wanted, and the number of bytes it got, plus the number of inodes 
> it got, and output those in {{listDirectives}}.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5451) add more debugging for cache rescan

2013-11-19 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827261#comment-13827261
 ] 

Colin Patrick McCabe commented on HDFS-5451:


bq. Can we hide these new PBCD Builder methods from users of DFS? They're 
meaningless for create/modify, ideally we only see them when doing listing. 
Seems well suited as a listing subclass?

as per our discussion earlier, I'd rather not create more subclasses.  Users 
should be able to get back a PBCD from listDirectives, modify one thing, and 
then use it in modifyDirective.

bq.Extra newline at the end of the Builder

ok

bq. Seems like having #clear and #increment(long) methods would better suit the 
new PBCE byte methods.

ok

bq. I feel like this should be a min then, so the repl 1 PBCE doesn't get 
charged double

Yeah, this was supposed to be min.  Good call.

bq. BPOS#blockIdArrayToString, could this go in DFSUtil instead? Seems like a 
better place for it. Also, you can use Guava's Joiner for doing this kind of 
task, and Arrays.asList and List.subList could get the max behavior.

I don't want to do so much copying.  This list could have arbitrary length.  
This can't go in DFSUtil since it depends on the configuration value for 
maximum blocks to print.

bq. Should this extra logging maybe be at DEBUG? It could be a rather large 
message, even with the 1000 limit.

I think seeing what was cached is useful.  This will be our only window into 
what happened in production in a lot of cases.

bq. CacheAdmin, the row extension for printStatus is kinda ugly. Maybe use an 
ArrayList so we can cleanly append?

It's kind of annoying, but {{ArrayList}} doesn't actually give you acess to the 
backing array.  Anyway, this array is tiny.  Let's just use a List and then 
call toArray at the end.

bq. Good catch on right justifying numeric columns. Mind doing the same for the 
ID field?

ok

bq. Tests need to be rebased on trunk, but there's an above comment on the JIRA 
about verifying uncache/cache races.

Let's do that later.  It's not really related to the other changes here and we 
want the stats in soon.

bq. Can we also get a test for the code snippet above, where we have multiple 
things caching the same file with different repls?

I added a new test for this area.

bq. Test verifying stats for a cached directory as files are added to it?

the new test covers this, I think

> add more debugging for cache rescan
> ---
>
> Key: HDFS-5451
> URL: https://issues.apache.org/jira/browse/HDFS-5451
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5451.001.patch, HDFS-5451.002.patch, 
> HDFS-5451.003.patch
>
>
> It would be nice to have message at DEBUG level that described all the 
> decisions we made for cache entries.  That way we could turn on this 
> debugging to get more information.  We should also store the number of bytes 
> each PBCE wanted, and the number of bytes it got, plus the number of inodes 
> it got, and output those in {{listDirectives}}.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827256#comment-13827256
 ] 

Vinay commented on HDFS-5526:
-

Yes you are right nicholas. We dont have any clue whether we have upgraded or 
not. One thing is Version file will be overwritten if the layoutVersion is 
latest or ctime is latest., But that namenode also should be part of the same 
cluster otherwise upgrade will not happen. 

One more thing, ./current/VERSION will be overwritten everytime DN restarted 
after upgrade because of the following check.
{code}// do upgrade
if (this.layoutVersion > HdfsConstants.LAYOUT_VERSION
|| this.cTime < nsInfo.getCTime()) {
  doUpgrade(sd, nsInfo);  // upgrade
  return;
}{code}
because, ./current/VERSION file ctime is always same. and upgraded NN will have 
higher ctime.


> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832

2013-11-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827255#comment-13827255
 ] 

Junping Du commented on HDFS-5527:
--

Yes. I also see this in my local test failure, suspecting it could be some race 
conditions in block queue of UnderReplicatedBlocks. Still in more investigation.

> Fix TestUnderReplicatedBlocks on branch HDFS-2832
> -
>
> Key: HDFS-5527
> URL: https://issues.apache.org/jira/browse/HDFS-5527
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: HDFS-5527.patch
>
>
> The failure seems like a deadlock, which is show in:
> https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827242#comment-13827242
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5526:
--

> But during upgrade only clusterId and layoutVersion are overwritten, ctime is 
> never modified. clusterId and layoutVersion are never going to change 
> dynamically. right?

You are right but we also need to consider some error cases such as connecting 
a DN to a wrong cluster, moving the storage to another DN, rolling back to a 
wrong version, upgrading again without rollback, etc.  We need to make sure all 
the error cases will fail.  I think it is the hard part.

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827239#comment-13827239
 ] 

Hadoop QA commented on HDFS-5014:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614728/HDFS-5014-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5494//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5494//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5494//console

This message is automatically generated.

> BPOfferService#processCommandFromActor() synchronization on namenode RPC call 
> delays IBR to Active NN, if Stanby NN is unstable
> ---
>
> Key: HDFS-5014
> URL: https://issues.apache.org/jira/browse/HDFS-5014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, 
> HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch
>
>
> In one of our cluster, following has happened which failed HDFS write.
> 1. Standby NN was unstable and continously restarting due to some errors. But 
> Active NN was stable.
> 2. MR Job was writing files.
> 3. At some point SNN went down again while datanode processing the REGISTER 
> command for SNN. 
> 4. Datanodes started retrying to connect to SNN to register at the following 
> code  in BPServiceActor#retrieveNamespaceInfo() which will be called under 
> synchronization.
> {code}  try {
> nsInfo = bpNamenode.versionRequest();
> LOG.debug(this + " received versionRequest response: " + nsInfo);
> break;{code}
> Unfortunately in all datanodes at same point this happened.
> 5. For next 7-8 min standby was down, and no blocks were reported to active 
> NN at this point and writes have failed.
> So culprit is {{BPOfferService#processCommandFromActor()}} is completely 
> synchronized which is not required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5498) Improve datanode startup time

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827236#comment-13827236
 ] 

Hadoop QA commented on HDFS-5498:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12614733/HDFS-5498.with_du_change.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  org.apache.hadoop.hdfs.TestDFSUpgrade

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5493//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5493//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5493//console

This message is automatically generated.

> Improve datanode startup time
> -
>
> Key: HDFS-5498
> URL: https://issues.apache.org/jira/browse/HDFS-5498
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-5498.with_du_change.patch
>
>
> Similarly to HDFS-5027, an improvement  can be made for getVomeMap(). This is 
> the phase in which ReplicaMap.is populated.  But it will be even better if 
> datanode scans only once and do both.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827227#comment-13827227
 ] 

Vinay commented on HDFS-5526:
-

bq. For example, the ctime or some ids may have been changed in some unexpected 
way without being noticed
Overwriting the version file in datanode current directory is only during 
format and upgrade.
But during upgrade only clusterId and layoutVersion are overwritten, ctime is 
never modified. clusterId and layoutVersion are never going to change 
dynamically. right?
{noformat}if (LayoutVersion.supports(Feature.FEDERATION, layoutVersion)) {
  clusterID = nsInfo.getClusterID();
  layoutVersion = nsInfo.getLayoutVersion();
  writeProperties(sd);
  return;
}{noformat}


Hi [~kihwal], Patch looks really simple. +1
do you think we need to update this comment now..?
{code}   * Do nothing, if previous directory does not exist.{code}

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable

2013-11-19 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827222#comment-13827222
 ] 

Uma Maheswara Rao G commented on HDFS-5014:
---

I think it can happen here (I guess).
{code}
DatanodeCommand cmd = blockReport();
 processCommand(new DatanodeCommand[]{ cmd });
{code}
We can check null here itself. But That's ok I think. Let me check the latest 
patch.

> BPOfferService#processCommandFromActor() synchronization on namenode RPC call 
> delays IBR to Active NN, if Stanby NN is unstable
> ---
>
> Key: HDFS-5014
> URL: https://issues.apache.org/jira/browse/HDFS-5014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, 
> HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch
>
>
> In one of our cluster, following has happened which failed HDFS write.
> 1. Standby NN was unstable and continously restarting due to some errors. But 
> Active NN was stable.
> 2. MR Job was writing files.
> 3. At some point SNN went down again while datanode processing the REGISTER 
> command for SNN. 
> 4. Datanodes started retrying to connect to SNN to register at the following 
> code  in BPServiceActor#retrieveNamespaceInfo() which will be called under 
> synchronization.
> {code}  try {
> nsInfo = bpNamenode.versionRequest();
> LOG.debug(this + " received versionRequest response: " + nsInfo);
> break;{code}
> Unfortunately in all datanodes at same point this happened.
> 5. For next 7-8 min standby was down, and no blocks were reported to active 
> NN at this point and writes have failed.
> So culprit is {{BPOfferService#processCommandFromActor()}} is completely 
> synchronized which is not required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-3987) Support webhdfs over HTTPS

2013-11-19 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827210#comment-13827210
 ] 

Haohui Mai commented on HDFS-3987:
--

I tested the patch by running distcp to write to swebhdfs. I ran the test in 
both secure and insecure clusters. Both set ups worked.

I clean up the warning of HttpServer in order to work around a Jenkins' bug. 
The original patch only touches hadoop-auth and hadoop-hdfs, in which case 
Jenkins won't build hadoop-common. Therefore, the test TestHdfsNativeCodeLoader 
cannot find libhadoop.so built in hadoop-common, causing the test to fail. 
Touching a file in hadoop-common forces the build and works around the problem.

> Support webhdfs over HTTPS
> --
>
> Key: HDFS-3987
> URL: https://issues.apache.org/jira/browse/HDFS-3987
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Haohui Mai
> Fix For: 2.3.0
>
> Attachments: HDFS-3987.000.patch, HDFS-3987.001.patch, 
> HDFS-3987.002.patch, HDFS-3987.003.patch, HDFS-3987.004.patch, 
> HDFS-3987.005.patch, HDFS-3987.006.patch, HDFS-3987.007.patch, 
> HDFS-3987.008.patch, HDFS-3987.009.patch
>
>
> This is a follow up of HDFS-3983.
> We should have a new filesystem client impl/binding for encrypted WebHDFS, 
> i.e. *webhdfss://*
> On the server side, webhdfs and httpfs we should only need to start the 
> service on a secured (HTTPS) endpoint.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5511) improve CacheManipulator interface to allow better unit testing

2013-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827191#comment-13827191
 ] 

Hudson commented on HDFS-5511:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4764 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4764/])
HDFS-5511. improve CacheManipulator interface to allow better unit testing 
(cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543676)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/ReadaheadPool.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeConfig.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestCachingStrategy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestPathBasedCacheRequests.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java


> improve CacheManipulator interface to allow better unit testing
> ---
>
> Key: HDFS-5511
> URL: https://issues.apache.org/jira/browse/HDFS-5511
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0
>
> Attachments: HDFS-5511.001.patch, HDFS-5511.002.patch
>
>
> The CacheManipulator interface has been helpful in allowing us to stub out 
> {{mlock}} in cases where we don't want to test it.  We should move  the 
> {{getMemlockLimit}} and {{getOperatingSystemPageSize}} functions into this 
> interface as well so that we don't have to skip these tests on machines where 
> these methods would ordinarily not work for us.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-3987) Support webhdfs over HTTPS

2013-11-19 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827188#comment-13827188
 ] 

Jing Zhao commented on HDFS-3987:
-

The new patch looks good to me. Still, please mention how you did system tests 
to verify the patch. And also please mention why you need to add modification 
to HttpServer.java. 

+1 for the latest patch. [~tucu00], do you still have further comments? I will 
commit the patch tomorrow in case there is no more comment.

> Support webhdfs over HTTPS
> --
>
> Key: HDFS-3987
> URL: https://issues.apache.org/jira/browse/HDFS-3987
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Haohui Mai
> Fix For: 2.3.0
>
> Attachments: HDFS-3987.000.patch, HDFS-3987.001.patch, 
> HDFS-3987.002.patch, HDFS-3987.003.patch, HDFS-3987.004.patch, 
> HDFS-3987.005.patch, HDFS-3987.006.patch, HDFS-3987.007.patch, 
> HDFS-3987.008.patch, HDFS-3987.009.patch
>
>
> This is a follow up of HDFS-3983.
> We should have a new filesystem client impl/binding for encrypted WebHDFS, 
> i.e. *webhdfss://*
> On the server side, webhdfs and httpfs we should only need to start the 
> service on a secured (HTTPS) endpoint.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5511) improve CacheManipulator interface to allow better unit testing

2013-11-19 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5511:
---

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

> improve CacheManipulator interface to allow better unit testing
> ---
>
> Key: HDFS-5511
> URL: https://issues.apache.org/jira/browse/HDFS-5511
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0
>
> Attachments: HDFS-5511.001.patch, HDFS-5511.002.patch
>
>
> The CacheManipulator interface has been helpful in allowing us to stub out 
> {{mlock}} in cases where we don't want to test it.  We should move  the 
> {{getMemlockLimit}} and {{getOperatingSystemPageSize}} functions into this 
> interface as well so that we don't have to skip these tests on machines where 
> these methods would ordinarily not work for us.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5513) CacheAdmin commands fail when using . as the path

2013-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827182#comment-13827182
 ] 

Hudson commented on HDFS-5513:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4763 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4763/])
HDFS-5513. CacheAdmin commands fail when using . as the path. Contributed by 
Andrew Wang. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543670)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/PathBasedCacheDirective.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestPathBasedCacheRequests.java


> CacheAdmin commands fail when using . as the path
> -
>
> Key: HDFS-5513
> URL: https://issues.apache.org/jira/browse/HDFS-5513
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, tools
>Affects Versions: 3.0.0
>Reporter: Stephen Chu
>Assignee: Andrew Wang
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch, hdfs-5513-3.patch
>
>
> The hdfs CLI commands generally accept "." as a path argument.
> e.g.
> {code}
> hdfs dfs -rm .
> hdfs dfsadmin -allowSnapshot .
> {code}
> I don't think it's very common to use the path "." but the CacheAdmin 
> commands will fail saying that it cannot create a Path from an empty string.
> {code}
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path .
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5513) CacheAdmin commands fail when using . as the path

2013-11-19 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5513:
--

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks again Colin for reviews.

> CacheAdmin commands fail when using . as the path
> -
>
> Key: HDFS-5513
> URL: https://issues.apache.org/jira/browse/HDFS-5513
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, tools
>Affects Versions: 3.0.0
>Reporter: Stephen Chu
>Assignee: Andrew Wang
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch, hdfs-5513-3.patch
>
>
> The hdfs CLI commands generally accept "." as a path argument.
> e.g.
> {code}
> hdfs dfs -rm .
> hdfs dfsadmin -allowSnapshot .
> {code}
> I don't think it's very common to use the path "." but the CacheAdmin 
> commands will fail saying that it cannot create a Path from an empty string.
> {code}
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path .
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5513) CacheAdmin commands fail when using . as the path

2013-11-19 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827176#comment-13827176
 ] 

Andrew Wang commented on HDFS-5513:
---

With Jenkins clean, will commit this shortly based on Colin's earlier +1. 
Thanks again for the review.

> CacheAdmin commands fail when using . as the path
> -
>
> Key: HDFS-5513
> URL: https://issues.apache.org/jira/browse/HDFS-5513
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, tools
>Affects Versions: 3.0.0
>Reporter: Stephen Chu
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch, hdfs-5513-3.patch
>
>
> The hdfs CLI commands generally accept "." as a path argument.
> e.g.
> {code}
> hdfs dfs -rm .
> hdfs dfsadmin -allowSnapshot .
> {code}
> I don't think it's very common to use the path "." but the CacheAdmin 
> commands will fail saying that it cannot create a Path from an empty string.
> {code}
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path .
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5513) CacheAdmin commands fail when using . as the path

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827168#comment-13827168
 ] 

Hadoop QA commented on HDFS-5513:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614695/hdfs-5513-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5492//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5492//console

This message is automatically generated.

> CacheAdmin commands fail when using . as the path
> -
>
> Key: HDFS-5513
> URL: https://issues.apache.org/jira/browse/HDFS-5513
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, tools
>Affects Versions: 3.0.0
>Reporter: Stephen Chu
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch, hdfs-5513-3.patch
>
>
> The hdfs CLI commands generally accept "." as a path argument.
> e.g.
> {code}
> hdfs dfs -rm .
> hdfs dfsadmin -allowSnapshot .
> {code}
> I don't think it's very common to use the path "." but the CacheAdmin 
> commands will fail saying that it cannot create a Path from an empty string.
> {code}
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path .
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832

2013-11-19 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827163#comment-13827163
 ] 

Arpit Agarwal commented on HDFS-5527:
-

Looks like an NN bug. From the [Jenkins 
logs|https://builds.apache.org/job/PreCommit-HDFS-Build/5488//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/].

{code}
2013-11-19 19:09:03,406 INFO  BlockStateChange 
(BlockManager.java:computeReplicationWorkForBlocks(1366)) - BLOCK* ask 
127.0.0.1:59892 to replicate blk_1073741825_1001 to datanode(s) 127.0.0.1:38708 
127.0.0.1:50461
{code}

And then again:
{code}
2013-11-19 19:09:06,407 INFO  BlockStateChange 
(BlockManager.java:computeReplicationWorkForBlocks(1366)) - BLOCK* ask 
127.0.0.1:50461 to replicate blk_1073741825_1001 to datanode(s) 127.0.0.1:38708
{code}

> Fix TestUnderReplicatedBlocks on branch HDFS-2832
> -
>
> Key: HDFS-5527
> URL: https://issues.apache.org/jira/browse/HDFS-5527
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: HDFS-5527.patch
>
>
> The failure seems like a deadlock, which is show in:
> https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827156#comment-13827156
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5526:
--

I like the simplicity of the patch -- it only changes the rollback code but not 
upgrade.  However, the VERSION file is overwritten but not restored during 
rollback.  I worry if it is possible that the new VERSION file is different 
from the original VERSION file.  For example, the ctime or some ids may have 
been changed in some unexpected way without being noticed.  How can we make 
sure the new and the original VERSION files are the same?

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832

2013-11-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827149#comment-13827149
 ] 

Junping Du commented on HDFS-5527:
--

You mean reproduce the failure? Yes. It failed intermittent so I am 
misunderstanding the patch attached can fix it. I can put failed log here if 
you think it is helpful.

> Fix TestUnderReplicatedBlocks on branch HDFS-2832
> -
>
> Key: HDFS-5527
> URL: https://issues.apache.org/jira/browse/HDFS-5527
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: HDFS-5527.patch
>
>
> The failure seems like a deadlock, which is show in:
> https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5531) Combine the getNsQuota() and getDsQuota() methods in INode

2013-11-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5531:
-

Status: Patch Available  (was: Open)

> Combine the getNsQuota() and getDsQuota() methods in INode
> --
>
> Key: HDFS-5531
> URL: https://issues.apache.org/jira/browse/HDFS-5531
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Attachments: h5531_20131119.patch
>
>
> I suggest to combine these two methods into 
> {code}
> public Quota.Counts getQuotaCounts()
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5531) Combine the getNsQuota() and getDsQuota() methods in INode

2013-11-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5531:
-

Attachment: h5531_20131119.patch

h5531_20131119.patch: 1st patch.

> Combine the getNsQuota() and getDsQuota() methods in INode
> --
>
> Key: HDFS-5531
> URL: https://issues.apache.org/jira/browse/HDFS-5531
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Attachments: h5531_20131119.patch
>
>
> I suggest to combine these two methods into 
> {code}
> public Quota.Counts getQuotaCounts()
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5531) Combine the getNsQuota() and getDsQuota() methods in INode

2013-11-19 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-5531:


 Summary: Combine the getNsQuota() and getDsQuota() methods in INode
 Key: HDFS-5531
 URL: https://issues.apache.org/jira/browse/HDFS-5531
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Attachments: h5531_20131119.patch

I suggest to combine these two methods into 
{code}
public Quota.Counts getQuotaCounts()
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5498) Improve datanode startup time

2013-11-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5498:
-

Attachment: HDFS-5498.with_du_change.patch

Attaching a patch that includes HADOOP-10111. This is not a commit candidate, 
but for reviews and testing.

> Improve datanode startup time
> -
>
> Key: HDFS-5498
> URL: https://issues.apache.org/jira/browse/HDFS-5498
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
> Attachments: HDFS-5498.with_du_change.patch
>
>
> Similarly to HDFS-5027, an improvement  can be made for getVomeMap(). This is 
> the phase in which ReplicaMap.is populated.  But it will be even better if 
> datanode scans only once and do both.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5498) Improve datanode startup time

2013-11-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5498:
-

Assignee: Kihwal Lee
  Status: Patch Available  (was: Open)

> Improve datanode startup time
> -
>
> Key: HDFS-5498
> URL: https://issues.apache.org/jira/browse/HDFS-5498
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-5498.with_du_change.patch
>
>
> Similarly to HDFS-5027, an improvement  can be made for getVomeMap(). This is 
> the phase in which ReplicaMap.is populated.  But it will be even better if 
> datanode scans only once and do both.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5526:
-

Status: Open  (was: Patch Available)

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5473) Consistent naming of user-visible caching classes and methods

2013-11-19 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827088#comment-13827088
 ] 

Andrew Wang commented on HDFS-5473:
---

I'll also note that we discussed adding the cache commands to the public 
HdfsAdmin class for easier accessibility. Would be a nice change to get in here 
too, or in a follow-on.

> Consistent naming of user-visible caching classes and methods
> -
>
> Key: HDFS-5473
> URL: https://issues.apache.org/jira/browse/HDFS-5473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Colin Patrick McCabe
>
> It's kind of warty that (after HDFS-5326 goes in) DistributedFileSystem has 
> {{*CachePool}} methods take a {{CachePoolInfo}} and 
> {{*PathBasedCacheDirective}} methods that thake a 
> {{PathBasedCacheDirective}}. We should consider renaming {{CachePoolInfo}} to 
> {{CachePool}} for consistency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-1386) TestJMXGet fails in jdk7

2013-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827086#comment-13827086
 ] 

Hudson commented on HDFS-1386:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4762 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4762/])
HDFS-1386. TestJMXGet fails in jdk7 (jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543612)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestJMXGet.java


> TestJMXGet fails in jdk7
> 
>
> Key: HDFS-1386
> URL: https://issues.apache.org/jira/browse/HDFS-1386
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode, test
>Affects Versions: 0.22.0
>Reporter: Tanping Wang
>Assignee: Jonathan Eagles
>Priority: Blocker
>  Labels: java7
> Attachments: HDFS-1386.patch, HDFS-1386.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever

2013-11-19 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827089#comment-13827089
 ] 

Vinay commented on HDFS-4516:
-

Thanks Uma and Nicholas

> Client crash after block allocation and NN switch before lease recovery for 
> the same file can cause readers to fail forever
> ---
>
> Key: HDFS-4516
> URL: https://issues.apache.org/jira/browse/HDFS-4516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Vinay
>Priority: Critical
> Attachments: HDFS-4516-Test.patch, HDFS-4516.patch, HDFS-4516.patch, 
> HDFS-4516.patch, HDFS-4516.txt
>
>
> If client crashes just after allocating block( blocks not yet created in DNs) 
> and NN also switched after this, then new Namenode will not know about locs.
> Further details will be in comment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable

2013-11-19 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5014:


Attachment: HDFS-5014-v2.patch

Thanks Uma for finding out the failure reason.

Its strange that cmd is null. Need to check.
Here is the updated patch to check for null for cmd before using it.

> BPOfferService#processCommandFromActor() synchronization on namenode RPC call 
> delays IBR to Active NN, if Stanby NN is unstable
> ---
>
> Key: HDFS-5014
> URL: https://issues.apache.org/jira/browse/HDFS-5014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, 
> HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch
>
>
> In one of our cluster, following has happened which failed HDFS write.
> 1. Standby NN was unstable and continously restarting due to some errors. But 
> Active NN was stable.
> 2. MR Job was writing files.
> 3. At some point SNN went down again while datanode processing the REGISTER 
> command for SNN. 
> 4. Datanodes started retrying to connect to SNN to register at the following 
> code  in BPServiceActor#retrieveNamespaceInfo() which will be called under 
> synchronization.
> {code}  try {
> nsInfo = bpNamenode.versionRequest();
> LOG.debug(this + " received versionRequest response: " + nsInfo);
> break;{code}
> Unfortunately in all datanodes at same point this happened.
> 5. For next 7-8 min standby was down, and no blocks were reported to active 
> NN at this point and writes have failed.
> So culprit is {{BPOfferService#processCommandFromActor()}} is completely 
> synchronized which is not required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HDFS-5473) Consistent naming of user-visible caching classes and methods

2013-11-19 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang reassigned HDFS-5473:
-

Assignee: Colin Patrick McCabe  (was: Andrew Wang)

> Consistent naming of user-visible caching classes and methods
> -
>
> Key: HDFS-5473
> URL: https://issues.apache.org/jira/browse/HDFS-5473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Colin Patrick McCabe
>
> It's kind of warty that (after HDFS-5326 goes in) DistributedFileSystem has 
> {{*CachePool}} methods take a {{CachePoolInfo}} and 
> {{*PathBasedCacheDirective}} methods that thake a 
> {{PathBasedCacheDirective}}. We should consider renaming {{CachePoolInfo}} to 
> {{CachePool}} for consistency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5473) Consistent naming of user-visible caching classes and methods

2013-11-19 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827079#comment-13827079
 ] 

Andrew Wang commented on HDFS-5473:
---

+1 for the proposal from me, thanks Colin. I like the Eclipse refactoring tools 
a lot for this, but sed probably works too.

We're going to have a bunch of javadoc/variable names to update too, but I 
figure we can just do that as best we can and fix as we see it in future JIRAs.

> Consistent naming of user-visible caching classes and methods
> -
>
> Key: HDFS-5473
> URL: https://issues.apache.org/jira/browse/HDFS-5473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>
> It's kind of warty that (after HDFS-5326 goes in) DistributedFileSystem has 
> {{*CachePool}} methods take a {{CachePoolInfo}} and 
> {{*PathBasedCacheDirective}} methods that thake a 
> {{PathBasedCacheDirective}}. We should consider renaming {{CachePoolInfo}} to 
> {{CachePool}} for consistency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-3987) Support webhdfs over HTTPS

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827077#comment-13827077
 ] 

Hadoop QA commented on HDFS-3987:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614672/HDFS-3987.009.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-auth hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5490//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5490//console

This message is automatically generated.

> Support webhdfs over HTTPS
> --
>
> Key: HDFS-3987
> URL: https://issues.apache.org/jira/browse/HDFS-3987
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Haohui Mai
> Fix For: 2.3.0
>
> Attachments: HDFS-3987.000.patch, HDFS-3987.001.patch, 
> HDFS-3987.002.patch, HDFS-3987.003.patch, HDFS-3987.004.patch, 
> HDFS-3987.005.patch, HDFS-3987.006.patch, HDFS-3987.007.patch, 
> HDFS-3987.008.patch, HDFS-3987.009.patch
>
>
> This is a follow up of HDFS-3983.
> We should have a new filesystem client impl/binding for encrypted WebHDFS, 
> i.e. *webhdfss://*
> On the server side, webhdfs and httpfs we should only need to start the 
> service on a secured (HTTPS) endpoint.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827072#comment-13827072
 ] 

Hadoop QA commented on HDFS-5526:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614686/HDFS-5526.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5491//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5491//console

This message is automatically generated.

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5451) add more debugging for cache rescan

2013-11-19 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827070#comment-13827070
 ] 

Andrew Wang commented on HDFS-5451:
---

Thanks Colin, this is going to be really useful for end users. Some review 
comments:

* Can we hide these new PBCD Builder methods from users of DFS? They're 
meaningless for create/modify, ideally we only see them when doing listing. 
Seems well suited as a listing subclass?
* Extra newline at the end of the Builder
* Seems like having {{#clear}} and {{#increment(long)}} methods would better 
suit the new PBCE byte methods.

{code}
List cachedOn =
ocblock.getDatanodes(Type.CACHED);
long cachedByBlock = Math.max(cachedOn.size(), pce.getReplication()) *
blockInfo.getNumBytes();
cachedTotal += cachedByBlock;
{code}
I'm guessing the max here is for when we have e.g. PBCEs with repl 1 and 2 
caching the same block? I feel like this should be a min then, so the repl 1 
PBCE doesn't get charged double. Comment would be nice.

* {{BPOS#blockIdArrayToString}}, could this go in DFSUtil instead? Seems like a 
better place for it. Also, you can use Guava's Joiner for doing this kind of 
task, and Arrays.asList and List.subList could get the max behavior.
* Should this extra logging maybe be at DEBUG? It could be a rather large 
message, even with the 1000 limit.
* CacheAdmin, the row extension for printStatus is kinda ugly. Maybe use an 
ArrayList so we can cleanly append?
* Good catch on right justifying numeric columns. Mind doing the same for the 
ID field?
* Tests need to be rebased on trunk, but there's an above comment on the JIRA 
about verifying uncache/cache races.
* Can we also get a test for the code snippet above, where we have multiple 
things caching the same file with different repls?
* Test verifying stats for a cached directory as files are added to it?

> add more debugging for cache rescan
> ---
>
> Key: HDFS-5451
> URL: https://issues.apache.org/jira/browse/HDFS-5451
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5451.001.patch, HDFS-5451.002.patch
>
>
> It would be nice to have message at DEBUG level that described all the 
> decisions we made for cache entries.  That way we could turn on this 
> debugging to get more information.  We should also store the number of bytes 
> each PBCE wanted, and the number of bytes it got, plus the number of inodes 
> it got, and output those in {{listDirectives}}.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827055#comment-13827055
 ] 

Kihwal Lee commented on HDFS-5526:
--

bq. Would storageID and cTime be preserved?
I think so. The slight difficulty is at loading current/VERSION without blowing 
up. After reading in, it needs to override a couple of fields and call 
writeProperties().

bq. BTW, do you know why cTime=0 in my test case above?
DataStorage's cTime is set to 0 when the node is formatted, but that of 
BlockPoolSliceStorage is supposed to be set to the one from nsInfo. So my guess 
is, when NNStorage is formatted cTime is 0.  NNStorage.newNamespaceInfo() is 
setting it to 0 and this must be used for formatting.

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-1386) TestJMXGet fails in jdk7

2013-11-19 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827053#comment-13827053
 ] 

Jonathan Eagles commented on HDFS-1386:
---

 HDFS-5530 and YARN-1426 were filed. Thanks again, Kihwal.

> TestJMXGet fails in jdk7
> 
>
> Key: HDFS-1386
> URL: https://issues.apache.org/jira/browse/HDFS-1386
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode, test
>Affects Versions: 0.22.0
>Reporter: Tanping Wang
>Assignee: Jonathan Eagles
>Priority: Blocker
>  Labels: java7
> Attachments: HDFS-1386.patch, HDFS-1386.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5530) HDFS Components are unable to unregister from DefaultMetricsSystem

2013-11-19 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created HDFS-5530:
-

 Summary: HDFS Components are unable to unregister from 
DefaultMetricsSystem
 Key: HDFS-5530
 URL: https://issues.apache.org/jira/browse/HDFS-5530
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.3.0
Reporter: Jonathan Eagles






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-1386) TestJMXGet fails in jdk7

2013-11-19 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-1386:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks Kihwal for the review. I will file the two JIRAs and post them here.

> TestJMXGet fails in jdk7
> 
>
> Key: HDFS-1386
> URL: https://issues.apache.org/jira/browse/HDFS-1386
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode, test
>Affects Versions: 0.22.0
>Reporter: Tanping Wang
>Assignee: Jonathan Eagles
>Priority: Blocker
>  Labels: java7
> Attachments: HDFS-1386.patch, HDFS-1386.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-1386) TestJMXGet fails in jdk7

2013-11-19 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-1386:
--

Target Version/s: 3.0.0, 2.3.0  (was: 3.0.0, 2.3.0, 0.23.10)

> TestJMXGet fails in jdk7
> 
>
> Key: HDFS-1386
> URL: https://issues.apache.org/jira/browse/HDFS-1386
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode, test
>Affects Versions: 0.22.0
>Reporter: Tanping Wang
>Assignee: Jonathan Eagles
>Priority: Blocker
>  Labels: java7
> Attachments: HDFS-1386.patch, HDFS-1386.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5525) Inline dust templates

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827007#comment-13827007
 ] 

Hadoop QA commented on HDFS-5525:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614569/HDFS-5525.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5489//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5489//console

This message is automatically generated.

> Inline dust templates
> -
>
> Key: HDFS-5525
> URL: https://issues.apache.org/jira/browse/HDFS-5525
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5525.000.patch, HDFS-5525.000.patch, screenshot.png
>
>
> Currently the dust templates are stored as separate files on the server side. 
> The web UI has to make separate HTTP requests to load the templates, which 
> increases the network overheads and page load latency.
> This jira proposes to inline all dust templates with the main HTML file, so 
> that the page can be loaded faster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5194) Robust support for alternate FsDatasetSpi implementations

2013-11-19 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826996#comment-13826996
 ] 

Eli Collins commented on HDFS-5194:
---

Notes from the call:
- Attendees: Dave Powell, Eric Sirianni, Andrew Wang, Eli Collins
- Scope here is non-file storage for DataNode storage, specifically a subset of 
DataNode storage for storing HDFS blocks given that parts of the data directory 
(eg MD) are managed via DataStorage which is not covered here. We could make 
DataStorage pluggable in the future as well indepdent of this, would probably 
require shuffling functionality that plugins would want to share outside 
DataStorage. 
- FsDatasetSpi is currently private, we need to come up with an API (for the 
Spi and the classes it returns) that could be declared stable so that users 
would not have to maintain different plugins for subsequent 2.x releases.
- Would help to have a dummy plugin to help articulate what interfaces are 
public and catch API and semantic breakages. Also a potential place for plugin 
authors to share code. Maintaining a functional dummy plugin is expensive so 
might make more sense to start with something that's compile only.
- Currently there is functionality in the FsDataset implementations that could 
be shared across plugins that could be moved outside and would decrease the 
effort required to plug out FsDataset and make it easier to maintain semantic 
compatibility.
- Pluggability is currently DataNode wide, it might make sense to have the 
ability to specify the plugin on a per-volume basis for example due to wanting 
different plugins for different types of storage (HDFS-2832).
- Should look into replacing standard java IO classes with Hadoop specific 
classes in the relevant FsDataSet APIs since they have baked in assumptions 
around file-based storage and interface baggage
- Next step is to breakdown the HDFS-5194 proposal into sub-tasks and hash out 
each patch individually. Perhaps create a feature branch if there are 
sufficiently many patches that need to stay out of trunk.


> Robust support for alternate FsDatasetSpi implementations
> -
>
> Key: HDFS-5194
> URL: https://issues.apache.org/jira/browse/HDFS-5194
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs-client
>Reporter: David Powell
>Priority: Minor
> Attachments: HDFS-5194.design.09112013.pdf, HDFS-5194.patch.09112013
>
>
> The existing FsDatasetSpi interface is well-positioned to permit extending 
> Hadoop to run natively on non-traditional storage architectures.  Before this 
> can be done, however, a number of gaps need to be addressed.  This JIRA 
> documents those gaps, suggests some solutions, and puts forth a sample 
> implementation of some of the key changes needed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5513) CacheAdmin commands fail when using . as the path

2013-11-19 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5513:
--

Attachment: hdfs-5513-3.patch

Thanks Colin. I combined the test into one of the other ones in the file. Let's 
see what Jenkins thinks.

> CacheAdmin commands fail when using . as the path
> -
>
> Key: HDFS-5513
> URL: https://issues.apache.org/jira/browse/HDFS-5513
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, tools
>Affects Versions: 3.0.0
>Reporter: Stephen Chu
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch, hdfs-5513-3.patch
>
>
> The hdfs CLI commands generally accept "." as a path argument.
> e.g.
> {code}
> hdfs dfs -rm .
> hdfs dfsadmin -allowSnapshot .
> {code}
> I don't think it's very common to use the path "." but the CacheAdmin 
> commands will fail saying that it cannot create a Path from an empty string.
> {code}
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path .
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826977#comment-13826977
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5526:
--

Would storageID and cTime be preserved?  BTW, do you know why cTime=0 in my 
test case above?

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5526:
-

Assignee: Kihwal Lee
  Status: Patch Available  (was: Open)

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5526:
-

Attachment: HDFS-5526.patch

Simply doing what I suggested breaks TestDFSRollback.  The existence of 
previous directory needs to be checked first.  I am attaching a candidate patch.

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Attachments: HDFS-5526.patch
>
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5511) improve CacheManipulator interface to allow better unit testing

2013-11-19 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826903#comment-13826903
 ] 

Colin Patrick McCabe commented on HDFS-5511:


Findbugs warnings are HADOOP-10116, not related to this change.  Will commit 
shortly based on Andrew's +1.

> improve CacheManipulator interface to allow better unit testing
> ---
>
> Key: HDFS-5511
> URL: https://issues.apache.org/jira/browse/HDFS-5511
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5511.001.patch, HDFS-5511.002.patch
>
>
> The CacheManipulator interface has been helpful in allowing us to stub out 
> {{mlock}} in cases where we don't want to test it.  We should move  the 
> {{getMemlockLimit}} and {{getOperatingSystemPageSize}} functions into this 
> interface as well so that we don't have to skip these tests on machines where 
> these methods would ordinarily not work for us.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever

2013-11-19 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826874#comment-13826874
 ] 

Uma Maheswara Rao G commented on HDFS-4516:
---

Thanks Nicholas for the review! I will commit it shortly.

> Client crash after block allocation and NN switch before lease recovery for 
> the same file can cause readers to fail forever
> ---
>
> Key: HDFS-4516
> URL: https://issues.apache.org/jira/browse/HDFS-4516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Vinay
>Priority: Critical
> Attachments: HDFS-4516-Test.patch, HDFS-4516.patch, HDFS-4516.patch, 
> HDFS-4516.patch, HDFS-4516.txt
>
>
> If client crashes just after allocating block( blocks not yet created in DNs) 
> and NN also switched after this, then new Namenode will not know about locs.
> Further details will be in comment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826875#comment-13826875
 ] 

Hadoop QA commented on HDFS-2832:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614660/h2832_20131119.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 45 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode
  
org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5488//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5488//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5488//console

This message is automatically generated.

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, 
> editsStored, h2832_20131023.patch, h2832_20131023b.patch, 
> h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, 
> h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, 
> h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, 
> h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, 
> h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, 
> h2832_20131118.patch, h2832_20131119.patch
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever

2013-11-19 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-4516:
--

Target Version/s: 3.0.0, 2.3.0, 2.2.1  (was: 3.0.0, 2.1.0-beta)

> Client crash after block allocation and NN switch before lease recovery for 
> the same file can cause readers to fail forever
> ---
>
> Key: HDFS-4516
> URL: https://issues.apache.org/jira/browse/HDFS-4516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Vinay
>Priority: Critical
> Attachments: HDFS-4516-Test.patch, HDFS-4516.patch, HDFS-4516.patch, 
> HDFS-4516.patch, HDFS-4516.txt
>
>
> If client crashes just after allocating block( blocks not yet created in DNs) 
> and NN also switched after this, then new Namenode will not know about locs.
> Further details will be in comment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5474) Deletesnapshot can make Namenode in safemode on NN restarts.

2013-11-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5474:


Priority: Blocker  (was: Major)
Target Version/s: 2.2.1

> Deletesnapshot can make Namenode in safemode on NN restarts.
> 
>
> Key: HDFS-5474
> URL: https://issues.apache.org/jira/browse/HDFS-5474
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Uma Maheswara Rao G
>Assignee: sathish
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5474-001.patch, HDFS-5474-002.patch
>
>
> When we deletesnapshot, we are deleting the blocks associated to that 
> snapshot and after that we do logsync to editlog about deleteSnapshot.
> There can be a chance that blocks removed from blocks map but before log sync 
> if there is BR ,  NN may finds that block does not exist in blocks map and 
> may invalidate that block. As part HB, invalidation info also can go. After 
> this steps if Namenode shutdown before actually do logsync,  On restart it 
> will still consider that snapshot Inodes and expect blocks to report from DN.
> Simple solution is, we should simply move down that blocks removal after 
> logsync only. Similar to delete op.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.

2013-11-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5504:


Priority: Blocker  (was: Major)
Target Version/s: 2.2.1

> In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, 
> leads to NN safemode.
> 
>
> Key: HDFS-5504
> URL: https://issues.apache.org/jira/browse/HDFS-5504
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Vinay
>Assignee: Vinay
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5504.patch, HDFS-5504.patch
>
>
> 1. HA installation, standby NN is down.
> 2. delete snapshot is called and it has deleted the blocks from blocksmap and 
> all datanodes. log sync also happened.
> 3. before next log roll NN crashed
> 4. When the namenode restartes then it will fsimage and finalized edits from 
> shared storage and set the safemode threshold. which includes blocks from 
> deleted snapshot also. (because this edits is not yet read as namenode is 
> restarted before the last edits segment is not finalized)
> 5. When it becomes active, it finalizes the edits and read the delete 
> snapshot edits_op. but at this time, it was not reducing the safemode count. 
> and it will continuing in safemode.
> 6. On next restart, as the edits is already finalized, on startup only it 
> will read and set the safemode threshold correctly.
> But one more restart will bring NN out of safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5428:


Priority: Blocker  (was: Major)
Target Version/s: 2.2.1

> under construction files deletion after snapshot+checkpoint+nn restart leads 
> nn safemode
> 
>
> Key: HDFS-5428
> URL: https://issues.apache.org/jira/browse/HDFS-5428
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Vinay
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
> HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, 
> HDFS-5428.004.patch, HDFS-5428.patch
>
>
> 1. allow snapshots under dir /foo
> 2. create a file /foo/test/bar and start writing to it
> 3. create a snapshot s1 under /foo after block is allocated and some data has 
> been written to it
> 4. Delete the directory /foo/test
> 5. wait till checkpoint or do saveNameSpace
> 6. restart NN.
> NN enters to safemode.
> Analysis:
> Snapshot nodes loaded from fsimage are always complete and all blocks will be 
> in COMPLETE state. 
> So when the Datanode reports RBW blocks those will not be updated in 
> blocksmap.
> Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot

2013-11-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5443:


Priority: Blocker  (was: Major)
Target Version/s: 2.2.1  (was: 3.0.0, 2.3.0)

> Delete 0-sized block when deleting an under-construction file that is 
> included in snapshot
> --
>
> Key: HDFS-5443
> URL: https://issues.apache.org/jira/browse/HDFS-5443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: 5443-test.patch, HDFS-5443.000.patch
>
>
> Namenode can stuck in safemode on restart if it crashes just after addblock 
> logsync and after taking snapshot for such file. This issue is reported by 
> Prakash and Sathish.
> On looking into the issue following things are happening.
> .
> 1) Client added block at NN and just did logsync
>So, NN has block ID persisted.
> 2)Before returning addblock response to client take a snapshot for root or 
> parent directories for that file
> 3) Delete parent directory for that file
> 4) Now crash the NN with out responding success to client for that addBlock 
> call
> Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion

2013-11-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5476:


Priority: Blocker  (was: Major)
Target Version/s: 2.2.1

> Snapshot: clean the blocks/files/directories under a renamed file/directory 
> while deletion
> --
>
> Key: HDFS-5476
> URL: https://issues.apache.org/jira/browse/HDFS-5476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5476.001.patch
>
>
> Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree 
> under the DstReference node for file/directory/snapshot deletion.
> Use case 1:
> # rename under-construction file with 0-sized blocks after snapshot.
> # delete the renamed directory.
> We need to make sure we delete the 0-sized block.
> Use case 2:
> # create snapshot s0 for /
> # create a new file under /foo/bar/
> # rename foo --> foo2
> # create snapshot s1
> # delete bar and foo2
> # delete snapshot s1
> We need to make sure we delete the file under /foo/bar since it is not 
> included in snapshot s0.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5425) Renaming underconstruction file with snapshots can make NN failure on restart

2013-11-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5425:


Priority: Blocker  (was: Major)
Target Version/s: 2.2.1

> Renaming underconstruction file with snapshots can make NN failure on restart
> -
>
> Key: HDFS-5425
> URL: https://issues.apache.org/jira/browse/HDFS-5425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Affects Versions: 3.0.0, 2.2.0
>Reporter: sathish
>Assignee: Jing Zhao
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5425.001.patch, HDFS-5425.patch, HDFS-5425.patch, 
> HDFS-5425.patch
>
>
> I faced this When i am doing some snapshot operations like 
> createSnapshot,renameSnapshot,i restarted my NN,it is shutting down with 
> exception,
> 2013-10-24 21:07:03,040 FATAL 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
> java.lang.IllegalStateException
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:133)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.replace(INodeDirectoryWithSnapshot.java:82)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.access$700(INodeDirectoryWithSnapshot.java:62)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.replaceChild(INodeDirectoryWithSnapshot.java:397)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.access$900(INodeDirectoryWithSnapshot.java:376)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot.replaceChild(INodeDirectoryWithSnapshot.java:598)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedReplaceINodeFile(FSDirectory.java:1548)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.replaceINodeFile(FSDirectory.java:1537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:855)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:751)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:720)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:266)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:784)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:563)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:422)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:472)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:670)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:655)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1245)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1311)
> 2013-10-24 21:07:03,050 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> 2013-10-24 21:07:03,052 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG: 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart

2013-11-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5427:


Target Version/s: 2.2.1

> not able to read deleted files from snapshot directly under snapshottable dir 
> after checkpoint and NN restart
> -
>
> Key: HDFS-5427
> URL: https://issues.apache.org/jira/browse/HDFS-5427
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Vinay
>Assignee: Vinay
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch
>
>
> 1. allow snapshots under dir /foo
> 2. create a file /foo/bar
> 3. create a snapshot s1 under /foo
> 4. delete the file /foo/bar
> 5. wait till checkpoint or do saveNameSpace
> 6. restart NN.
> 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar
> client will get BlockMissingException
> Reason is 
> While loading the deleted file list for a snashottable dir from fsimage, 
> blocks were not updated in blocksmap



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5257) addBlock() retry should return LocatedBlock with locations else client will get AIOBE

2013-11-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5257:


Target Version/s: 2.2.1

> addBlock() retry should return LocatedBlock with locations else client will 
> get AIOBE
> -
>
> Key: HDFS-5257
> URL: https://issues.apache.org/jira/browse/HDFS-5257
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Vinay
>Assignee: Vinay
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch, 
> HDFS-5257.patch
>
>
> {{addBlock()}} call retry should return the LocatedBlock with locations if 
> the block was created in previous call and failover/restart of namenode 
> happened.
> otherwise client will get {{ArrayIndexOutOfBoundsException}} while creating 
> the block and write will fail.
> {noformat}java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1118)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:511){noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable

2013-11-19 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826859#comment-13826859
 ] 

Uma Maheswara Rao G edited comment on HDFS-5014 at 11/19/13 7:42 PM:
-

Seems like there is an issue with the path. 
We should move the cmd null check before DNA_REGISTER if condition.
{noformat}
2013-11-19 14:08:33,394 ERROR datanode.DataNode (BPServiceActor.java:run(719)) 
- Exception in BPOfferService for Block pool 
BP-1297942247-67.195.138.31-1384870112818 (storage id 
DS-234026112-67.195.138.31-43443-1384870113355) service to 
localhost/127.0.0.1:48821
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:507)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:745)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:597)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:717)
at java.lang.Thread.run(Thread.java:662)
{noformat}

This would be the reason for timeouts in Jenkins.


was (Author: umamaheswararao):
Seems like there is an issue with the path. 
We should move the cmd null check before DNA_REGISTER if condition.
{noformat}
2013-11-19 14:08:33,394 ERROR datanode.DataNode (BPServiceActor.java:run(719)) 
- Exception in BPOfferService for Block pool 
BP-1297942247-67.195.138.31-1384870112818 (storage id 
DS-234026112-67.195.138.31-43443-1384870113355) service to 
localhost/127.0.0.1:48821
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:507)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:745)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:597)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:717)
at java.lang.Thread.run(Thread.java:662)
{code}

> BPOfferService#processCommandFromActor() synchronization on namenode RPC call 
> delays IBR to Active NN, if Stanby NN is unstable
> ---
>
> Key: HDFS-5014
> URL: https://issues.apache.org/jira/browse/HDFS-5014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, 
> HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch
>
>
> In one of our cluster, following has happened which failed HDFS write.
> 1. Standby NN was unstable and continously restarting due to some errors. But 
> Active NN was stable.
> 2. MR Job was writing files.
> 3. At some point SNN went down again while datanode processing the REGISTER 
> command for SNN. 
> 4. Datanodes started retrying to connect to SNN to register at the following 
> code  in BPServiceActor#retrieveNamespaceInfo() which will be called under 
> synchronization.
> {code}  try {
> nsInfo = bpNamenode.versionRequest();
> LOG.debug(this + " received versionRequest response: " + nsInfo);
> break;{code}
> Unfortunately in all datanodes at same point this happened.
> 5. For next 7-8 min standby was down, and no blocks were reported to active 
> NN at this point and writes have failed.
> So culprit is {{BPOfferService#processCommandFromActor()}} is completely 
> synchronized which is not required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5257) addBlock() retry should return LocatedBlock with locations else client will get AIOBE

2013-11-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5257:


Priority: Blocker  (was: Critical)

> addBlock() retry should return LocatedBlock with locations else client will 
> get AIOBE
> -
>
> Key: HDFS-5257
> URL: https://issues.apache.org/jira/browse/HDFS-5257
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Vinay
>Assignee: Vinay
>Priority: Blocker
> Fix For: 2.3.0
>
> Attachments: HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch, 
> HDFS-5257.patch
>
>
> {{addBlock()}} call retry should return the LocatedBlock with locations if 
> the block was created in previous call and failover/restart of namenode 
> happened.
> otherwise client will get {{ArrayIndexOutOfBoundsException}} while creating 
> the block and write will fail.
> {noformat}java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1118)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:511){noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable

2013-11-19 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826859#comment-13826859
 ] 

Uma Maheswara Rao G commented on HDFS-5014:
---

Seems like there is an issue with the path. 
We should move the cmd null check before DNA_REGISTER if condition.
{noformat}
2013-11-19 14:08:33,394 ERROR datanode.DataNode (BPServiceActor.java:run(719)) 
- Exception in BPOfferService for Block pool 
BP-1297942247-67.195.138.31-1384870112818 (storage id 
DS-234026112-67.195.138.31-43443-1384870113355) service to 
localhost/127.0.0.1:48821
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:507)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:745)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:597)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:717)
at java.lang.Thread.run(Thread.java:662)
{code}

> BPOfferService#processCommandFromActor() synchronization on namenode RPC call 
> delays IBR to Active NN, if Stanby NN is unstable
> ---
>
> Key: HDFS-5014
> URL: https://issues.apache.org/jira/browse/HDFS-5014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, 
> HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch
>
>
> In one of our cluster, following has happened which failed HDFS write.
> 1. Standby NN was unstable and continously restarting due to some errors. But 
> Active NN was stable.
> 2. MR Job was writing files.
> 3. At some point SNN went down again while datanode processing the REGISTER 
> command for SNN. 
> 4. Datanodes started retrying to connect to SNN to register at the following 
> code  in BPServiceActor#retrieveNamespaceInfo() which will be called under 
> synchronization.
> {code}  try {
> nsInfo = bpNamenode.versionRequest();
> LOG.debug(this + " received versionRequest response: " + nsInfo);
> break;{code}
> Unfortunately in all datanodes at same point this happened.
> 5. For next 7-8 min standby was down, and no blocks were reported to active 
> NN at this point and writes have failed.
> So culprit is {{BPOfferService#processCommandFromActor()}} is completely 
> synchronized which is not required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5511) improve CacheManipulator interface to allow better unit testing

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826834#comment-13826834
 ] 

Hadoop QA commented on HDFS-5511:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614645/HDFS-5511.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5487//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5487//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5487//console

This message is automatically generated.

> improve CacheManipulator interface to allow better unit testing
> ---
>
> Key: HDFS-5511
> URL: https://issues.apache.org/jira/browse/HDFS-5511
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5511.001.patch, HDFS-5511.002.patch
>
>
> The CacheManipulator interface has been helpful in allowing us to stub out 
> {{mlock}} in cases where we don't want to test it.  We should move  the 
> {{getMemlockLimit}} and {{getOperatingSystemPageSize}} functions into this 
> interface as well so that we don't have to skip these tests on machines where 
> these methods would ordinarily not work for us.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-1386) TestJMXGet fails in jdk7

2013-11-19 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826830#comment-13826830
 ] 

Kihwal Lee commented on HDFS-1386:
--

bq.  If you are ok with this I can file separate JIRAs, one for YARN and one 
for Default Metrics System unregistration.

Sounds reasonable. +1 for the patch.  Please do check whether there is already 
a jira for the test failure.

> TestJMXGet fails in jdk7
> 
>
> Key: HDFS-1386
> URL: https://issues.apache.org/jira/browse/HDFS-1386
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode, test
>Affects Versions: 0.22.0
>Reporter: Tanping Wang
>Assignee: Jonathan Eagles
>Priority: Blocker
>  Labels: java7
> Attachments: HDFS-1386.patch, HDFS-1386.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever

2013-11-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4516:
-

Target Version/s: 2.1.0-beta, 3.0.0  (was: 3.0.0, 2.1.0-beta)
Hadoop Flags: Reviewed

+1 patch looks good.

> Client crash after block allocation and NN switch before lease recovery for 
> the same file can cause readers to fail forever
> ---
>
> Key: HDFS-4516
> URL: https://issues.apache.org/jira/browse/HDFS-4516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Vinay
>Priority: Critical
> Attachments: HDFS-4516-Test.patch, HDFS-4516.patch, HDFS-4516.patch, 
> HDFS-4516.patch, HDFS-4516.txt
>
>
> If client crashes just after allocating block( blocks not yet created in DNs) 
> and NN also switched after this, then new Namenode will not know about locs.
> Further details will be in comment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826825#comment-13826825
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5526:
--

Kihwal, you are right that there is no backup copy of ./current/VERSION and so 
it cannot be rolled back.

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Priority: Blocker
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826821#comment-13826821
 ] 

Kihwal Lee commented on HDFS-5526:
--

What about adding the following in the beginning of 
{{DataStorage.doRollback()}}? This is similar to what is in 
{{DataStorage.doUpgrade()}}. Since the VERSION file is not read yet in rollback 
(if it does it blows up as you reported), {{this.layoutVersion}} will be 0. So 
instead, it is checking the software layout version and see whether it is using 
the same layout version as the name node.

{code}
if (LayoutVersion.supports(Feature.FEDERATION, HdfsConstants.LAYOUT_VERSION)
&& HdfsConstants.LAYOUT_VERSION == nsInfo.getLayoutVersion()) {
  clusterID = nsInfo.getClusterID();
  layoutVersion = nsInfo.getLayoutVersion();
  writeProperties(sd);
  return;
}
{code}

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Priority: Blocker
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-3987) Support webhdfs over HTTPS

2013-11-19 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826814#comment-13826814
 ] 

Haohui Mai commented on HDFS-3987:
--

bq. SWebHdfsFileSystem.java, why do we need a new token type? What that adds?


On the client side, each FileSystem has to have a unique token kind so that 
Token.cancel() / Token.renew() can be redirected to the appropriate code paths. 
Hftp / hsftp / webhdfs follow the same pattern, so as swebhdfs. Without the new 
token kind swebhdfs will try to cancel / renew the token via the code path of 
webhdfs (i.e., canceling / renewing token via http, but not https). The same 
issues happens in hsftp and it is fixed in HDFS-5502.

bq. WebHdfsFileSystem.java, it seems things are hardcoded to use a specific 
token kind, it should use the token kind send by the server.

I didn't fully understand the question. WebHdfsFileSystem has the same logic 
(w.r.t token handling) of the old code. The server specifies the token kind, 
and the client redirects to the appropriate code path for handling the token 
using reflections.

bq.  My concern here is HttpFS tokens. Have you verified HttpFS works with 
swebhdfs?

I think that HttpFS is out of the scope of this jira, maybe we can address them 
in a separate jira if these issues arise.

> Support webhdfs over HTTPS
> --
>
> Key: HDFS-3987
> URL: https://issues.apache.org/jira/browse/HDFS-3987
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Haohui Mai
> Fix For: 2.3.0
>
> Attachments: HDFS-3987.000.patch, HDFS-3987.001.patch, 
> HDFS-3987.002.patch, HDFS-3987.003.patch, HDFS-3987.004.patch, 
> HDFS-3987.005.patch, HDFS-3987.006.patch, HDFS-3987.007.patch, 
> HDFS-3987.008.patch, HDFS-3987.009.patch
>
>
> This is a follow up of HDFS-3983.
> We should have a new filesystem client impl/binding for encrypted WebHDFS, 
> i.e. *webhdfss://*
> On the server side, webhdfs and httpfs we should only need to start the 
> service on a secured (HTTPS) endpoint.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5523) Support subdirectory mount and multiple exports in HDFS-NFS gateway

2013-11-19 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826794#comment-13826794
 ] 

Brandon Li commented on HDFS-5523:
--

[~aw]: Originally, NFS gateway had "/" as the only export for the NFS client to 
mount. This is obviously a big limitation especially when the user doesn't want 
to let NFS clients see the whole namespace structure for security or management 
concerns.

HDFS-5469 added the property to make the single export configurable so the user 
can decide which subtree(export) of the namespace to share. However, users 
still have only one export, and don't have the flexibility to share multiple 
subtrees and give each export different access control (e.g., which hosts can 
mount which export with rw or ro access).  

This JIRA is to enable the NFS gateway to share multiple subtrees each with 
different access control like that in a traditional NFS server. 

> Support subdirectory mount and multiple exports in HDFS-NFS gateway 
> 
>
> Key: HDFS-5523
> URL: https://issues.apache.org/jira/browse/HDFS-5523
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: nfs
>Reporter: Brandon Li
>
> Supporting multiple exports and subdirectory mount usually can make data and 
> security management easier for the HDFS-NFS client. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-3987) Support webhdfs over HTTPS

2013-11-19 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-3987:
-

Attachment: HDFS-3987.009.patch

> Support webhdfs over HTTPS
> --
>
> Key: HDFS-3987
> URL: https://issues.apache.org/jira/browse/HDFS-3987
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Haohui Mai
> Fix For: 2.3.0
>
> Attachments: HDFS-3987.000.patch, HDFS-3987.001.patch, 
> HDFS-3987.002.patch, HDFS-3987.003.patch, HDFS-3987.004.patch, 
> HDFS-3987.005.patch, HDFS-3987.006.patch, HDFS-3987.007.patch, 
> HDFS-3987.008.patch, HDFS-3987.009.patch
>
>
> This is a follow up of HDFS-3983.
> We should have a new filesystem client impl/binding for encrypted WebHDFS, 
> i.e. *webhdfss://*
> On the server side, webhdfs and httpfs we should only need to start the 
> service on a secured (HTTPS) endpoint.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5527) Fix TestUnderReplicatedBlocks on branch HDFS-2832

2013-11-19 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826741#comment-13826741
 ] 

Arpit Agarwal commented on HDFS-5527:
-

Junping, were you able to duplicate the failure?

> Fix TestUnderReplicatedBlocks on branch HDFS-2832
> -
>
> Key: HDFS-5527
> URL: https://issues.apache.org/jira/browse/HDFS-5527
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: HDFS-5527.patch
>
>
> The failure seems like a deadlock, which is show in:
> https://builds.apache.org/job/PreCommit-HDFS-Build/5440//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestUnderReplicatedBlocks/testSetrepIncWithUnderReplicatedBlocks/



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-11-19 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-2832:


Attachment: h2832_20131119.patch

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, 
> editsStored, h2832_20131023.patch, h2832_20131023b.patch, 
> h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, 
> h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, 
> h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, 
> h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, 
> h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, 
> h2832_20131118.patch, h2832_20131119.patch
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5513) CacheAdmin commands fail when using . as the path

2013-11-19 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1382#comment-1382
 ] 

Colin Patrick McCabe commented on HDFS-5513:


bq. This was caused by the attempt to deep-copy in the PBCD Builder. Paths 
normalize the URI upon creation, so the single . simply gets thrown away. There 
doesn't seem to be a way to deep-copy a Path, but at the same time it doesn't 
look like you can mutate a Path either.

I took another look and you are right.  Although the Path does make its URI 
accessible to the outside world, the URI has no methods that could be used to 
mutate it.

Can we merge {{testSingleDotPath}} into another junit test?  It just seems kind 
of like overkill to set up a whole DFSCluster just to see if a PBCE with "." as 
the path can be added and then removed.  It would be nice to keep test 
execution time down.

+1 once that's addressed

> CacheAdmin commands fail when using . as the path
> -
>
> Key: HDFS-5513
> URL: https://issues.apache.org/jira/browse/HDFS-5513
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, tools
>Affects Versions: 3.0.0
>Reporter: Stephen Chu
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-5513-1.patch, hdfs-5513-2.patch
>
>
> The hdfs CLI commands generally accept "." as a path argument.
> e.g.
> {code}
> hdfs dfs -rm .
> hdfs dfsadmin -allowSnapshot .
> {code}
> I don't think it's very common to use the path "." but the CacheAdmin 
> commands will fail saying that it cannot create a Path from an empty string.
> {code}
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -removeDirectives -path .
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listPathBasedCacheDirectives(DistributedFileSystem.java:1639)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$RemovePathBasedCacheDirectivesCommand.run(CacheAdmin.java:365)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> [schu@hdfs-c5-nfs ~]$ hdfs cacheadmin -addDirective -path . -pool schu
> Exception in thread "main" java.lang.IllegalArgumentException: Can not create 
> a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:184)
>   at 
> org.apache.hadoop.hdfs.protocol.PathBasedCacheDirective$Builder.(PathBasedCacheDirective.java:66)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addPathBasedCacheDirective(DistributedFileSystem.java:1598)
>   at 
> org.apache.hadoop.hdfs.tools.CacheAdmin$AddPathBasedCacheDirectiveCommand.run(CacheAdmin.java:180)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:82)
>   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:87)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826661#comment-13826661
 ] 

Kihwal Lee commented on HDFS-5526:
--

Well, it is being called for {{DataStorage}} when {{recoverTransitionRead()}} 
is called for the first block pool that is being initialized. 

{{DataStorage.doUpgrade()}} will simply write the new version and return. So 
"previous" won't be created. In {{DataStorage.doRollback()}}, the comment says,
"Do nothing, if previous directory does not exist" and that's what it does.  

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Priority: Blocker
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5511) improve CacheManipulator interface to allow better unit testing

2013-11-19 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5511:
---

Attachment: HDFS-5511.002.patch

* add getters and setters for cacheManipulator

* add javaDoc

> improve CacheManipulator interface to allow better unit testing
> ---
>
> Key: HDFS-5511
> URL: https://issues.apache.org/jira/browse/HDFS-5511
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5511.001.patch, HDFS-5511.002.patch
>
>
> The CacheManipulator interface has been helpful in allowing us to stub out 
> {{mlock}} in cases where we don't want to test it.  We should move  the 
> {{getMemlockLimit}} and {{getOperatingSystemPageSize}} functions into this 
> interface as well so that we don't have to skip these tests on machines where 
> these methods would ordinarily not work for us.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5526) Datanode cannot roll back to previous layout version

2013-11-19 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826631#comment-13826631
 ] 

Kihwal Lee commented on HDFS-5526:
--

It looks like {{recoverTransitionRead()}} is done for individual block pool 
storage from {{DataNode.initStorage()}}, but never done for {{DataStorage}} 
itself. That also explains why it's lacking "previous" after upgrade.

> Datanode cannot roll back to previous layout version
> 
>
> Key: HDFS-5526
> URL: https://issues.apache.org/jira/browse/HDFS-5526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Priority: Blocker
>
> Current trunk layout version is -48.
> Hadoop v2.2.0 layout version is -47.
> If a cluster is upgraded from v2.2.0 (-47) to trunk (-48), the datanodes 
> cannot start with -rollback.  It will fail with IncorrectVersionException.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5014) BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable

2013-11-19 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5014:


Attachment: HDFS-5014-v2.patch

Thanks Uma, for the comments.
I have updated the java docs and removed unnecessary DNA_REGISTER command.

By this time jenkins was still running. I will update again if any issues from 
jenkins.

> BPOfferService#processCommandFromActor() synchronization on namenode RPC call 
> delays IBR to Active NN, if Stanby NN is unstable
> ---
>
> Key: HDFS-5014
> URL: https://issues.apache.org/jira/browse/HDFS-5014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-5014-v2.patch, HDFS-5014-v2.patch, 
> HDFS-5014-v2.patch, HDFS-5014-v2.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, HDFS-5014.patch, 
> HDFS-5014.patch
>
>
> In one of our cluster, following has happened which failed HDFS write.
> 1. Standby NN was unstable and continously restarting due to some errors. But 
> Active NN was stable.
> 2. MR Job was writing files.
> 3. At some point SNN went down again while datanode processing the REGISTER 
> command for SNN. 
> 4. Datanodes started retrying to connect to SNN to register at the following 
> code  in BPServiceActor#retrieveNamespaceInfo() which will be called under 
> synchronization.
> {code}  try {
> nsInfo = bpNamenode.versionRequest();
> LOG.debug(this + " received versionRequest response: " + nsInfo);
> break;{code}
> Unfortunately in all datanodes at same point this happened.
> 5. For next 7-8 min standby was down, and no blocks were reported to active 
> NN at this point and writes have failed.
> So culprit is {{BPOfferService#processCommandFromActor()}} is completely 
> synchronized which is not required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever

2013-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826568#comment-13826568
 ] 

Hadoop QA commented on HDFS-4516:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614572/HDFS-4516.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5484//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5484//console

This message is automatically generated.

> Client crash after block allocation and NN switch before lease recovery for 
> the same file can cause readers to fail forever
> ---
>
> Key: HDFS-4516
> URL: https://issues.apache.org/jira/browse/HDFS-4516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Vinay
>Priority: Critical
> Attachments: HDFS-4516-Test.patch, HDFS-4516.patch, HDFS-4516.patch, 
> HDFS-4516.patch, HDFS-4516.txt
>
>
> If client crashes just after allocating block( blocks not yet created in DNs) 
> and NN also switched after this, then new Namenode will not know about locs.
> Further details will be in comment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


  1   2   >