[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2015-07-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646287#comment-14646287
 ] 

Hudson commented on HDFS-6482:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2217 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2217/])
HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei 
Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2015-07-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646257#comment-14646257
 ] 

Hudson commented on HDFS-6482:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #268 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/268/])
HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei 
Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2015-07-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645568#comment-14645568
 ] 

Hudson commented on HDFS-6482:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8236 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8236/])
HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei 
Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2015-07-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645941#comment-14645941
 ] 

Hudson commented on HDFS-6482:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1001 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1001/])
HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei 
Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2015-07-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645927#comment-14645927
 ] 

Hudson commented on HDFS-6482:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #271 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/271/])
HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei 
Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2015-07-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646134#comment-14646134
 ] 

Hudson commented on HDFS-6482:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #260 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/260/])
HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei 
Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2015-07-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646093#comment-14646093
 ] 

Hudson commented on HDFS-6482:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2198 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2198/])
HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei 
Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156320#comment-14156320
 ] 

Hudson commented on HDFS-6482:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #698 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/698/])
HDFS-6482. Fix CHANGES.txt in trunk (arp: rev 
be30c86cc9f71894dc649ed22983e5c42e9b6951)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156398#comment-14156398
 ] 

Hudson commented on HDFS-6482:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/])
HDFS-6482. Fix CHANGES.txt in trunk (arp: rev 
be30c86cc9f71894dc649ed22983e5c42e9b6951)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156514#comment-14156514
 ] 

Hudson commented on HDFS-6482:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
HDFS-6482. Fix CHANGES.txt in trunk (arp: rev 
be30c86cc9f71894dc649ed22983e5c42e9b6951)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-10-02 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157106#comment-14157106
 ] 

Harsh J commented on HDFS-6482:
---

Updated the FAQ that was covering DN block moves to reflect the new 
maintain-subdir requirement: 
https://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-10-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157521#comment-14157521
 ] 

Colin Patrick McCabe commented on HDFS-6482:


Thanks, [~qwertymaniac].

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155014#comment-14155014
 ] 

Hudson commented on HDFS-6482:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6163 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6163/])
HDFS-6482. Fix CHANGES.txt in trunk (arp: rev 
be30c86cc9f71894dc649ed22983e5c42e9b6951)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128495#comment-14128495
 ] 

Hudson commented on HDFS-6482:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1867 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1867/])
HDFS-6482. Fix CHANGES.txt in trunk. (arp: rev 
0de563a18e9e09207e3ef5f1cad1d2e788af9503)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128916#comment-14128916
 ] 

Hudson commented on HDFS-6482:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1892 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1892/])
HDFS-6482. Fix CHANGES.txt in trunk. (arp: rev 
0de563a18e9e09207e3ef5f1cad1d2e788af9503)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-09-03 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120886#comment-14120886
 ] 

Arpit Agarwal commented on HDFS-6482:
-

I added a patch to HDFS-6981. Please review it.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-09-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118513#comment-14118513
 ] 

Colin Patrick McCabe commented on HDFS-6482:


Yeah, it would be great to have this in 2.6.  Is HDFS-6981 blocking merging 
this to 2.6?

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-09-02 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118554#comment-14118554
 ] 

Arpit Agarwal commented on HDFS-6482:
-

That is the known issue, yes.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-29 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115703#comment-14115703
 ] 

Arpit Agarwal commented on HDFS-6482:
-

Now that HDFS-6800 is in trunk to support DN layout changes with rolling 
upgrade I'd like to include this improvement in 2.6.

I plan to test it with rolling upgrades over the next couple of weeks.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-06 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088044#comment-14088044
 ] 

Arpit Agarwal commented on HDFS-6482:
-

Hi [~james.thomas], is the branch-2 merge blocked by HDFS-6800?

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-06 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088059#comment-14088059
 ] 

James Thomas commented on HDFS-6482:


[~arpitagarwal], yep, we're waiting on discussion there.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-06 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088108#comment-14088108
 ] 

Arpit Agarwal commented on HDFS-6482:
-

It is too late to mention this now but I am concerned about the delta between 
trunk and branch-2 while we wait on HDFS-6800 to get resolved. Will continue 
the discussion on HDFS-6800.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083524#comment-14083524
 ] 

Hudson commented on HDFS-6482:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #631 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/631/])
HDFS-6482. Use block ID-based block layout on datanodes (James Thomas via Colin 
Patrick McCabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615223)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DiskChecker.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSFinalize.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/UpgradeUtilities.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-datanode-dir.txt


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, 

[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083569#comment-14083569
 ] 

Hudson commented on HDFS-6482:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1825 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1825/])
HDFS-6482. Use block ID-based block layout on datanodes (James Thomas via Colin 
Patrick McCabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615223)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DiskChecker.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSFinalize.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/UpgradeUtilities.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-datanode-dir.txt


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, 

[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083589#comment-14083589
 ] 

Hudson commented on HDFS-6482:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1850 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1850/])
HDFS-6482. Use block ID-based block layout on datanodes (James Thomas via Colin 
Patrick McCabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615223)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DiskChecker.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSFinalize.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/UpgradeUtilities.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-datanode-dir.txt


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 

[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-01 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082901#comment-14082901
 ] 

Colin Patrick McCabe commented on HDFS-6482:


Thanks for your hard work on this, James.  Committed to trunk (but not 
branch-2).

Let's continue to discuss the rolling downgrade issues (need for additional 
rolling DN downgrade tests, general DN rolling downgrade strategy, etc.) on 
HDFS-6800.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082945#comment-14082945
 ] 

Hudson commented on HDFS-6482:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5999 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5999/])
HDFS-6482. Use block ID-based block layout on datanodes (James Thomas via Colin 
Patrick McCabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615223)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DiskChecker.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSFinalize.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/UpgradeUtilities.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-datanode-dir.txt


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, 

[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083032#comment-14083032
 ] 

Chris Nauroth commented on HDFS-6482:
-

Hi, [~james.thomas] and [~cmccabe].  This patch broke compilation on Windows.  
I filed HADOOP-10925 for it, and I expect to have a patch ready shortly.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-01 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083035#comment-14083035
 ] 

James Thomas commented on HDFS-6482:


[~cnauroth], apologies, thanks.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-08-01 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083113#comment-14083113
 ] 

Colin Patrick McCabe commented on HDFS-6482:


thanks, Chris

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 3.0.0

 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-31 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081365#comment-14081365
 ] 

Suresh Srinivas commented on HDFS-6482:
---

[~cmccabe], does this patch change the directory structure and during rollback 
the new directory structure is retained? If so, this is a concern for me and I 
am -0 on this. We should at least make sure we have rollback tests to check 
that things work and there are no undocumented hidden assumption in the older 
software version (to which we are rolling back to) is broken. Given rollback 
could be to any older release from where upgrade to this version is allowed; 
that makes testing and ensuring nothing is broken that much more hard.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-31 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081427#comment-14081427
 ] 

James Thomas commented on HDFS-6482:


[~sureshms] No, during rollback the directory structure in the previous 
directory on the DN is restored, and this is the old directory structure. I 
have run some tests on my computer that show that rollback works. It is not 
really possible to write a rollback test that checks this case because we 
cannot run an older version of the DN code in the test.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-31 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081603#comment-14081603
 ] 

James Thomas commented on HDFS-6482:


[~sureshms] Is the documentation the Rollback section at 
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
 correct? You are supposed to restart the DNs normally, without flags like 
-rollback or -rollingupgrade rollback? If you restart the DNs with 
-rollback, everything should work normally and the previous directory should 
be restored with the old layout. [~arpitagarwal], any thoughts on this?

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-31 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081607#comment-14081607
 ] 

James Thomas commented on HDFS-6482:


Or [~kihwal]?

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081618#comment-14081618
 ] 

Colin Patrick McCabe commented on HDFS-6482:


Why don't we merge this to trunk and then open another JIRA to iron out any 
issues with rolling upgrades between different DN layout versions.  At minimum, 
we should decide whether we support rolling DN upgrades between different 
layout versions, and if we don't support it, give a clear failure message to 
admins.  But this patch is big enough that I don't think cramming all that into 
here is a good idea.  There also seem to be some issues with rolling DN 
downgrade now (for example, HDFS-6005 removed {{datanode \-rollingupgrade 
\-rollback}}, but not the usage text for it displayed in {{\-help}}.)

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081739#comment-14081739
 ] 

Colin Patrick McCabe commented on HDFS-6482:


Hey guys, I filed HDFS-6800 to have the rolling upgrade discussion.  I'm going 
to commit this to trunk (but *not* to any other branches) in a bit if nobody 
has any objections.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080201#comment-14080201
 ] 

Colin Patrick McCabe commented on HDFS-6482:


+1.  Thanks for your work on this, James.

I'm going to commit this to trunk today if there's no further comments.  We can 
decide about branch-2 later

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-30 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080275#comment-14080275
 ] 

Suresh Srinivas commented on HDFS-6482:
---

[~cmccabe], I have been traveling and not kept up with this. I will try to get 
back by tomorrow. If not, please go ahead and commit by tomorrow evening.

Any comments I may have can be addressed before merging to branch-2.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080383#comment-14080383
 ] 

Colin Patrick McCabe commented on HDFS-6482:


ok, I will wait for tomorrow evening.  thanks for looking at this, suresh.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-21 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069541#comment-14069541
 ] 

James Thomas commented on HDFS-6482:


In response to [~cmccabe]'s comments:

One thread per storage directory doesn't make sense here since this is the 
number of threads to use for the hard link process for ONE storage directory. 
The hard link processes for the storage directories are currently not run in 
parallel.

We can create a separate JIRA to add the native code to the regular hard link 
path. I've created a separate code path in this change for the upgrade to the 
block ID-based layout, and I want to focus on optimizing that in this JIRA.

[~sureshms], any thoughts?

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069631#comment-14069631
 ] 

Colin Patrick McCabe commented on HDFS-6482:


bq. One thread per storage directory doesn't make sense here since this is the 
number of threads to use for the hard link process for ONE storage directory. 
The hard link processes for the storage directories are currently not run in 
parallel.

Understood.  It seems like we should be parallelizing the upgrade of different 
storage directories, since clearly we'd like to keep all those disks busy if we 
could.  Anyway, this JIRA is big enough as-is, so let's not worry about it 
right now.

James, given that the you've gotten the upgrade times in the single seconds 
now, I am +1 on putting this change in 2.x.  [~sureshms], [~atm], what are your 
thoughts here?

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066942#comment-14066942
 ] 

Hadoop QA commented on HDFS-6482:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656582/HDFS-6482.8.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7390//console

This message is automatically generated.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066976#comment-14066976
 ] 

Hadoop QA commented on HDFS-6482:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656582/HDFS-6482.8.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7391//console

This message is automatically generated.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066988#comment-14066988
 ] 

Colin Patrick McCabe commented on HDFS-6482:


1 second for 100k blocks is pretty good.

bq. Added a configuration parameter for users to specify the number of threads 
to be used in the hard link process.

Perhaps one thread per storage directory would make sense?  I'm not sure if a 
configuration option is useful, if this upgrade is a one time event (and the 
NameNodes that would be upgraded have already been deployed.)

bq. We use these optimizations for the hard link process only when upgrading to 
the block ID-based layout, because otherwise the directory structures of the 
old and new layouts should be the same and we can perform fast batch hard links 
over directories – see HDFS-1445.

Why not always use the native path, if it's faster?  It should be trivial to 
implement the batch symlink API via the native path.  You'd just write a 
for loop in java that made some calls down into the JNI function you already 
wrote.  There is a new symlink API coming up in Java7, so we'll want to stop 
using the shell thing eventually anyway.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066995#comment-14066995
 ] 

Colin Patrick McCabe commented on HDFS-6482:


So, when I try to apply your patch locally, I get git binary diffs are not 
supported.  It looks like different versions of GNU patch have different 
behavior here (presumably support is a new feature?) and we're playing jenkins 
roulette.  I would say put the tar.gz file in a separate attachment for now, so 
we can get a jenkins run.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066997#comment-14066997
 ] 

Colin Patrick McCabe commented on HDFS-6482:


and also patch fails when you get git binary diffs are not supported-- 
hence the message you're seeing in the log output

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067068#comment-14067068
 ] 

Hadoop QA commented on HDFS-6482:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12656597/hadoop-24-datanode-dir.tgz
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7394//console

This message is automatically generated.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067359#comment-14067359
 ] 

Hadoop QA commented on HDFS-6482:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656608/HDFS-6482.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.TestSymlinkLocalFSFileContext
  org.apache.hadoop.ipc.TestIPC
  org.apache.hadoop.fs.TestSymlinkLocalFSFileSystem
  
org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
  org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7396//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7396//console

This message is automatically generated.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
 hadoop-24-datanode-dir.tgz


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-08 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055173#comment-14055173
 ] 

James Thomas commented on HDFS-6482:


[~sureshms] Thanks for the info. I don't understand your last comment -- could 
you explain further? Also, I don't think it makes sense to support both the 
LDir structure and this structure simultaneously. We would need to continue to 
maintain information in the ReplicaMap about where each block was located 
(since we wouldn't know whether it was stored with the old or new scheme), so 
there would be no memory usage savings. I'm not sure we would ever reach a 
point where all blocks stored with the old scheme would be gone and we could 
officially stop using the location field in ReplicaInfo.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055419#comment-14055419
 ] 

Colin Patrick McCabe commented on HDFS-6482:


bq. \[james wrote\]: Also, I don't think it makes sense to support both the 
LDir structure and this structure simultaneously

The main reason to do this change is to save memory and simplify things by not 
having to store the path to each replica.  If we support the old layout, then 
we no longer have this nice property.  We could still get some of the gains by 
setting the path to null in some of the various data structures... basically 
assume that null means this replica is located at a place determined by its 
block id.  And non-null would mean using the old system.  This might be a 
possible solution.  I would prefer not to go down this road due to the greater 
code complexity, though.

bq. \[suresh wrote\]: I think creating hard links with new schema is an issue. 
The main reason for hardlinks created as it is done today is to minimize the 
impact of any bug in new software. The simplest thing was done where we 
iterated over directories and created hardlinks. Rollback must ensure the 
system goes back to previous state of the system.

I don't see why a rollback wouldn't work here.  It's the same as going from the 
old (pre hadoop-2.0) layout to the new block pool-based layout.  We also used 
hardlinks there to provide downgrade capability, and it also worked there.  
We're not changing the contents of the old directory, just moving it out of the 
way and hardlinking to the block and meta files within it.

bq. James Thomas, we did a bunch of improvement to cut down the time from 10s 
of minutes to a couple of minutes. See HDFS-1445 for more details. Clearly 
anything significantly above 60S (design goal of rolling upgrades) will results 
in issues for rolling upgrades.

Yes.  This is a very important consideration.  James and I discussed a few ways 
to optimize the hardlink process.  I think that it's very possible for this to 
be done in a second or two at most.  If you assume 500,000 replicas spread over 
10 drives, you have 50,000 hardlinks to make on each drive.  This just isn't 
going to take that long, since the operations you're doing are just altering 
memory (we don't fsync after calling {{link}}).  It's just a question of doing 
it in a smart way that minimizes the number of {{exec}} calls we make (and 
possibly obtains some parallelism).

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-07 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053895#comment-14053895
 ] 

Suresh Srinivas commented on HDFS-6482:
---

HDFS-5535 added support for rolling upgrade. During that time, given datanode 
layout rarely changes, rollback was not considered for datanodes. Given that 
this jira is changing datanode layout version, impact on rollback should be 
considered before this change can be committed.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-07 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053958#comment-14053958
 ] 

James Thomas commented on HDFS-6482:


I think DN rollback should work fine with this change. The previous directory 
will contain the blocks laid out with the old structure, and on rollback this 
structure will be restored. The relevant code is in DataStorage.java, 
particularly linkBlocks().

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054008#comment-14054008
 ] 

Colin Patrick McCabe commented on HDFS-6482:


The new current directory will contain hardlinks to the block and metadata 
files in the previous directory.  So it seems like rollback should work fine 
in this case.

It would be nice to add a unit test where we upgrade a DataNode from the 
non-blockid-based version to a blockid-based version, and then do a rollback.  
Can you add this, James?  Since you already added 
{{hadoop-24-datanode-dir.tgz}}, it shouldn't be too difficult to add a unit 
test that rolls back to this version from the new version.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-07 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054044#comment-14054044
 ] 

Suresh Srinivas commented on HDFS-6482:
---

bq. The new current directory will contain hardlinks to the block and 
metadata files in the previous directory. So it seems like rollback should 
work fine in this case.
Can a brief design doc be posted to this jira to describe what the new 
directory structure is, what happens during upgrade to release with this 
change? I do not have time to review jira to figure this out. 

My read is, this will not work if during upgrade the previous layout is changed 
to the new layout. During rolling upgrades hardlinks to all the blocks are 
*not* created, only for the ones deleted post rolling upgrade. This is done to 
keep the datanode upgrade time short to support quick restart. If rolling 
upgrades cannot be supported, this code can only go into a major release.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054115#comment-14054115
 ] 

Colin Patrick McCabe commented on HDFS-6482:


bq. My read is, this will not work if during upgrade the previous layout is 
changed to the new layout. During rolling upgrades hardlinks to all the blocks 
are not created, only for the ones deleted post rolling upgrade. This is done 
to keep the datanode upgrade time short to support quick restart. If rolling 
upgrades cannot be supported, this code can only go into a major release.

My understanding of this was that it was an optimization for the cases where 
the datanode layout hadn't changed significantly (which was most upgrades).  It 
should not be interpreted as a hard limitation that prevents us from making 
*any* changes for the datanode layout in the future.

James, it would be good to see some upgrade times for a DN with a few hundred 
thousand blocks.  It seems like this should be manageable, especially if we 
parallelize it a bit.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-07 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054141#comment-14054141
 ] 

Suresh Srinivas commented on HDFS-6482:
---

bq. My understanding of this was that it was an optimization for the cases 
where the datanode layout hadn't changed significantly (which was most 
upgrades).
One of the key requirements of rolling upgrades was to keep datanode upgrade 
time as short as possible. Second, current rolling upgrades does not take 
hardlinks as I mentioned already. Hence if the assumption is hardlinks will be 
made, that needs to be factored in.

bq. It should not be interpreted as a hard limitation that prevents us from 
making any changes for the datanode layout in the future.
Not all datanode layout changes need massive changes to underlying directory 
structure. One solution is to support both directory structures and as the 
blocks get deleted and re-added, they will naturally migrate to the new scheme.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-07 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054273#comment-14054273
 ] 

James Thomas commented on HDFS-6482:


[~sureshms] Do you know if there are any benchmarks that demonstrate that 
creating hundreds of thousands of hard links using the regular upgrade 
procedure is slow?

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-07 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054508#comment-14054508
 ] 

Suresh Srinivas commented on HDFS-6482:
---

[~james.thomas], we did a bunch of improvement to cut down the time from 10s of 
minutes to a couple of minutes. See HDFS-1445 for more details. Clearly 
anything significantly above 60S (design goal of rolling upgrades) will results 
in issues for rolling upgrades.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-07 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054509#comment-14054509
 ] 

Suresh Srinivas commented on HDFS-6482:
---

Quick comment on the design document part:
{noformat}
Upgrades: We handle DN upgrades by hard linking to the blocks in the previous 
directory as before. The only difference with this upgrade is that the new hard 
links would be placed into directories in the manner described here. This 
shouldn't affect anything, as no code appears to assume that blocks are laid 
out in the manner prescribed by LDir.
{noformat}
I think creating hard links with new schema is an issue. The main reason for 
hardlinks created as it is done today is to minimize the impact of any bug in 
new software. The simplest thing was done where we iterated over directories 
and created hardlinks. Rollback must ensure the system goes back to previous 
state of the system.


 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
 HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
 HDFS-6482.7.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052139#comment-14052139
 ] 

Hadoop QA commented on HDFS-6482:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654017/HDFS-6482.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7281//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7281//console

This message is automatically generated.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050676#comment-14050676
 ] 

Colin Patrick McCabe commented on HDFS-6482:


I agree with the reasoning about the two-level directory structure.

I see that your code adds a binary .tgz file:

{code}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz 
b/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz
new file mode 100644
index 
..49c9b15875e9d4c51a6fd06448ef54c2ced5e863
GIT binary patch
literal 320695
zcmYgXc{tSV_kQ10Ldcfvm94C$h7c0jcQeK=lw~BLA$!?n3)%O577W?4XAjwDC~L?z
zvKu?|`-tA(_dl-dbe{9v_kHejKF=kN`bY7Vd=X6ko+Dc)z^g*N!RuTshB~@%H+8
zde$pSkDGL?E?s#h+$#I5l0zrouuGoc0o|a6|$P2cscI_BcbYLXU08Qc(?V2-WKI
{code}

Unfortunately, our patch apply script doesn't understand git binary diffs :(  
So the tgz is not getting picked up, and leading to this spurious test failure.

{code}
org.apache.hadoop.util.Shell$ExitCodeException: gzip: 
/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test-classes/hadoop-24-datanode-dir.tgz:
 No such file or directory
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
{code}

{code}
-  // nothing to do here
+  // nothing to do hereFile dir =
{code}
Looks like a typo.

+1 once this is addressed.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049421#comment-14049421
 ] 

Hadoop QA commented on HDFS-6482:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653454/HDFS-6482.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7268//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7268//console

This message is automatically generated.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-19 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037061#comment-14037061
 ] 

Arpit Agarwal commented on HDFS-6482:
-

Good point about the dentry cache. I did not spend enough time to understand 
your probabilistic analysis. However with a quick and dirty calculation I agree 
that blowup is unlikely.

Even assuming 64TB disks, 8MB average block size (very conservative) and 
uniform distribution of block files in subdir, the expected number of files per 
subdir is 2 * (64TB / (8MB * 256 * 256)) = 256. The 2-level approach looks fine 
to me.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036465#comment-14036465
 ] 

Hadoop QA commented on HDFS-6482:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651220/HDFS-6482.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7163//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7163//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7163//console

This message is automatically generated.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029887#comment-14029887
 ] 

Hadoop QA commented on HDFS-6482:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650111/HDFS-6482.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7098//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7098//console

This message is automatically generated.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.4.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027226#comment-14027226
 ] 

Hadoop QA commented on HDFS-6482:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649627/HDFS-6482.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestBlockMissingException
  org.apache.hadoop.hdfs.TestCrcCorruption
  org.apache.hadoop.hdfs.TestMissingBlocksAlert
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
  org.apache.hadoop.hdfs.TestBlockReaderLocalLegacy
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages
  org.apache.hadoop.hdfs.protocol.TestLayoutVersion
  
org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestDatanodeRestart
  org.apache.hadoop.hdfs.server.namenode.TestXAttrConfigFlag
  org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader
  org.apache.hadoop.hdfs.TestFileCorruption
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
  org.apache.hadoop.hdfs.TestBlockReaderLocal
  
org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
  org.apache.hadoop.hdfs.server.datanode.TestCachingStrategy
  org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
  org.apache.hadoop.hdfs.TestDFSClientRetries
  
org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool
org.apache.hadoop.hdfs.server.namenode.TestFsck
org.apache.hadoop.hdfs.TestReplication
org.apache.hadoop.hdfs.TestDatanodeBlockScanner

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7078//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7078//console

This message is automatically generated.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
 HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-09 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025559#comment-14025559
 ] 

Arpit Agarwal commented on HDFS-6482:
-

{{DFS_DATANODE_NUMBLOCKS_DEFAULT}} is currently 64. I am not sure why the 
default was set so low. It would be good to know the reason before we change 
the behavior. It was quite possibly an arbitrary choice.

After ~4 million blocks we would start putting more than 256 blocks in each 
leaf subdirectory. With every 4M blocks, we'd add 256 files to each leaf. I 
think this is fine since 4 million blocks itself is going to be very unlikely. 
I recall as late as Vista NTFS directory listings would get noticeably slow 
with thousands of files per directory. Is there any performance loss with 
always having three levels of subdirectories, restricting each to 256 children 
at the most?

- Who removes empty subdirectories when blocks are deleted?
- Let's avoid suffixing hex numerals to subdir for consistency with the 
existing naming convention.
- StringBuilder looks unnecessary in {{idToBlockDir}}.
- We should add a release note stating that {{DFS_DATANODE_NUMBLOCKS_DEFAULT}} 
is obsolete.

The approach looks good and a big +1 for removing LDir.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025693#comment-14025693
 ] 

Colin Patrick McCabe commented on HDFS-6482:


bq. DFS_DATANODE_NUMBLOCKS_DEFAULT is currently 64. I am not sure why the 
default was set so low. It would be good to know the reason before we change 
the behavior. It was quite possibly an arbitrary choice.

So, back in the really old days (think ext2), there were performance issues for 
directories with a large number of files (10,000+).  See wikipedia's page on 
ext2 here: http://en.wikipedia.org/wiki/Ext2.  The LDir subdirectory mechanism 
was intended to alleviate this.

More recent filesystems like ext4 (and recent revisions of ext3) have what's 
called directory indices.  This basically means that there is an index which 
allows you to look up a particular entry in a directory in less than O(N) time. 
 This makes having directories with a huge number of entries possible.

It's still nice to have multiple directories to avoid overloading {{readdir}} 
(when we have to do that-- for example, to find a metadata file without knowing 
its genstamp) and to make inspecting things easier.  Plus, it allows us to stay 
compatible with systems that don't handle giant directories well.

bq. After ~4 million blocks we would start putting more than 256 blocks in each 
leaf subdirectory. With every 4M blocks, we'd add 256 files to each leaf. I 
think this is fine since 4 million blocks itself is going to be very unlikely. 
I recall as late as Vista NTFS directory listings would get noticeably slow 
with thousands of files per directory. Is there any performance loss with 
always having three levels of subdirectories, restricting each to 256 children 
at the most?

It's an interesting idea, but after all, as you pointed out, even to get to 
1,024 blocks per subdirectory (which still isn't thousands but is a single 
thousand) under James' scheme would require 16 million blocks.  At that point, 
it seems like there will be other problems.  We can always evolve the directory 
and metadata naming structure again once 16 million blocks is on the horizon 
(and we probably will have to do other things too, like investigate off-heap 
memory storage)

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025761#comment-14025761
 ] 

Kihwal Lee commented on HDFS-6482:
--

BlockIDs are sequential nowadays. With the proposed block distribution method,  
leaf dirs can get severely unbalanced, especially in smaller clusters.  Besides 
the cost of looking up entries in a directory, directory lock contention can 
become high and hurt performance if many files are created and read from a 
small set of directories. I think limiting the number to 64 kind of imposed a 
cap on how contentious it can be.  We might do better by more evenly 
distributing blocks. 

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-09 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025786#comment-14025786
 ] 

James Thomas commented on HDFS-6482:


Thanks for the review, Arpit, and thanks for the follow-up, Colin. I want to 
clarify one thing -- the numbers 4 million and 16 million that both of you 
mention are, as far as I understand, actually numbers of blocks for the ENTIRE 
cluster, not just a single DN. Suppose we had a cluster of 16 million blocks 
(with sequential block IDs), we could in theory have a single DN with a 
directory as large as 1024 entries, if we got unlucky with the assignment of 
blocks to DNs. Assuming uniform distribution of blocks across the DNs available 
in the cluster and a maximum # of blocks per DN of 2^24, we have an expected # 
of blocks per directory of 256. I don't know how accurate this assumption is.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-09 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025802#comment-14025802
 ] 

James Thomas commented on HDFS-6482:


Kihwal, we were considering using some sort of deterministic probing (as in 
hash tables) to find less full directories if the initial directory for a block 
is full. Do you think the cost (and additional complexity) of this sort of 
scheme is justified given the relatively low probability (given the uniform 
block distribution assumption, at least) of directory blowup?

Additionally, I want to note that if the total number of blocks in the cluster 
is N, N/2^16 is a strict upper bound on the number of blocks in a single 
directory on any DN, assuming completely sequential block IDs. So for a small 
cluster we can't see any blowup.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025980#comment-14025980
 ] 

Colin Patrick McCabe commented on HDFS-6482:


bq. Suppose we had a cluster of 16 million blocks (with sequential block IDs), 
we could in theory have a single DN with a directory as large as 1024 entries, 
if we got unlucky with the assignment of blocks to DNs.

I don't think this calculation is right.

Even if all the blocks end up on a single DN (maximally unbalanced), in a 16 
million block cluster, you have  (16 * 1024 * 1024) / (256 * 256) = 256 entries 
per directory.

To confirm this calculation, I ran this test program:
{code}
#include inttypes.h
#include stdio.h

#define MAX_A 256
#define MAX_B 256

uint64_t dir_entries[MAX_A][MAX_B];

int main(void)
{
  uint64_t i, j, l, a, b, c;
  uint64_t max = (16LL * 1024LL * 1024LL);

  for (i = 0; i  max; i++) {
l = (i  0x00ffLL);
a = (i  0xff00LL)  8LL;
b = (i  0x00ffLL)  16LL;
c = (i  0xff00LL)  16LL;
c |= l;
//printf(%02PRIx64/%02PRIx64/%012PRIx64\n, a, b, c);
dir_entries[a][b]++;
  }
  max = 0;
  for (i = 0; i  MAX_A; i++) {
for (j = 0; j  MAX_B; j++) {
  if (max  dir_entries[i][j]) {
max = dir_entries[i][j];
  }
}
  }
  printf(max entries per directory = %PRId64\n, max);
  return 0;
}
{code}

bq. we were considering using some sort of deterministic probing (as in hash 
tables) to find less full directories if the initial directory for a block is 
full...

I don't think probing is a good idea.  It's going to slow things down in the 
common case when we're reading a block.

Maybe we should add another layer in the hierarchy so that we know we won't get 
big directories even on huge clusters.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019111#comment-14019111
 ] 

Colin Patrick McCabe commented on HDFS-6482:


bq.   * @param genStamp generation stamp of the blockFile dir =

Looks like a bad search-and-replace.

bq.  public static File getDirectoryNoCreate(File root, long blockId) {

Maybe rather than having an {{IdBasedBlockDirectory}} class, we should just put 
a static method like {{idToBlockDir}} inside {{DatanodeUtil}}, that simply 
returns a {{File}}.  Code that wants to call {{mkdir}} can always do it on its 
own (with appropriate try/catch blocks.)  I don't see a lot of value in having 
a separate class, since the only interesting thing there is the {{File}} 
contained inside.

I also don't like the side effect in the constructor.  I guess we've done that 
kind of thing in the past.  But usually when we do that, there is a {{close}} 
method that undoes the side-effect.  In this case it just feels kind of 
arbitrary... why should creating this object do a mkdir somewhere?  You don't 
need the object to use the directory, and you don't need to keep the object 
around to keep the directory around.

bq.The block ID of a block uniquely determines its position in the  + 
   directory structure);

Should be finalized block

Looks good aside from that.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-04 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018261#comment-14018261
 ] 

James Thomas commented on HDFS-6482:


Thanks for the review, Colin. Addressed your feedback and implemented your 
layout suggestion. We should definitely think more about the implications of 
the layout, but I think your idea is strictly better than hashing.

The layout does not apply to the rbw directory. Currently, the subdir-split 
layout is only used for finalized replicas, and a basic flat layout is used for 
rbw. Doesn't seem like we should change that.

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.1.patch, HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in ReplicaInfo that keep track of a replica's 
 location.
 An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-06-03 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017246#comment-14017246
 ] 

Colin Patrick McCabe commented on HDFS-6482:


{code}
+  public static long getBlockIdFromBlockOrMetaFile(String blockOrMetaFile) {
+long metaTry = getBlockId(blockOrMetaFile);
{code}

Rather than have a new function, how about fixing {{getBlockId}} to work on 
either metadata or data files?  It would require a new regular expression.  
However, both meta and block files begin with the block ID and then follow with 
a non-numeric character (or the end of the file name) so it shouldn't be too 
bad to write the regex for that.

{code}
BLOCKID_BASED_LAYOUT(-55,
The block ID of a block uniquely determines its position in the  +
directory structure, obviating the need to keep per-block  +
directory information in memory.);
{code}

It might be better just to write The block ID of a block uniquely determines 
its position in the directory structure.  The rest of the descriptions are 
pretty short.

{code}
+// If we are upgrading from a version older than the one where we 
introduced
+// block ID-based layout AND we're working with the finalized directory,
+// we'll need to upgrade from the old flat layout to the block ID-based one
+if (oldLV  LayoutVersion.Feature.BLOCKID_BASED_LAYOUT.getInfo().
+getLayoutVersion()  to.getName().equals(STORAGE_DIR_FINALIZED)) {
+  upgradeToIdBasedLayout = true;
{code}

This new layout applies to the rbw directory as well, right?

{code}
  File getDir() {
try {
  return new IdBasedBlockDirectory(baseDir).getDirectory(getBlockId());
} catch (IOException ioe) {
  return null; // won't happen since directory for this block already exists
}
  }
{code}

It seems like it would be better to just have a static method or something in 
{{IdBasedBlockDirectory}} that returned a File object.  This can't happen 
code is scary, especially when we're dealing with filesystem operations like 
mkdir.

Remember that a File object can exist, even though the corresponding file on 
disk does not.  The object just contains a path, basically.  So let's just 
return that File from somewhere and skip the mkdir.

{code}
import org.apache.hadoop.hdfs.server.datanode.*;
{code}

We generally don't use wildcard includes... I think maybe your editor did this 
automatically.  IntelliJ did that to me once :)  There's a setting on IntelliJ 
to turn that off.

{code}
  // directory store Finalized replica
  private final IdBasedBlockDirectory finalizedDir;
{code}
While we're moving the comment, let's make it grammatical

{code}
this.finalizedDir = new IdBasedBlockDirectory(finalizedDir);
{code}
It seems kind of tricky to have two variables with the same name.  I would say 
rename one or the other, or don't bother with a local variable for finalizedDir 
at all (nested new statements).

{code}
  static private long hashBlockId(long n) {
return (n + 378734493671000L) * 9223372036854775783L;
  }

  public File getDirectory(long blockId) throws IOException {
long h = hashBlockId(blockId);
int d1 = (int)((h  56)  0xff);
int d2 = (int)((h  48)  0xff);
{code}

So you're creating a path which looks like: a/b/c

How about taking bits 8-16 for a, bits 16-24 for b, and the rest for c?  
(Notice that the lowest bits are part of c.)

This has some nice effects.  In combination with our sequential block 
allocation strategy, it means that the first 256 files all go in the same 
directory, avoiding the need to make 2 directories per file.  The next 256 go 
in a different distinct directory, and so on.

The thing to keep in mind is that we don't really want each block in its own 
directory... we just want to avoid overloading directories.  We should eschew 
hashing so that we never need to worry about collisions.  With the scheme I 
outlined, we can go up to approximately a (power of two) billion blocks (2**30) 
without ever exceeding 16384 files per directory.  At a billion blocks per 
Datanode, we have bigger problems than directory structure, of course :)

 Use block ID-based block layout on datanodes
 

 Key: HDFS-6482
 URL: https://issues.apache.org/jira/browse/HDFS-6482
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6482.patch


 Right now blocks are placed into directories that are split into many 
 subdirectories when capacity is reached. Instead we can use a block's ID to 
 determine the path it should go in. This eliminates the need for the LDir 
 data structure that facilitates the splitting of directories when they reach 
 capacity as well as fields in