[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646287#comment-14646287 ] Hudson commented on HDFS-6482: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2217 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2217/]) HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646257#comment-14646257 ] Hudson commented on HDFS-6482: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #268 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/268/]) HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645568#comment-14645568 ] Hudson commented on HDFS-6482: -- FAILURE: Integrated in Hadoop-trunk-Commit #8236 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8236/]) HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645941#comment-14645941 ] Hudson commented on HDFS-6482: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1001 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1001/]) HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645927#comment-14645927 ] Hudson commented on HDFS-6482: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #271 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/271/]) HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646134#comment-14646134 ] Hudson commented on HDFS-6482: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #260 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/260/]) HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646093#comment-14646093 ] Hudson commented on HDFS-6482: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2198 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2198/]) HDFS-8834. TestReplication is not valid after HDFS-6482. (Contributed by Lei Xu) (lei: rev f4f1b8b267703b8bebab06e17e69a4a4de611592) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156320#comment-14156320 ] Hudson commented on HDFS-6482: -- FAILURE: Integrated in Hadoop-Yarn-trunk #698 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/698/]) HDFS-6482. Fix CHANGES.txt in trunk (arp: rev be30c86cc9f71894dc649ed22983e5c42e9b6951) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156398#comment-14156398 ] Hudson commented on HDFS-6482: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/]) HDFS-6482. Fix CHANGES.txt in trunk (arp: rev be30c86cc9f71894dc649ed22983e5c42e9b6951) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156514#comment-14156514 ] Hudson commented on HDFS-6482: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/]) HDFS-6482. Fix CHANGES.txt in trunk (arp: rev be30c86cc9f71894dc649ed22983e5c42e9b6951) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157106#comment-14157106 ] Harsh J commented on HDFS-6482: --- Updated the FAQ that was covering DN block moves to reflect the new maintain-subdir requirement: https://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157521#comment-14157521 ] Colin Patrick McCabe commented on HDFS-6482: Thanks, [~qwertymaniac]. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155014#comment-14155014 ] Hudson commented on HDFS-6482: -- FAILURE: Integrated in Hadoop-trunk-Commit #6163 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6163/]) HDFS-6482. Fix CHANGES.txt in trunk (arp: rev be30c86cc9f71894dc649ed22983e5c42e9b6951) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128495#comment-14128495 ] Hudson commented on HDFS-6482: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1867 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1867/]) HDFS-6482. Fix CHANGES.txt in trunk. (arp: rev 0de563a18e9e09207e3ef5f1cad1d2e788af9503) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128916#comment-14128916 ] Hudson commented on HDFS-6482: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1892 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1892/]) HDFS-6482. Fix CHANGES.txt in trunk. (arp: rev 0de563a18e9e09207e3ef5f1cad1d2e788af9503) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120886#comment-14120886 ] Arpit Agarwal commented on HDFS-6482: - I added a patch to HDFS-6981. Please review it. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118513#comment-14118513 ] Colin Patrick McCabe commented on HDFS-6482: Yeah, it would be great to have this in 2.6. Is HDFS-6981 blocking merging this to 2.6? Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118554#comment-14118554 ] Arpit Agarwal commented on HDFS-6482: - That is the known issue, yes. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115703#comment-14115703 ] Arpit Agarwal commented on HDFS-6482: - Now that HDFS-6800 is in trunk to support DN layout changes with rolling upgrade I'd like to include this improvement in 2.6. I plan to test it with rolling upgrades over the next couple of weeks. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088044#comment-14088044 ] Arpit Agarwal commented on HDFS-6482: - Hi [~james.thomas], is the branch-2 merge blocked by HDFS-6800? Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088059#comment-14088059 ] James Thomas commented on HDFS-6482: [~arpitagarwal], yep, we're waiting on discussion there. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088108#comment-14088108 ] Arpit Agarwal commented on HDFS-6482: - It is too late to mention this now but I am concerned about the delta between trunk and branch-2 while we wait on HDFS-6800 to get resolved. Will continue the discussion on HDFS-6800. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083524#comment-14083524 ] Hudson commented on HDFS-6482: -- FAILURE: Integrated in Hadoop-Yarn-trunk #631 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/631/]) HDFS-6482. Use block ID-based block layout on datanodes (James Thomas via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615223) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DiskChecker.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSFinalize.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/UpgradeUtilities.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-datanode-dir.txt Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch,
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083569#comment-14083569 ] Hudson commented on HDFS-6482: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1825 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1825/]) HDFS-6482. Use block ID-based block layout on datanodes (James Thomas via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615223) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DiskChecker.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSFinalize.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/UpgradeUtilities.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-datanode-dir.txt Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch,
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083589#comment-14083589 ] Hudson commented on HDFS-6482: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1850 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1850/]) HDFS-6482. Use block ID-based block layout on datanodes (James Thomas via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615223) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DiskChecker.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSFinalize.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/UpgradeUtilities.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-datanode-dir.txt Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch,
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082901#comment-14082901 ] Colin Patrick McCabe commented on HDFS-6482: Thanks for your hard work on this, James. Committed to trunk (but not branch-2). Let's continue to discuss the rolling downgrade issues (need for additional rolling DN downgrade tests, general DN rolling downgrade strategy, etc.) on HDFS-6800. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082945#comment-14082945 ] Hudson commented on HDFS-6482: -- FAILURE: Integrated in Hadoop-trunk-Commit #5999 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5999/]) HDFS-6482. Use block ID-based block layout on datanodes (James Thomas via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1615223) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DiskChecker.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSFinalize.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/UpgradeUtilities.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-datanode-dir.txt Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch,
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083032#comment-14083032 ] Chris Nauroth commented on HDFS-6482: - Hi, [~james.thomas] and [~cmccabe]. This patch broke compilation on Windows. I filed HADOOP-10925 for it, and I expect to have a patch ready shortly. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083035#comment-14083035 ] James Thomas commented on HDFS-6482: [~cnauroth], apologies, thanks. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083113#comment-14083113 ] Colin Patrick McCabe commented on HDFS-6482: thanks, Chris Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Fix For: 3.0.0 Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081365#comment-14081365 ] Suresh Srinivas commented on HDFS-6482: --- [~cmccabe], does this patch change the directory structure and during rollback the new directory structure is retained? If so, this is a concern for me and I am -0 on this. We should at least make sure we have rollback tests to check that things work and there are no undocumented hidden assumption in the older software version (to which we are rolling back to) is broken. Given rollback could be to any older release from where upgrade to this version is allowed; that makes testing and ensuring nothing is broken that much more hard. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081427#comment-14081427 ] James Thomas commented on HDFS-6482: [~sureshms] No, during rollback the directory structure in the previous directory on the DN is restored, and this is the old directory structure. I have run some tests on my computer that show that rollback works. It is not really possible to write a rollback test that checks this case because we cannot run an older version of the DN code in the test. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081603#comment-14081603 ] James Thomas commented on HDFS-6482: [~sureshms] Is the documentation the Rollback section at http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html correct? You are supposed to restart the DNs normally, without flags like -rollback or -rollingupgrade rollback? If you restart the DNs with -rollback, everything should work normally and the previous directory should be restored with the old layout. [~arpitagarwal], any thoughts on this? Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081607#comment-14081607 ] James Thomas commented on HDFS-6482: Or [~kihwal]? Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081618#comment-14081618 ] Colin Patrick McCabe commented on HDFS-6482: Why don't we merge this to trunk and then open another JIRA to iron out any issues with rolling upgrades between different DN layout versions. At minimum, we should decide whether we support rolling DN upgrades between different layout versions, and if we don't support it, give a clear failure message to admins. But this patch is big enough that I don't think cramming all that into here is a good idea. There also seem to be some issues with rolling DN downgrade now (for example, HDFS-6005 removed {{datanode \-rollingupgrade \-rollback}}, but not the usage text for it displayed in {{\-help}}.) Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081739#comment-14081739 ] Colin Patrick McCabe commented on HDFS-6482: Hey guys, I filed HDFS-6800 to have the rolling upgrade discussion. I'm going to commit this to trunk (but *not* to any other branches) in a bit if nobody has any objections. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080201#comment-14080201 ] Colin Patrick McCabe commented on HDFS-6482: +1. Thanks for your work on this, James. I'm going to commit this to trunk today if there's no further comments. We can decide about branch-2 later Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080275#comment-14080275 ] Suresh Srinivas commented on HDFS-6482: --- [~cmccabe], I have been traveling and not kept up with this. I will try to get back by tomorrow. If not, please go ahead and commit by tomorrow evening. Any comments I may have can be addressed before merging to branch-2. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080383#comment-14080383 ] Colin Patrick McCabe commented on HDFS-6482: ok, I will wait for tomorrow evening. thanks for looking at this, suresh. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069541#comment-14069541 ] James Thomas commented on HDFS-6482: In response to [~cmccabe]'s comments: One thread per storage directory doesn't make sense here since this is the number of threads to use for the hard link process for ONE storage directory. The hard link processes for the storage directories are currently not run in parallel. We can create a separate JIRA to add the native code to the regular hard link path. I've created a separate code path in this change for the upgrade to the block ID-based layout, and I want to focus on optimizing that in this JIRA. [~sureshms], any thoughts? Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069631#comment-14069631 ] Colin Patrick McCabe commented on HDFS-6482: bq. One thread per storage directory doesn't make sense here since this is the number of threads to use for the hard link process for ONE storage directory. The hard link processes for the storage directories are currently not run in parallel. Understood. It seems like we should be parallelizing the upgrade of different storage directories, since clearly we'd like to keep all those disks busy if we could. Anyway, this JIRA is big enough as-is, so let's not worry about it right now. James, given that the you've gotten the upgrade times in the single seconds now, I am +1 on putting this change in 2.x. [~sureshms], [~atm], what are your thoughts here? Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066942#comment-14066942 ] Hadoop QA commented on HDFS-6482: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656582/HDFS-6482.8.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7390//console This message is automatically generated. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066976#comment-14066976 ] Hadoop QA commented on HDFS-6482: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656582/HDFS-6482.8.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7391//console This message is automatically generated. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066988#comment-14066988 ] Colin Patrick McCabe commented on HDFS-6482: 1 second for 100k blocks is pretty good. bq. Added a configuration parameter for users to specify the number of threads to be used in the hard link process. Perhaps one thread per storage directory would make sense? I'm not sure if a configuration option is useful, if this upgrade is a one time event (and the NameNodes that would be upgraded have already been deployed.) bq. We use these optimizations for the hard link process only when upgrading to the block ID-based layout, because otherwise the directory structures of the old and new layouts should be the same and we can perform fast batch hard links over directories – see HDFS-1445. Why not always use the native path, if it's faster? It should be trivial to implement the batch symlink API via the native path. You'd just write a for loop in java that made some calls down into the JNI function you already wrote. There is a new symlink API coming up in Java7, so we'll want to stop using the shell thing eventually anyway. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066995#comment-14066995 ] Colin Patrick McCabe commented on HDFS-6482: So, when I try to apply your patch locally, I get git binary diffs are not supported. It looks like different versions of GNU patch have different behavior here (presumably support is a new feature?) and we're playing jenkins roulette. I would say put the tar.gz file in a separate attachment for now, so we can get a jenkins run. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066997#comment-14066997 ] Colin Patrick McCabe commented on HDFS-6482: and also patch fails when you get git binary diffs are not supported-- hence the message you're seeing in the log output Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067068#comment-14067068 ] Hadoop QA commented on HDFS-6482: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656597/hadoop-24-datanode-dir.tgz against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7394//console This message is automatically generated. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067359#comment-14067359 ] Hadoop QA commented on HDFS-6482: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656608/HDFS-6482.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TestSymlinkLocalFSFileContext org.apache.hadoop.ipc.TestIPC org.apache.hadoop.fs.TestSymlinkLocalFSFileSystem org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7396//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7396//console This message is automatically generated. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, hadoop-24-datanode-dir.tgz Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055173#comment-14055173 ] James Thomas commented on HDFS-6482: [~sureshms] Thanks for the info. I don't understand your last comment -- could you explain further? Also, I don't think it makes sense to support both the LDir structure and this structure simultaneously. We would need to continue to maintain information in the ReplicaMap about where each block was located (since we wouldn't know whether it was stored with the old or new scheme), so there would be no memory usage savings. I'm not sure we would ever reach a point where all blocks stored with the old scheme would be gone and we could officially stop using the location field in ReplicaInfo. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055419#comment-14055419 ] Colin Patrick McCabe commented on HDFS-6482: bq. \[james wrote\]: Also, I don't think it makes sense to support both the LDir structure and this structure simultaneously The main reason to do this change is to save memory and simplify things by not having to store the path to each replica. If we support the old layout, then we no longer have this nice property. We could still get some of the gains by setting the path to null in some of the various data structures... basically assume that null means this replica is located at a place determined by its block id. And non-null would mean using the old system. This might be a possible solution. I would prefer not to go down this road due to the greater code complexity, though. bq. \[suresh wrote\]: I think creating hard links with new schema is an issue. The main reason for hardlinks created as it is done today is to minimize the impact of any bug in new software. The simplest thing was done where we iterated over directories and created hardlinks. Rollback must ensure the system goes back to previous state of the system. I don't see why a rollback wouldn't work here. It's the same as going from the old (pre hadoop-2.0) layout to the new block pool-based layout. We also used hardlinks there to provide downgrade capability, and it also worked there. We're not changing the contents of the old directory, just moving it out of the way and hardlinking to the block and meta files within it. bq. James Thomas, we did a bunch of improvement to cut down the time from 10s of minutes to a couple of minutes. See HDFS-1445 for more details. Clearly anything significantly above 60S (design goal of rolling upgrades) will results in issues for rolling upgrades. Yes. This is a very important consideration. James and I discussed a few ways to optimize the hardlink process. I think that it's very possible for this to be done in a second or two at most. If you assume 500,000 replicas spread over 10 drives, you have 50,000 hardlinks to make on each drive. This just isn't going to take that long, since the operations you're doing are just altering memory (we don't fsync after calling {{link}}). It's just a question of doing it in a smart way that minimizes the number of {{exec}} calls we make (and possibly obtains some parallelism). Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053895#comment-14053895 ] Suresh Srinivas commented on HDFS-6482: --- HDFS-5535 added support for rolling upgrade. During that time, given datanode layout rarely changes, rollback was not considered for datanodes. Given that this jira is changing datanode layout version, impact on rollback should be considered before this change can be committed. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053958#comment-14053958 ] James Thomas commented on HDFS-6482: I think DN rollback should work fine with this change. The previous directory will contain the blocks laid out with the old structure, and on rollback this structure will be restored. The relevant code is in DataStorage.java, particularly linkBlocks(). Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054008#comment-14054008 ] Colin Patrick McCabe commented on HDFS-6482: The new current directory will contain hardlinks to the block and metadata files in the previous directory. So it seems like rollback should work fine in this case. It would be nice to add a unit test where we upgrade a DataNode from the non-blockid-based version to a blockid-based version, and then do a rollback. Can you add this, James? Since you already added {{hadoop-24-datanode-dir.tgz}}, it shouldn't be too difficult to add a unit test that rolls back to this version from the new version. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054044#comment-14054044 ] Suresh Srinivas commented on HDFS-6482: --- bq. The new current directory will contain hardlinks to the block and metadata files in the previous directory. So it seems like rollback should work fine in this case. Can a brief design doc be posted to this jira to describe what the new directory structure is, what happens during upgrade to release with this change? I do not have time to review jira to figure this out. My read is, this will not work if during upgrade the previous layout is changed to the new layout. During rolling upgrades hardlinks to all the blocks are *not* created, only for the ones deleted post rolling upgrade. This is done to keep the datanode upgrade time short to support quick restart. If rolling upgrades cannot be supported, this code can only go into a major release. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054115#comment-14054115 ] Colin Patrick McCabe commented on HDFS-6482: bq. My read is, this will not work if during upgrade the previous layout is changed to the new layout. During rolling upgrades hardlinks to all the blocks are not created, only for the ones deleted post rolling upgrade. This is done to keep the datanode upgrade time short to support quick restart. If rolling upgrades cannot be supported, this code can only go into a major release. My understanding of this was that it was an optimization for the cases where the datanode layout hadn't changed significantly (which was most upgrades). It should not be interpreted as a hard limitation that prevents us from making *any* changes for the datanode layout in the future. James, it would be good to see some upgrade times for a DN with a few hundred thousand blocks. It seems like this should be manageable, especially if we parallelize it a bit. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054141#comment-14054141 ] Suresh Srinivas commented on HDFS-6482: --- bq. My understanding of this was that it was an optimization for the cases where the datanode layout hadn't changed significantly (which was most upgrades). One of the key requirements of rolling upgrades was to keep datanode upgrade time as short as possible. Second, current rolling upgrades does not take hardlinks as I mentioned already. Hence if the assumption is hardlinks will be made, that needs to be factored in. bq. It should not be interpreted as a hard limitation that prevents us from making any changes for the datanode layout in the future. Not all datanode layout changes need massive changes to underlying directory structure. One solution is to support both directory structures and as the blocks get deleted and re-added, they will naturally migrate to the new scheme. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054273#comment-14054273 ] James Thomas commented on HDFS-6482: [~sureshms] Do you know if there are any benchmarks that demonstrate that creating hundreds of thousands of hard links using the regular upgrade procedure is slow? Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054508#comment-14054508 ] Suresh Srinivas commented on HDFS-6482: --- [~james.thomas], we did a bunch of improvement to cut down the time from 10s of minutes to a couple of minutes. See HDFS-1445 for more details. Clearly anything significantly above 60S (design goal of rolling upgrades) will results in issues for rolling upgrades. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054509#comment-14054509 ] Suresh Srinivas commented on HDFS-6482: --- Quick comment on the design document part: {noformat} Upgrades: We handle DN upgrades by hard linking to the blocks in the previous directory as before. The only difference with this upgrade is that the new hard links would be placed into directories in the manner described here. This shouldn't affect anything, as no code appears to assume that blocks are laid out in the manner prescribed by LDir. {noformat} I think creating hard links with new schema is an issue. The main reason for hardlinks created as it is done today is to minimize the impact of any bug in new software. The simplest thing was done where we iterated over directories and created hardlinks. Rollback must ensure the system goes back to previous state of the system. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052139#comment-14052139 ] Hadoop QA commented on HDFS-6482: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654017/HDFS-6482.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7281//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7281//console This message is automatically generated. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050676#comment-14050676 ] Colin Patrick McCabe commented on HDFS-6482: I agree with the reasoning about the two-level directory structure. I see that your code adds a binary .tgz file: {code} diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz b/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-24-datanode-dir.tgz new file mode 100644 index ..49c9b15875e9d4c51a6fd06448ef54c2ced5e863 GIT binary patch literal 320695 zcmYgXc{tSV_kQ10Ldcfvm94C$h7c0jcQeK=lw~BLA$!?n3)%O577W?4XAjwDC~L?z zvKu?|`-tA(_dl-dbe{9v_kHejKF=kN`bY7Vd=X6ko+Dc)z^g*N!RuTshB~@%H+8 zde$pSkDGL?E?s#h+$#I5l0zrouuGoc0o|a6|$P2cscI_BcbYLXU08Qc(?V2-WKI {code} Unfortunately, our patch apply script doesn't understand git binary diffs :( So the tgz is not getting picked up, and leading to this spurious test failure. {code} org.apache.hadoop.util.Shell$ExitCodeException: gzip: /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test-classes/hadoop-24-datanode-dir.tgz: No such file or directory tar: This does not look like a tar archive tar: Exiting with failure status due to previous errors {code} {code} - // nothing to do here + // nothing to do hereFile dir = {code} Looks like a typo. +1 once this is addressed. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049421#comment-14049421 ] Hadoop QA commented on HDFS-6482: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653454/HDFS-6482.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7268//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7268//console This message is automatically generated. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037061#comment-14037061 ] Arpit Agarwal commented on HDFS-6482: - Good point about the dentry cache. I did not spend enough time to understand your probabilistic analysis. However with a quick and dirty calculation I agree that blowup is unlikely. Even assuming 64TB disks, 8MB average block size (very conservative) and uniform distribution of block files in subdir, the expected number of files per subdir is 2 * (64TB / (8MB * 256 * 256)) = 256. The 2-level approach looks fine to me. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036465#comment-14036465 ] Hadoop QA commented on HDFS-6482: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651220/HDFS-6482.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7163//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7163//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7163//console This message is automatically generated. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029887#comment-14029887 ] Hadoop QA commented on HDFS-6482: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650111/HDFS-6482.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7098//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7098//console This message is automatically generated. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027226#comment-14027226 ] Hadoop QA commented on HDFS-6482: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649627/HDFS-6482.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestBlockMissingException org.apache.hadoop.hdfs.TestCrcCorruption org.apache.hadoop.hdfs.TestMissingBlocksAlert org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure org.apache.hadoop.hdfs.TestBlockReaderLocalLegacy org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages org.apache.hadoop.hdfs.protocol.TestLayoutVersion org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestDatanodeRestart org.apache.hadoop.hdfs.server.namenode.TestXAttrConfigFlag org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader org.apache.hadoop.hdfs.TestFileCorruption org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.TestBlockReaderLocal org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks org.apache.hadoop.hdfs.server.datanode.TestCachingStrategy org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.TestDatanodeBlockScanner {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7078//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7078//console This message is automatically generated. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025559#comment-14025559 ] Arpit Agarwal commented on HDFS-6482: - {{DFS_DATANODE_NUMBLOCKS_DEFAULT}} is currently 64. I am not sure why the default was set so low. It would be good to know the reason before we change the behavior. It was quite possibly an arbitrary choice. After ~4 million blocks we would start putting more than 256 blocks in each leaf subdirectory. With every 4M blocks, we'd add 256 files to each leaf. I think this is fine since 4 million blocks itself is going to be very unlikely. I recall as late as Vista NTFS directory listings would get noticeably slow with thousands of files per directory. Is there any performance loss with always having three levels of subdirectories, restricting each to 256 children at the most? - Who removes empty subdirectories when blocks are deleted? - Let's avoid suffixing hex numerals to subdir for consistency with the existing naming convention. - StringBuilder looks unnecessary in {{idToBlockDir}}. - We should add a release note stating that {{DFS_DATANODE_NUMBLOCKS_DEFAULT}} is obsolete. The approach looks good and a big +1 for removing LDir. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025693#comment-14025693 ] Colin Patrick McCabe commented on HDFS-6482: bq. DFS_DATANODE_NUMBLOCKS_DEFAULT is currently 64. I am not sure why the default was set so low. It would be good to know the reason before we change the behavior. It was quite possibly an arbitrary choice. So, back in the really old days (think ext2), there were performance issues for directories with a large number of files (10,000+). See wikipedia's page on ext2 here: http://en.wikipedia.org/wiki/Ext2. The LDir subdirectory mechanism was intended to alleviate this. More recent filesystems like ext4 (and recent revisions of ext3) have what's called directory indices. This basically means that there is an index which allows you to look up a particular entry in a directory in less than O(N) time. This makes having directories with a huge number of entries possible. It's still nice to have multiple directories to avoid overloading {{readdir}} (when we have to do that-- for example, to find a metadata file without knowing its genstamp) and to make inspecting things easier. Plus, it allows us to stay compatible with systems that don't handle giant directories well. bq. After ~4 million blocks we would start putting more than 256 blocks in each leaf subdirectory. With every 4M blocks, we'd add 256 files to each leaf. I think this is fine since 4 million blocks itself is going to be very unlikely. I recall as late as Vista NTFS directory listings would get noticeably slow with thousands of files per directory. Is there any performance loss with always having three levels of subdirectories, restricting each to 256 children at the most? It's an interesting idea, but after all, as you pointed out, even to get to 1,024 blocks per subdirectory (which still isn't thousands but is a single thousand) under James' scheme would require 16 million blocks. At that point, it seems like there will be other problems. We can always evolve the directory and metadata naming structure again once 16 million blocks is on the horizon (and we probably will have to do other things too, like investigate off-heap memory storage) Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025761#comment-14025761 ] Kihwal Lee commented on HDFS-6482: -- BlockIDs are sequential nowadays. With the proposed block distribution method, leaf dirs can get severely unbalanced, especially in smaller clusters. Besides the cost of looking up entries in a directory, directory lock contention can become high and hurt performance if many files are created and read from a small set of directories. I think limiting the number to 64 kind of imposed a cap on how contentious it can be. We might do better by more evenly distributing blocks. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025786#comment-14025786 ] James Thomas commented on HDFS-6482: Thanks for the review, Arpit, and thanks for the follow-up, Colin. I want to clarify one thing -- the numbers 4 million and 16 million that both of you mention are, as far as I understand, actually numbers of blocks for the ENTIRE cluster, not just a single DN. Suppose we had a cluster of 16 million blocks (with sequential block IDs), we could in theory have a single DN with a directory as large as 1024 entries, if we got unlucky with the assignment of blocks to DNs. Assuming uniform distribution of blocks across the DNs available in the cluster and a maximum # of blocks per DN of 2^24, we have an expected # of blocks per directory of 256. I don't know how accurate this assumption is. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025802#comment-14025802 ] James Thomas commented on HDFS-6482: Kihwal, we were considering using some sort of deterministic probing (as in hash tables) to find less full directories if the initial directory for a block is full. Do you think the cost (and additional complexity) of this sort of scheme is justified given the relatively low probability (given the uniform block distribution assumption, at least) of directory blowup? Additionally, I want to note that if the total number of blocks in the cluster is N, N/2^16 is a strict upper bound on the number of blocks in a single directory on any DN, assuming completely sequential block IDs. So for a small cluster we can't see any blowup. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025980#comment-14025980 ] Colin Patrick McCabe commented on HDFS-6482: bq. Suppose we had a cluster of 16 million blocks (with sequential block IDs), we could in theory have a single DN with a directory as large as 1024 entries, if we got unlucky with the assignment of blocks to DNs. I don't think this calculation is right. Even if all the blocks end up on a single DN (maximally unbalanced), in a 16 million block cluster, you have (16 * 1024 * 1024) / (256 * 256) = 256 entries per directory. To confirm this calculation, I ran this test program: {code} #include inttypes.h #include stdio.h #define MAX_A 256 #define MAX_B 256 uint64_t dir_entries[MAX_A][MAX_B]; int main(void) { uint64_t i, j, l, a, b, c; uint64_t max = (16LL * 1024LL * 1024LL); for (i = 0; i max; i++) { l = (i 0x00ffLL); a = (i 0xff00LL) 8LL; b = (i 0x00ffLL) 16LL; c = (i 0xff00LL) 16LL; c |= l; //printf(%02PRIx64/%02PRIx64/%012PRIx64\n, a, b, c); dir_entries[a][b]++; } max = 0; for (i = 0; i MAX_A; i++) { for (j = 0; j MAX_B; j++) { if (max dir_entries[i][j]) { max = dir_entries[i][j]; } } } printf(max entries per directory = %PRId64\n, max); return 0; } {code} bq. we were considering using some sort of deterministic probing (as in hash tables) to find less full directories if the initial directory for a block is full... I don't think probing is a good idea. It's going to slow things down in the common case when we're reading a block. Maybe we should add another layer in the hierarchy so that we know we won't get big directories even on huge clusters. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019111#comment-14019111 ] Colin Patrick McCabe commented on HDFS-6482: bq. * @param genStamp generation stamp of the blockFile dir = Looks like a bad search-and-replace. bq. public static File getDirectoryNoCreate(File root, long blockId) { Maybe rather than having an {{IdBasedBlockDirectory}} class, we should just put a static method like {{idToBlockDir}} inside {{DatanodeUtil}}, that simply returns a {{File}}. Code that wants to call {{mkdir}} can always do it on its own (with appropriate try/catch blocks.) I don't see a lot of value in having a separate class, since the only interesting thing there is the {{File}} contained inside. I also don't like the side effect in the constructor. I guess we've done that kind of thing in the past. But usually when we do that, there is a {{close}} method that undoes the side-effect. In this case it just feels kind of arbitrary... why should creating this object do a mkdir somewhere? You don't need the object to use the directory, and you don't need to keep the object around to keep the directory around. bq.The block ID of a block uniquely determines its position in the + directory structure); Should be finalized block Looks good aside from that. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018261#comment-14018261 ] James Thomas commented on HDFS-6482: Thanks for the review, Colin. Addressed your feedback and implemented your layout suggestion. We should definitely think more about the implications of the layout, but I think your idea is strictly better than hashing. The layout does not apply to the rbw directory. Currently, the subdir-split layout is only used for finalized replicas, and a basic flat layout is used for rbw. Doesn't seem like we should change that. Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.1.patch, HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017246#comment-14017246 ] Colin Patrick McCabe commented on HDFS-6482: {code} + public static long getBlockIdFromBlockOrMetaFile(String blockOrMetaFile) { +long metaTry = getBlockId(blockOrMetaFile); {code} Rather than have a new function, how about fixing {{getBlockId}} to work on either metadata or data files? It would require a new regular expression. However, both meta and block files begin with the block ID and then follow with a non-numeric character (or the end of the file name) so it shouldn't be too bad to write the regex for that. {code} BLOCKID_BASED_LAYOUT(-55, The block ID of a block uniquely determines its position in the + directory structure, obviating the need to keep per-block + directory information in memory.); {code} It might be better just to write The block ID of a block uniquely determines its position in the directory structure. The rest of the descriptions are pretty short. {code} +// If we are upgrading from a version older than the one where we introduced +// block ID-based layout AND we're working with the finalized directory, +// we'll need to upgrade from the old flat layout to the block ID-based one +if (oldLV LayoutVersion.Feature.BLOCKID_BASED_LAYOUT.getInfo(). +getLayoutVersion() to.getName().equals(STORAGE_DIR_FINALIZED)) { + upgradeToIdBasedLayout = true; {code} This new layout applies to the rbw directory as well, right? {code} File getDir() { try { return new IdBasedBlockDirectory(baseDir).getDirectory(getBlockId()); } catch (IOException ioe) { return null; // won't happen since directory for this block already exists } } {code} It seems like it would be better to just have a static method or something in {{IdBasedBlockDirectory}} that returned a File object. This can't happen code is scary, especially when we're dealing with filesystem operations like mkdir. Remember that a File object can exist, even though the corresponding file on disk does not. The object just contains a path, basically. So let's just return that File from somewhere and skip the mkdir. {code} import org.apache.hadoop.hdfs.server.datanode.*; {code} We generally don't use wildcard includes... I think maybe your editor did this automatically. IntelliJ did that to me once :) There's a setting on IntelliJ to turn that off. {code} // directory store Finalized replica private final IdBasedBlockDirectory finalizedDir; {code} While we're moving the comment, let's make it grammatical {code} this.finalizedDir = new IdBasedBlockDirectory(finalizedDir); {code} It seems kind of tricky to have two variables with the same name. I would say rename one or the other, or don't bother with a local variable for finalizedDir at all (nested new statements). {code} static private long hashBlockId(long n) { return (n + 378734493671000L) * 9223372036854775783L; } public File getDirectory(long blockId) throws IOException { long h = hashBlockId(blockId); int d1 = (int)((h 56) 0xff); int d2 = (int)((h 48) 0xff); {code} So you're creating a path which looks like: a/b/c How about taking bits 8-16 for a, bits 16-24 for b, and the rest for c? (Notice that the lowest bits are part of c.) This has some nice effects. In combination with our sequential block allocation strategy, it means that the first 256 files all go in the same directory, avoiding the need to make 2 directories per file. The next 256 go in a different distinct directory, and so on. The thing to keep in mind is that we don't really want each block in its own directory... we just want to avoid overloading directories. We should eschew hashing so that we never need to worry about collisions. With the scheme I outlined, we can go up to approximately a (power of two) billion blocks (2**30) without ever exceeding 16384 files per directory. At a billion blocks per Datanode, we have bigger problems than directory structure, of course :) Use block ID-based block layout on datanodes Key: HDFS-6482 URL: https://issues.apache.org/jira/browse/HDFS-6482 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6482.patch Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in