[jira] [Comment Edited] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572328#comment-16572328 ] Virajith Jalaparti edited comment on HDFS-13088 at 8/7/18 9:00 PM: --- Thanks for the feedback [~elgoiri] and [~ehiggs]. [^HDFS-13088.002.patch] is an alternate approach to implement this -- It adds a new parameter {{dfs.provided.overreplication.factor}} which allows specifying how many extra replicas can be allowed for blocks that are PROVIDED. This is a single value for all blocks/files in the system and ephemeral (not necessarily retained across Namenode restarts unless the config value remains the same). However, there are no changes to {{FileSystem}} or {{INodeFile}} and much less intrusive. The main change to existing code is when the excess replicas are checked for in {{BlockManager#shouldProcessExtraRedundancy}} -- the number of excess replicas are determined to be the block replication + the value specified by {{dfs.provided.overreplication.factor}} for PROVIDED blocks. For blocks that are not PROVIDED or for EC-blocks, the earlier semantics are retained. I still need to add tests for this but posting the patch to get it out earlier. was (Author: virajith): Thanks for the feedback [~elgoiri] and [~ehiggs]. [^HDFS-13088.002.patch] is an alternate approach to implement this -- It adds a new parameter {{dfs.provided.overreplication.factor}} which allows specifying how many extra replicas can be allowed for blocks that are PROVIDED. This is a single value for all blocks/files in the system and ephemeral (not necessarily retained across Namenode restarts unless the config value remains the same). However, there are no changes to {{FileSystem}} or {{INodeFile}} and much less intrusive. > Allow HDFS files/blocks to be over-replicated. > -- > > Key: HDFS-13088 > URL: https://issues.apache.org/jira/browse/HDFS-13088 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13088.001.patch, HDFS-13088.002.patch > > > This JIRA is to add a per-file "over-replication" factor to HDFS. As > mentioned in HDFS-13069, the over-replication factor will be the excess > replicas that will be allowed to exist for a file or block. This is > beneficial if the application deems additional replicas for a file are > needed. In the case of HDFS-13069, it would allow copies of data in PROVIDED > storage to be cached locally in HDFS in a read-through manner. > The Namenode will not proactively meet the over-replication i.e., it does not > schedule replications if the number of replicas for a block is less than > (replication factor + over-replication factor) as long as they are more than > the replication factor of the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399706#comment-16399706 ] Virajith Jalaparti edited comment on HDFS-13088 at 3/15/18 1:11 AM: Posting an initial patch to get feedback on the approach. The key changes are: 1) Add a new {{setReplication(string, short, short)}} and {{getOverReplication}} calls to {{FileSystem}}. 2) Change the {{InodeFile#HeaderFormat}} so that of the 11bits that were reserved for the replication factor, 3 bits are used for over-replication. This implies that the maximum allowed replication will become 2^8 -1 instead of 2^11 -1. 3) Change the {{setrep}} command (and ClientNamenodeProtocol) to allow setting the over-replication factor on a file. The idea behind the changes was that over-replication is a "new kind" of replication factor, and thus, I modified existing ways to set the replication factor on a file to include over-replication was (Author: virajith): Posting an initial patch to get feedback on the approach. The key changes are: 1) Add a new {{setReplication(string, short, short)}} and {{getOverReplication}} calls to {{FileSystem}}. 2) Change the {{InodeFile#HeaderFormat}} so that of the 11bits that were reserved for the replication factor, 3 bits are used for over-replication 3) Change the {{setrep}} command (and ClientNamenodeProtocol) to allow setting the over-replication factor on a file. The idea behind the changes was that over-replication is a "new kind" of replication factor, and thus, I modified existing ways to set the replication factor on a file to include over-replication > Allow HDFS files/blocks to be over-replicated. > -- > > Key: HDFS-13088 > URL: https://issues.apache.org/jira/browse/HDFS-13088 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13088.001.patch > > > This JIRA is to add a per-file "over-replication" factor to HDFS. As > mentioned in HDFS-13069, the over-replication factor will be the excess > replicas that will be allowed to exist for a file or block. This is > beneficial if the application deems additional replicas for a file are > needed. In the case of HDFS-13069, it would allow copies of data in PROVIDED > storage to be cached locally in HDFS in a read-through manner. > The Namenode will not proactively meet the over-replication i.e., it does not > schedule replications if the number of replicas for a block is less than > (replication factor + over-replication factor) as long as they are more than > the replication factor of the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org