[jira] [Comment Edited] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-08-07 Thread Virajith Jalaparti (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572328#comment-16572328
 ] 

Virajith Jalaparti edited comment on HDFS-13088 at 8/7/18 9:00 PM:
---

Thanks for the feedback [~elgoiri] and [~ehiggs].

 [^HDFS-13088.002.patch]  is an alternate approach to implement this -- It adds 
a new parameter {{dfs.provided.overreplication.factor}} which allows specifying 
how many extra replicas can be allowed for blocks that are PROVIDED. This is a 
single value for all blocks/files in the system and ephemeral (not necessarily 
retained across Namenode restarts unless the config value remains the same). 
However, there are no changes to {{FileSystem}} or {{INodeFile}} and much less 
intrusive.

The main change to existing code is when the excess replicas are checked for in 
{{BlockManager#shouldProcessExtraRedundancy}} -- the number of excess replicas 
are determined to be the block replication + the value specified by 
{{dfs.provided.overreplication.factor}} for PROVIDED blocks. For blocks that 
are not PROVIDED or for EC-blocks, the earlier semantics are retained.

I still need to add tests for this but posting the patch to get it out earlier.


was (Author: virajith):
Thanks for the feedback [~elgoiri] and [~ehiggs].

 [^HDFS-13088.002.patch]  is an alternate approach to implement this -- It adds 
a new parameter {{dfs.provided.overreplication.factor}} which allows specifying 
how many extra replicas can be allowed for blocks that are PROVIDED. This is a 
single value for all blocks/files in the system and ephemeral (not necessarily 
retained across Namenode restarts unless the config value remains the same). 
However, there are no changes to {{FileSystem}} or {{INodeFile}} and much less 
intrusive.

> Allow HDFS files/blocks to be over-replicated.
> --
>
> Key: HDFS-13088
> URL: https://issues.apache.org/jira/browse/HDFS-13088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13088.001.patch, HDFS-13088.002.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-03-14 Thread Virajith Jalaparti (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399706#comment-16399706
 ] 

Virajith Jalaparti edited comment on HDFS-13088 at 3/15/18 1:11 AM:


Posting an initial patch to get feedback on the approach. The key changes are:
1) Add a new {{setReplication(string, short, short)}} and 
{{getOverReplication}} calls to {{FileSystem}}.
2) Change the {{InodeFile#HeaderFormat}} so that of the 11bits that were 
reserved for the replication factor, 3 bits are used for over-replication. This 
implies that the maximum allowed replication will become 2^8 -1 instead of 2^11 
-1.
3) Change the {{setrep}} command (and ClientNamenodeProtocol) to allow setting 
the over-replication factor on a file.

The idea behind the changes was that over-replication is a "new kind" of 
replication factor, and thus, I modified existing ways to set the replication 
factor on a file to include over-replication


was (Author: virajith):
Posting an initial patch to get feedback on the approach. The key changes are:
1) Add a new {{setReplication(string, short, short)}} and 
{{getOverReplication}} calls to {{FileSystem}}.
2) Change the {{InodeFile#HeaderFormat}} so that of the 11bits that were 
reserved for the replication factor, 3 bits are used for over-replication
3) Change the {{setrep}} command (and ClientNamenodeProtocol) to allow setting 
the over-replication factor on a file.

The idea behind the changes was that over-replication is a "new kind" of 
replication factor, and thus, I modified existing ways to set the replication 
factor on a file to include over-replication

> Allow HDFS files/blocks to be over-replicated.
> --
>
> Key: HDFS-13088
> URL: https://issues.apache.org/jira/browse/HDFS-13088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13088.001.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org