[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-04-12 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966458#comment-15966458
 ] 

Daryn Sharp commented on HDFS-11313:


Very clever design, but as I hinted before, I have strong concerns/objections 
to a sorted order requirement upon which this design appears predicated.  
Restricting the block data structures, by design, to some form of a tree does 
not scale well compared to other data structures. Ex. hashed or indexed.  In 
fact it completely eliminates them.

The answer may be much simpler.

Back when the ipc handlers processed the IBR and FBRs, yielding the fsn lock 
was not possible, but awhile back I offloaded the BRs into a queue for 
processing by a dedicated thread.  This reduced fsn lock contention (1 vs 
n-many waiters), and increased throughput via batching multiple BRs under the 
same write lock subject to a time limit.  I think this may be extended to yield 
the lock during FBR processing.  The serialized nature of BR processing removes 
the IBR races.

There's probably just a few races to consider with systems like the decom 
manager, repl monitor, etc.

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: SegmentedBlockReports.pdf
>
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-04-12 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966230#comment-15966230
 ] 

Zhe Zhang commented on HDFS-11313:
--

Thanks [~redvine] for posting the design doc!

Linking to HDFS-9011 where [~jingzhao] has posted a related design and patch.

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: SegmentedBlockReports.pdf
>
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-04-12 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965511#comment-15965511
 ] 

Vinitha Reddy Gankidi commented on HDFS-11313:
--

Attached the design doc. Please take a look. I would appreciate any feedback on 
the design. Once we finalize on it, I'll create subtasks for the 
implementation. 

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: SegmentedBlockReports.pdf
>
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-03-24 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941294#comment-15941294
 ] 

Vinitha Reddy Gankidi commented on HDFS-11313:
--

Assigning it to myself. Will attach a design doc soon.

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-01-20 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832740#comment-15832740
 ] 

Konstantin Shvachko commented on HDFS-11313:


Hey [~redvine], we can leverage BR-leases introduced by HDFS-7923 as indicators 
to send a repeat request for block reports to a DN, which didn't send the BR 
until lease expired. We should also track BRs by lease IDs and reject outdated 
BRs from DNs. It would be good to write a document with more details. And cover 
potential race conditions as [~daryn] suggested.
[~daryn] these are all good questions, which we should answer in the design 
doc. I do not expect new race conditions, because segmented reports are also 
snapshots but of a range of blocks rather than a full set of DN blocks. But we 
should definitely look deep into it. Good point about legacy blockIDs, which 
used to be random and therefore could be anywhere. So NN should partition the 
entire space of ids from minID to maxID.

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-01-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821924#comment-15821924
 ] 

Daryn Sharp commented on HDFS-11313:


Interesting idea.  Need to think through race conditions because the current 
naive design (snapshot in time), is easy to reconcile state in the NN.  Not 
saying I like it, just that we need to think hard about new races esp. with 
IBRs.

It must include provisions for negative block ids so it's not just the last 
segment that is open ended.  Not a contrived use case, we have many 2.x 
clusters with legacy negative block ids esp. archival clusters.

What would be the basic design?  Is it predicated on the NN sorting block ids?  
If yes, I have strong concerns I'll outline.  How are the segment ranges 
computed?  Fixed size?  How will very sparse block ranges be handled, esp. in 
the case of negative block ids?

What I've long wanted to do is invert the block report processing.  The NN 
sends BRs to the DN, and DN reconciles inconsistencies with IBRs.  Haven't 
thought through it beyond the concept, but I digress.



> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-01-10 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816933#comment-15816933
 ] 

Vinitha Reddy Gankidi commented on HDFS-11313:
--

[~shv] This idea seems promising. I would like to work on it. I wanted to note 
that HDFS-7923 is related in the sense that the blocks reports by the DN are 
sent only when the NN gives the signal. Even with this patch, the issue of 
processing large DN reports under a global namespace lock still remains.  

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11313) Segmented Block Reports

2017-01-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816841#comment-15816841
 ] 

Konstantin Shvachko commented on HDFS-11313:


Full block report (FBR) processing on the NameNode does four things:
# Update replica information reported by the DataNode for known blocks
# Add new replicas reported by the DataNode
# Instruct the DataNode to delete replicas, which belong to non-existing blocks
# Remove replicas, which NameNode assumed to be present on the DataNode, but 
which did not appear in the report

The main problem with current FBRs is that they are processed under global 
namesystem lock, and since the reports are big, other operation cannot proceed 
until the lock is released. On large clusters the current trend is to decrease 
FBR frequency, sending FBRs once in 6, 10, or even 12 hours. It would be 
beneficial to split FBRs into smaller even though more frequent RPC calls.

If a DataNode were to split its FBR into multiple RPCs arbitrarily, then 
NameNode wouldn't be able to distinguish between replicas which do not exist on 
the DataNode from those that have not been yet reported (see 4 above).
Therefore, the proposal is to introduce segmented block reports (SBR), where 
each report includes a segment of IDs. So the DataNode reports all its replicas 
in the given range of blockIDs, and if some block is not present in the report, 
the respective replica must be removed from the NameNode.
More details:
* NameNode allocates blockIDs sequentially. It should partition the set of 
allocated so far block IDs into reasonably sized segments. The last segment is 
open ended.
* BlockReportCommand is a new DatanodeCommand, which NameNode should send to a 
DataNode (in reply to a heartbeat) to order a block report within a specified 
segment.
* When DN receives a BlockReportCommand it forms SBR for the requested segment 
and sends it to NN. The report also includes the segment boundaries. This could 
be done per storage.
* NN processing of SBR is similar to FBR, but bounded to the reported segment.
* NN can eventually start optimizing to request SBRs when it is less busy.
* Periodic FBRs, can eventually be removed, but for now should remain for 
backward compatibility. That is if a DN does not receive any 
BlockReportCommands from NN, it should send FBR.

P.S. There is a lot of jiras discussing partial block reports since prehistoric 
times. I scanned through many, but found only one mentioning of a similar 
proposal. In HDFS-395 [~cutting] in [his 
comment|https://issues.apache.org/jira/browse/HDFS-395?focusedCommentId=12593583=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12593583]
 posted a link to an old discussion on the topic. Unfortunately the link is now 
stale.

> Segmented Block Reports
> ---
>
> Key: HDFS-11313
> URL: https://issues.apache.org/jira/browse/HDFS-11313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.2
>Reporter: Konstantin Shvachko
>
> Block reports from a single DataNode can be currently split into multiple 
> RPCs each reporting a single DataNode storage (disk). The reports are still 
> large since disks are getting bigger. Splitting blockReport RPCs into 
> multiple smaller calls would improve NameNode performance and overall HDFS 
> stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode 
> divide blockID space into segments and then ask DataNodes to report replicas 
> in a particular range of IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org