[jira] [Commented] (HDFS-11313) Segmented Block Reports
[ https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966458#comment-15966458 ] Daryn Sharp commented on HDFS-11313: Very clever design, but as I hinted before, I have strong concerns/objections to a sorted order requirement upon which this design appears predicated. Restricting the block data structures, by design, to some form of a tree does not scale well compared to other data structures. Ex. hashed or indexed. In fact it completely eliminates them. The answer may be much simpler. Back when the ipc handlers processed the IBR and FBRs, yielding the fsn lock was not possible, but awhile back I offloaded the BRs into a queue for processing by a dedicated thread. This reduced fsn lock contention (1 vs n-many waiters), and increased throughput via batching multiple BRs under the same write lock subject to a time limit. I think this may be extended to yield the lock during FBR processing. The serialized nature of BR processing removes the IBR races. There's probably just a few races to consider with systems like the decom manager, repl monitor, etc. > Segmented Block Reports > --- > > Key: HDFS-11313 > URL: https://issues.apache.org/jira/browse/HDFS-11313 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.2 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: SegmentedBlockReports.pdf > > > Block reports from a single DataNode can be currently split into multiple > RPCs each reporting a single DataNode storage (disk). The reports are still > large since disks are getting bigger. Splitting blockReport RPCs into > multiple smaller calls would improve NameNode performance and overall HDFS > stability. > This was discussed in multiple jiras. Here the approach is to let NameNode > divide blockID space into segments and then ask DataNodes to report replicas > in a particular range of IDs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11313) Segmented Block Reports
[ https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966230#comment-15966230 ] Zhe Zhang commented on HDFS-11313: -- Thanks [~redvine] for posting the design doc! Linking to HDFS-9011 where [~jingzhao] has posted a related design and patch. > Segmented Block Reports > --- > > Key: HDFS-11313 > URL: https://issues.apache.org/jira/browse/HDFS-11313 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.2 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: SegmentedBlockReports.pdf > > > Block reports from a single DataNode can be currently split into multiple > RPCs each reporting a single DataNode storage (disk). The reports are still > large since disks are getting bigger. Splitting blockReport RPCs into > multiple smaller calls would improve NameNode performance and overall HDFS > stability. > This was discussed in multiple jiras. Here the approach is to let NameNode > divide blockID space into segments and then ask DataNodes to report replicas > in a particular range of IDs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11313) Segmented Block Reports
[ https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965511#comment-15965511 ] Vinitha Reddy Gankidi commented on HDFS-11313: -- Attached the design doc. Please take a look. I would appreciate any feedback on the design. Once we finalize on it, I'll create subtasks for the implementation. > Segmented Block Reports > --- > > Key: HDFS-11313 > URL: https://issues.apache.org/jira/browse/HDFS-11313 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.2 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: SegmentedBlockReports.pdf > > > Block reports from a single DataNode can be currently split into multiple > RPCs each reporting a single DataNode storage (disk). The reports are still > large since disks are getting bigger. Splitting blockReport RPCs into > multiple smaller calls would improve NameNode performance and overall HDFS > stability. > This was discussed in multiple jiras. Here the approach is to let NameNode > divide blockID space into segments and then ask DataNodes to report replicas > in a particular range of IDs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11313) Segmented Block Reports
[ https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941294#comment-15941294 ] Vinitha Reddy Gankidi commented on HDFS-11313: -- Assigning it to myself. Will attach a design doc soon. > Segmented Block Reports > --- > > Key: HDFS-11313 > URL: https://issues.apache.org/jira/browse/HDFS-11313 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.2 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > > Block reports from a single DataNode can be currently split into multiple > RPCs each reporting a single DataNode storage (disk). The reports are still > large since disks are getting bigger. Splitting blockReport RPCs into > multiple smaller calls would improve NameNode performance and overall HDFS > stability. > This was discussed in multiple jiras. Here the approach is to let NameNode > divide blockID space into segments and then ask DataNodes to report replicas > in a particular range of IDs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11313) Segmented Block Reports
[ https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832740#comment-15832740 ] Konstantin Shvachko commented on HDFS-11313: Hey [~redvine], we can leverage BR-leases introduced by HDFS-7923 as indicators to send a repeat request for block reports to a DN, which didn't send the BR until lease expired. We should also track BRs by lease IDs and reject outdated BRs from DNs. It would be good to write a document with more details. And cover potential race conditions as [~daryn] suggested. [~daryn] these are all good questions, which we should answer in the design doc. I do not expect new race conditions, because segmented reports are also snapshots but of a range of blocks rather than a full set of DN blocks. But we should definitely look deep into it. Good point about legacy blockIDs, which used to be random and therefore could be anywhere. So NN should partition the entire space of ids from minID to maxID. > Segmented Block Reports > --- > > Key: HDFS-11313 > URL: https://issues.apache.org/jira/browse/HDFS-11313 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.2 >Reporter: Konstantin Shvachko > > Block reports from a single DataNode can be currently split into multiple > RPCs each reporting a single DataNode storage (disk). The reports are still > large since disks are getting bigger. Splitting blockReport RPCs into > multiple smaller calls would improve NameNode performance and overall HDFS > stability. > This was discussed in multiple jiras. Here the approach is to let NameNode > divide blockID space into segments and then ask DataNodes to report replicas > in a particular range of IDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11313) Segmented Block Reports
[ https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821924#comment-15821924 ] Daryn Sharp commented on HDFS-11313: Interesting idea. Need to think through race conditions because the current naive design (snapshot in time), is easy to reconcile state in the NN. Not saying I like it, just that we need to think hard about new races esp. with IBRs. It must include provisions for negative block ids so it's not just the last segment that is open ended. Not a contrived use case, we have many 2.x clusters with legacy negative block ids esp. archival clusters. What would be the basic design? Is it predicated on the NN sorting block ids? If yes, I have strong concerns I'll outline. How are the segment ranges computed? Fixed size? How will very sparse block ranges be handled, esp. in the case of negative block ids? What I've long wanted to do is invert the block report processing. The NN sends BRs to the DN, and DN reconciles inconsistencies with IBRs. Haven't thought through it beyond the concept, but I digress. > Segmented Block Reports > --- > > Key: HDFS-11313 > URL: https://issues.apache.org/jira/browse/HDFS-11313 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.2 >Reporter: Konstantin Shvachko > > Block reports from a single DataNode can be currently split into multiple > RPCs each reporting a single DataNode storage (disk). The reports are still > large since disks are getting bigger. Splitting blockReport RPCs into > multiple smaller calls would improve NameNode performance and overall HDFS > stability. > This was discussed in multiple jiras. Here the approach is to let NameNode > divide blockID space into segments and then ask DataNodes to report replicas > in a particular range of IDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11313) Segmented Block Reports
[ https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816933#comment-15816933 ] Vinitha Reddy Gankidi commented on HDFS-11313: -- [~shv] This idea seems promising. I would like to work on it. I wanted to note that HDFS-7923 is related in the sense that the blocks reports by the DN are sent only when the NN gives the signal. Even with this patch, the issue of processing large DN reports under a global namespace lock still remains. > Segmented Block Reports > --- > > Key: HDFS-11313 > URL: https://issues.apache.org/jira/browse/HDFS-11313 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.2 >Reporter: Konstantin Shvachko > > Block reports from a single DataNode can be currently split into multiple > RPCs each reporting a single DataNode storage (disk). The reports are still > large since disks are getting bigger. Splitting blockReport RPCs into > multiple smaller calls would improve NameNode performance and overall HDFS > stability. > This was discussed in multiple jiras. Here the approach is to let NameNode > divide blockID space into segments and then ask DataNodes to report replicas > in a particular range of IDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11313) Segmented Block Reports
[ https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816841#comment-15816841 ] Konstantin Shvachko commented on HDFS-11313: Full block report (FBR) processing on the NameNode does four things: # Update replica information reported by the DataNode for known blocks # Add new replicas reported by the DataNode # Instruct the DataNode to delete replicas, which belong to non-existing blocks # Remove replicas, which NameNode assumed to be present on the DataNode, but which did not appear in the report The main problem with current FBRs is that they are processed under global namesystem lock, and since the reports are big, other operation cannot proceed until the lock is released. On large clusters the current trend is to decrease FBR frequency, sending FBRs once in 6, 10, or even 12 hours. It would be beneficial to split FBRs into smaller even though more frequent RPC calls. If a DataNode were to split its FBR into multiple RPCs arbitrarily, then NameNode wouldn't be able to distinguish between replicas which do not exist on the DataNode from those that have not been yet reported (see 4 above). Therefore, the proposal is to introduce segmented block reports (SBR), where each report includes a segment of IDs. So the DataNode reports all its replicas in the given range of blockIDs, and if some block is not present in the report, the respective replica must be removed from the NameNode. More details: * NameNode allocates blockIDs sequentially. It should partition the set of allocated so far block IDs into reasonably sized segments. The last segment is open ended. * BlockReportCommand is a new DatanodeCommand, which NameNode should send to a DataNode (in reply to a heartbeat) to order a block report within a specified segment. * When DN receives a BlockReportCommand it forms SBR for the requested segment and sends it to NN. The report also includes the segment boundaries. This could be done per storage. * NN processing of SBR is similar to FBR, but bounded to the reported segment. * NN can eventually start optimizing to request SBRs when it is less busy. * Periodic FBRs, can eventually be removed, but for now should remain for backward compatibility. That is if a DN does not receive any BlockReportCommands from NN, it should send FBR. P.S. There is a lot of jiras discussing partial block reports since prehistoric times. I scanned through many, but found only one mentioning of a similar proposal. In HDFS-395 [~cutting] in [his comment|https://issues.apache.org/jira/browse/HDFS-395?focusedCommentId=12593583=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12593583] posted a link to an old discussion on the topic. Unfortunately the link is now stale. > Segmented Block Reports > --- > > Key: HDFS-11313 > URL: https://issues.apache.org/jira/browse/HDFS-11313 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.6.2 >Reporter: Konstantin Shvachko > > Block reports from a single DataNode can be currently split into multiple > RPCs each reporting a single DataNode storage (disk). The reports are still > large since disks are getting bigger. Splitting blockReport RPCs into > multiple smaller calls would improve NameNode performance and overall HDFS > stability. > This was discussed in multiple jiras. Here the approach is to let NameNode > divide blockID space into segments and then ask DataNodes to report replicas > in a particular range of IDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org