[ https://issues.apache.org/jira/browse/KUDU-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367479#comment-16367479 ]
Grant Henke commented on KUDU-616: ---------------------------------- [~andrew.wong] was this work handled in some other jiras? > Mitigate tablet damage when disks are lost > ------------------------------------------ > > Key: KUDU-616 > URL: https://issues.apache.org/jira/browse/KUDU-616 > Project: Kudu > Issue Type: Sub-task > Components: fs > Affects Versions: M5 > Reporter: Adar Dembo > Assignee: Andrew Wong > Priority: Major > > Disk loss is an unfortunate fact of life, and Kudu should provide mechanisms > for mitigating disk loss. > # Make it possible to isolate specific tablets to some subset of the > machine's disks, so that if one disk dies it doesn't take out all the tablets > with it. This is more complicated than it looks: > ** We need a concrete way of describing disk groups. It can be per-node, or > abstract enough that it makes sense across the entire cluster, or perhaps we > aggregate information (e.g. ten machines have 5 disks and the other forty > machines have 6 disks). > ** This mechanism needs to be used for both data blocks and other bits of > metadata (master blocks, superblocks, and other random files). > ** Presumably it needs to be provided when a table is created (or a tablet is > split), and it needs to be persisted as part of tablet metadata. It might be > sufficient to express it in Kudu configuration (i.e. complex gflags) but > since it can be associated to tablet metadata, it's hard to see how this > would work. > # When a disk fails, the server needs to handle it appropriately (mark it as > failed, put affected tablets in a failed state, etc.). -- This message was sent by Atlassian JIRA (v7.6.3#76005)