* Eric Blake (ebl...@redhat.com) wrote: > On 09/02/2015 02:51 AM, Wen Congyang wrote: > > If the child is not ready, read/write/getlength/flush will > > return -errno. It is not critical error, and can be ignored: > > 1. read/write: > > Just not report the error event. > > What happens if all the children report an error? Or is the threshold > at play here?
I think it's interesting because in the COLO case the intention isn't really about a threshold (in the way you might use for RAID or mirroring), it's that one of the stores is local (and not expected to error) and one is somewhere over a network, so if it fails you don't want to stop the local VM working. However, if it fails we do need to know about it; if any write to the secondary stops then the fault-tolerance has failed (at least for that drive); so we should do *something* - I'm not sure what though. Dave > For example, if you have a threshold of 3/5, then I'm assuming that if > up to two children return an errno, then it is okay to ignore; but if > three or more return an errno, you haven't met threshold, so the I/O > must fail. > > Are you ignoring ALL errors (including things like EACCES), or just EIO > errors? > > > > 2. getlength: > > just ignore it. If all children's getlength return -errno, > > and be ignored, return -EIO. > > 3. flush: > > Just ignore it. If all children's getlength return -errno, > > s/getlength/flush/ > > > and be ignored, return 0. > > Yuck - claiming success when all of the children fail feels dangerous. > > > > > Usage: children.x.ignore-errors=true > > > > Signed-off-by: Wen Congyang <we...@cn.fujitsu.com> > > Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com> > > Signed-off-by: Gonglei <arei.gong...@huawei.com> > > Cc: Alberto Garcia <be...@igalia.com> > > --- > > block/quorum.c | 94 > > ++++++++++++++++++++++++++++++++++++++++++++++++---- > > Interface review only: > > > +++ b/qapi/block-core.json > > @@ -1411,6 +1411,8 @@ > > # @allow-write-backing-file: #optional whether the backing file is opened > > in > > # read-write mode. It is only for backing file > > # (Since 2.5 default: false) > > +# @ignore-errors: #options whether the child's I/O error should be ignored. > > s/options/optional/ > s/error/errors/ > > > +# it is only for quorum's child.(Since 2.5 default: false) > > Space after '.' in English sentences. > > The fact that you are documenting that this option can only be specified > for quorum children makes me wonder if it belongs better as an option in > BlockdevOptionsQuorum rather than BlockdevOptionsBase. > > Semantically, it sounds like you are trying to allow for a per-child > decision of whether this particular child's errors matter to the overall > quorum. So, if we have a 3/5 quorum, we can decide that for children A, > B, C, and D, errors cannot be ignored, but for child E, errors are not a > problem. > > As written, you are tying the semantics to each child BDS, and requiring > special code to double-check that the property is only ever set if the > BDS is used as the child of a quorum. Furthermore, if the property is > set, you are then changing what the child does in response to various > operations. > > What if you instead create a list property in the quorum parent? Maybe > along the lines of: > > # @child-errors-okay: #optional an array of named-node children where > errors will be ignored (Since 2.5, default empty) > > { 'struct': 'BlockdevOptionsQuorum', > 'data': { '*blkverify': 'bool', > 'children': [ 'BlockdevRef' ], > 'vote-threshold': 'int', > '*rewrite-corrupted': 'bool', > '*read-pattern': 'QuorumReadPattern', > '*child-errors-okay': ['str'] } } > > The above example of a 3/5 quorum, where only child E can ignore errors, > would then be represented as: > > { "children": [ "A", "B", "C", "D", "E" ], 'vote-threshold':3, > 'child-errors-okay': [ "E" ] } > > The code to ignore the errors is then done in the quorum itself (the BDS > for E does not have to care about a special ignore-errors property, but > just always returns the error as usual; and then the quorum is deciding > how to handle the error), and you are not polluting the BDS state for > something that is quorum-specific, because it is now the quorum itself > that tracks the special casing. > > Finally, why can't hot-plug/unplug of quorum members work? If you are > going to always ignore errors from a particular child, then why is that > child even part of the quorum? Isn't a better design to just not add > the child to the quorum until it is ready and won't be reporting errors? > > -- > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK