Springfield is a collection of projects unifying multiple levels of
the storage stack and providing a general API for automation, health
and status monitoring, as well as sane and easy configuration across
multiple levels of the storage stack. It is a scalable solution,
working from a single node
In order to catch I/O error of backing device, a separate bi_end_io
call back is required. Then a per backing device counter can record I/O
errors number and retire the backing device if the counter reaches a
per backing device I/O error limit.
This patch adds backing_request_endio() to bcache
In patch "bcache: fix cached_dev->count usage for bch_cache_set_error()",
cached_dev_get() is called when creating dc->writeback_thread, and
cached_dev_put() is called when exiting dc->writeback_thread. This
modification works well unless people detach the bcache device manually by
'echo 1 >
Kernel thread routine bch_writeback_thread() has the following code block,
447 down_write(>writeback_lock);
448~450 if (check conditions) {
451 up_write(>writeback_lock);
452 set_current_state(TASK_INTERRUPTIBLE);
453
454 if
From: Tang Junhui
When we run IO in a detached device, and run iostat to shows IO status,
normally it will show like bellow (Omitted some fields):
Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util
sdd... 15.89 0.531.820.202.23
struct delayed_work writeback_rate_update in struct cache_dev is a delayed
worker to call function update_writeback_rate() in period (the interval is
defined by dc->writeback_rate_update_seconds).
When a metadate I/O error happens on cache device, bcache error handling
routine
On Sat, 2018-01-27 at 09:37 +0100, Jan Tulak wrote:
> Springfield is a collection of projects unifying multiple levels of
> the storage stack and providing a general API for automation, health
> and status monitoring, as well as sane and easy configuration across
> multiple levels of the storage
Currently bcache does not handle backing device failure, if backing
device is offline and disconnected from system, its bcache device can still
be accessible. If the bcache device is in writeback mode, I/O requests even
can success if the requests hit on cache device. That is to say, when and
how
On Tue, Jan 23 2018 at 10:31pm -0500,
Ming Lei wrote:
> On Tue, Jan 23, 2018 at 04:57:34PM +, Bart Van Assche wrote:
> > On Wed, 2018-01-24 at 00:37 +0800, Ming Lei wrote:
> > > On Tue, Jan 23, 2018 at 04:24:20PM +, Bart Van Assche wrote:
> > > > My opinion about
On Sat, 2018-01-27 at 14:09 -0500, Mike Snitzer wrote:
> Ming let me know that he successfully tested this V3 patch using both
> your test (fio to both mpath and underlying path) and Bart's (02-mq with
> can_queue in guest).
>
> Would be great if you'd review and verify this fix works for you
dc->writeback_rate_update_seconds can be set via sysfs and its value can
be set to [1, ULONG_MAX]. It does not make sense to set such a large
value, 60 seconds is long enough value considering the default 5 seconds
works well for long time.
Because dc->writeback_rate_update is a special delayed
Hi maintainers and folks,
This patch set tries to improve bcache device failure handling, includes
cache device and backing device failures.
The basic idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices which are attached to this cache set
- Stop all the
When bcache metadata I/O fails, bcache will call bch_cache_set_error()
to retire the whole cache set. The expected behavior to retire a cache
set is to unregister the cache set, and unregister all backing device
attached to this cache set, then remove sysfs entries of the cache set
and all
Current bcache failure handling code will stop all attached bcache devices
when the cache set is broken or disconnected. This is desired behavior for
most of enterprise or cloud use cases, but maybe not for low end
configuration. Nix points out, users may still want to
access
On Sat, Jan 27, 2018 at 10:12:43PM +, Bart Van Assche wrote:
> On Sat, 2018-01-27 at 14:09 -0500, Mike Snitzer wrote:
> > Ming let me know that he successfully tested this V3 patch using both
> > your test (fio to both mpath and underlying path) and Bart's (02-mq with
> > can_queue in guest).
On Sat, Jan 27 2018 at 5:12pm -0500,
Bart Van Assche wrote:
> On Sat, 2018-01-27 at 14:09 -0500, Mike Snitzer wrote:
> > Ming let me know that he successfully tested this V3 patch using both
> > your test (fio to both mpath and underlying path) and Bart's (02-mq with
> >
In patch "bcache: fix cached_dev->count usage for bch_cache_set_error()",
cached_dev_get() is called when creating dc->writeback_thread, and
cached_dev_put() is called when exiting dc->writeback_thread. This
modification works well unless people detach the bcache device manually by
'echo 1 >
Struct cache uses io_errors for two purposes,
- Error decay: when cache set error_decay is set, io_errors is used to
generate a small piece of delay when I/O error happens.
- I/O errors counter: in order to generate big enough value for error
decay, I/O errors counter value is stored by left
Kernel thread routine bch_writeback_thread() has the following code block,
447 down_write(>writeback_lock);
448~450 if (check conditions) {
451 up_write(>writeback_lock);
452 set_current_state(TASK_INTERRUPTIBLE);
453
454 if
Hello Coly,
Saturday, January 27, 2018, 5:24:06 PM, you wrote:
> Current bcache failure handling code will stop all attached bcache devices
> when the cache set is broken or disconnected. This is desired behavior for
> most of enterprise or cloud use cases, but maybe not for low end
>
On Sat, Jan 27 2018 at 7:54pm -0500,
Bart Van Assche wrote:
> On Sat, 2018-01-27 at 19:23 -0500, Mike Snitzer wrote:
> > Your contributions do _not_ make up for your inability to work well with
> > others. Tiresome doesn't begin to describe these interactions.
> >
> >
On Sat, 2018-01-27 at 19:23 -0500, Mike Snitzer wrote:
> Your contributions do _not_ make up for your inability to work well with
> others. Tiresome doesn't begin to describe these interactions.
>
> Life is too short to continue enduring your bullshit.
>
> But do let us know when you have
dc->writeback_rate_update_seconds can be set via sysfs and its value can
be set to [1, ULONG_MAX]. It does not make sense to set such a large
value, 60 seconds is long enough value considering the default 5 seconds
works well for long time.
Because dc->writeback_rate_update is a special delayed
On Sat, 2018-01-27 at 21:03 -0500, Mike Snitzer wrote:
> You cannot even be forthcoming about the technical merit of a change you
> authored (commit 6077c2d70) that I'm left to clean up in the face of
> performance bottlenecks it unwittingly introduced? If you were being
> honest: you'd grant
When there are too many I/O errors on cache device, current bcache code
will retire the whole cache set, and detach all bcache devices. But the
detached bcache devices are not stopped, which is problematic when bcache
is in writeback mode.
If the retired cache set has dirty data of backing
If a bcache device is configured to writeback mode, current code does not
handle write I/O errors on backing devices properly.
In writeback mode, write request is written to cache device, and
latter being flushed to backing device. If I/O failed when writing from
cache device to the backing
On 28/01/2018 11:33 AM, Pavel Goran wrote:
> Hello Coly,
>
> Saturday, January 27, 2018, 5:24:06 PM, you wrote:
>
>> Current bcache failure handling code will stop all attached bcache devices
>> when the cache set is broken or disconnected. This is desired behavior for
>> most of enterprise or
On Sat, Jan 27 2018 at 10:00pm -0500,
Bart Van Assche wrote:
> On Sat, 2018-01-27 at 21:03 -0500, Mike Snitzer wrote:
> > You cannot even be forthcoming about the technical merit of a change you
> > authored (commit 6077c2d70) that I'm left to clean up in the face of
> >
When there are too many I/O errors on cache device, current bcache code
will retire the whole cache set, and detach all bcache devices. But the
detached bcache devices are not stopped, which is problematic when bcache
is in writeback mode.
If the retired cache set has dirty data of backing
In order to catch I/O error of backing device, a separate bi_end_io
call back is required. Then a per backing device counter can record I/O
errors number and retire the backing device if the counter reaches a
per backing device I/O error limit.
This patch adds backing_request_endio() to bcache
From: Tang Junhui
When we run IO in a detached device, and run iostat to shows IO status,
normally it will show like bellow (Omitted some fields):
Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util
sdd... 15.89 0.531.820.202.23
Hello Coly,
Sunday, January 28, 2018, 7:32:09 AM, you wrote:
>>> Current bcache failure handling code will stop all attached bcache devices
>>> when the cache set is broken or disconnected. This is desired behavior for
>>> most of enterprise or cloud use cases, but maybe not for low end
>>>
When bcache metadata I/O fails, bcache will call bch_cache_set_error()
to retire the whole cache set. The expected behavior to retire a cache
set is to unregister the cache set, and unregister all backing device
attached to this cache set, then remove sysfs entries of the cache set
and all
struct delayed_work writeback_rate_update in struct cache_dev is a delayed
worker to call function update_writeback_rate() in period (the interval is
defined by dc->writeback_rate_update_seconds).
When a metadate I/O error happens on cache device, bcache error handling
routine
Struct cache uses io_errors for two purposes,
- Error decay: when cache set error_decay is set, io_errors is used to
generate a small piece of delay when I/O error happens.
- I/O errors counter: in order to generate big enough value for error
decay, I/O errors counter value is stored by left
If a bcache device is configured to writeback mode, current code does not
handle write I/O errors on backing devices properly.
In writeback mode, write request is written to cache device, and
latter being flushed to backing device. If I/O failed when writing from
cache device to the backing
36 matches
Mail list logo