Hi Tejun,
On 18/2/28 02:33, Tejun Heo wrote:
> Hello, Joseph.
>
> On Sat, Feb 24, 2018 at 09:45:49AM +0800, Joseph Qi wrote:
>>> IIRC, as long as the blkcg and the device are there, the blkgs aren't
>>> gonna be destroyed. So, if you have a ref to the blkcg through
>>> tryget, the blkg
在 2018/2/27 22:57, Bart Van Assche 写道:
On Tue, 2018-02-27 at 15:09 +0800, chenxiang (M) wrote:
在 2018/2/26 23:25, Bart Van Assche 写道:
On Mon, 2018-02-26 at 17:37 +0800, chenxiang (M) wrote:
When i have a test on kernel 4.16-rc1, find a issue: running IO on SATA disk,
then disable the disk
On 28/02/2018 3:27 AM, Michael Lyle wrote:
> On 02/27/2018 10:33 AM, Michael Lyle wrote:
>> On 02/27/2018 10:29 AM, Michael Lyle wrote:
>>> Hi Coly Li--
>>>
>>> On 02/27/2018 08:55 AM, Coly Li wrote:
Hi maintainers and folks,
This patch set tries to improve bcache device failure
On 28/02/2018 2:20 AM, Michael Lyle wrote:
> Hi Coly Li--
>
> Just a couple of questions.
>
> On 02/27/2018 08:55 AM, Coly Li wrote:
>> +#define BACKING_DEV_OFFLINE_TIMEOUT 5
>
Hi Mike,
> I think you wanted this to be 30 (per commit message)-- was this turned
> down for testing or deliberate?
Hi Damien,
On Wed, Feb 28, 2018 at 02:21:49AM +, Damien Le Moal wrote:
> Ming,
>
> On 2018/02/27 17:35, Ming Lei wrote:
> > On Tue, Feb 27, 2018 at 04:28:30PM -0800, Bart Van Assche wrote:
> >> If a request times out the .completed_request() method is not called
> >
> > If BLK_EH_HANDLED is
Ming,
On 2018/02/27 17:35, Ming Lei wrote:
> On Tue, Feb 27, 2018 at 04:28:30PM -0800, Bart Van Assche wrote:
>> If a request times out the .completed_request() method is not called
>
> If BLK_EH_HANDLED is returned from .timeout(), __blk_mq_complete_request()
> should have called
From: Tang Junhui
Kernel crashed when register a duplicate cache device, the call trace is
bellow:
[ 417.643790] CPU: 1 PID: 16886 Comm: bcache-register Tainted: G
W OE4.15.5-amd64-preempt-sysrq-20171018 #2
[
Bart,
On 2018/02/27 16:32, Bart Van Assche wrote:
> When debugging the ZBC code in the mq-deadline scheduler it is very
> important to know which zones are locked and which zones are not
> locked. Hence this patch that exports the zone locking information
> through debugfs.
>
> Signed-off-by:
On Tue, Feb 27, 2018 at 04:28:30PM -0800, Bart Van Assche wrote:
> If a request times out the .completed_request() method is not called
If BLK_EH_HANDLED is returned from .timeout(), __blk_mq_complete_request()
should have called .completed_request(). Otherwise, somewhere may be
wrong about
From: Omar Sandoval
sbitmap_queue_get()/sbitmap_queue_clear() are used for
allocating/freeing a resource, so they should provide acquire/release
barrier semantics, respectively. sbitmap_get() currently contains a full
barrier, which is unnecessary, so use test_and_set_bit_lock()
From: Omar Sandoval
Two fixlets inspired by Tejun's patch
(https://patchwork.kernel.org/patch/10226749/). Patch 2 is what we
discussed on that patch, patch 1 is a small preparation.
Omar Sandoval (2):
block: clear ctx pending bit under ctx lock
sbitmap: use
From: Omar Sandoval
When we insert a request, we set the software queue pending bit while
holding the software queue lock. However, we clear it outside of the
lock, so it's possible that a concurrent insert could reset the bit
after we clear it but before we empty the request
Make sure that the queue show and store methods are contiguous and
also that these appear in alphabetical order.
Signed-off-by: Bart Van Assche
Cc: Omar Sandoval
Cc: Damien Le Moal
Cc: Ming Lei
Cc: Hannes
When debugging the ZBC code in the mq-deadline scheduler it is very
important to know which zones are locked and which zones are not
locked. Hence this patch that exports the zone locking information
through debugfs.
Signed-off-by: Bart Van Assche
Cc: Omar Sandoval
Hello Jens,
While analyzing the mq-deadline behavior for ZBC drives together with Damien
we noticed the following:
- That the request queue attribute methods are not contiguous in
blk-mq-debugfs.c.
- That the information about which zones are locked is not yet available in
debugfs.
Hence
If a request times out the .completed_request() method is not called
and the .finish_request() method is only called if RQF_ELVPRIV has
been set. Hence this patch that sets RQF_ELVPRIV and that adds a
.finish_request() method. Without this patch, if a request times out
the zone that request
On Sat, 2018-02-24 at 20:44 +0800, Ming Lei wrote:
> On Thu, Feb 22, 2018 at 05:08:02PM -0800, Bart Van Assche wrote:
> > Hello Jens,
> >
> > Recently Joseph Qi identified races between the block cgroup code and
> > request
> > queue initialization and cleanup. This patch series address these
On Thu, 2018-02-22 at 17:08 -0800, Bart Van Assche wrote:
> Recently Joseph Qi identified races between the block cgroup code and request
> queue initialization and cleanup. This patch series address these races.
> Please
> consider these patches for kernel v4.17.
Hello Jens,
Can you have a
On Tue, 2018-02-27 at 15:34 -0700, Jens Axboe wrote:
> Similarly to the support we have for testing/faking timeouts for
> null_blk, this adds support for triggering a requeue condition.
> Considering the issues around restart we've been seeing, this should be
> a useful addition to the testing
On 2/12/18 10:14 AM, Gustavo A. R. Silva wrote:
> It seems that the proper value to return in this particular case is the
> one contained into variable new_index instead of ret.
Thanks, applied.
--
Jens Axboe
On 02/26/2018 05:01 PM, Omar Sandoval wrote:
On Mon, Feb 12, 2018 at 11:14:55AM -0600, Gustavo A. R. Silva wrote:
It seems that the proper value to return in this particular case is the
one contained into variable new_index instead of ret.
Addresses-Coverity-ID: 1465148 ("Copy-paste error")
Similarly to the support we have for testing/faking timeouts for
null_blk, this adds support for triggering a requeue condition.
Considering the issues around restart we've been seeing, this should be
a useful addition to the testing arsenal to ensure that we are handling
requeue conditions
On Tue, Feb 27, 2018 at 10:14:04AM -0800, Tejun Heo wrote:
> Hello, Omar.
>
> On Mon, Feb 26, 2018 at 02:14:44PM -0800, Omar Sandoval wrote:
> > > wake_index = atomic_read(>wake_index);
> > > for (i = 0; i < SBQ_WAIT_QUEUES; i++) {
> > > struct sbq_wait_state *ws = >ws[wake_index];
> On 27 Feb 2018, at 19.46, Matias Bjørling wrote:
>
> On 02/27/2018 03:40 PM, Javier González wrote:
>>> On 26 Feb 2018, at 20.04, Matias Bjørling wrote:
>
>>> Can you help me understand why you want to use The
>>> NVM_CHK_ST_HOST_USE? Why would I care if
> On 27 Feb 2018, at 19.23, Matias Bjørling wrote:
>
> On 02/27/2018 04:57 PM, Javier González wrote:
>> Currently, the device geometry is stored redundantly in the nvm_id and
>> nvm_geo structures at a device level. Moreover, when instantiating
>> targets on a specific number
On 02/27/2018 10:33 AM, Michael Lyle wrote:
> On 02/27/2018 10:29 AM, Michael Lyle wrote:
>> Hi Coly Li--
>>
>> On 02/27/2018 08:55 AM, Coly Li wrote:
>>> Hi maintainers and folks,
>>>
>>> This patch set tries to improve bcache device failure handling, includes
>>> cache device and backing device
OK, I have convinced myself this is safe.
Reviewed-by: Michael Lyle
On 02/27/2018 08:55 AM, Coly Li wrote:
> struct delayed_work writeback_rate_update in struct cache_dev is a delayed
> worker to call function update_writeback_rate() in period (the interval is
> defined by
On 02/27/2018 03:40 PM, Javier González wrote:
On 26 Feb 2018, at 20.04, Matias Bjørling wrote:
Can you help me understand why you want to use The
NVM_CHK_ST_HOST_USE? Why would I care if the chunk state is HOST_USE?
A target instance should not be able to see states
Hello, Joseph.
On Sat, Feb 24, 2018 at 09:45:49AM +0800, Joseph Qi wrote:
> > IIRC, as long as the blkcg and the device are there, the blkgs aren't
> > gonna be destroyed. So, if you have a ref to the blkcg through
> > tryget, the blkg shouldn't go away.
> >
>
> Maybe we have misunderstanding
On 02/27/2018 10:29 AM, Michael Lyle wrote:
> Hi Coly Li--
>
> On 02/27/2018 08:55 AM, Coly Li wrote:
>> Hi maintainers and folks,
>>
>> This patch set tries to improve bcache device failure handling, includes
>> cache device and backing device failures.
>
> I have applied 1, 2, 4 & 6 from this
Hi Coly Li--
On 02/27/2018 08:55 AM, Coly Li wrote:
> Hi maintainers and folks,
>
> This patch set tries to improve bcache device failure handling, includes
> cache device and backing device failures.
I have applied 1, 2, 4 & 6 from this series to my 4.17 bcache-for-next
for testing.
Mike
On 02/27/2018 04:57 PM, Javier González wrote:
Currently, the device geometry is stored redundantly in the nvm_id and
nvm_geo structures at a device level. Moreover, when instantiating
targets on a specific number of LUNs, these structures are replicated
and manually modified to fit the instance
Hi Coly Li--
Just a couple of questions.
On 02/27/2018 08:55 AM, Coly Li wrote:
> +#define BACKING_DEV_OFFLINE_TIMEOUT 5
I think you wanted this to be 30 (per commit message)-- was this turned
down for testing or deliberate?
> +static int cached_dev_status_update(void *arg)
> +{
> + struct
Hi Coly Li--
On 02/27/2018 08:55 AM, Coly Li wrote:
> When too many I/Os failed on cache device, bch_cache_set_error() is called
> in the error handling code path to retire whole problematic cache set. If
> new I/O requests continue to come and take refcount dc->count, the cache
> set won't be
Hi Coly Li---
Thanks for this. I've been uncomfortable with the interaction between
the dirty status and the refcount (even aside from this issue), and I
believe you've resolved it. I'm sorry for the slow review-- it's taken
me some time to convince myself that this is safe.
I'm getting closer
On 2/27/18 10:49 AM, Michael Lyle wrote:
> Hi Jens,
>
> Please pick up these two critical fixes to bcache by Tang Junhui.
> They're both one-liners and have been reviewed and tested.
>
> The first corrects a regression when flash-only volumes are present
> that was introduced in 4.16-RC1. The
From: Coly Li
Commit 2831231d4c3f ("bcache: reduce cache_set devices iteration by
devices_max_used") adds c->devices_max_used to reduce iteration of
c->uuids elements, this value is updated in bcache_device_attach().
But for flash only volume, when calling flash_devs_run(), the
Hi Jens,
Please pick up these two critical fixes to bcache by Tang Junhui.
They're both one-liners and have been reviewed and tested.
The first corrects a regression when flash-only volumes are present
that was introduced in 4.16-RC1. The second adjusts bio refcount
and completion behavior to
From: Tang Junhui
Kernel crashed when run fio in a RAID5 backend bcache device, the call
trace is bellow:
[ 440.012034] kernel BUG at block/blk-ioc.c:146!
[ 440.012696] invalid opcode: [#1] SMP NOPTI
[ 440.026537] CPU: 2 PID: 2205 Comm: md127_raid5 Not tainted
In order to catch I/O error of backing device, a separate bi_end_io
call back is required. Then a per backing device counter can record I/O
errors number and retire the backing device if the counter reaches a
per backing device I/O error limit.
This patch adds backing_request_endio() to bcache
Currently bcache does not handle backing device failure, if backing
device is offline and disconnected from system, its bcache device can still
be accessible. If the bcache device is in writeback mode, I/O requests even
can success if the requests hit on cache device. That is to say, when and
how
When there are too many I/O errors on cache device, current bcache code
will retire the whole cache set, and detach all bcache devices. But the
detached bcache devices are not stopped, which is problematic when bcache
is in writeback mode.
If the retired cache set has dirty data of backing
From: Tang Junhui
When we run IO in a detached device, and run iostat to shows IO status,
normally it will show like bellow (Omitted some fields):
Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util
sdd... 15.89 0.531.820.202.23
In patch "bcache: fix cached_dev->count usage for bch_cache_set_error()",
cached_dev_get() is called when creating dc->writeback_thread, and
cached_dev_put() is called when exiting dc->writeback_thread. This
modification works well unless people detach the bcache device manually by
'echo 1 >
When bcache metadata I/O fails, bcache will call bch_cache_set_error()
to retire the whole cache set. The expected behavior to retire a cache
set is to unregister the cache set, and unregister all backing device
attached to this cache set, then remove sysfs entries of the cache set
and all
struct delayed_work writeback_rate_update in struct cache_dev is a delayed
worker to call function update_writeback_rate() in period (the interval is
defined by dc->writeback_rate_update_seconds).
When a metadate I/O error happens on cache device, bcache error handling
routine
When too many I/Os failed on cache device, bch_cache_set_error() is called
in the error handling code path to retire whole problematic cache set. If
new I/O requests continue to come and take refcount dc->count, the cache
set won't be retired immediately, this is a problem.
Further more, there
Hi maintainers and folks,
This patch set tries to improve bcache device failure handling, includes
cache device and backing device failures.
The basic idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices which are attached to this cache set
- Stop all the
On 27.02.2018 18:05, Nikolay Borisov wrote:
> Hello Tejun,
>
> So while running some fs tests I hit the following GPF. Btw the
> warning taint flag was due to a debugging WARN_ON in btrfs 100 or so
> tests ago so is unrelated to this gpf:
>
> [ 4255.628110] general protection fault:
> On 27 Feb 2018, at 16.57, Javier González wrote:
>
> Currently, the device geometry is stored redundantly in the nvm_id and
> nvm_geo structures at a device level. Moreover, when instantiating
> targets on a specific number of LUNs, these structures are replicated
> and
Hello Tejun,
So while running some fs tests I hit the following GPF. Btw the
warning taint flag was due to a debugging WARN_ON in btrfs 100 or so
tests ago so is unrelated to this gpf:
[ 4255.628110] general protection fault: [#1] SMP PTI
[ 4255.628303] Modules linked in:
[ 4255.628446]
Currently, the device geometry is stored redundantly in the nvm_id and
nvm_geo structures at a device level. Moreover, when instantiating
targets on a specific number of LUNs, these structures are replicated
and manually modified to fit the instance channel and LUN partitioning.
Instead, create a
Sending this separately as it seems to be the controversial one.
# Changes since V3
>From Matias:
- Remove nvm_common_geo
- Do appropriate renames when having a single geometry for device and
targets
Javier
Javier González (1):
lightnvm: simplify geometry structure.
On Tue, 2018-02-27 at 15:09 +0800, chenxiang (M) wrote:
> 在 2018/2/26 23:25, Bart Van Assche 写道:
> > On Mon, 2018-02-26 at 17:37 +0800, chenxiang (M) wrote:
> > > When i have a test on kernel 4.16-rc1, find a issue: running IO on SATA
> > > disk, then disable the disk through
> > > sysfs
> On 26 Feb 2018, at 20.04, Matias Bjørling wrote:
>
> On 02/26/2018 02:17 PM, Javier González wrote:
>> From: Javier González
>> In preparation of pblk supporting 2.0, implement the get log report
>> chunk in pblk.
>> This patch only replicates de bad
bio_check_eod() should check partiton size not the whole
disk if bio->bi_partno is not zero.
Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and
partitions index")
Signed-off-by: Jiufei Xue
---
block/blk-core.c | 79
bio_devname use __bdevname to display the device name, and can
only show the major and minor of the part0,
Fix this by using disk_name to display the correct name.
Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and
partitions index")
Signed-off-by: Jiufei Xue
Fix a typo in pkt_start_recovery.
Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and
partitions index")
Signed-off-by: Jiufei Xue
---
drivers/block/pktcdvd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
The vm counters is counted in sectors, so we should do the conversation
in submit_bio.
Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and
partitions index")
Cc: sta...@vger.kernel.org
Reviewed-by: Omar Sandoval
Reviewed-by: Christoph Hellwig
I have found a few problems while reviewing the patch 74d46992e0d9
("block: replace bi_bdev with a gendisk pointer and partitions index"),
So fix them.
Changes since v1:
- add a Fixes tag in individual patch.
- check end-of-device of bio in blk_partition_remap when the bi_partno
is not zero to
On 27.02.2018 11:57, Linus Walleij wrote:
> On Mon, Feb 26, 2018 at 10:48 PM, Dmitry Osipenko wrote:
>> On 22.02.2018 20:54, Dmitry Osipenko wrote:
>>> On 22.02.2018 10:42, Adrian Hunter wrote:
>
SDIO (unless it is a combo card) should be unaffected by changes to the
On Mon, Feb 26, 2018 at 01:44:55PM +0100, Jan Kara wrote:
> On Mon 26-02-18 11:38:19, Mark Rutland wrote:
> > That seems to be it!
> >
> > With the below patch applied, I can't trigger the bug after ~10 minutes,
> > whereas prior to the patch I can trigger it in ~10 seconds. I'll leave
> > that
It is observed on null_blk that IOPS can be improved much by simply making
hw queue per NUMA node, so this patch applies the introduced .host_tagset
for improving performance.
In reality, .can_queue is quite big, and NUMA node number is often small, so
each hw queue's depth should be high enough
It is observed that IOPS can be improved much by simply making
hw queue per NUMA node on null_blk, so this patch applies the
introduced .host_tagset for improving performance.
In reality, .can_queue is quite big, and NUMA node number is
often small, so each hw queue's depth should be high enough
From: Hannes Reinecke
Add a host template flag 'host_tagset' to enable the use of a global
tagset for block-mq.
Cc: Hannes Reinecke
Cc: Arun Easi
Cc: Omar Sandoval ,
Cc: "Martin K. Petersen" ,
Cc:
This patch introduces the parameter of 'g_host_tags' so that we can
test this feature by null_blk easiy.
With host_tags when the whole hw depth is kept as same, it is observed
that IOPS can be improved by ~50% on a dual socket(total 16 CPU cores)
system:
1) no 'host_tags', each hw queue depth is
This patch can support to partition host-wide tags to multiple hw queues,
so each hw queue related data structures(tags, hctx) can be accessed in
NUMA locality way, for example, the hw queue can be per NUMA node.
It is observed IOPS can be improved much in this way on null_blk test.
Cc: Hannes
This patch introduces 'start_tag' field to 'struct blk_mq_tags' so that
host wide tagset can be supported easily in the following patches by
partitioning host wide tags into multiple hw queues.
No function change.
Cc: Hannes Reinecke
Cc: Arun Easi
Cc: Omar
>From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
one msix vector can be created without any online CPU mapped, then
command may be queued, and won't be notified after its completion.
This patch setups mapping between cpu and reply queue according to irq
affinity info
>From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
one msix vector can be created without any online CPU mapped, then one
command's completion may not be notified.
This patch setups mapping between cpu and reply queue according to irq
affinity info retrived by
Hi All,
The 1st two patches fixes reply queue selection, and this issue has been
reported and can cause IO hang during booting, please consider the two
for V4.16.
The following 6 patches try to improve hostwide tagset on hpsa and
megaraid_sas by making hw queue per NUMA node.
I don't have
On 26/02/18 23:48, Dmitry Osipenko wrote:
> On 22.02.2018 20:54, Dmitry Osipenko wrote:
>> On 22.02.2018 10:42, Adrian Hunter wrote:
>>> On 21/02/18 22:50, Dmitry Osipenko wrote:
On 29.11.2017 16:41, Adrian Hunter wrote:
> Define and use a blk-mq queue. Discards and flushes are processed
On Mon, Feb 26, 2018 at 10:48 PM, Dmitry Osipenko wrote:
> On 22.02.2018 20:54, Dmitry Osipenko wrote:
>> On 22.02.2018 10:42, Adrian Hunter wrote:
>>> SDIO (unless it is a combo card) should be unaffected by changes to the
>>> block driver.
>
> I don't know whether it's a
73 matches
Mail list logo