On Fri, Feb 2, 2018 at 5:40 PM, Doug Ledford wrote:
> On Fri, 2018-02-02 at 16:07 +, Bart Van Assche wrote:
>> On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
>> > Since the first version the following was changed:
>> >
>> >- Load-balancing and IO fail-over using multipath features wer
On Thu 01-02-18 19:58:45, Eric Biggers wrote:
> On Thu, Jan 11, 2018 at 06:00:08PM +0100, Jan Kara wrote:
> > On Thu 11-01-18 19:22:39, Hou Tao wrote:
> > > Hi,
> > >
> > > On 2018/1/11 16:24, Dan Carpenter wrote:
> > > > Thanks for your report and the patch. I am sending it to the
> > > > linux-
Hi Bart,
My another 2 cents:)
On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche wrote:
> On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
>> o Simple configuration of IBNBD:
>>- Server side is completely passive: volumes do not need to be
>> explicitly exported.
>
> That sounds like a s
Hi Kashyap,
On Mon, Feb 05, 2018 at 12:35:13PM +0530, Kashyap Desai wrote:
> > -Original Message-
> > From: Hannes Reinecke [mailto:h...@suse.de]
> > Sent: Monday, February 5, 2018 12:28 PM
> > To: Ming Lei; Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig;
> > Mike Snitzer
> > C
On Mon, Feb 05, 2018 at 07:58:29AM +0100, Hannes Reinecke wrote:
> On 02/03/2018 05:21 AM, Ming Lei wrote:
> > Hi All,
> >
> > This patchset supports global tags which was started by Hannes originally:
> >
> > https://marc.info/?l=linux-block&m=149132580511346&w=2
> >
> > Also inroduce 'forc
On Mon, Feb 05, 2018 at 07:54:29AM +0100, Hannes Reinecke wrote:
> On 02/03/2018 05:21 AM, Ming Lei wrote:
> > Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple
> > reply queues, but tags is often HBA wide.
> >
> > These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_
Hi Roman,
Here are some comments below.
+int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe)
+{
+ struct ib_recv_wr wr, *bad_wr;
+
+ wr.next= NULL;
+ wr.wr_cqe = cqe;
+ wr.sg_list = NULL;
+ wr.num_sge = 0;
+
+ return ib_post_recv(con->qp
Hi Roman,
+struct ibtrs_clt_io_req {
+ struct list_headlist;
+ struct ibtrs_iu *iu;
+ struct scatterlist *sglist; /* list holding user data */
+ unsigned intsg_cnt;
+ unsigned intsg_size;
+ unsigned int
Hi Roman,
+static inline void ibtrs_clt_state_lock(void)
+{
+ rcu_read_lock();
+}
+
+static inline void ibtrs_clt_state_unlock(void)
+{
+ rcu_read_unlock();
+}
This looks rather pointless...
+
+#define cmpxchg_min(var, new) ({ \
+ typeo
Hi Roman,
This is the sysfs interface to IBTRS sessions on client side:
/sys/kernel/ibtrs_client//
*** IBTRS session created by ibtrs_clt_open() API call
|
|- max_reconnect_attempts
| *** number of reconnect attempts for session
|
|- add_path
| *** adds a
Hi Roman,
Some comments below.
On 02/02/2018 04:08 PM, Roman Pen wrote:
This is main functionality of ibtrs-server module, which accepts
set of RDMA connections (so called IBTRS session), creates/destroys
sysfs entries associated with IBTRS session and notifies upper layer
(user of IBTRS API) a
Hi Bart,
My another 2 cents:)
On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche wrote:
On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
o Simple configuration of IBNBD:
- Server side is completely passive: volumes do not need to be
explicitly exported.
That sounds like a securit
From: Tang Junhui
back-end device sdm has already attached a cache_set with ID
f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with
another cache set, and it returns with an error:
[root]# cd /sys/block/sdm/bcache
[root]# echo 5ccd0a63-148e-48b8-afa2-aca9cbd6279f > attach
-bash: echo: wr
There are no groups in the 2.0 specification, make sure that the
nvm_id structure is flattened before 2.0 data structures are added.
Signed-off-by: Matias Bjørling
---
drivers/lightnvm/core.c | 25 ++-
drivers/nvme/host/lightnvm.c | 100 +--
Implement the geometry data structures for 2.0 and enable a drive
to be identified as one, including exposing the appropriate 2.0
sysfs entries.
Signed-off-by: Matias Bjørling
---
drivers/lightnvm/core.c | 2 +-
drivers/nvme/host/lightnvm.c | 334 +-
Hi,
A couple of patches for 2.0 support for the lightnvm subsystem. They
form the basis for integrating 2.0 support.
For the rest of the support, Javier has code that implements report
chunk and sets up the LBA format data structure. He also has a bunch
of patches that brings pblk up to speed.
T
Hi Roman and the team,
On 02/02/2018 04:08 PM, Roman Pen wrote:
This series introduces IBNBD/IBTRS modules.
IBTRS (InfiniBand Transport) is a reliable high speed transport library
which allows for establishing connection between client and server
machines via RDMA.
So its not strictly infinib
The nvme driver sets up the size of the nvme namespace in two steps.
First it initializes the device with standard logical block and
metadata sizes, and then sets the correct logical block and metadata
size. Due to the OCSSD 2.0 specification relies on the namespace to
expose these sizes for correc
Make the 1.2 data structures explicit, so it will be easy to identify
the 2.0 data structures. Also fix the order of which the nvme_nvm_*
are declared, such that they follow the nvme_nvm_command order.
Signed-off-by: Matias Bjørling
---
drivers/nvme/host/lightnvm.c | 82 ++---
Hi Roman and the team (again), replying to my own email :)
I forgot to mention that first of all thank you for upstreaming
your work! I fully support your goal to have your production driver
upstream to minimize your maintenance efforts. I hope that my
feedback didn't came across with a different
On Fri, Feb 2, 2018 at 4:11 PM, Jens Axboe wrote:
> On 2/2/18 7:08 AM, Roman Pen wrote:
>> This is main functionality of ibnbd-client module, which provides
>> interface to map remote device as local block device /dev/ibnbd
>> and feeds IBTRS with IO requests.
>
> Kill the legacy IO path for this,
On Fri, Feb 2, 2018 at 4:55 PM, Bart Van Assche wrote:
> On Fri, 2018-02-02 at 15:09 +0100, Roman Pen wrote:
>> +Entries under /sys/kernel/ibnbd_client/
>> +===
>> [ ... ]
>
> You will need Greg KH's permission to add new entries directly under
> /sys/kernel.
>
On Fri, Feb 2, 2018 at 5:54 PM, Bart Van Assche wrote:
> On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
>> +static inline struct ibtrs_tag *
>> +__ibtrs_get_tag(struct ibtrs_clt *clt, enum ibtrs_clt_con_type con_type)
>> +{
>> + size_t max_depth = clt->queue_depth;
>> + struct ibtrs_t
>
>> Hi Bart,
>>
>> My another 2 cents:)
>> On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche
>> wrote:
>>>
>>> On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
o Simple configuration of IBNBD:
- Server side is completely passive: volumes do not need to be
explicitly
Indeed, seems sbitmap can be reused.
But tags is a part of IBTRS, and is not related to block device at all. One
IBTRS connection (session) handles many block devices
we use host shared tag sets for the case of multiple block devices.
(or any IO producers).
Lets wait until we actually ha
/sys/kernel was chosen ages ago and I completely forgot to move it to configfs.
IBTRS is not a block device, so for some read-only entries (statistics
or states)
something else should be probably used, not configfs. Or it is fine
to read state
of the connection from configfs? For me sounds a b
Hi Bart,
My another 2 cents:)
On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche
wrote:
On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
o Simple configuration of IBNBD:
- Server side is completely passive: volumes do not need to be
explicitly exported.
That sounds like a se
Hi Sagi,
On Mon, Feb 5, 2018 at 12:19 PM, Sagi Grimberg wrote:
> Hi Roman,
>
>> +static inline void ibtrs_clt_state_lock(void)
>> +{
>> + rcu_read_lock();
>> +}
>> +
>> +static inline void ibtrs_clt_state_unlock(void)
>> +{
>> + rcu_read_unlock();
>> +}
>
>
> This looks rather pointle
Hi All,
This patchset supports global tags which was started by Hannes originally:
https://marc.info/?l=linux-block&m=149132580511346&w=2
Also inroduce 'force_blk_mq' and 'host_tagset' to 'struct scsi_host_template',
so that driver can avoid to support two IO paths(legacy and blk-mq), es
This patch changes tags->breserved_tags, tags->bitmap_tags and
tags->active_queues as pointer, and prepares for supporting global tags.
No functional change.
Tested-by: Laurence Oberman
Reviewed-by: Hannes Reinecke
Cc: Mike Snitzer
Cc: Christoph Hellwig
Signed-off-by: Ming Lei
---
block/bfq
Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple
reply queues, but tags is often HBA wide.
These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY)
for automatic affinity assignment.
Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
has
From: Hannes Reinecke
Add a host template flag 'host_tagset' to enable the use of a global
tagset for block-mq.
Cc: Hannes Reinecke
Cc: Arun Easi
Cc: Omar Sandoval ,
Cc: "Martin K. Petersen" ,
Cc: James Bottomley ,
Cc: Christoph Hellwig ,
Cc: Don Brace
Cc: Kashyap Desai
Cc: Peter Rivera
Cc:
This patch introduces the parameter of 'g_global_tags' so that we can
test this feature by null_blk easiy.
Not see obvious performance drop with global_tags when the whole hw
depth is kept as same:
1) no 'global_tags', each hw queue depth is 1, and 4 hw queues
modprobe null_blk queue_mode=2 nr_de
>From scsi driver view, it is a bit troublesome to support both blk-mq
and non-blk-mq at the same time, especially when drivers need to support
multi hw-queue.
This patch introduces 'force_blk_mq' to scsi_host_template so that drivers
can provide blk-mq only support, so driver code can avoid the t
Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
has been merged to V4.16-rc, and it is easy to allocate all offline CPUs
for some irq vectors, this can't be avoided even though the allocation
is improved.
For example, on a 8cores VM, 4~7 are not-present/offline, 4 queues
So that we can decide the default reply queue by the map created
during adding host.
Cc: Hannes Reinecke
Cc: Arun Easi
Cc: Omar Sandoval ,
Cc: "Martin K. Petersen" ,
Cc: James Bottomley ,
Cc: Christoph Hellwig ,
Cc: Don Brace
Cc: Kashyap Desai
Cc: Peter Rivera
Cc: Paolo Bonzini
Cc: Mike Snit
This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps
each reply queue to blk_mq's hw queue, then .queuecommand can always
choose the hw queue as the reply queue. And if no any online CPU is
mapped to one hw queue, request can't be submitted to this hw queue
at all, finally the irq
On 05/02/2018 16:20, Ming Lei wrote:
> Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
> has been merged to V4.16-rc, and it is easy to allocate all offline CPUs
> for some irq vectors, this can't be avoided even though the allocation
> is improved.
>
> For example, on a
On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote:
> This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime
> maps
> each reply queue to blk_mq's hw queue, then .queuecommand can always
> choose the hw queue as the reply queue. And if no any online CPU is
> mapped to one hw queue, reques
Hi All,
We've got some "strange" issue on a Xen hypervisor with CentOS 6 and
4.9.63-29.el6.x86_6 kernel.
The system has a local raid + is connected with 2 iscsi sessions to 3
disks with multipath (6 blockdevs in total).
We've noticed that vgdisplay was hanging, and the kernel was printing
th
> -Original Message-
> This is a critical issue on the HPSA because Linus already has the
> original commit that causes the system to fail to boot.
>
> All my testing was on DL380 G7 servers with:
>
> Hewlett-Packard Company Smart Array G6 controllers
> Vendor: HP Model: P410i
On Mon, 2018-02-05 at 09:56 +0100, Jinpu Wang wrote:
> Hi Bart,
>
> My another 2 cents:)
> On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche
> wrote:
> > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
> > > o Simple configuration of IBNBD:
> > >- Server side is completely passive: volumes
On Mon, 2018-02-05 at 15:19 +0100, Roman Penyaev wrote:
> On Mon, Feb 5, 2018 at 12:19 PM, Sagi Grimberg wrote:
> > Do you actually ever have remote write access in your protocol?
>
> We do not have reads, instead client writes on write and server writes
> on read. (write only storage solution :)
On Mon, Feb 5, 2018 at 5:16 PM, Bart Van Assche wrote:
> On Mon, 2018-02-05 at 09:56 +0100, Jinpu Wang wrote:
>> Hi Bart,
>>
>> My another 2 cents:)
>> On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche
>> wrote:
>> > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
>> > > o Simple configuration
On Mon, Feb 5, 2018 at 3:17 PM, Sagi Grimberg wrote:
>
Hi Bart,
My another 2 cents:)
On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche
wrote:
>
>
> On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
>>
>>
>> o Simple configuration of IBNBD:
>>
On Mon, 2018-02-05 at 14:16 +0200, Sagi Grimberg wrote:
> - Your latency measurements are surprisingly high for a null target
>device (even for low end nvme device actually) regardless of the
>transport implementation.
>
> For example:
> - QD=1 read latency is 648.95 for ibnbd (I assume us
On Mon, Feb 5, 2018 at 3:14 PM, Sagi Grimberg wrote:
>
>> Indeed, seems sbitmap can be reused.
>>
>> But tags is a part of IBTRS, and is not related to block device at all.
>> One
>> IBTRS connection (session) handles many block devices
>
>
> we use host shared tag sets for the case of multiple bl
Hi Bart,
On Mon, Feb 5, 2018 at 5:58 PM, Bart Van Assche wrote:
> On Mon, 2018-02-05 at 14:16 +0200, Sagi Grimberg wrote:
>> - Your latency measurements are surprisingly high for a null target
>>device (even for low end nvme device actually) regardless of the
>>transport implementation.
>
On Mon, 2018-02-05 at 18:16 +0100, Roman Penyaev wrote:
> Everything (fio jobs, setup, etc) is given in the same link:
>
> https://www.spinics.net/lists/linux-rdma/msg48799.html
>
> at the bottom you will find links on google docs with many pages
> and archived fio jobs and scripts. (I do not rem
On Sat, 2018-02-03 at 10:51 +0800, Joseph Qi wrote:
> Hi Bart,
>
> On 18/2/3 00:21, Bart Van Assche wrote:
> > On Fri, 2018-02-02 at 09:02 +0800, Joseph Qi wrote:
> > > We triggered this race when using single queue. I'm not sure if it
> > > exists in multi-queue.
> >
> > Regarding the races betw
On 02/05/2018 04:15 AM, Matias Bjørling wrote:
> Implement the geometry data structures for 2.0 and enable a drive
> to be identified as one, including exposing the appropriate 2.0
> sysfs entries.
>
> Signed-off-by: Matias Bjørling
> ---
> drivers/lightnvm/core.c | 2 +-
> drivers/nvme/h
On 02/05/18 08:40, Danil Kipnis wrote:
It just occurred to me, that we could easily extend the interface in
such a way that each client (i.e. each session) would have on server
side her own directory with the devices it can access. I.e. instead of
just "dev_search_path" per server, any client wou
> -Original Message-
> From: Laurence Oberman [mailto:lober...@redhat.com]
> Sent: Monday, February 05, 2018 9:58 AM
> To: Ming Lei ; Jens Axboe ; linux-
> bl...@vger.kernel.org; Christoph Hellwig ; Mike Snitzer
> ; Don Brace
> Cc: linux-s...@vger.kernel.org; Hannes Reinecke ; Arun Easi
>
> -Original Message-
> From: Ming Lei [mailto:ming@redhat.com]
> Sent: Monday, February 05, 2018 9:21 AM
> To: Jens Axboe ; linux-block@vger.kernel.org; Christoph
> Hellwig ; Mike Snitzer
> Cc: linux-s...@vger.kernel.org; Hannes Reinecke ; Arun Easi
> ; Omar Sandoval ; Martin K .
> P
Hi,
just a note: the most difficult part in the implementation of this
patch has been how to handle the fact that the requeue and finish
hooks of the active elevator get invoked even for requests that are
not referred in that elevator any longer. You can find details in the
comments introduced by t
Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
be re-inserted into the active I/O scheduler for that device. As a
consequence, I/O schedulers may get the same request inserted again,
even several times, w
> Il giorno 30 gen 2018, alle ore 16:40, Paolo Valente
> ha scritto:
>
>
>
>> Il giorno 30 gen 2018, alle ore 15:40, Ming Lei ha
>> scritto:
>>
>> On Tue, Jan 30, 2018 at 03:30:28PM +0100, Oleksandr Natalenko wrote:
>>> Hi.
>>>
>> ...
>>> systemd-udevd-271 [000] 4.311033: bfq
I have a workload where one process sends many asynchronous write bios
(without waiting for them) and another process sends synchronous flush
bios. During this workload, writeback throttling throttles down to one
outstanding bio, and this incorrect throttling causes performance
degradation (all wri
On 2/5/18 12:11 PM, Mikulas Patocka wrote:
> I have a workload where one process sends many asynchronous write bios
> (without waiting for them) and another process sends synchronous flush
> bios. During this workload, writeback throttling throttles down to one
> outstanding bio, and this incorrect
On 2/5/18 12:22 PM, Jens Axboe wrote:
> On 2/5/18 12:11 PM, Mikulas Patocka wrote:
>> I have a workload where one process sends many asynchronous write bios
>> (without waiting for them) and another process sends synchronous flush
>> bios. During this workload, writeback throttling throttles down t
On Mon, 5 Feb 2018, Jens Axboe wrote:
> On 2/5/18 12:22 PM, Jens Axboe wrote:
> > On 2/5/18 12:11 PM, Mikulas Patocka wrote:
> >> I have a workload where one process sends many asynchronous write bios
> >> (without waiting for them) and another process sends synchronous flush
> >> bios. During t
On 2/5/18 1:00 PM, Mikulas Patocka wrote:
>
>
> On Mon, 5 Feb 2018, Jens Axboe wrote:
>
>> On 2/5/18 12:22 PM, Jens Axboe wrote:
>>> On 2/5/18 12:11 PM, Mikulas Patocka wrote:
I have a workload where one process sends many asynchronous write bios
(without waiting for them) and another
> This patch introduces the parameter of 'g_global_tags' so that we can
> test this feature by null_blk easiy.
>
> Not see obvious performance drop with global_tags when the whole hw
> depth is kept as same:
>
> 1) no 'global_tags', each hw queue depth is 1, and 4 hw queues
> modprobe null_blk qu
Hello, Bart.
Thanks a lot for testing and fixing the issues but I'm a bit confused
by the patch. Maybe we can split patch a bit more? There seem to be
three things going on,
1. Changing preemption protection to irq protection in issue path.
2. Merge of aborted_gstate_sync and gstate_seq.
3. U
On Mon, 2018-02-05 at 13:06 -0800, Tejun Heo wrote:
> Thanks a lot for testing and fixing the issues but I'm a bit confused
> by the patch. Maybe we can split patch a bit more? There seem to be
> three things going on,
>
> 1. Changing preemption protection to irq protection in issue path.
>
> 2
Hello, Bart.
On Mon, Feb 05, 2018 at 09:33:03PM +, Bart Van Assche wrote:
> My goal with this patch is to fix the race between resetting the timer and
> the completion path. Hence change (3). Changes (1) and (2) are needed to
> make the changes in blk_mq_rq_timed_out() work.
Ah, I see. That
Update online stats for fired event and completions on
each poll cycle.
Also expose am initialization interface. The irqpoll consumer will
initialize the irq-am context of the irq-poll context.
Signed-off-by: Sagi Grimberg
---
include/linux/irq_poll.h | 9 +
lib/Kconfig |
Currently activated via modparam, obviously we will want to
find a more generic way to control this.
Signed-off-by: Sagi Grimberg
---
drivers/infiniband/core/cq.c | 48
1 file changed, 48 insertions(+)
diff --git a/drivers/infiniband/core/cq.c b/driv
Adaptive IRQ moderation (also called adaptive IRQ coalescing) has been widely
used
in the networking stack for over 20 years and has become a standard default
setting.
Adaptive moderation is a feature supported by the device to delay an interrupt
for a either a period of time, or number of comple
Signed-off-by: Sagi Grimberg
---
drivers/infiniband/core/cq.c | 25 -
include/rdma/ib_verbs.h | 9 +++--
2 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index 270801d28f9d..1d984fd77449 1
Useful for local debugging
Signed-off-by: Sagi Grimberg
---
include/linux/irq-am.h | 2 +
lib/irq-am.c | 109 +
2 files changed, 111 insertions(+)
diff --git a/include/linux/irq-am.h b/include/linux/irq-am.h
index 5ddd5ca268aa..18df315
irq-am library helps I/O devices implement interrupt moderation in
an adaptive fashion, based on online stats.
The consumer can initialize an irq-am context with a callback that
performs the device specific moderation programming and also the number
of am (adaptive moderation) levels which are als
在 2018/2/5 23:20, Ming Lei 写道:
This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps
each reply queue to blk_mq's hw queue, then .queuecommand can always
choose the hw queue as the reply queue. And if no any online CPU is
mapped to one hw queue, request can't be submitted to this
From: Srivatsa S. Bhat
register_blkdev() and __register_chrdev_region() treat the major
number as an unsigned int. So print it the same way to avoid
absurd error statements such as:
"... major requested (-1) is greater than the maximum (511) ..."
(and also fix off-by-one bugs in the error prints)
From: Srivatsa S. Bhat
CHRDEV_MAJOR_DYN_END and CHRDEV_MAJOR_DYN_EXT_END are valid major
numbers. So fix the loop iteration to include them in the search for
free major numbers.
While at it, also remove a redundant if condition ("cd->major != i"),
as it will never be true.
Signed-off-by: Srivat
Hi maintainers and folks,
This patch set tries to improve bcache device failure handling, includes
cache device and backing device failures.
The basic idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices which are attached to this cache set
- Stop all the det
struct delayed_work writeback_rate_update in struct cache_dev is a delayed
worker to call function update_writeback_rate() in period (the interval is
defined by dc->writeback_rate_update_seconds).
When a metadate I/O error happens on cache device, bcache error handling
routine bch_cache_set_error(
dc->writeback_rate_update_seconds can be set via sysfs and its value can
be set to [1, ULONG_MAX]. It does not make sense to set such a large
value, 60 seconds is long enough value considering the default 5 seconds
works well for long time.
Because dc->writeback_rate_update is a special delayed w
In patch "bcache: fix cached_dev->count usage for bch_cache_set_error()",
cached_dev_get() is called when creating dc->writeback_thread, and
cached_dev_put() is called when exiting dc->writeback_thread. This
modification works well unless people detach the bcache device manually by
'echo 1 > /s
When bcache metadata I/O fails, bcache will call bch_cache_set_error()
to retire the whole cache set. The expected behavior to retire a cache
set is to unregister the cache set, and unregister all backing device
attached to this cache set, then remove sysfs entries of the cache set
and all attached
From: Tang Junhui
When we run IO in a detached device, and run iostat to shows IO status,
normally it will show like bellow (Omitted some fields):
Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util
sdd... 15.89 0.531.820.202.23 1.81 52.30
bcache0..
When there are too many I/O errors on cache device, current bcache code
will retire the whole cache set, and detach all bcache devices. But the
detached bcache devices are not stopped, which is problematic when bcache
is in writeback mode.
If the retired cache set has dirty data of backing devices
In order to catch I/O error of backing device, a separate bi_end_io
call back is required. Then a per backing device counter can record I/O
errors number and retire the backing device if the counter reaches a
per backing device I/O error limit.
This patch adds backing_request_endio() to bcache bac
When too many I/Os failed on cache device, bch_cache_set_error() is called
in the error handling code path to retire whole problematic cache set. If
new I/O requests continue to come and take refcount dc->count, the cache
set won't be retired immediately, this is a problem.
Further more, there are
Currently bcache does not handle backing device failure, if backing
device is offline and disconnected from system, its bcache device can still
be accessible. If the bcache device is in writeback mode, I/O requests even
can success if the requests hit on cache device. That is to say, when and
how b
Hi Paolo,
I applied this to master.today, flipped udev back to bfq and took it
for a spin. Unfortunately, box fairly quickly went boom under load.
[ 454.739975] [ cut here ]
[ 454.739979] list_add corruption. prev->next should be next
(5f99a42a), but was
If a bcache device is configured to writeback mode, current code does not
handle write I/O errors on backing devices properly.
In writeback mode, write request is written to cache device, and
latter being flushed to backing device. If I/O failed when writing from
cache device to the backing device
> > We still have more than one reply queue ending up completion one CPU.
>
> pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) has to be used, that means
> smp_affinity_enable has to be set as 1, but seems it is the default
setting.
>
> Please see kernel/irq/affinity.c, especially irq_calc_affinity_vectors(
On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg wrote:
[...]
> In the networking stack, each device driver implements adaptive IRQ moderation
> on its own. The approach here is a bit different, it tries to take the common
> denominator,
> which is per-queue statistics gathering and workload change
We've triggered a WARNING in blk_throtl_bio when throttling writeback
io, which complains blkg->refcnt is already 0 when calling blkg_get, and
then kernel crashes with invalid page request.
After investigating this issue, we've found there is a race between
blkcg_bio_issue_check and cgroup_rmdir. T
On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg wrote:
> irq-am library helps I/O devices implement interrupt moderation in
> an adaptive fashion, based on online stats.
>
> The consumer can initialize an irq-am context with a callback that
> performs the device specific moderation programming and
Hi, Paolo.
I can confirm that this patch fixes cfdisk hang for me. I've also tried
to trigger the issue Mike has encountered, but with no luck (maybe, I
wasn't insistent enough, just was doing dd on usb-storage device in the
VM).
So, with regard to cfdisk hang on usb-storage:
Tested-by: Ole
On Tue, 2018-02-06 at 08:44 +0100, Oleksandr Natalenko wrote:
> Hi, Paolo.
>
> I can confirm that this patch fixes cfdisk hang for me. I've also tried
> to trigger the issue Mike has encountered, but with no luck (maybe, I
> wasn't insistent enough, just was doing dd on usb-storage device in the
93 matches
Mail list logo