Re: [PATCH 04/24] ibtrs: client: private header with client structs and functions

2018-02-05 Thread Sagi Grimberg
Hi Roman, +struct ibtrs_clt_io_req { + struct list_headlist; + struct ibtrs_iu *iu; + struct scatterlist *sglist; /* list holding user data */ + unsigned intsg_cnt; + unsigned intsg_size; + unsigned int

[PATCH 2/4] lightnvm: flatten nvm_id_group into nvm_id

2018-02-05 Thread Matias Bjørling
There are no groups in the 2.0 specification, make sure that the nvm_id structure is flattened before 2.0 data structures are added. Signed-off-by: Matias Bjørling --- drivers/lightnvm/core.c | 25 ++- drivers/nvme/host/lightnvm.c | 100

[PATCH 3/4] lightnvm: add 2.0 geometry identification

2018-02-05 Thread Matias Bjørling
Implement the geometry data structures for 2.0 and enable a drive to be identified as one, including exposing the appropriate 2.0 sysfs entries. Signed-off-by: Matias Bjørling --- drivers/lightnvm/core.c | 2 +- drivers/nvme/host/lightnvm.c | 334

[PATCH 0/4] lightnvm: base 2.0 implementation

2018-02-05 Thread Matias Bjørling
Hi, A couple of patches for 2.0 support for the lightnvm subsystem. They form the basis for integrating 2.0 support. For the rest of the support, Javier has code that implements report chunk and sets up the LBA format data structure. He also has a bunch of patches that brings pblk up to speed.

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Sagi Grimberg
Hi Roman and the team, On 02/02/2018 04:08 PM, Roman Pen wrote: This series introduces IBNBD/IBTRS modules. IBTRS (InfiniBand Transport) is a reliable high speed transport library which allows for establishing connection between client and server machines via RDMA. So its not strictly

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Sagi Grimberg
Hi Roman, +static inline void ibtrs_clt_state_lock(void) +{ + rcu_read_lock(); +} + +static inline void ibtrs_clt_state_unlock(void) +{ + rcu_read_unlock(); +} This looks rather pointless... + +#define cmpxchg_min(var, new) ({ \ +

Re: [PATCH 07/24] ibtrs: client: sysfs interface functions

2018-02-05 Thread Sagi Grimberg
Hi Roman, This is the sysfs interface to IBTRS sessions on client side: /sys/kernel/ibtrs_client// *** IBTRS session created by ibtrs_clt_open() API call | |- max_reconnect_attempts | *** number of reconnect attempts for session | |- add_path | *** adds

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Sagi Grimberg
Hi Roman and the team (again), replying to my own email :) I forgot to mention that first of all thank you for upstreaming your work! I fully support your goal to have your production driver upstream to minimize your maintenance efforts. I hope that my feedback didn't came across with a

Re: [PATCH 2/5] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS

2018-02-05 Thread Ming Lei
On Mon, Feb 05, 2018 at 07:54:29AM +0100, Hannes Reinecke wrote: > On 02/03/2018 05:21 AM, Ming Lei wrote: > > Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple > > reply queues, but tags is often HBA wide. > > > > These HBAs have switched to use

Re: [PATCH 03/24] ibtrs: core: lib functions shared between client and server modules

2018-02-05 Thread Sagi Grimberg
Hi Roman, Here are some comments below. +int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe) +{ + struct ib_recv_wr wr, *bad_wr; + + wr.next= NULL; + wr.wr_cqe = cqe; + wr.sg_list = NULL; + wr.num_sge = 0; + + return

Re: [PATCH 09/24] ibtrs: server: main functionality

2018-02-05 Thread Sagi Grimberg
Hi Roman, Some comments below. On 02/02/2018 04:08 PM, Roman Pen wrote: This is main functionality of ibtrs-server module, which accepts set of RDMA connections (so called IBTRS session), creates/destroys sysfs entries associated with IBTRS session and notifies upper layer (user of IBTRS API)

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Sagi Grimberg
Hi Bart, My another 2 cents:) On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche wrote: On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: o Simple configuration of IBNBD: - Server side is completely passive: volumes do not need to be explicitly exported.

Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-05 Thread Ming Lei
Hi Kashyap, On Mon, Feb 05, 2018 at 12:35:13PM +0530, Kashyap Desai wrote: > > -Original Message- > > From: Hannes Reinecke [mailto:h...@suse.de] > > Sent: Monday, February 5, 2018 12:28 PM > > To: Ming Lei; Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; > > Mike Snitzer > >

Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-05 Thread Ming Lei
On Mon, Feb 05, 2018 at 07:58:29AM +0100, Hannes Reinecke wrote: > On 02/03/2018 05:21 AM, Ming Lei wrote: > > Hi All, > > > > This patchset supports global tags which was started by Hannes originally: > > > > https://marc.info/?l=linux-block=149132580511346=2 > > > > Also inroduce

[PATCH 2/2] bcache: fix for data collapse after re-attaching an attached device

2018-02-05 Thread tang . junhui
From: Tang Junhui back-end device sdm has already attached a cache_set with ID f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with another cache set, and it returns with an error: [root]# cd /sys/block/sdm/bcache [root]# echo 5ccd0a63-148e-48b8-afa2-aca9cbd6279f

[PATCH 4/4] nvme: lightnvm: add late setup of block size and metadata

2018-02-05 Thread Matias Bjørling
The nvme driver sets up the size of the nvme namespace in two steps. First it initializes the device with standard logical block and metadata sizes, and then sets the correct logical block and metadata size. Due to the OCSSD 2.0 specification relies on the namespace to expose these sizes for

[PATCH 1/4] lightnvm: make 1.2 data structures explicit

2018-02-05 Thread Matias Bjørling
Make the 1.2 data structures explicit, so it will be easy to identify the 2.0 data structures. Also fix the order of which the nvme_nvm_* are declared, such that they follow the nvme_nvm_command order. Signed-off-by: Matias Bjørling --- drivers/nvme/host/lightnvm.c | 82

Re: [PATCH 16/24] ibnbd: client: main functionality

2018-02-05 Thread Roman Penyaev
On Fri, Feb 2, 2018 at 4:11 PM, Jens Axboe wrote: > On 2/2/18 7:08 AM, Roman Pen wrote: >> This is main functionality of ibnbd-client module, which provides >> interface to map remote device as local block device /dev/ibnbd >> and feeds IBTRS with IO requests. > > Kill the legacy

Re: [PATCH 23/24] ibnbd: a bit of documentation

2018-02-05 Thread Sagi Grimberg
/sys/kernel was chosen ages ago and I completely forgot to move it to configfs. IBTRS is not a block device, so for some read-only entries (statistics or states) something else should be probably used, not configfs. Or it is fine to read state of the connection from configfs? For me sounds a

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Roman Penyaev
On Fri, Feb 2, 2018 at 5:54 PM, Bart Van Assche wrote: > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: >> +static inline struct ibtrs_tag * >> +__ibtrs_get_tag(struct ibtrs_clt *clt, enum ibtrs_clt_con_type con_type) >> +{ >> + size_t max_depth =

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Sagi Grimberg
Indeed, seems sbitmap can be reused. But tags is a part of IBTRS, and is not related to block device at all. One IBTRS connection (session) handles many block devices we use host shared tag sets for the case of multiple block devices. (or any IO producers). Lets wait until we actually

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Roman Penyaev
Hi Sagi, On Mon, Feb 5, 2018 at 12:19 PM, Sagi Grimberg wrote: > Hi Roman, > >> +static inline void ibtrs_clt_state_lock(void) >> +{ >> + rcu_read_lock(); >> +} >> + >> +static inline void ibtrs_clt_state_unlock(void) >> +{ >> + rcu_read_unlock(); >> +} > > > This

Re: [PATCH 23/24] ibnbd: a bit of documentation

2018-02-05 Thread Roman Penyaev
On Fri, Feb 2, 2018 at 4:55 PM, Bart Van Assche wrote: > On Fri, 2018-02-02 at 15:09 +0100, Roman Pen wrote: >> +Entries under /sys/kernel/ibnbd_client/ >> +=== >> [ ... ] > > You will need Greg KH's permission to add new entries

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Danil Kipnis
> >> Hi Bart, >> >> My another 2 cents:) >> On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche >> wrote: >>> >>> On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: o Simple configuration of IBNBD: - Server side is completely passive: volumes do not need to

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Sagi Grimberg
Hi Bart, My another 2 cents:) On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche wrote: On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: o Simple configuration of IBNBD: - Server side is completely passive: volumes do not need to be explicitly exported.

[PATCH V2 5/8] scsi: introduce force_blk_mq

2018-02-05 Thread Ming Lei
>From scsi driver view, it is a bit troublesome to support both blk-mq and non-blk-mq at the same time, especially when drivers need to support multi hw-queue. This patch introduces 'force_blk_mq' to scsi_host_template so that drivers can provide blk-mq only support, so driver code can avoid the

[PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-05 Thread Ming Lei
Hi All, This patchset supports global tags which was started by Hannes originally: https://marc.info/?l=linux-block=149132580511346=2 Also inroduce 'force_blk_mq' and 'host_tagset' to 'struct scsi_host_template', so that driver can avoid to support two IO paths(legacy and blk-mq),

[PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags'

2018-02-05 Thread Ming Lei
This patch introduces the parameter of 'g_global_tags' so that we can test this feature by null_blk easiy. Not see obvious performance drop with global_tags when the whole hw depth is kept as same: 1) no 'global_tags', each hw queue depth is 1, and 4 hw queues modprobe null_blk queue_mode=2

[PATCH V2 3/8] scsi: Add template flag 'host_tagset'

2018-02-05 Thread Ming Lei
From: Hannes Reinecke Add a host template flag 'host_tagset' to enable the use of a global tagset for block-mq. Cc: Hannes Reinecke Cc: Arun Easi Cc: Omar Sandoval , Cc: "Martin K. Petersen" , Cc:

[PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS

2018-02-05 Thread Ming Lei
Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple reply queues, but tags is often HBA wide. These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) for automatic affinity assignment. Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)

[PATCH V2 1/8] blk-mq: tags: define several fields of tags as pointer

2018-02-05 Thread Ming Lei
This patch changes tags->breserved_tags, tags->bitmap_tags and tags->active_queues as pointer, and prepares for supporting global tags. No functional change. Tested-by: Laurence Oberman Reviewed-by: Hannes Reinecke Cc: Mike Snitzer Cc:

[PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue

2018-02-05 Thread Ming Lei
This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps each reply queue to blk_mq's hw queue, then .queuecommand can always choose the hw queue as the reply queue. And if no any online CPU is mapped to one hw queue, request can't be submitted to this hw queue at all, finally the

[PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host

2018-02-05 Thread Ming Lei
So that we can decide the default reply queue by the map created during adding host. Cc: Hannes Reinecke Cc: Arun Easi Cc: Omar Sandoval , Cc: "Martin K. Petersen" , Cc: James Bottomley

[PATCH V2 6/8] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity

2018-02-05 Thread Ming Lei
Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs) has been merged to V4.16-rc, and it is easy to allocate all offline CPUs for some irq vectors, this can't be avoided even though the allocation is improved. For example, on a 8cores VM, 4~7 are not-present/offline, 4 queues

RE: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue

2018-02-05 Thread Don Brace
> -Original Message- > This is a critical issue on the HPSA because Linus already has the > original commit that causes the system to fail to boot. > > All my testing was on DL380 G7 servers with: > > Hewlett-Packard Company Smart Array G6 controllers > Vendor: HP Model: P410i

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Bart Van Assche
On Mon, 2018-02-05 at 09:56 +0100, Jinpu Wang wrote: > Hi Bart, > > My another 2 cents:) > On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche > wrote: > > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: > > > o Simple configuration of IBNBD: > > >- Server side is

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Bart Van Assche
On Mon, 2018-02-05 at 15:19 +0100, Roman Penyaev wrote: > On Mon, Feb 5, 2018 at 12:19 PM, Sagi Grimberg wrote: > > Do you actually ever have remote write access in your protocol? > > We do not have reads, instead client writes on write and server writes > on read. (write only

Re: [PATCH V2 6/8] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity

2018-02-05 Thread Paolo Bonzini
On 05/02/2018 16:20, Ming Lei wrote: > Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs) > has been merged to V4.16-rc, and it is easy to allocate all offline CPUs > for some irq vectors, this can't be avoided even though the allocation > is improved. > > For example, on a

Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue

2018-02-05 Thread Laurence Oberman
On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote: > This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime > maps > each reply queue to blk_mq's hw queue, then .queuecommand can always > choose the hw queue as the reply queue. And if no any online CPU is > mapped to one hw queue,

vgdisplay hang on iSCSI session

2018-02-05 Thread Jean-Louis Dupond
Hi All, We've got some "strange" issue on a Xen hypervisor with CentOS 6 and 4.9.63-29.el6.x86_6 kernel. The system has a local raid + is connected with 2 iscsi sessions to 3 disks with multipath (6 blockdevs in total). We've noticed that vgdisplay was hanging, and the kernel was printing

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Jinpu Wang
On Mon, Feb 5, 2018 at 5:16 PM, Bart Van Assche wrote: > On Mon, 2018-02-05 at 09:56 +0100, Jinpu Wang wrote: >> Hi Bart, >> >> My another 2 cents:) >> On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche >> wrote: >> > On Fri, 2018-02-02 at 15:08

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Danil Kipnis
On Mon, Feb 5, 2018 at 3:17 PM, Sagi Grimberg wrote: > Hi Bart, My another 2 cents:) On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche wrote: > > > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: >> >>

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Bart Van Assche
On Mon, 2018-02-05 at 14:16 +0200, Sagi Grimberg wrote: > - Your latency measurements are surprisingly high for a null target >device (even for low end nvme device actually) regardless of the >transport implementation. > > For example: > - QD=1 read latency is 648.95 for ibnbd (I assume

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Roman Penyaev
On Mon, Feb 5, 2018 at 3:14 PM, Sagi Grimberg wrote: > >> Indeed, seems sbitmap can be reused. >> >> But tags is a part of IBTRS, and is not related to block device at all. >> One >> IBTRS connection (session) handles many block devices > > > we use host shared tag sets for the

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Roman Penyaev
Hi Bart, On Mon, Feb 5, 2018 at 5:58 PM, Bart Van Assche wrote: > On Mon, 2018-02-05 at 14:16 +0200, Sagi Grimberg wrote: >> - Your latency measurements are surprisingly high for a null target >>device (even for low end nvme device actually) regardless of the >>

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Bart Van Assche
On Mon, 2018-02-05 at 18:16 +0100, Roman Penyaev wrote: > Everything (fio jobs, setup, etc) is given in the same link: > > https://www.spinics.net/lists/linux-rdma/msg48799.html > > at the bottom you will find links on google docs with many pages > and archived fio jobs and scripts. (I do not

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Jinpu Wang
On Fri, Feb 2, 2018 at 5:40 PM, Doug Ledford wrote: > On Fri, 2018-02-02 at 16:07 +, Bart Van Assche wrote: >> On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: >> > Since the first version the following was changed: >> > >> >- Load-balancing and IO fail-over using

Re: blkdev loop UAF

2018-02-05 Thread Jan Kara
On Thu 01-02-18 19:58:45, Eric Biggers wrote: > On Thu, Jan 11, 2018 at 06:00:08PM +0100, Jan Kara wrote: > > On Thu 11-01-18 19:22:39, Hou Tao wrote: > > > Hi, > > > > > > On 2018/1/11 16:24, Dan Carpenter wrote: > > > > Thanks for your report and the patch. I am sending it to the > > > >

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Jinpu Wang
Hi Bart, My another 2 cents:) On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche wrote: > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: >> o Simple configuration of IBNBD: >>- Server side is completely passive: volumes do not need to be >> explicitly exported.

Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-02-05 Thread Bart Van Assche
On Sat, 2018-02-03 at 10:51 +0800, Joseph Qi wrote: > Hi Bart, > > On 18/2/3 00:21, Bart Van Assche wrote: > > On Fri, 2018-02-02 at 09:02 +0800, Joseph Qi wrote: > > > We triggered this race when using single queue. I'm not sure if it > > > exists in multi-queue. > > > > Regarding the races

Re: [PATCH 3/4] lightnvm: add 2.0 geometry identification

2018-02-05 Thread Randy Dunlap
On 02/05/2018 04:15 AM, Matias Bjørling wrote: > Implement the geometry data structures for 2.0 and enable a drive > to be identified as one, including exposing the appropriate 2.0 > sysfs entries. > > Signed-off-by: Matias Bjørling > --- > drivers/lightnvm/core.c | 2

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Bart Van Assche
On 02/05/18 08:40, Danil Kipnis wrote: It just occurred to me, that we could easily extend the interface in such a way that each client (i.e. each session) would have on server side her own directory with the devices it can access. I.e. instead of just "dev_search_path" per server, any client

RE: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue

2018-02-05 Thread Don Brace
> -Original Message- > From: Laurence Oberman [mailto:lober...@redhat.com] > Sent: Monday, February 05, 2018 9:58 AM > To: Ming Lei ; Jens Axboe ; linux- > bl...@vger.kernel.org; Christoph Hellwig ; Mike Snitzer >

Re: [PATCH] wbt: fix incorrect throttling due to flush latency

2018-02-05 Thread Jens Axboe
On 2/5/18 1:00 PM, Mikulas Patocka wrote: > > > On Mon, 5 Feb 2018, Jens Axboe wrote: > >> On 2/5/18 12:22 PM, Jens Axboe wrote: >>> On 2/5/18 12:11 PM, Mikulas Patocka wrote: I have a workload where one process sends many asynchronous write bios (without waiting for them) and another

RE: [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags'

2018-02-05 Thread Don Brace
> This patch introduces the parameter of 'g_global_tags' so that we can > test this feature by null_blk easiy. > > Not see obvious performance drop with global_tags when the whole hw > depth is kept as same: > > 1) no 'global_tags', each hw queue depth is 1, and 4 hw queues > modprobe null_blk

Re: [PATCH] wbt: fix incorrect throttling due to flush latency

2018-02-05 Thread Jens Axboe
On 2/5/18 12:11 PM, Mikulas Patocka wrote: > I have a workload where one process sends many asynchronous write bios > (without waiting for them) and another process sends synchronous flush > bios. During this workload, writeback throttling throttles down to one > outstanding bio, and this

RE: [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host

2018-02-05 Thread Don Brace
> -Original Message- > From: Ming Lei [mailto:ming@redhat.com] > Sent: Monday, February 05, 2018 9:21 AM > To: Jens Axboe ; linux-block@vger.kernel.org; Christoph > Hellwig ; Mike Snitzer > Cc: linux-s...@vger.kernel.org;

[PATCH] wbt: fix incorrect throttling due to flush latency

2018-02-05 Thread Mikulas Patocka
I have a workload where one process sends many asynchronous write bios (without waiting for them) and another process sends synchronous flush bios. During this workload, writeback throttling throttles down to one outstanding bio, and this incorrect throttling causes performance degradation (all

Re: [PATCH] wbt: fix incorrect throttling due to flush latency

2018-02-05 Thread Jens Axboe
On 2/5/18 12:22 PM, Jens Axboe wrote: > On 2/5/18 12:11 PM, Mikulas Patocka wrote: >> I have a workload where one process sends many asynchronous write bios >> (without waiting for them) and another process sends synchronous flush >> bios. During this workload, writeback throttling throttles down

[PATCH v5 05/10] bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags

2018-02-05 Thread Coly Li
When too many I/Os failed on cache device, bch_cache_set_error() is called in the error handling code path to retire whole problematic cache set. If new I/O requests continue to come and take refcount dc->count, the cache set won't be retired immediately, this is a problem. Further more, there

[PATCH v5 07/10] bcache: fix inaccurate io state for detached bcache devices

2018-02-05 Thread Coly Li
From: Tang Junhui When we run IO in a detached device, and run iostat to shows IO status, normally it will show like bellow (Omitted some fields): Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util sdd... 15.89 0.531.820.202.23

[PATCH v5 08/10] bcache: add backing_request_endio() for bi_end_io of attached backing device I/O

2018-02-05 Thread Coly Li
In order to catch I/O error of backing device, a separate bi_end_io call back is required. Then a per backing device counter can record I/O errors number and retire the backing device if the counter reaches a per backing device I/O error limit. This patch adds backing_request_endio() to bcache

[PATCH v5 03/10] bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set

2018-02-05 Thread Coly Li
In patch "bcache: fix cached_dev->count usage for bch_cache_set_error()", cached_dev_get() is called when creating dc->writeback_thread, and cached_dev_put() is called when exiting dc->writeback_thread. This modification works well unless people detach the bcache device manually by 'echo 1 >

[PATCH v5 04/10] bcache: stop dc->writeback_rate_update properly

2018-02-05 Thread Coly Li
struct delayed_work writeback_rate_update in struct cache_dev is a delayed worker to call function update_writeback_rate() in period (the interval is defined by dc->writeback_rate_update_seconds). When a metadate I/O error happens on cache device, bcache error handling routine

[PATCH v5 01/10] bcache: set writeback_rate_update_seconds in range [1, 60] seconds

2018-02-05 Thread Coly Li
dc->writeback_rate_update_seconds can be set via sysfs and its value can be set to [1, ULONG_MAX]. It does not make sense to set such a large value, 60 seconds is long enough value considering the default 5 seconds works well for long time. Because dc->writeback_rate_update is a special delayed

[PATCH v5 02/10] bcache: fix cached_dev->count usage for bch_cache_set_error()

2018-02-05 Thread Coly Li
When bcache metadata I/O fails, bcache will call bch_cache_set_error() to retire the whole cache set. The expected behavior to retire a cache set is to unregister the cache set, and unregister all backing device attached to this cache set, then remove sysfs entries of the cache set and all

[PATCH v5 00/10] bcache: device failure handling improvement

2018-02-05 Thread Coly Li
Hi maintainers and folks, This patch set tries to improve bcache device failure handling, includes cache device and backing device failures. The basic idea to handle failed cache device is, - Unregister cache set - Detach all backing devices which are attached to this cache set - Stop all the

[PATCH v5 09/10] bcache: add io_disable to struct cached_dev

2018-02-05 Thread Coly Li
If a bcache device is configured to writeback mode, current code does not handle write I/O errors on backing devices properly. In writeback mode, write request is written to cache device, and latter being flushed to backing device. If I/O failed when writing from cache device to the backing

[PATCH v5 10/10] bcache: stop bcache device when backing device is offline

2018-02-05 Thread Coly Li
Currently bcache does not handle backing device failure, if backing device is offline and disconnected from system, its bcache device can still be accessible. If the bcache device is in writeback mode, I/O requests even can success if the requests hit on cache device. That is to say, when and how

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Mike Galbraith
Hi Paolo, I applied this to master.today, flipped udev back to bfq and took it for a spin.  Unfortunately, box fairly quickly went boom under load. [ 454.739975] [ cut here ] [ 454.739979] list_add corruption. prev->next should be next (5f99a42a), but was

Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue

2018-02-05 Thread chenxiang (M)
在 2018/2/5 23:20, Ming Lei 写道: This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps each reply queue to blk_mq's hw queue, then .queuecommand can always choose the hw queue as the reply queue. And if no any online CPU is mapped to one hw queue, request can't be submitted to

[PATCH 2/2] block, char_dev: Use correct format specifier for unsigned ints

2018-02-05 Thread Srivatsa S. Bhat
From: Srivatsa S. Bhat register_blkdev() and __register_chrdev_region() treat the major number as an unsigned int. So print it the same way to avoid absurd error statements such as: "... major requested (-1) is greater than the maximum (511) ..." (and also fix off-by-one

[PATCH 1/2] char_dev: Fix off-by-one bugs in find_dynamic_major()

2018-02-05 Thread Srivatsa S. Bhat
From: Srivatsa S. Bhat CHRDEV_MAJOR_DYN_END and CHRDEV_MAJOR_DYN_EXT_END are valid major numbers. So fix the loop iteration to include them in the search for free major numbers. While at it, also remove a redundant if condition ("cd->major != i"), as it will never be

[PATCH rfc 1/5] irq-am: Introduce library implementing generic adaptive moderation

2018-02-05 Thread Sagi Grimberg
irq-am library helps I/O devices implement interrupt moderation in an adaptive fashion, based on online stats. The consumer can initialize an irq-am context with a callback that performs the device specific moderation programming and also the number of am (adaptive moderation) levels which are

[PATCH rfc 3/5] irq_poll: wire up irq_am

2018-02-05 Thread Sagi Grimberg
Update online stats for fired event and completions on each poll cycle. Also expose am initialization interface. The irqpoll consumer will initialize the irq-am context of the irq-poll context. Signed-off-by: Sagi Grimberg --- include/linux/irq_poll.h | 9 +

[PATCH rfc 4/5] IB/cq: add adaptive moderation support

2018-02-05 Thread Sagi Grimberg
Currently activated via modparam, obviously we will want to find a more generic way to control this. Signed-off-by: Sagi Grimberg --- drivers/infiniband/core/cq.c | 48 1 file changed, 48 insertions(+) diff --git

[PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-05 Thread Sagi Grimberg
Adaptive IRQ moderation (also called adaptive IRQ coalescing) has been widely used in the networking stack for over 20 years and has become a standard default setting. Adaptive moderation is a feature supported by the device to delay an interrupt for a either a period of time, or number of

[PATCH rfc 5/5] IB/cq: wire up adaptive moderation to workqueue based completion queues

2018-02-05 Thread Sagi Grimberg
Signed-off-by: Sagi Grimberg --- drivers/infiniband/core/cq.c | 25 - include/rdma/ib_verbs.h | 9 +++-- 2 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index

[PATCH rfc 2/5] irq-am: add some debugfs exposure on tuning state

2018-02-05 Thread Sagi Grimberg
Useful for local debugging Signed-off-by: Sagi Grimberg --- include/linux/irq-am.h | 2 + lib/irq-am.c | 109 + 2 files changed, 111 insertions(+) diff --git a/include/linux/irq-am.h b/include/linux/irq-am.h index

Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-05 Thread Or Gerlitz
On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg wrote: [...] > In the networking stack, each device driver implements adaptive IRQ moderation > on its own. The approach here is a bit different, it tries to take the common > denominator, > which is per-queue statistics gathering

[PATCH] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-05 Thread Joseph Qi
We've triggered a WARNING in blk_throtl_bio when throttling writeback io, which complains blkg->refcnt is already 0 when calling blkg_get, and then kernel crashes with invalid page request. After investigating this issue, we've found there is a race between blkcg_bio_issue_check and cgroup_rmdir.

Re: [PATCH rfc 1/5] irq-am: Introduce library implementing generic adaptive moderation

2018-02-05 Thread Or Gerlitz
On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg wrote: > irq-am library helps I/O devices implement interrupt moderation in > an adaptive fashion, based on online stats. > > The consumer can initialize an irq-am context with a callback that > performs the device specific

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Oleksandr Natalenko
Hi, Paolo. I can confirm that this patch fixes cfdisk hang for me. I've also tried to trigger the issue Mike has encountered, but with no luck (maybe, I wasn't insistent enough, just was doing dd on usb-storage device in the VM). So, with regard to cfdisk hang on usb-storage: Tested-by:

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Mike Galbraith
On Tue, 2018-02-06 at 08:44 +0100, Oleksandr Natalenko wrote: > Hi, Paolo. > > I can confirm that this patch fixes cfdisk hang for me. I've also tried > to trigger the issue Mike has encountered, but with no luck (maybe, I > wasn't insistent enough, just was doing dd on usb-storage device in

Re: [PATCH] blk-mq: Fix a race between resetting the timer and completion handling

2018-02-05 Thread Tejun Heo
Hello, Bart. Thanks a lot for testing and fixing the issues but I'm a bit confused by the patch. Maybe we can split patch a bit more? There seem to be three things going on, 1. Changing preemption protection to irq protection in issue path. 2. Merge of aborted_gstate_sync and gstate_seq. 3.

Re: [PATCH] blk-mq: Fix a race between resetting the timer and completion handling

2018-02-05 Thread Bart Van Assche
On Mon, 2018-02-05 at 13:06 -0800, Tejun Heo wrote: > Thanks a lot for testing and fixing the issues but I'm a bit confused > by the patch. Maybe we can split patch a bit more? There seem to be > three things going on, > > 1. Changing preemption protection to irq protection in issue path. > >

Re: v4.15 and I/O hang with BFQ

2018-02-05 Thread Paolo Valente
> Il giorno 30 gen 2018, alle ore 16:40, Paolo Valente > ha scritto: > > > >> Il giorno 30 gen 2018, alle ore 15:40, Ming Lei ha >> scritto: >> >> On Tue, Jan 30, 2018 at 03:30:28PM +0100, Oleksandr Natalenko wrote: >>> Hi. >>> >> ... >>>

Re: [PATCH] wbt: fix incorrect throttling due to flush latency

2018-02-05 Thread Mikulas Patocka
On Mon, 5 Feb 2018, Jens Axboe wrote: > On 2/5/18 12:22 PM, Jens Axboe wrote: > > On 2/5/18 12:11 PM, Mikulas Patocka wrote: > >> I have a workload where one process sends many asynchronous write bios > >> (without waiting for them) and another process sends synchronous flush > >> bios. During

Re: [PATCH] blk-mq: Fix a race between resetting the timer and completion handling

2018-02-05 Thread t...@kernel.org
Hello, Bart. On Mon, Feb 05, 2018 at 09:33:03PM +, Bart Van Assche wrote: > My goal with this patch is to fix the race between resetting the timer and > the completion path. Hence change (3). Changes (1) and (2) are needed to > make the changes in blk_mq_rq_timed_out() work. Ah, I see. That

[PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Paolo Valente
Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device be re-inserted into the active I/O scheduler for that device. As a consequence, I/O schedulers may get the same request inserted again, even several times,

[PATCH BUGFIX 0/1] block, bfq: handle requeues of I/O requests

2018-02-05 Thread Paolo Valente
Hi, just a note: the most difficult part in the implementation of this patch has been how to handle the fact that the requeue and finish hooks of the active elevator get invoked even for requests that are not referred in that elevator any longer. You can find details in the comments introduced by