date:20180205

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Jinpu Wang

On Fri, Feb 2, 2018 at 5:40 PM, Doug Ledford wrote: > On Fri, 2018-02-02 at 16:07 +, Bart Van Assche wrote: >> On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: >> > Since the first version the following was changed: >> > >> >- Load-balancing and IO fail-over using multipath features wer

Re: blkdev loop UAF

2018-02-05 Thread Jan Kara

On Thu 01-02-18 19:58:45, Eric Biggers wrote: > On Thu, Jan 11, 2018 at 06:00:08PM +0100, Jan Kara wrote: > > On Thu 11-01-18 19:22:39, Hou Tao wrote: > > > Hi, > > > > > > On 2018/1/11 16:24, Dan Carpenter wrote: > > > > Thanks for your report and the patch. I am sending it to the > > > > linux-

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Jinpu Wang

Hi Bart, My another 2 cents:) On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche wrote: > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: >> o Simple configuration of IBNBD: >>- Server side is completely passive: volumes do not need to be >> explicitly exported. > > That sounds like a s

Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-05 Thread Ming Lei

Hi Kashyap, On Mon, Feb 05, 2018 at 12:35:13PM +0530, Kashyap Desai wrote: > > -Original Message- > > From: Hannes Reinecke [mailto:h...@suse.de] > > Sent: Monday, February 5, 2018 12:28 PM > > To: Ming Lei; Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; > > Mike Snitzer > > C

Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-05 Thread Ming Lei

On Mon, Feb 05, 2018 at 07:58:29AM +0100, Hannes Reinecke wrote: > On 02/03/2018 05:21 AM, Ming Lei wrote: > > Hi All, > > > > This patchset supports global tags which was started by Hannes originally: > > > > https://marc.info/?l=linux-block&m=149132580511346&w=2 > > > > Also inroduce 'forc

Re: [PATCH 2/5] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS

2018-02-05 Thread Ming Lei

On Mon, Feb 05, 2018 at 07:54:29AM +0100, Hannes Reinecke wrote: > On 02/03/2018 05:21 AM, Ming Lei wrote: > > Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple > > reply queues, but tags is often HBA wide. > > > > These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_

Re: [PATCH 03/24] ibtrs: core: lib functions shared between client and server modules

2018-02-05 Thread Sagi Grimberg

Hi Roman, Here are some comments below. +int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe) +{ + struct ib_recv_wr wr, *bad_wr; + + wr.next= NULL; + wr.wr_cqe = cqe; + wr.sg_list = NULL; + wr.num_sge = 0; + + return ib_post_recv(con->qp

Re: [PATCH 04/24] ibtrs: client: private header with client structs and functions

2018-02-05 Thread Sagi Grimberg

Hi Roman, +struct ibtrs_clt_io_req { + struct list_headlist; + struct ibtrs_iu *iu; + struct scatterlist *sglist; /* list holding user data */ + unsigned intsg_cnt; + unsigned intsg_size; + unsigned int

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Sagi Grimberg

Hi Roman, +static inline void ibtrs_clt_state_lock(void) +{ + rcu_read_lock(); +} + +static inline void ibtrs_clt_state_unlock(void) +{ + rcu_read_unlock(); +} This looks rather pointless... + +#define cmpxchg_min(var, new) ({ \ + typeo

Re: [PATCH 07/24] ibtrs: client: sysfs interface functions

2018-02-05 Thread Sagi Grimberg

Hi Roman, This is the sysfs interface to IBTRS sessions on client side: /sys/kernel/ibtrs_client// *** IBTRS session created by ibtrs_clt_open() API call | |- max_reconnect_attempts | *** number of reconnect attempts for session | |- add_path | *** adds a

Re: [PATCH 09/24] ibtrs: server: main functionality

2018-02-05 Thread Sagi Grimberg

Hi Roman, Some comments below. On 02/02/2018 04:08 PM, Roman Pen wrote: This is main functionality of ibtrs-server module, which accepts set of RDMA connections (so called IBTRS session), creates/destroys sysfs entries associated with IBTRS session and notifies upper layer (user of IBTRS API) a

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Sagi Grimberg

Hi Bart, My another 2 cents:) On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche wrote: On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: o Simple configuration of IBNBD: - Server side is completely passive: volumes do not need to be explicitly exported. That sounds like a securit

[PATCH 2/2] bcache: fix for data collapse after re-attaching an attached device

2018-02-05 Thread tang . junhui

From: Tang Junhui back-end device sdm has already attached a cache_set with ID f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with another cache set, and it returns with an error: [root]# cd /sys/block/sdm/bcache [root]# echo 5ccd0a63-148e-48b8-afa2-aca9cbd6279f > attach -bash: echo: wr

[PATCH 2/4] lightnvm: flatten nvm_id_group into nvm_id

2018-02-05 Thread Matias Bjørling

There are no groups in the 2.0 specification, make sure that the nvm_id structure is flattened before 2.0 data structures are added. Signed-off-by: Matias Bjørling --- drivers/lightnvm/core.c | 25 ++- drivers/nvme/host/lightnvm.c | 100 +--

[PATCH 3/4] lightnvm: add 2.0 geometry identification

2018-02-05 Thread Matias Bjørling

Implement the geometry data structures for 2.0 and enable a drive to be identified as one, including exposing the appropriate 2.0 sysfs entries. Signed-off-by: Matias Bjørling --- drivers/lightnvm/core.c | 2 +- drivers/nvme/host/lightnvm.c | 334 +-

[PATCH 0/4] lightnvm: base 2.0 implementation

2018-02-05 Thread Matias Bjørling

Hi, A couple of patches for 2.0 support for the lightnvm subsystem. They form the basis for integrating 2.0 support. For the rest of the support, Javier has code that implements report chunk and sets up the LBA format data structure. He also has a bunch of patches that brings pblk up to speed. T

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Sagi Grimberg

Hi Roman and the team, On 02/02/2018 04:08 PM, Roman Pen wrote: This series introduces IBNBD/IBTRS modules. IBTRS (InfiniBand Transport) is a reliable high speed transport library which allows for establishing connection between client and server machines via RDMA. So its not strictly infinib

[PATCH 4/4] nvme: lightnvm: add late setup of block size and metadata

2018-02-05 Thread Matias Bjørling

The nvme driver sets up the size of the nvme namespace in two steps. First it initializes the device with standard logical block and metadata sizes, and then sets the correct logical block and metadata size. Due to the OCSSD 2.0 specification relies on the namespace to expose these sizes for correc

[PATCH 1/4] lightnvm: make 1.2 data structures explicit

2018-02-05 Thread Matias Bjørling

Make the 1.2 data structures explicit, so it will be easy to identify the 2.0 data structures. Also fix the order of which the nvme_nvm_* are declared, such that they follow the nvme_nvm_command order. Signed-off-by: Matias Bjørling --- drivers/nvme/host/lightnvm.c | 82 ++---

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Sagi Grimberg

Hi Roman and the team (again), replying to my own email :) I forgot to mention that first of all thank you for upstreaming your work! I fully support your goal to have your production driver upstream to minimize your maintenance efforts. I hope that my feedback didn't came across with a different

Re: [PATCH 16/24] ibnbd: client: main functionality

2018-02-05 Thread Roman Penyaev

On Fri, Feb 2, 2018 at 4:11 PM, Jens Axboe wrote: > On 2/2/18 7:08 AM, Roman Pen wrote: >> This is main functionality of ibnbd-client module, which provides >> interface to map remote device as local block device /dev/ibnbd >> and feeds IBTRS with IO requests. > > Kill the legacy IO path for this,

Re: [PATCH 23/24] ibnbd: a bit of documentation

2018-02-05 Thread Roman Penyaev

On Fri, Feb 2, 2018 at 4:55 PM, Bart Van Assche wrote: > On Fri, 2018-02-02 at 15:09 +0100, Roman Pen wrote: >> +Entries under /sys/kernel/ibnbd_client/ >> +=== >> [ ... ] > > You will need Greg KH's permission to add new entries directly under > /sys/kernel. >

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Roman Penyaev

On Fri, Feb 2, 2018 at 5:54 PM, Bart Van Assche wrote: > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: >> +static inline struct ibtrs_tag * >> +__ibtrs_get_tag(struct ibtrs_clt *clt, enum ibtrs_clt_con_type con_type) >> +{ >> + size_t max_depth = clt->queue_depth; >> + struct ibtrs_t

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Danil Kipnis

> >> Hi Bart, >> >> My another 2 cents:) >> On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche >> wrote: >>> >>> On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: o Simple configuration of IBNBD: - Server side is completely passive: volumes do not need to be explicitly

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Sagi Grimberg

Indeed, seems sbitmap can be reused. But tags is a part of IBTRS, and is not related to block device at all. One IBTRS connection (session) handles many block devices we use host shared tag sets for the case of multiple block devices. (or any IO producers). Lets wait until we actually ha

Re: [PATCH 23/24] ibnbd: a bit of documentation

2018-02-05 Thread Sagi Grimberg

/sys/kernel was chosen ages ago and I completely forgot to move it to configfs. IBTRS is not a block device, so for some read-only entries (statistics or states) something else should be probably used, not configfs. Or it is fine to read state of the connection from configfs? For me sounds a b

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Sagi Grimberg

Hi Bart, My another 2 cents:) On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche wrote: On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: o Simple configuration of IBNBD: - Server side is completely passive: volumes do not need to be explicitly exported. That sounds like a se

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Roman Penyaev

Hi Sagi, On Mon, Feb 5, 2018 at 12:19 PM, Sagi Grimberg wrote: > Hi Roman, > >> +static inline void ibtrs_clt_state_lock(void) >> +{ >> + rcu_read_lock(); >> +} >> + >> +static inline void ibtrs_clt_state_unlock(void) >> +{ >> + rcu_read_unlock(); >> +} > > > This looks rather pointle

[PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-05 Thread Ming Lei

Hi All, This patchset supports global tags which was started by Hannes originally: https://marc.info/?l=linux-block&m=149132580511346&w=2 Also inroduce 'force_blk_mq' and 'host_tagset' to 'struct scsi_host_template', so that driver can avoid to support two IO paths(legacy and blk-mq), es

[PATCH V2 1/8] blk-mq: tags: define several fields of tags as pointer

2018-02-05 Thread Ming Lei

This patch changes tags->breserved_tags, tags->bitmap_tags and tags->active_queues as pointer, and prepares for supporting global tags. No functional change. Tested-by: Laurence Oberman Reviewed-by: Hannes Reinecke Cc: Mike Snitzer Cc: Christoph Hellwig Signed-off-by: Ming Lei --- block/bfq

[PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS

2018-02-05 Thread Ming Lei

Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple reply queues, but tags is often HBA wide. These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) for automatic affinity assignment. Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs) has

[PATCH V2 3/8] scsi: Add template flag 'host_tagset'

2018-02-05 Thread Ming Lei

From: Hannes Reinecke Add a host template flag 'host_tagset' to enable the use of a global tagset for block-mq. Cc: Hannes Reinecke Cc: Arun Easi Cc: Omar Sandoval , Cc: "Martin K. Petersen" , Cc: James Bottomley , Cc: Christoph Hellwig , Cc: Don Brace Cc: Kashyap Desai Cc: Peter Rivera Cc:

[PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags'

2018-02-05 Thread Ming Lei

This patch introduces the parameter of 'g_global_tags' so that we can test this feature by null_blk easiy. Not see obvious performance drop with global_tags when the whole hw depth is kept as same: 1) no 'global_tags', each hw queue depth is 1, and 4 hw queues modprobe null_blk queue_mode=2 nr_de

[PATCH V2 5/8] scsi: introduce force_blk_mq

2018-02-05 Thread Ming Lei

>From scsi driver view, it is a bit troublesome to support both blk-mq and non-blk-mq at the same time, especially when drivers need to support multi hw-queue. This patch introduces 'force_blk_mq' to scsi_host_template so that drivers can provide blk-mq only support, so driver code can avoid the t

[PATCH V2 6/8] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity

2018-02-05 Thread Ming Lei

Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs) has been merged to V4.16-rc, and it is easy to allocate all offline CPUs for some irq vectors, this can't be avoided even though the allocation is improved. For example, on a 8cores VM, 4~7 are not-present/offline, 4 queues

[PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host

2018-02-05 Thread Ming Lei

So that we can decide the default reply queue by the map created during adding host. Cc: Hannes Reinecke Cc: Arun Easi Cc: Omar Sandoval , Cc: "Martin K. Petersen" , Cc: James Bottomley , Cc: Christoph Hellwig , Cc: Don Brace Cc: Kashyap Desai Cc: Peter Rivera Cc: Paolo Bonzini Cc: Mike Snit

[PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue

2018-02-05 Thread Ming Lei

This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps each reply queue to blk_mq's hw queue, then .queuecommand can always choose the hw queue as the reply queue. And if no any online CPU is mapped to one hw queue, request can't be submitted to this hw queue at all, finally the irq

Re: [PATCH V2 6/8] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity

2018-02-05 Thread Paolo Bonzini

On 05/02/2018 16:20, Ming Lei wrote: > Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs) > has been merged to V4.16-rc, and it is easy to allocate all offline CPUs > for some irq vectors, this can't be avoided even though the allocation > is improved. > > For example, on a

Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue

2018-02-05 Thread Laurence Oberman

On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote: > This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime > maps > each reply queue to blk_mq's hw queue, then .queuecommand can always > choose the hw queue as the reply queue. And if no any online CPU is > mapped to one hw queue, reques

vgdisplay hang on iSCSI session

2018-02-05 Thread Jean-Louis Dupond

Hi All, We've got some "strange" issue on a Xen hypervisor with CentOS 6 and 4.9.63-29.el6.x86_6 kernel. The system has a local raid + is connected with 2 iscsi sessions to 3 disks with multipath (6 blockdevs in total). We've noticed that vgdisplay was hanging, and the kernel was printing th

RE: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue

2018-02-05 Thread Don Brace

> -Original Message- > This is a critical issue on the HPSA because Linus already has the > original commit that causes the system to fail to boot. > > All my testing was on DL380 G7 servers with: > > Hewlett-Packard Company Smart Array G6 controllers > Vendor: HP Model: P410i

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Bart Van Assche

On Mon, 2018-02-05 at 09:56 +0100, Jinpu Wang wrote: > Hi Bart, > > My another 2 cents:) > On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche > wrote: > > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: > > > o Simple configuration of IBNBD: > > >- Server side is completely passive: volumes

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Bart Van Assche

On Mon, 2018-02-05 at 15:19 +0100, Roman Penyaev wrote: > On Mon, Feb 5, 2018 at 12:19 PM, Sagi Grimberg wrote: > > Do you actually ever have remote write access in your protocol? > > We do not have reads, instead client writes on write and server writes > on read. (write only storage solution :)

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Jinpu Wang

On Mon, Feb 5, 2018 at 5:16 PM, Bart Van Assche wrote: > On Mon, 2018-02-05 at 09:56 +0100, Jinpu Wang wrote: >> Hi Bart, >> >> My another 2 cents:) >> On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche >> wrote: >> > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: >> > > o Simple configuration

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Danil Kipnis

On Mon, Feb 5, 2018 at 3:17 PM, Sagi Grimberg wrote: > Hi Bart, My another 2 cents:) On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche wrote: > > > On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote: >> >> >> o Simple configuration of IBNBD: >>

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Bart Van Assche

On Mon, 2018-02-05 at 14:16 +0200, Sagi Grimberg wrote: > - Your latency measurements are surprisingly high for a null target >device (even for low end nvme device actually) regardless of the >transport implementation. > > For example: > - QD=1 read latency is 648.95 for ibnbd (I assume us

Re: [PATCH 05/24] ibtrs: client: main functionality

2018-02-05 Thread Roman Penyaev

On Mon, Feb 5, 2018 at 3:14 PM, Sagi Grimberg wrote: > >> Indeed, seems sbitmap can be reused. >> >> But tags is a part of IBTRS, and is not related to block device at all. >> One >> IBTRS connection (session) handles many block devices > > > we use host shared tag sets for the case of multiple bl

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Roman Penyaev

Hi Bart, On Mon, Feb 5, 2018 at 5:58 PM, Bart Van Assche wrote: > On Mon, 2018-02-05 at 14:16 +0200, Sagi Grimberg wrote: >> - Your latency measurements are surprisingly high for a null target >>device (even for low end nvme device actually) regardless of the >>transport implementation. >

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Bart Van Assche

On Mon, 2018-02-05 at 18:16 +0100, Roman Penyaev wrote: > Everything (fio jobs, setup, etc) is given in the same link: > > https://www.spinics.net/lists/linux-rdma/msg48799.html > > at the bottom you will find links on google docs with many pages > and archived fio jobs and scripts. (I do not rem

Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-02-05 Thread Bart Van Assche

On Sat, 2018-02-03 at 10:51 +0800, Joseph Qi wrote: > Hi Bart, > > On 18/2/3 00:21, Bart Van Assche wrote: > > On Fri, 2018-02-02 at 09:02 +0800, Joseph Qi wrote: > > > We triggered this race when using single queue. I'm not sure if it > > > exists in multi-queue. > > > > Regarding the races betw

Re: [PATCH 3/4] lightnvm: add 2.0 geometry identification

2018-02-05 Thread Randy Dunlap

On 02/05/2018 04:15 AM, Matias Bjørling wrote: > Implement the geometry data structures for 2.0 and enable a drive > to be identified as one, including exposing the appropriate 2.0 > sysfs entries. > > Signed-off-by: Matias Bjørling > --- > drivers/lightnvm/core.c | 2 +- > drivers/nvme/h

Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-05 Thread Bart Van Assche

On 02/05/18 08:40, Danil Kipnis wrote: It just occurred to me, that we could easily extend the interface in such a way that each client (i.e. each session) would have on server side her own directory with the devices it can access. I.e. instead of just "dev_search_path" per server, any client wou

RE: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue

2018-02-05 Thread Don Brace

> -Original Message- > From: Laurence Oberman [mailto:lober...@redhat.com] > Sent: Monday, February 05, 2018 9:58 AM > To: Ming Lei ; Jens Axboe ; linux- > bl...@vger.kernel.org; Christoph Hellwig ; Mike Snitzer > ; Don Brace > Cc: linux-s...@vger.kernel.org; Hannes Reinecke ; Arun Easi >

RE: [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host

2018-02-05 Thread Don Brace

> -Original Message- > From: Ming Lei [mailto:ming@redhat.com] > Sent: Monday, February 05, 2018 9:21 AM > To: Jens Axboe ; linux-block@vger.kernel.org; Christoph > Hellwig ; Mike Snitzer > Cc: linux-s...@vger.kernel.org; Hannes Reinecke ; Arun Easi > ; Omar Sandoval ; Martin K . > P

[PATCH BUGFIX 0/1] block, bfq: handle requeues of I/O requests

2018-02-05 Thread Paolo Valente

Hi, just a note: the most difficult part in the implementation of this patch has been how to handle the fact that the requeue and finish hooks of the active elevator get invoked even for requests that are not referred in that elevator any longer. You can find details in the comments introduced by t

[PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Paolo Valente

Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device be re-inserted into the active I/O scheduler for that device. As a consequence, I/O schedulers may get the same request inserted again, even several times, w

Re: v4.15 and I/O hang with BFQ

2018-02-05 Thread Paolo Valente

> Il giorno 30 gen 2018, alle ore 16:40, Paolo Valente > ha scritto: > > > >> Il giorno 30 gen 2018, alle ore 15:40, Ming Lei ha >> scritto: >> >> On Tue, Jan 30, 2018 at 03:30:28PM +0100, Oleksandr Natalenko wrote: >>> Hi. >>> >> ... >>> systemd-udevd-271 [000] 4.311033: bfq

[PATCH] wbt: fix incorrect throttling due to flush latency

2018-02-05 Thread Mikulas Patocka

I have a workload where one process sends many asynchronous write bios (without waiting for them) and another process sends synchronous flush bios. During this workload, writeback throttling throttles down to one outstanding bio, and this incorrect throttling causes performance degradation (all wri

Re: [PATCH] wbt: fix incorrect throttling due to flush latency

2018-02-05 Thread Jens Axboe

On 2/5/18 12:11 PM, Mikulas Patocka wrote: > I have a workload where one process sends many asynchronous write bios > (without waiting for them) and another process sends synchronous flush > bios. During this workload, writeback throttling throttles down to one > outstanding bio, and this incorrect

Re: [PATCH] wbt: fix incorrect throttling due to flush latency

2018-02-05 Thread Jens Axboe

On 2/5/18 12:22 PM, Jens Axboe wrote: > On 2/5/18 12:11 PM, Mikulas Patocka wrote: >> I have a workload where one process sends many asynchronous write bios >> (without waiting for them) and another process sends synchronous flush >> bios. During this workload, writeback throttling throttles down t

Re: [PATCH] wbt: fix incorrect throttling due to flush latency

2018-02-05 Thread Mikulas Patocka

On Mon, 5 Feb 2018, Jens Axboe wrote: > On 2/5/18 12:22 PM, Jens Axboe wrote: > > On 2/5/18 12:11 PM, Mikulas Patocka wrote: > >> I have a workload where one process sends many asynchronous write bios > >> (without waiting for them) and another process sends synchronous flush > >> bios. During t

Re: [PATCH] wbt: fix incorrect throttling due to flush latency

2018-02-05 Thread Jens Axboe

On 2/5/18 1:00 PM, Mikulas Patocka wrote: > > > On Mon, 5 Feb 2018, Jens Axboe wrote: > >> On 2/5/18 12:22 PM, Jens Axboe wrote: >>> On 2/5/18 12:11 PM, Mikulas Patocka wrote: I have a workload where one process sends many asynchronous write bios (without waiting for them) and another

RE: [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags'

2018-02-05 Thread Don Brace

> This patch introduces the parameter of 'g_global_tags' so that we can > test this feature by null_blk easiy. > > Not see obvious performance drop with global_tags when the whole hw > depth is kept as same: > > 1) no 'global_tags', each hw queue depth is 1, and 4 hw queues > modprobe null_blk qu

Re: [PATCH] blk-mq: Fix a race between resetting the timer and completion handling

2018-02-05 Thread Tejun Heo

Hello, Bart. Thanks a lot for testing and fixing the issues but I'm a bit confused by the patch. Maybe we can split patch a bit more? There seem to be three things going on, 1. Changing preemption protection to irq protection in issue path. 2. Merge of aborted_gstate_sync and gstate_seq. 3. U

Re: [PATCH] blk-mq: Fix a race between resetting the timer and completion handling

2018-02-05 Thread Bart Van Assche

On Mon, 2018-02-05 at 13:06 -0800, Tejun Heo wrote: > Thanks a lot for testing and fixing the issues but I'm a bit confused > by the patch. Maybe we can split patch a bit more? There seem to be > three things going on, > > 1. Changing preemption protection to irq protection in issue path. > > 2

Re: [PATCH] blk-mq: Fix a race between resetting the timer and completion handling

2018-02-05 Thread t...@kernel.org

Hello, Bart. On Mon, Feb 05, 2018 at 09:33:03PM +, Bart Van Assche wrote: > My goal with this patch is to fix the race between resetting the timer and > the completion path. Hence change (3). Changes (1) and (2) are needed to > make the changes in blk_mq_rq_timed_out() work. Ah, I see. That

[PATCH rfc 3/5] irq_poll: wire up irq_am

2018-02-05 Thread Sagi Grimberg

Update online stats for fired event and completions on each poll cycle. Also expose am initialization interface. The irqpoll consumer will initialize the irq-am context of the irq-poll context. Signed-off-by: Sagi Grimberg --- include/linux/irq_poll.h | 9 + lib/Kconfig |

[PATCH rfc 4/5] IB/cq: add adaptive moderation support

2018-02-05 Thread Sagi Grimberg

Currently activated via modparam, obviously we will want to find a more generic way to control this. Signed-off-by: Sagi Grimberg --- drivers/infiniband/core/cq.c | 48 1 file changed, 48 insertions(+) diff --git a/drivers/infiniband/core/cq.c b/driv

[PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-05 Thread Sagi Grimberg

Adaptive IRQ moderation (also called adaptive IRQ coalescing) has been widely used in the networking stack for over 20 years and has become a standard default setting. Adaptive moderation is a feature supported by the device to delay an interrupt for a either a period of time, or number of comple

[PATCH rfc 5/5] IB/cq: wire up adaptive moderation to workqueue based completion queues

2018-02-05 Thread Sagi Grimberg

Signed-off-by: Sagi Grimberg --- drivers/infiniband/core/cq.c | 25 - include/rdma/ib_verbs.h | 9 +++-- 2 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index 270801d28f9d..1d984fd77449 1

[PATCH rfc 2/5] irq-am: add some debugfs exposure on tuning state

2018-02-05 Thread Sagi Grimberg

Useful for local debugging Signed-off-by: Sagi Grimberg --- include/linux/irq-am.h | 2 + lib/irq-am.c | 109 + 2 files changed, 111 insertions(+) diff --git a/include/linux/irq-am.h b/include/linux/irq-am.h index 5ddd5ca268aa..18df315

[PATCH rfc 1/5] irq-am: Introduce library implementing generic adaptive moderation

2018-02-05 Thread Sagi Grimberg

irq-am library helps I/O devices implement interrupt moderation in an adaptive fashion, based on online stats. The consumer can initialize an irq-am context with a callback that performs the device specific moderation programming and also the number of am (adaptive moderation) levels which are als

Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue

2018-02-05 Thread chenxiang (M)

在 2018/2/5 23:20, Ming Lei 写道: This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps each reply queue to blk_mq's hw queue, then .queuecommand can always choose the hw queue as the reply queue. And if no any online CPU is mapped to one hw queue, request can't be submitted to this

[PATCH 2/2] block, char_dev: Use correct format specifier for unsigned ints

2018-02-05 Thread Srivatsa S. Bhat

From: Srivatsa S. Bhat register_blkdev() and __register_chrdev_region() treat the major number as an unsigned int. So print it the same way to avoid absurd error statements such as: "... major requested (-1) is greater than the maximum (511) ..." (and also fix off-by-one bugs in the error prints)

[PATCH 1/2] char_dev: Fix off-by-one bugs in find_dynamic_major()

2018-02-05 Thread Srivatsa S. Bhat

From: Srivatsa S. Bhat CHRDEV_MAJOR_DYN_END and CHRDEV_MAJOR_DYN_EXT_END are valid major numbers. So fix the loop iteration to include them in the search for free major numbers. While at it, also remove a redundant if condition ("cd->major != i"), as it will never be true. Signed-off-by: Srivat

[PATCH v5 00/10] bcache: device failure handling improvement

2018-02-05 Thread Coly Li

Hi maintainers and folks, This patch set tries to improve bcache device failure handling, includes cache device and backing device failures. The basic idea to handle failed cache device is, - Unregister cache set - Detach all backing devices which are attached to this cache set - Stop all the det

[PATCH v5 04/10] bcache: stop dc->writeback_rate_update properly

2018-02-05 Thread Coly Li

struct delayed_work writeback_rate_update in struct cache_dev is a delayed worker to call function update_writeback_rate() in period (the interval is defined by dc->writeback_rate_update_seconds). When a metadate I/O error happens on cache device, bcache error handling routine bch_cache_set_error(

[PATCH v5 01/10] bcache: set writeback_rate_update_seconds in range [1, 60] seconds

2018-02-05 Thread Coly Li

dc->writeback_rate_update_seconds can be set via sysfs and its value can be set to [1, ULONG_MAX]. It does not make sense to set such a large value, 60 seconds is long enough value considering the default 5 seconds works well for long time. Because dc->writeback_rate_update is a special delayed w

[PATCH v5 03/10] bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set

2018-02-05 Thread Coly Li

In patch "bcache: fix cached_dev->count usage for bch_cache_set_error()", cached_dev_get() is called when creating dc->writeback_thread, and cached_dev_put() is called when exiting dc->writeback_thread. This modification works well unless people detach the bcache device manually by 'echo 1 > /s

[PATCH v5 02/10] bcache: fix cached_dev->count usage for bch_cache_set_error()

2018-02-05 Thread Coly Li

When bcache metadata I/O fails, bcache will call bch_cache_set_error() to retire the whole cache set. The expected behavior to retire a cache set is to unregister the cache set, and unregister all backing device attached to this cache set, then remove sysfs entries of the cache set and all attached

[PATCH v5 07/10] bcache: fix inaccurate io state for detached bcache devices

2018-02-05 Thread Coly Li

From: Tang Junhui When we run IO in a detached device, and run iostat to shows IO status, normally it will show like bellow (Omitted some fields): Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util sdd... 15.89 0.531.820.202.23 1.81 52.30 bcache0..

[PATCH v5 06/10] bcache: add stop_when_cache_set_failed option to backing device

2018-02-05 Thread Coly Li

When there are too many I/O errors on cache device, current bcache code will retire the whole cache set, and detach all bcache devices. But the detached bcache devices are not stopped, which is problematic when bcache is in writeback mode. If the retired cache set has dirty data of backing devices

[PATCH v5 08/10] bcache: add backing_request_endio() for bi_end_io of attached backing device I/O

2018-02-05 Thread Coly Li

In order to catch I/O error of backing device, a separate bi_end_io call back is required. Then a per backing device counter can record I/O errors number and retire the backing device if the counter reaches a per backing device I/O error limit. This patch adds backing_request_endio() to bcache bac

[PATCH v5 05/10] bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags

2018-02-05 Thread Coly Li

When too many I/Os failed on cache device, bch_cache_set_error() is called in the error handling code path to retire whole problematic cache set. If new I/O requests continue to come and take refcount dc->count, the cache set won't be retired immediately, this is a problem. Further more, there are

[PATCH v5 10/10] bcache: stop bcache device when backing device is offline

2018-02-05 Thread Coly Li

Currently bcache does not handle backing device failure, if backing device is offline and disconnected from system, its bcache device can still be accessible. If the bcache device is in writeback mode, I/O requests even can success if the requests hit on cache device. That is to say, when and how b

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Mike Galbraith

Hi Paolo, I applied this to master.today, flipped udev back to bfq and took it for a spin. Unfortunately, box fairly quickly went boom under load. [ 454.739975] [ cut here ] [ 454.739979] list_add corruption. prev->next should be next (5f99a42a), but was

[PATCH v5 09/10] bcache: add io_disable to struct cached_dev

2018-02-05 Thread Coly Li

If a bcache device is configured to writeback mode, current code does not handle write I/O errors on backing devices properly. In writeback mode, write request is written to cache device, and latter being flushed to backing device. If I/O failed when writing from cache device to the backing device

RE: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-05 Thread Kashyap Desai

> > We still have more than one reply queue ending up completion one CPU. > > pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) has to be used, that means > smp_affinity_enable has to be set as 1, but seems it is the default setting. > > Please see kernel/irq/affinity.c, especially irq_calc_affinity_vectors(

Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-05 Thread Or Gerlitz

On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg wrote: [...] > In the networking stack, each device driver implements adaptive IRQ moderation > on its own. The approach here is a bit different, it tries to take the common > denominator, > which is per-queue statistics gathering and workload change

[PATCH] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-05 Thread Joseph Qi

We've triggered a WARNING in blk_throtl_bio when throttling writeback io, which complains blkg->refcnt is already 0 when calling blkg_get, and then kernel crashes with invalid page request. After investigating this issue, we've found there is a race between blkcg_bio_issue_check and cgroup_rmdir. T

Re: [PATCH rfc 1/5] irq-am: Introduce library implementing generic adaptive moderation

2018-02-05 Thread Or Gerlitz

On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg wrote: > irq-am library helps I/O devices implement interrupt moderation in > an adaptive fashion, based on online stats. > > The consumer can initialize an irq-am context with a callback that > performs the device specific moderation programming and

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Oleksandr Natalenko

Hi, Paolo. I can confirm that this patch fixes cfdisk hang for me. I've also tried to trigger the issue Mike has encountered, but with no luck (maybe, I wasn't insistent enough, just was doing dd on usb-storage device in the VM). So, with regard to cfdisk hang on usb-storage: Tested-by: Ole

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Mike Galbraith

On Tue, 2018-02-06 at 08:44 +0100, Oleksandr Natalenko wrote: > Hi, Paolo. > > I can confirm that this patch fixes cfdisk hang for me. I've also tried > to trigger the issue Mike has encountered, but with no luck (maybe, I > wasn't insistent enough, just was doing dd on usb-storage device in the

93 matches

Mail list logo