Re: [PATCH v5] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-11 Thread Bart Van Assche
On Wed, 2018-04-11 at 10:11 +0800, Ming Lei wrote: > On Tue, Apr 10, 2018 at 03:01:57PM -0600, Bart Van Assche wrote: > > The blk-mq timeout handling code ignores completions that occur after > > blk_mq_check_expired() has been called and before blk_mq_rq_timed_out() &

Re: [PATCH v5] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-11 Thread Bart Van Assche
On Wed, 2018-04-11 at 16:19 +0300, Sagi Grimberg wrote: > > static void __blk_mq_requeue_request(struct request *rq) > > { > > struct request_queue *q = rq->q; > > + enum mq_rq_state old_state = blk_mq_rq_state(rq); > > > > blk_mq_put_driver_tag(rq); > > > > trace_block_rq_r

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-11 Thread Bart Van Assche
On Tue, 2018-04-10 at 14:54 -0700, t...@kernel.org wrote: > Ah, yeah, I was moving it out of add_timer but forgot to actully add > it to the issue path. Fixed patch below. > > BTW, no matter what we do w/ the request handover between normal and > timeout paths, we'd need something similar. Other

Re: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 20:10 -0600, Jens Axboe wrote: > On 4/10/18 8:05 PM, Ming Lei wrote: > > On Tue, Apr 10, 2018 at 02:30:35PM -0600, Bart Van Assche wrote: > > > Because blkcg_exit_queue() is now called from inside blk_cleanup_queue() > > > it is no longer safe t

Re: [PATCH v2] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash

2018-04-10 Thread Bart Van Assche
On Wed, 2018-04-11 at 10:12 +0800, Ming Lei wrote: > On Tue, Apr 10, 2018 at 02:45:54PM -0600, Bart Van Assche wrote: > > + struct request_queue *q = bio->bi_disk->queue; > > blk_qc_t ret = BLK_QC_T_NONE; > > > > + if (blk_queue_enter(q, flags) < 0)

[PATCH v3] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash

2018-04-10 Thread Bart Van Assche
: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") Signed-off-by: Bart Van Assche Cc: Ming Lei Cc: Joseph Qi --- Changes compared to v2: converted two ternary expressions into if-statements. Changes compared to v1: guarded the blk_queue_exi

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 14:33 -0700, t...@kernel.org wrote: > + else > + rq->missed_completion = true; In this patch I found code that sets rq->missed_completion but no code that clears it. Did I perhaps miss something? Thanks, Bart.

[PATCH v5] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-10 Thread Bart Van Assche
() and also the synchronize_rcu() call in the timeout handler. Signed-off-by: Bart Van Assche Cc: Tejun Heo Cc: Christoph Hellwig Cc: Ming Lei Cc: Sagi Grimberg Cc: Israel Rukshin , Cc: Max Gurtovoy Cc: # v4.16 --- Changes compared to v4: - Addressed multiple review comments from Christoph. The m

[PATCH v2] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash

2018-04-10 Thread Bart Van Assche
: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") Signed-off-by: Bart Van Assche Cc: Ming Lei Cc: Joseph Qi --- Changes compared to v1: changed the blk_queue_exit() inside the loop with "if (q)". b

[PATCH] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash

2018-04-10 Thread Bart Van Assche
: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") Signed-off-by: Bart Van Assche Cc: Ming Lei Cc: Joseph Qi --- block/blk-core.c | 32 ++-- 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/block/bl

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 22:30 +0800, Ming Lei wrote: > On Tue, Apr 10, 2018 at 02:09:33PM +0000, Bart Van Assche wrote: > > Please keep in mind that all synchronize_rcu() does is to wait for pre- > > existing RCU readers to finish. synchronize_rcu() does not prevent that new &g

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 07:20 -0700, Tejun Heo wrote: > On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: > > Since the request state can be updated from two different contexts, > > namely regular completion and request timeout, this race cannot be > > fixed wit

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 21:55 +0800, Ming Lei wrote: > Then I have same question with Jianchao, what is the actual double > complete in linus tree between BLK_EH_RESET_TIMER and normal completion? > > Follows my understanding: > > 1) when timeout is detected on one request, its aborted_gstate is >

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 11:55 +0200, Christoph Hellwig wrote: > I don't think we need the atomic_long_cmpxchg, and can do with a plain > cmpxhg. Also unterminated cmpxchg loops are a bad idea, but I think > both callers are protected from other changes so we can turn that > into a WARN_ON(). Hello

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 15:59 +0800, jianchao.wang wrote: > If yes, how does the timeout handler get the freed request when the tag has > been freed ? Hello Jianchao, Have you noticed that the timeout handler does not check whether or not the request tag is freed? Additionally, I don't think it w

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Bart Van Assche
On Tue, 2018-04-10 at 16:41 +0800, Ming Lei wrote: > On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: > > If a completion occurs after blk_mq_rq_timed_out() has reset > > rq->aborted_gstate and the request is again in flight when the timeout > > Given rq-

[PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-09 Thread Bart Van Assche
quot;) Signed-off-by: Bart Van Assche Cc: Tejun Heo Cc: Christoph Hellwig Cc: Sagi Grimberg Cc: Israel Rukshin , Cc: Max Gurtovoy Cc: # v4.16 --- Changes compared to v3 (see also https://www.mail-archive.com/linux-block@vger.kernel.org/msg20073.html): - Removed the spinlock again that was

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Bart Van Assche
On Tue, 2018-04-10 at 09:30 +0800, Ming Lei wrote: > Also is it possible to see queue freed here? I think the caller should keep a reference on the request queue. Otherwise we have a much bigger problem than a race between submitting a bio and removing a request queue from the cgroup controller in

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 16:58 -0600, Jens Axboe wrote: > This ends up being nutty in the generic_make_request() case, where we > do the exact same enter/exit logic right after. That needs to get unified. > Maybe move the queue enter into generic_make_request_checks(), and exit > in the caller? Hello

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote: > The oops happens during generic_make_request_checks(), in > blk_throtl_bio() exactly. > So if we want to bypass dying queue, we have to check this before > generic_make_request_checks(), I think. How about something like the patch below? Thank

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:56 -0700, t...@kernel.org wrote: > On Mon, Apr 09, 2018 at 05:03:05PM +0000, Bart Van Assche wrote: > > exist today in the blk-mq timeout handling code cannot be fixed completely > > using RCU only. > > I really don't think that is that complic

Re: 4.15.14 crash with iscsi target and dvd

2018-04-09 Thread Bart Van Assche
On Sun, 2018-04-08 at 12:02 -0400, Wakko Warner wrote: > I finished with git bisect. Here's the output: > 84c8590646d5b35804bac60eb58b145839b5893e is the first bad commit > commit 84c8590646d5b35804bac60eb58b145839b5893e > Author: Ming Lei > Date: Fri Nov 11 20:05:32 2016 +0800 > > target:

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 09:47 -0700, Tejun Heo wrote: > On Sun, Apr 08, 2018 at 10:20:38PM -0700, Bart Van Assche wrote: > > If a completion occurs after blk_mq_rq_timed_out() has reset > > rq->aborted_gstate and the request is again in flight when the timeout > > expire

Re: Block layer use of __GFP flags

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 01:26 -0700, Christoph Hellwig wrote: > On Mon, Apr 09, 2018 at 08:53:49AM +0200, Hannes Reinecke wrote: > > Why don't you fold the 'flags' argument into the 'gfp_flags', and drop > > the 'flags' argument completely? > > Looks a bit pointless to me, having two arguments denoti

Re: Block layer use of __GFP flags

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:00 +0200, Michal Hocko wrote: > On Mon 09-04-18 04:46:22, Bart Van Assche wrote: > [...] > [...] > > diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c > > index ad8a125defdd..3ddb464b72e6 100644 > > --- a/drivers/ide/ide-pm.c >

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:37 +0200, Christoph Hellwig wrote: > This looks sensible, but I'm worried about taking a whole spinlock > for every request completion, including irq disabling. However it seems > like your new updated pattern would fit use of cmpxchg() very nicely. Hello Christoph, Than

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:37 +0300, Sagi Grimberg wrote: > > If a completion occurs after blk_mq_rq_timed_out() has reset > > rq->aborted_gstate and the request is again in flight when the timeout > > expires then a request will be completed twice: a first time by the > > timeout handler and a secon

[PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-08 Thread Bart Van Assche
_iter+0xe9/0x200 blk_mq_timeout_work+0x181/0x2e0 process_one_work+0x21c/0x6d0 worker_thread+0x35/0x380 kthread+0x117/0x130 ret_from_fork+0x24/0x30 Fixes: 1d9bd5161ba3 ("blk-mq: replace timeout synchronization with a RCU and generation based scheme") Signed-off-by: Bart Van Assche Cc: Tejun Heo Cc: C

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Bart Van Assche
On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote: > The following kernel oops is triggered by 'removing scsi device' during > heavy IO. Is the below patch sufficient to fix this? Thanks, Bart. Subject: blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash Bec

Re: Block layer use of __GFP flags

2018-04-08 Thread Bart Van Assche
On Sun, 2018-04-08 at 12:08 -0700, Matthew Wilcox wrote: > On Sun, Apr 08, 2018 at 04:40:59PM +0000, Bart Van Assche wrote: > > Do you perhaps want me to prepare a patch that makes blk_get_request() again > > respect the full gfp mask passed as third argument to blk_get_request(

Re: Block layer use of __GFP flags

2018-04-08 Thread Bart Van Assche
On Sat, 2018-04-07 at 23:54 -0700, Matthew Wilcox wrote: > Please explain: > > commit 6a15674d1e90917f1723a814e2e8c949000440f7 > Author: Bart Van Assche > Date: Thu Nov 9 10:49:54 2017 -0800 > > block: Introduce blk_get_request_flags() > > A side effec

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Bart Van Assche
On Sun, 2018-04-08 at 16:11 +0800, Joseph Qi wrote: > This is because scsi_remove_device() will call blk_cleanup_queue(), and > then all blkgs have been destroyed and root_blkg is NULL. > Thus tg is NULL and trigger NULL pointer dereference when get td from > tg (tg->td). > It seems that we cannot

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Bart Van Assche
On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote: > The following kernel oops is triggered by 'removing scsi device' during > heavy IO. How did you trigger this oops? Bart.

Re: 4.15.14 crash with iscsi target and dvd

2018-04-07 Thread Bart Van Assche
On Sat, 2018-04-07 at 12:53 -0400, Wakko Warner wrote: > Bart Van Assche wrote: > > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote: > > > I know now why scsi_print_command isn't doing anything. cmd->cmnd is > > > null. > > > I added a dev

Re: 4.15.14 crash with iscsi target and dvd

2018-04-06 Thread Bart Van Assche
On Fri, 2018-04-06 at 21:03 -0400, Wakko Warner wrote: > Bart Van Assche wrote: > > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote: > > > I know now why scsi_print_command isn't doing anything. cmd->cmnd is > > > null. > > > I added a dev

Re: 4.15.14 crash with iscsi target and dvd

2018-04-05 Thread Bart Van Assche
On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote: > I know now why scsi_print_command isn't doing anything. cmd->cmnd is null. > I added a dev_printk in scsi_print_command where the 2 if statements return. > Logs: > [ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL That's something that should nev

Re: BUG: KASAN: use-after-free in bt_for_each+0x1ea/0x29f

2018-04-05 Thread Bart Van Assche
On Wed, 2018-04-04 at 19:26 -0600, Jens Axboe wrote: > Leaving the whole trace here, but I'm having a hard time making sense of it. > It complains about a user-after-free in the inflight iteration, which is only > working on the queue, request, and on-stack mi data. None of these would be > freed.

Re: 4.15.14 crash with iscsi target and dvd

2018-04-03 Thread Bart Van Assche
On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote: > Wakko Warner wrote: > > Wakko Warner wrote: > > > I tested 4.14.32 last night with the same oops. 4.9.91 works fine. > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount > > > /dev/sr1 and then do find -type f | x

Re: [PATCH 2/2] blk-mq: Fix request handover from timeout path to normal execution

2018-04-02 Thread Bart Van Assche
On Mon, 2018-04-02 at 15:16 -0700, t...@kernel.org wrote: > AFAIK, > there's one non-critical race condition which has always been there. > We have a larger race window for that case but don't yet know whether > that's problematic or not. If that actually is problematic, we can > figure out a way

Re: [PATCH 2/2] blk-mq: Fix request handover from timeout path to normal execution

2018-04-02 Thread Bart Van Assche
On Mon, 2018-04-02 at 15:01 -0700, t...@kernel.org wrote: > On Mon, Apr 02, 2018 at 09:56:41PM +0000, Bart Van Assche wrote: > > This patch increases the time during which .aborted_gstate == .gstate if the > > timeout is reset. Does that increase the chance that a completion will

Re: [PATCH 2/2] blk-mq: Fix request handover from timeout path to normal execution

2018-04-02 Thread Bart Van Assche
On Mon, 2018-04-02 at 14:10 -0700, Tejun Heo wrote: > On Mon, Apr 02, 2018 at 02:08:37PM -0700, Bart Van Assche wrote: > > On 04/02/18 12:01, Tejun Heo wrote: > > > + * As nothing prevents from completion happening while > > > + * ->aborted_gstate is set, this m

Re: [PATCH 2/2] blk-mq: Fix request handover from timeout path to normal execution

2018-04-02 Thread Bart Van Assche
On Mon, 2018-04-02 at 14:10 -0700, Tejun Heo wrote: > On Mon, Apr 02, 2018 at 02:08:37PM -0700, Bart Van Assche wrote: > > On 04/02/18 12:01, Tejun Heo wrote: > > > + * As nothing prevents from completion happening while > > > + * ->aborted_gstate is set, this m

Re: [PATCH 2/2] blk-mq: Fix request handover from timeout path to normal execution

2018-04-02 Thread Bart Van Assche
On 04/02/18 12:01, Tejun Heo wrote: +* As nothing prevents from completion happening while +* ->aborted_gstate is set, this may lead to ignored completions +* and further spurious timeouts. +*/ + if (rq->rq_flags & RQF_MQ_TIMEOUT_RESET) + blk_mq

Re: [PATCH 1/2] blk-mq: Factor out [s]rcu synchronization

2018-04-02 Thread Bart Van Assche
On 04/02/18 12:00, Tejun Heo wrote: Factor out [s]rcu synchronization in blk_mq_timeout_work() into blk_mq_timeout_sync_rcu(). This is to add another user in the future and doesn't cause any functional changes. Reviewed-by: Bart Van Assche

Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Bart Van Assche
On Sun, 2018-04-01 at 12:24 -0400, Wakko Warner wrote: > What do you enable in the kernel to get those locking messages? Hello Wakko, I have attached the script to this e-mail that I use to enable a bunch of kernel debugging options. Please note that enabling these options, especially lockdep and

Re: 4.15.14 crash with iscsi target and dvd

2018-04-01 Thread Bart Van Assche
On Sun, 2018-04-01 at 07:37 -0400, Wakko Warner wrote: > Bart Van Assche wrote: > > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote: > > > Richard Weinberger wrote: > > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner > > > > wrote: > >

Re: 4.15.14 crash with iscsi target and dvd

2018-03-31 Thread Bart Van Assche
On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote: > Richard Weinberger wrote: > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner wrote: > > > I reported this before but noone responded. > > > > Because you're sending only to LKML. > > CC'ing storage folks. > > Thank you. I wasn't sure who I

Re: Multi-Actuator SAS HDD First Look

2018-03-30 Thread Bart Van Assche
On Fri, 2018-03-30 at 12:36 -0600, Tim Walker wrote: > Yes I will be there to discuss the multi-LUN approach. I wanted to get > these interface details out so we could have some background and > perhaps folks would come with ideas. I don't have much more to put > out, but I will certainly answer qu

Re: Multi-Actuator SAS HDD First Look

2018-03-30 Thread Bart Van Assche
On Fri, 2018-03-30 at 12:21 -0600, Tim Walker wrote: > Yes, the header LUN field. Sorry! > > We hadn't intended to broadcast - we expect to see a LUN specified. > For a device specific command both LUNs will be affected regardless of > which LUN is specified in the transport field. e.g. if we comm

Re: Multi-Actuator SAS HDD First Look

2018-03-30 Thread Bart Van Assche
On Fri, 2018-03-30 at 12:07 -0600, Tim Walker wrote: > Concerning how we are currently allocating commands to LUNs or the > device as a whole, here is a list based on the current two LUN model. > This model has LUN0 & LUN1, both reporting 1/2 the total storage. Our > definition of "device based" is

Re: v4.16-rc1 + dm-mpath + BFQ

2018-03-30 Thread Bart Van Assche
On Fri, 2018-03-30 at 10:23 +0200, Paolo Valente wrote: > Still 4.16-rc1, being that the version for which you reported this > issue in the first place. A vanilla v4.16-rc1 kernel is not sufficient to run the srp-test software since RDMA/CM support for the SRP target driver is missing from that ke

Re: v4.16-rc1 + dm-mpath + BFQ

2018-03-29 Thread Bart Van Assche
On Thu, 2018-03-29 at 11:02 +0200, Paolo Valente wrote: > > Il giorno 01 mar 2018, alle ore 02:35, Bart Van Assche > > ha scritto: > > Thank you for having shared your kernel config off-list. After having > > made the following changes to your kernel config I was able

Re: disk-io lockup in 4.14.13 kernel

2018-03-27 Thread Bart Van Assche
On 03/27/18 01:59, Jaco Kroon wrote: I triggered it hoping to get a stack trace of the process which is deadlocking finding where the lock is being taken that ends up blocking, but I then realized that you mentioned sleeping, which may end up not having a stack trace because there is no process a

Re: disk-io lockup in 4.14.13 kernel

2018-03-26 Thread Bart Van Assche
On Sat, 2018-03-24 at 23:38 +0200, Jaco Kroon wrote: > Does the following go with your theory: > > [452545.945561] sysrq: SysRq : Show backtrace of all active CPUs > [452545.946182] NMI backtrace for cpu 5 > [452545.946185] CPU: 5 PID: 31921 Comm: bash Tainted: G I > 4.14.13-uls #2 >

Re: [PATCH] mmc: block: Delete gendisk before cleaning up the request queue

2018-03-22 Thread Bart Van Assche
new. Anyway, thanks for this patch. Reviewed-by: Bart Van Assche

[PATCH v2] block: Change a rcu_read_{lock,unlock}_sched() pair into rcu_read_{lock,unlock}()

2018-03-19 Thread Bart Van Assche
sufficient. Note: scsi_device_quiesce() does not have to be modified since it already uses synchronize_rcu(). Reported-by: Tejun Heo Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably") Signed-off-by: Bart Van Assche Acked-by: Tejun Heo Cc: Tejun Heo

[PATCH] block: Change a rcu_read_{lock,unlock}_sched() pair into rcu_read_{lock,unlock}()

2018-03-19 Thread Bart Van Assche
scsi: Make SCSI quiesce and resume work reliably") Signed-off-by: Bart Van Assche Cc: Hannes Reinecke Cc: Ming Lei Cc: Christoph Hellwig Cc: Johannes Thumshirn Cc: Tejun Heo Cc: Oleksandr Natalenko Cc: Martin Steigerwald Cc: sta...@vger.kernel.org # v4.15 --- block/blk-core.c | 4 ++

Re: [PATCH 00/16] bcache: Compiler, sparse and smatch fixes

2018-03-16 Thread Bart Van Assche
On Fri, 2018-03-16 at 11:16 -0700, Michael Lyle wrote: > I'll dig through these. In the future can you please send things off > the linux-bcache list? I'm a bit off my normal workflow this way. I will do that. Thanks for the reviews! Bart.

[PATCH v2] blk-mq-debugfs: Show more request state information

2018-03-16 Thread Bart Van Assche
8 ("blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq") Signed-off-by: Bart Van Assche Cc: Tejun Heo --- Changes compared to v1: - Added an out-of-bounds check to the access of the request state name array as requested by Jens. block/blk-mq-debugfs.c | 16 +++- 1 file ch

Re: [PATCH 15/16] bcache: Fix an endianness bug

2018-03-16 Thread Bart Van Assche
date to support the endianness issue. > > How about leave the endianness problem to me? I will pick code piece > from your patch and set 'From: Bart Van Assche ' > on the patch(s), and continue my fix for bcache on s390x. Hello Coly, That sounds fine to me. Please let me kn

Re: [PATCH 13/16] bcache: Make bch_dump_read() fail if copying to user space fails

2018-03-15 Thread Bart Van Assche
On Fri, 2018-03-16 at 01:00 +0800, Coly Li wrote: > On 15/03/2018 11:08 PM, Bart Van Assche wrote: > > copy_to_user() returns the number of remaining bytes. Avoid that > > a larger value is returned than the number of bytes that have > > been copied by returning -EFAULT if no

Re: [PATCH 12/16] bcache: Make it easier for static analyzers to analyze bch_allocator_thread()

2018-03-15 Thread Bart Van Assche
On Fri, 2018-03-16 at 00:29 +0800, Coly Li wrote: > On 15/03/2018 11:08 PM, Bart Van Assche wrote: > > This patch does not change any functionality but avoids that smatch > > reports the following: > > > > drivers/md/bcache/alloc.c:334: bch_allocator_thread() error

Re: [PATCH 06/16] bcache: Suppress more warnings about set-but-not-used variables

2018-03-15 Thread Bart Van Assche
On Fri, 2018-03-16 at 00:20 +0800, Coly Li wrote: > On 15/03/2018 11:08 PM, Bart Van Assche wrote: > > This patch does not change any functionality. > > > > Signed-off-by: Bart Van Assche > > Hi Bart, > > This patch looks good to me. A question is, does G

[PATCH 12/16] bcache: Make it easier for static analyzers to analyze bch_allocator_thread()

2018-03-15 Thread Bart Van Assche
This patch does not change any functionality but avoids that smatch reports the following: drivers/md/bcache/alloc.c:334: bch_allocator_thread() error: uninitialized symbol 'bucket'. Signed-off-by: Bart Van Assche --- drivers/md/bcache/alloc.c | 7 ++- 1 file changed, 2 insert

[PATCH 07/16] bcache: Reduce the number of sparse complaints about lock imbalances

2018-03-15 Thread Bart Van Assche
Add more annotations for sparse to inform it about which functions do not have the same number of spin_lock() and spin_unlock() calls. Signed-off-by: Bart Van Assche --- drivers/md/bcache/journal.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/md/bcache/journal.c b/drivers/md

[PATCH 16/16] bcache: Fix endianness annotations

2018-03-15 Thread Bart Van Assche
except csum are declared as little endian. Signed-off-by: Bart Van Assche --- drivers/md/bcache/super.c | 10 ++-- include/uapi/linux/bcache.h | 118 +++- 2 files changed, 68 insertions(+), 60 deletions(-) diff --git a/drivers/md/bcache/super.c b

[PATCH 15/16] bcache: Fix an endianness bug

2018-03-15 Thread Bart Van Assche
Ensure that byte swapping occurs on big endian architectures when reading or writing the superblock. Signed-off-by: Bart Van Assche --- drivers/md/bcache/bcache.h | 12 drivers/md/bcache/super.c | 4 ++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/md

[PATCH 13/16] bcache: Make bch_dump_read() fail if copying to user space fails

2018-03-15 Thread Bart Van Assche
copy_to_user() returns the number of remaining bytes. Avoid that a larger value is returned than the number of bytes that have been copied by returning -EFAULT if not all bytes have been copied. Signed-off-by: Bart Van Assche --- drivers/md/bcache/debug.c | 5 ++--- 1 file changed, 2 insertions

[PATCH 10/16] bcache: Suppress a compiler warning in bch_##name##_h()

2018-03-15 Thread Bart Van Assche
INT_MAX(type) / 1024 > i)) \ ^ The only functional change in this patch is that (type) ~0 is changed into ~(type)0. That makes a difference for unsigned types for which sizeof(type) > sizeof(int). I think this change is a bug fix. Signed-off-by: Bart Van Assche --- drivers/md/bcache/util.c |

[PATCH 14/16] bcache: Make csum_set() implementation easier to read

2018-03-15 Thread Bart Van Assche
Introduce two temporary variables to avoid having to repeat expressions. This patch does not change any functionality. Signed-off-by: Bart Van Assche --- drivers/md/bcache/bcache.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/md/bcache/bcache.h b

[PATCH 09/16] bcache: Remove a redundant assignment

2018-03-15 Thread Bart Van Assche
A bio_set_op_attrs() call a little further down overwrites bio->bi_opf. That means that the bio->bi_opf assignment is redundant. Hence remove it. See also commit ad0d9e76a412 ("bcache: use bio op accessors"). Signed-off-by: Bart Van Assche Cc: Mike Christie Cc: Hannes Reinecke

[PATCH 08/16] bcache: Fix a compiler warning in bcache_device_init()

2018-03-15 Thread Bart Van Assche
Avoid that building with W=1 triggers the following compiler warning: drivers/md/bcache/super.c:776:20: warning: comparison is always false due to limited range of data type [-Wtype-limits] d->nr_stripes > SIZE_MAX / sizeof(atomic_t)) { ^ Signed-off-by: Bart Van

[PATCH 04/16] bcache: Fix kernel-doc warnings

2018-03-15 Thread Bart Van Assche
Avoid that building with W=1 triggers warnings about the kernel-doc headers. Signed-off-by: Bart Van Assche --- drivers/md/bcache/btree.c | 2 +- drivers/md/bcache/closure.c | 8 drivers/md/bcache/request.c | 1 + drivers/md/bcache/util.c| 18 -- 4 files

[PATCH 11/16] bcache: Check the d->disk pointer before using it

2018-03-15 Thread Bart Van Assche
Since bcache_device_free() checks the d->disk pointer I think that means that that pointer can be NULL. Hence test that pointer before using it. This was detected by smatch. Signed-off-by: Bart Van Assche --- drivers/md/bcache/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) d

[PATCH 06/16] bcache: Suppress more warnings about set-but-not-used variables

2018-03-15 Thread Bart Van Assche
This patch does not change any functionality. Signed-off-by: Bart Van Assche --- drivers/md/bcache/bset.c| 4 ++-- drivers/md/bcache/journal.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c index e56d3ecdbfcb

[PATCH 03/16] bcache: Annotate switch fall-through

2018-03-15 Thread Bart Van Assche
This patch avoids that building with W=1 triggers complaints about switch fall-throughs. Signed-off-by: Bart Van Assche --- drivers/md/bcache/util.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c index a23cd6a14b74..6198041f0ee2

[PATCH 02/16] bcache: Add __printf annotation to __bch_check_keys()

2018-03-15 Thread Bart Van Assche
Make it possible for the compiler to verify the consistency of the format string passed to __bch_check_keys() and the arguments that should be formatted according to that format string. Signed-off-by: Bart Van Assche --- drivers/md/bcache/bset.h | 5 +++-- 1 file changed, 3 insertions(+), 2

[PATCH 05/16] bcache: Remove an unused variable

2018-03-15 Thread Bart Van Assche
Signed-off-by: Bart Van Assche --- drivers/md/bcache/extents.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c index f9d391711595..c334e461 100644 --- a/drivers/md/bcache/extents.c +++ b/drivers/md/bcache/extents.c @@ -534,7

[PATCH 01/16] bcache: Fix indentation

2018-03-15 Thread Bart Van Assche
This patch avoids that smatch complains about inconsistent indentation. Signed-off-by: Bart Van Assche --- drivers/md/bcache/btree.c | 2 +- drivers/md/bcache/writeback.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache

[PATCH 00/16] bcache: Compiler, sparse and smatch fixes

2018-03-15 Thread Bart Van Assche
. Thanks, Bart. Bart Van Assche (16): bcache: Fix indentation bcache: Add __printf annotation to __bch_check_keys() bcache: Annotate switch fall-through bcache: Fix kernel-doc warnings bcache: Remove an unused variable bcache: Suppress more warnings about set-but-not-used variables

[PATCH v3, resend] block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into

2018-03-14 Thread Bart Van Assche
ECTOR_SIZE redefinition. Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have not been removed from uapi header files nor from NAND drivers in which these constants are used for another purpose than converting block layer offsets and sizes into a number of sectors. Signed-off-by: Bart

Re: [PATCH v2] block: bio_check_eod() needs to consider partition

2018-03-14 Thread Bart Van Assche
On Wed, 2018-03-14 at 14:03 +0100, h...@lst.de wrote: > can you test the version below? Hello Christoph, The same VM that failed to boot with v2 of this patch boots fine with this patch. Thanks, Bart.

Re: disk-io lockup in 4.14.13 kernel

2018-03-13 Thread Bart Van Assche
On Tue, 2018-03-13 at 19:16 +0200, Jaco Kroon wrote: > The server in question is the destination of numerous rsync/ssh cases > (used primarily for backups) and is not intended as a real-time system. > I'm happy to enable the options below that you would indicate would be > helpful in pinpointing t

Re: [PATCH v2] block: bio_check_eod() needs to consider partition

2018-03-13 Thread Bart Van Assche
On Tue, 2018-03-13 at 07:49 -0700, Jens Axboe wrote: > On 3/13/18 2:18 AM, Christoph Hellwig wrote: > > bio_check_eod() should check partiton size not the whole disk if > > bio->bi_partno is non-zero. Does this by taking the call to bio_check_eod > > into blk_partition_remap. > > Applied, thanks.

Re: disk-io lockup in 4.14.13 kernel

2018-03-13 Thread Bart Van Assche
On Tue, 2018-03-13 at 16:59 +0200, Jaco Kroon wrote: > I quickly checked my dmesg logs and I'm not seeing that particular > message, could be that newer kernels only started warning about it? Hello Jaco, That message only appears if CONFIG_DEBUG_ATOMIC_SLEEP (sleep inside atomic) is enabled in t

Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts

2018-03-13 Thread Bart Van Assche
On Tue, 2018-03-13 at 22:32 +0800, Ming Lei wrote: > On Tue, Mar 13, 2018 at 02:08:23PM +0100, Martin Steigerwald wrote: > > Ming and Bart, I added you to cc, cause I had to do with you about another > > blk-mq report, please feel free to adapt. > > Looks RIP points to scsi_times_out+0x17/0x1d0,

Re: disk-io lockup in 4.14.13 kernel

2018-03-13 Thread Bart Van Assche
On Tue, 2018-03-13 at 11:30 +0200, Jaco Kroon wrote: > On 11/03/2018 07:00, Bart Van Assche wrote: > > Did I see correctly that /dev/sdm is behind a MPT SAS controller? You may > > want to contact the authors of this driver and Cc the linux-scsi mailing > > list. Sorry but

Re: [PATCH] device_handler: remove VLAs

2018-03-12 Thread Bart Van Assche
On Sat, 2018-03-10 at 14:14 +0100, Stephen Kitt wrote: > The two patches I sent were supposed to be alternative solutions; see > https://marc.info/?l=linux-scsi&m=152063671005295&w=2 for the introduction (I > seem to have messed up the headers, so the mails didn’t end up threaded > properly). The

Re: disk-io lockup in 4.14.13 kernel

2018-03-10 Thread Bart Van Assche
On Sun, 2018-03-11 at 06:33 +0200, Jaco Kroon wrote: > crowsnest ~ # find /sys -name sdm > /sys/kernel/debug/block/sdm > /sys/devices/pci:00/:00:01.0/:01:00.0/host0/port-0:0/expander-0:0/port-0:0:0/expander-0:1/port-0:1:0/end_device-0:1:0/target0:0:13/0:0:13:0/block/sdm > /sys/class/blo

Re: disk-io lockup in 4.14.13 kernel

2018-03-10 Thread Bart Van Assche
On Sat, 2018-03-10 at 22:56 +0200, Jaco Kroon wrote: > On 22/02/2018 18:46, Bart Van Assche wrote: > > On 02/22/18 02:58, Jaco Kroon wrote: > > > We've been seeing sporadic IO lockups on recent kernels. > > > > Are you using the legacy I/O stack or blk-mq? If yo

Re: [PATCH] device_handler: remove VLAs

2018-03-09 Thread Bart Van Assche
On Fri, 2018-03-09 at 23:32 +0100, Stephen Kitt wrote: > In preparation to enabling -Wvla, remove VLAs and replace them with > fixed-length arrays instead. > > scsi_dh_{alua,emc,rdac} use variable-length array declarations to > store command blocks, with the appropriate size as determined by > COM

Re: [PATCH] scsi: resolve COMMAND_SIZE at compile time

2018-03-09 Thread Bart Van Assche
On Fri, 2018-03-09 at 23:33 +0100, Stephen Kitt wrote: > +/* > + * SCSI command sizes are as follows, in bytes, for fixed size commands, per > + * group: 6, 10, 10, 12, 16, 12, 10, 10. The top three bits of an opcode > + * determine its group. > + * The size table is encoded into a 32-bit value by

Re: [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests

2018-03-08 Thread Bart Van Assche
On Thu, 2018-03-01 at 15:58 +, Stephen Bates wrote: > > Any plans adding the capability to nvme-rdma? Should be > > straight-forward... In theory, the use-case would be rdma backend > > fabric behind. Shouldn't be hard to test either... > > Nice idea Sagi. Yes we have been starting to look at

[PATCH] block: Suppress kernel-doc warnings triggered by blk-zoned.c

2018-03-08 Thread Bart Van Assche
Avoid that building with W=1 causes the kernel-doc tool to complain about undocumented function arguments for the blk-zoned.c source file. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Damien Le Moal --- block/blk-zoned.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions

[PATCH] blk-mq-debugfs: Show more request state information

2018-03-08 Thread Bart Van Assche
8 ("blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq") Signed-off-by: Bart Van Assche Cc: Tejun Heo --- block/blk-mq-debugfs.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index bd21d5b9f65f..162e09c02ae7 10064

Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue

2018-03-08 Thread Bart Van Assche
On Thu, 2018-03-08 at 09:41 +0100, Hannes Reinecke wrote: > IE the _entire_ request set is allocated as _one_ array, making it quite > hard to handle from the lower-level CPU caches. > Also the 'node' indicator doesn't really help us here, as the requests > have to be access by all CPUs in the shar

[PATCH v2 00/11] Make all concurrent queue flag manipulations safe

2018-03-07 Thread Bart Van Assche
ot path. Bart Van Assche (11): block: Reorder the queue flag manipulation function definitions block: Use the queue_flag_*() functions instead of open-coding these block: Introduce blk_queue_flag_{set,clear,test_and_{set,clear}}() block: Protect queue flag changes with the queue lock m

[PATCH v2 07/11] iscsi: Use blk_queue_flag_set()

2018-03-07 Thread Bart Van Assche
Use blk_queue_flag_set() instead of open-coding this function. Signed-off-by: Bart Van Assche Acked-by: Martin K. Petersen Reviewed-by: Johannes Thumshirn Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Ming Lei --- drivers/scsi/iscsi_tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion

[PATCH v2 01/11] block: Reorder the queue flag manipulation function definitions

2018-03-07 Thread Bart Van Assche
Move the definition of queue_flag_clear_unlocked() up and move the definition of queue_in_flight() down such that all queue flag manipulation function definitions become contiguous. This patch does not change any functionality. Signed-off-by: Bart Van Assche Reviewed-by: Johannes Thumshirn

[PATCH v2 09/11] block: Use blk_queue_flag_*() in drivers instead of queue_flag_*()

2018-03-07 Thread Bart Van Assche
this patch does not change any functionality. Signed-off-by: Bart Van Assche Reviewed-by: Martin K. Petersen Acked-by: Martin K. Petersen Cc: Martin K. Petersen Cc: Mike Snitzer Cc: Shaohua Li Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: Ming Lei --- block/bsg

<    5   6   7   8   9   10   11   12   13   14   >