Re: [PATCH v4 02/10] block, scsi: Give RQF_PREEMPT back its original meaning

2018-09-14 Thread h...@lst.de
On Mon, Aug 06, 2018 at 01:54:40PM +, Bart Van Assche wrote:
> On Mon, 2018-08-06 at 08:27 +0200, Hannes Reinecke wrote:
> > I am not sure this is going to work for SCSI parallel; we're using the
> > QUIESCE state there to do domain validation, and all commands there are
> > most definitely not PM requests.
> > Can you please validate your patches with eg aic7xxx and SCSI parallel
> > disks?
> 
> Hello Hannes,
> 
> How about using the RQF_PM flag for SCSI parallel requests instead of
> RQF_PREEMPT? That change will also avoid that RQF_PREEMPT has a double 
> meaning.
> Anyway, I will see whether I can drop that change from this series and submit
> that change seperately.

One thing I'd like to do is split BLK_MQ_REQ_PREEMPT and RQF_PREEMPT,
that is don't set RQF_PREEMPT automatically when BLK_MQ_REQ_PREEMPT is
passed to blk_get_request, but let the caller set it manually.  After
that RQF_PREEMPT isn't used by the block layer itself at all and can
be moved into ide/scsi specific flags that we can use as we want.


Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread h...@lst.de
On Thu, Jul 19, 2018 at 10:22:40AM -0600, Keith Busch wrote:
> On Thu, Jul 19, 2018 at 04:04:46PM +, Bart Van Assche wrote:
> > On Thu, 2018-07-19 at 09:56 -0600, Keith Busch wrote:
> > > Even some scsi drivers are susceptible to losing their requests with the
> > > reverted behavior: take virtio-scsi for example, which returns RESET_TIMER
> > > from it's timeout handler. With the behavior everyone seems to want,
> > > a natural completion at or around the same time is lost forever because
> > > it was blocked from completion with no way to recover.
> > 
> > The patch I had posted handles a completion that occurs while a timeout is
> > being handled properly. From 
> > https://www.mail-archive.com/linux-block@vger.kernel.org/msg22196.html:
> 
> Sounds like a win-win to me.

How do we get a fix into 4.18 at this part of the cycle?  I think that
is the most important prirority right now.


Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread h...@lst.de
On Wed, Jul 18, 2018 at 03:17:11PM -0600, Keith Busch wrote:
> And there may be other drivers that don't want their completions
> ignored, so breaking them again is also a step in the wrong direction.
> 
> There are not that many blk-mq drivers, so we can go through them all.

I think the point is that SCSI is the biggest user by both the number
of low-level drivers sitting under the midlayer, and also by usage.

We need to be very careful not to break it.  Note that this doesn't
mean that I don't want to eventually move away from just ignoring
completions in timeout state for SCSI.  I'd just rather rever 4.18
to a clean known state instead of doctoring around late in the rc
phase.

> Most don't even implement .timeout, so they never know that condition
> ever happened. Others always return BLK_EH_RESET_TIMER without doing
> anythign else, so the 'new' behavior would have to be better for those,
> too.

And we should never even hit the timeout handler for those as it
is rather pointless (although it looks we currently do..).


Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread h...@lst.de
On Wed, Jul 18, 2018 at 09:56:50PM +0200, h...@lst.de wrote:
> On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote:
> > Of the two you mentioned, yours is preferable IMO. While I appreciate
> > Jianchao's detailed analysis, it's hard to take a proposal seriously
> > that so colourfully calls everyone else "dangerous" while advocating
> > for silently losing requests on purpose.
> > 
> > But where's the option that fixes scsi to handle hardware completions
> > concurrently with arbitrary timeout software? Propping up that house of
> > cards can't be the only recourse.
> 
> The important bit is that we need to fix this issue quickly.  We are
> past -rc5 so I'm rather concerned about anything too complicated.
> 
> I'm not even sure SCSI has a problem with multiple completions happening
> at the same time, but it certainly has a problem with bypassing
> blk_mq_complete_request from the EH path.
> 
> I think we can solve this properly, but I also think we are way to late
> in the 4.18 cycle to fix it properly.  For now I fear we'll just have
> to revert the changes and try again for 4.19 or even 4.20 if we don't
> act quickly enough.

So here is a quick attempt at the revert while also trying to keep
nvme working.  Keith, Bart, Jianchao - does this looks reasonable
as a 4.18 band aid?

http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-eh-revert


Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread h...@lst.de
On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote:
> Of the two you mentioned, yours is preferable IMO. While I appreciate
> Jianchao's detailed analysis, it's hard to take a proposal seriously
> that so colourfully calls everyone else "dangerous" while advocating
> for silently losing requests on purpose.
> 
> But where's the option that fixes scsi to handle hardware completions
> concurrently with arbitrary timeout software? Propping up that house of
> cards can't be the only recourse.

The important bit is that we need to fix this issue quickly.  We are
past -rc5 so I'm rather concerned about anything too complicated.

I'm not even sure SCSI has a problem with multiple completions happening
at the same time, but it certainly has a problem with bypassing
blk_mq_complete_request from the EH path.

I think we can solve this properly, but I also think we are way to late
in the 4.18 cycle to fix it properly.  For now I fear we'll just have
to revert the changes and try again for 4.19 or even 4.20 if we don't
act quickly enough.


Re: [PATCH v9 2/2] blk-mq: Rework blk-mq timeout handling again

2018-05-16 Thread h...@lst.de
On Wed, May 16, 2018 at 04:47:54PM +, Bart Van Assche wrote:
> I think your patch changes the order of changing the request state and
> calling mod_timer(). In my patch the request state and the deadline are
> updated first and mod_timer() is called afterwards. I think your patch
> changes the order of these operations into the following:
> (1) __blk_mq_start_request() sets the request deadline.
> (2) __blk_mq_start_request() calls __blk_add_timer() which in turn calls
> mod_timer().
> (3) __blk_mq_start_request() changes the request state into MQ_RQ_IN_FLIGHT.
> 
> In the unlikely event of a significant delay between (2) and (3) it can
> happen that the timer fires and examines and ignores the request because
> its state differs from MQ_RQ_IN_FLIGHT. If the request for which this
> happened times out its timeout will only be handled the next time
> blk_mq_timeout_work() is called. Is this the behavior you intended?

We can move the timer manipulation after the change easily I think.
It would make sense to add comments explaining the ordering.


Re: [PATCH v9 2/2] blk-mq: Rework blk-mq timeout handling again

2018-05-16 Thread h...@lst.de
On Wed, May 16, 2018 at 04:17:42PM +, Bart Van Assche wrote:
> There is another reason the deadline is included in the atomic operation,
> namely to handle races between the BLK_EH_RESET_TIMER case in 
> blk_mq_rq_timed_out()
> and blk_mq_complete_request(). I don't think that race is addressed properly 
> by
> your patch. I will see what I can do to address that race without using 64-bit
> atomic operations.

I might be missing something here, so please help me understand
what is missing.

If we restart the timer in blk_mq_rq_timed_out we also bump the
generation at the same time as we reset the deadline in your old
patch.  With this patch we only bump the generation, but that should
be enough to address the rest, or not?


Re: [PATCH] block: Avoid executing a report or reset zones while a queue is frozen

2018-04-19 Thread h...@lst.de
On Tue, Apr 17, 2018 at 05:35:07PM +, Bart Van Assche wrote:
> > Hmm.  I think we need to avoid clearing that data and update it using
> > RCU instead.  Calling blk_queue_enter before submitting bios is
> > something that would make zone reporting very different from any
> > other block layer user.
> 
> Hello Christoph,
> 
> Please have a look at generic_make_request(). I think that function shows
> that blk_queue_enter() is called anyway before .make_request_fn() is called.

Yes, that is the point.  We already call blk_queue_enter in
generic_make_request, which the zone report and reset ops pass through.


Re: [PATCH] block: Ensure that a request queue is dissociated from the cgroup controller

2018-04-12 Thread h...@lst.de
On Thu, Apr 12, 2018 at 06:48:12AM -0700, t...@kernel.org wrote:
> > Which sounds like a very good reason not to use a driver controller
> > lock for internals like blkcq.
> > 
> > In fact splitting the lock used for synchronizing access to queue
> > fields from the driver controller lock used to synchronize I/O
> > in the legacy path in long overdue.
> 
> It'd be probably a lot easier to make sure the shared lock doesn't go
> away till all the request_queues using it go away.  The choice is
> between refcnting something which carries the lock and double locking
> in hot paths.  Can't think of many downsides of the former approach.

We've stopped sharing request_queues between different devices a while
ago.  The problem is just that for legacy devices the driver still controls
the lock life time, and it might be shorter than the queue lifetime.
Note that for blk-mq we basically don't use the queue_lock at all,
and certainly not in the hot path.


Re: [PATCH] block: Ensure that a request queue is dissociated from the cgroup controller

2018-04-12 Thread h...@lst.de
On Thu, Apr 12, 2018 at 11:52:11AM +, Bart Van Assche wrote:
> On Thu, 2018-04-12 at 07:34 +0200, Christoph Hellwig wrote:
> > On Wed, Apr 11, 2018 at 07:58:52PM -0600, Bart Van Assche wrote:
> > > Several block drivers call alloc_disk() followed by put_disk() if
> > > something fails before device_add_disk() is called without calling
> > > blk_cleanup_queue(). Make sure that also for this scenario a request
> > > queue is dissociated from the cgroup controller. This patch avoids
> > > that loading the parport_pc, paride and pf drivers triggers the
> > > following kernel crash:
> > 
> > Can we move the cleanup to put_disk in general and not just for
> > this case?  Having alloc/free routines pair up generally avoids
> > a lot of confusion.
> 
> Hello Christoph,
> 
> At least the SCSI ULP drivers drop the last reference to a disk after
> the blk_cleanup_queue() call. As explained in the description of commit
> a063057d7c73, removing a request queue from blkcg must happen before
> blk_cleanup_queue() finishes because a block driver may free the
> request queue spinlock immediately after blk_cleanup_queue() returns.
> So I don't think that we can move the code that removes a request
> queue from blkcg into put_disk(). Another challenge is that some block
> drivers (e.g. skd) clear the disk->queue pointer if device_add_disk()
> has not been called to avoid that put_disk() causes a request queue
> reference count imbalance.

Which sounds like a very good reason not to use a driver controller
lock for internals like blkcq.

In fact splitting the lock used for synchronizing access to queue
fields from the driver controller lock used to synchronize I/O
in the legacy path in long overdue.


Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread h...@lst.de
On Tue, Apr 10, 2018 at 01:26:39PM +, Bart Van Assche wrote:
> Can you explain why you think that using cmpxchg() is safe in this context?
> The regular completion path and the timeout code can both execute e.g. the
> following code concurrently:
> 
>   blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE);
> 
> That's why I think that we need an atomic compare and exchange instead of
> cmpxchg(). What I found in the Intel Software Developer Manual seems to
> confirm that:

The Linux cmpxchg() helper is supposed to always be atomic, you only
need atomic_cmpxchg and friends if you want to operate on an atomic_t.
(same for the _long variant).

In fact if you look at the x86 implementation of the cmpxchg() macro
you'll see that it will use the lock prefix if the kernel has been
built with CONFIG_SMP enabled.


Re: [PATCH v2] block: bio_check_eod() needs to consider partition

2018-03-14 Thread h...@lst.de
Hi Bart,

can you test the version below?

---
>From a68a8518158e31d66a0dc4f4e795ca3ceb83752c Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <h...@lst.de>
Date: Tue, 13 Mar 2018 09:27:30 +0100
Subject: block: bio_check_eod() needs to consider partition

bio_check_eod() should check partiton size not the whole disk if
bio->bi_partno is non-zero.  Does this by taking the call to bio_check_eod
into blk_partition_remap.

Based on an earlier patch from Jiufei Xue.

Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and 
partitions index")
Reported-by: Jiufei Xue <jiufei@linux.alibaba.com>
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 block/blk-core.c | 93 
 1 file changed, 40 insertions(+), 53 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 6d82c4f7fadd..47ee24611126 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2023,7 +2023,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio)
return BLK_QC_T_NONE;
 }
 
-static void handle_bad_sector(struct bio *bio)
+static void handle_bad_sector(struct bio *bio, sector_t maxsector)
 {
char b[BDEVNAME_SIZE];
 
@@ -2031,7 +2031,7 @@ static void handle_bad_sector(struct bio *bio)
printk(KERN_INFO "%s: rw=%d, want=%Lu, limit=%Lu\n",
bio_devname(bio, b), bio->bi_opf,
(unsigned long long)bio_end_sector(bio),
-   (long long)get_capacity(bio->bi_disk));
+   (long long)maxsector);
 }
 
 #ifdef CONFIG_FAIL_MAKE_REQUEST
@@ -2092,68 +2092,59 @@ static noinline int should_fail_bio(struct bio *bio)
 }
 ALLOW_ERROR_INJECTION(should_fail_bio, ERRNO);
 
+/*
+ * Check whether this bio extends beyond the end of the device or partition.
+ * This may well happen - the kernel calls bread() without checking the size of
+ * the device, e.g., when mounting a file system.
+ */
+static inline int bio_check_eod(struct bio *bio, sector_t maxsector)
+{
+   unsigned int nr_sectors = bio_sectors(bio);
+
+   if (nr_sectors && maxsector &&
+   (nr_sectors > maxsector ||
+bio->bi_iter.bi_sector > maxsector - nr_sectors)) {
+   handle_bad_sector(bio, maxsector);
+   return -EIO;
+   }
+   return 0;
+}
+
 /*
  * Remap block n of partition p to block n+start(p) of the disk.
  */
 static inline int blk_partition_remap(struct bio *bio)
 {
struct hd_struct *p;
-   int ret = 0;
+   int ret = -EIO;
 
rcu_read_lock();
p = __disk_get_part(bio->bi_disk, bio->bi_partno);
-   if (unlikely(!p || should_fail_request(p, bio->bi_iter.bi_size) ||
-bio_check_ro(bio, p))) {
-   ret = -EIO;
+   if (unlikely(!p))
+   goto out;
+   if (unlikely(should_fail_request(p, bio->bi_iter.bi_size)))
+   goto out;
+   if (unlikely(bio_check_ro(bio, p)))
goto out;
-   }
 
/*
 * Zone reset does not include bi_size so bio_sectors() is always 0.
 * Include a test for the reset op code and perform the remap if needed.
 */
-   if (!bio_sectors(bio) && bio_op(bio) != REQ_OP_ZONE_RESET)
-   goto out;
-
-   bio->bi_iter.bi_sector += p->start_sect;
-   bio->bi_partno = 0;
-   trace_block_bio_remap(bio->bi_disk->queue, bio, part_devt(p),
- bio->bi_iter.bi_sector - p->start_sect);
-
+   if (bio_sectors(bio) || bio_op(bio) == REQ_OP_ZONE_RESET) {
+   if (bio_check_eod(bio, part_nr_sects_read(p)))
+   goto out;
+   bio->bi_iter.bi_sector += p->start_sect;
+   bio->bi_partno = 0;
+   trace_block_bio_remap(bio->bi_disk->queue, bio, part_devt(p),
+ bio->bi_iter.bi_sector - p->start_sect);
+   }
+   ret = 0;
 out:
rcu_read_unlock();
return ret;
 }
 
-/*
- * Check whether this bio extends beyond the end of the device.
- */
-static inline int bio_check_eod(struct bio *bio, unsigned int nr_sectors)
-{
-   sector_t maxsector;
-
-   if (!nr_sectors)
-   return 0;
-
-   /* Test device or partition size, when known. */
-   maxsector = get_capacity(bio->bi_disk);
-   if (maxsector) {
-   sector_t sector = bio->bi_iter.bi_sector;
-
-   if (maxsector < nr_sectors || maxsector - nr_sectors < sector) {
-   /*
-* This may well happen - the kernel calls bread()
-* without checking the size of the device, e.g., when
-* mounting a device.
-*/
-   handle_bad_sector(bio);
- 

Re: [PATCH v2] block: bio_check_eod() needs to consider partition

2018-03-13 Thread h...@lst.de
On Tue, Mar 13, 2018 at 04:25:40PM +, Bart Van Assche wrote:
> On Tue, 2018-03-13 at 07:49 -0700, Jens Axboe wrote:
> > On 3/13/18 2:18 AM, Christoph Hellwig wrote:
> > > bio_check_eod() should check partiton size not the whole disk if
> > > bio->bi_partno is non-zero.  Does this by taking the call to bio_check_eod
> > > into blk_partition_remap.
> > 
> > Applied, thanks.
> 
> Hello Christoph and Jens,
> 
> I think that this patch introduces a severe regression: with this patch 
> applied
> a VM that I use for kernel testing doesn't boot anymore. If I revert this 
> patch
> that VM boots fine again. This is what I see on the serial console with this
> patch applied:

Ok, please revert this for now.  I'm off for today and will look into
it tomorrow.


Re: [PATCH 2/2] block: fix peeking requests during PM

2017-10-03 Thread h...@lst.de
On Tue, Oct 03, 2017 at 03:13:43PM +, Bart Van Assche wrote:
> > +   switch (rq->q->rpm_status) {
> > +   case RPM_SUSPENDED:
> > +   return false;
> > +   case RPM_ACTIVE:
> > +   return rq->rq_flags & RQF_PM;
> > +   default:
> > +   return true;
> > +   }
> >  }
> 
> Hello Christoph,
> 
> The old code does not process non-PM requests in the RPM_SUSPENDING mode nor
> in the RPM_RESUMING code. I think the new code processes all requests in
> these two modes. Was that behavior change intended?

Yeah, looks like this version has the RPM_ACTIVE and default cases
switched.  Back to the drawing board.


Re: [PATCH v4 4/7] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag

2017-10-02 Thread h...@lst.de
On Mon, Oct 02, 2017 at 03:56:08PM +, Bart Van Assche wrote:
> Since queue_flag_set() and queue_flag_clear() use non-atomic primitives to
> set and clear flags it is essential to protect calls of these functions with
> the queue lock. Otherwise flag updates could get lost due to concurrent
> queue_flag_set() or queue_flag_clear() calls.
> 
> One of the reasons why I introduced this helper function is to keep the
> wake_up_all(>mq_freeze_wq) call that is added to this function by a later
> patch in the block layer.

Oh, despite the _unlocked versions existing it doesn't do any atomic
ops.  Yeah, we'll need the queue lock then.

But please split it into one helper for setting the queue blocked
and to clear it, especially if the clear needs to do an additional
wake_up.


Re: [PATCH v3 0/6] Make SCSI device suspend and resume work reliably

2017-09-25 Thread h...@lst.de
On Mon, Sep 25, 2017 at 04:17:27PM +, Bart Van Assche wrote:
> This patch series definitely is an alternative for blk-mq/scsi-mq. And as you
> know my approach can be extended easily to the legacy SCSI core by adding
> blk_queue_enter() / blk_queue_exit() calls where necessary in the legacy block
> layer. I have not done this because the bug report was against scsi-mq and not
> against the legacy SCSI core. Additionally, since the legacy block layer and
> SCSI core are on their way out I did not want to spend time on modifying 
> these.

For now we'll have to fix both paths to have equal functionality.  Even
if scsi might be on it's way out there still are many drivers that
use the legacy request path, and with mmc we'll probably grow another
dual block layer driver soon, although hopefully the transition
period will be much shorter.


Re: [PATCH V3 06/12] scsi: sd_zbc: Rearrange code

2017-09-15 Thread h...@lst.de
On Fri, Sep 15, 2017 at 02:51:03PM +, Bart Van Assche wrote:
> On Fri, 2017-09-15 at 19:06 +0900, Damien Le Moal wrote:
> > Rearrange sd_zbc_setup() to include use_16_for_rw and use_10_for_rw
> > assignments and move the calculation of sdkp->zone_shift together
> > with the assignment of the verified zone_blocks value in
> > sd_zbc_check_zone_size().
> 
> Both functions are called from inside block_device_operations.revalidate_disk.
> Isn't that too late to set use_1[06]_for_rw? Shouldn't both be set just after
> the slave_alloc callback has been called?

Hmm.  sd_read_capacity already is called from the same path and seems
to work..

> 
> Thanks,
> 
> Bart.---end quoted text---


Re: [PATCH] block: Call .initialize_rq_fn() also for filesystem requests

2017-08-29 Thread h...@lst.de
Hi Bart,

this isn't just about performance - it's about understanability of
the I/O path.

The legacy request path already has the prep_fn, which is intended
for exactly this sort of prep work, but even that led to a lot
of confusion.  For blk-mq we decided to not add it but let the called
driver in control.  I really don't want to move away from that.

The passthrough requests using blk_get_requst are a special case
as the caller allocates the requests, stuffs data into them and only
then hands control to the driver, and thus we need some way to
initialize the request before handing controller to the driver in
this particular special case.  And if needed would could actually
do that with explicit calls instead of the callback, although you
changed it to save a bit of code.


Re: [PATCH 10/10] nvme: implement multipath access to nvme subsystems

2017-08-24 Thread h...@lst.de
On Wed, Aug 23, 2017 at 06:21:55PM +, Bart Van Assche wrote:
> On Wed, 2017-08-23 at 19:58 +0200, Christoph Hellwig wrote:
> > +static blk_qc_t nvme_make_request(struct request_queue *q, struct bio *bio)
> > +{
> > +   struct nvme_ns_head *head = q->queuedata;
> > +   struct nvme_ns *ns;
> > +   blk_qc_t ret = BLK_QC_T_NONE;
> > +   int srcu_idx;
> > +
> > +   srcu_idx = srcu_read_lock(>srcu);
> > +   ns = srcu_dereference(head->current_path, >srcu);
> > +   if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
> > +   ns = nvme_find_path(head);
> > +   if (likely(ns)) {
> > +   bio->bi_disk = ns->disk;
> > +   bio->bi_opf |= REQ_FAILFAST_TRANSPORT;
> > +   ret = generic_make_request_fast(bio);
> > +   } else if (!list_empty_careful(>list)) {
> > +   printk_ratelimited("no path available - requeing I/O\n");
> > +
> > +   spin_lock_irq(>requeue_lock);
> > +   bio_list_add(>requeue_list, bio);
> > +   spin_unlock_irq(>requeue_lock);
> > +   } else {
> > +   printk_ratelimited("no path - failing I/O\n");
> > +
> > +   bio->bi_status = BLK_STS_IOERR;
> > +   bio_endio(bio);
> > +   }
> > +
> > +   srcu_read_unlock(>srcu, srcu_idx);
> > +   return ret;
> > +}
> 
> Hello Christoph,
> 
> Since generic_make_request_fast() returns BLK_STS_AGAIN for a dying path:
> can the same kind of soft lockups occur with the NVMe multipathing code as
> with the current upstream device mapper multipathing code? See e.g.
> "[PATCH 3/7] dm-mpath: Do not lock up a CPU with requeuing activity"
> (https://www.redhat.com/archives/dm-devel/2017-August/msg00124.html).

I suspect the code is not going to hit it because we check the controller
state before trying to queue I/O on the lower queue.  But if you point
me to a good reproducer test case I'd like to check.

Also does the "single queue" case in your mail refer to the old
request code?  nvme only uses blk-mq so it would not hit that.
But either way I think get_request should be fixed to return
BLK_STS_IOERR if the queue is dying instead of BLK_STS_AGAIN.

> Another question about this code is what will happen if
> generic_make_request_fast() returns BLK_STS_AGAIN and the submit_bio() or
> generic_make_request() caller ignores the return value of the called
> function? A quick grep revealed that there is plenty of code that ignores
> the return value of these last two functions.

generic_make_request and generic_make_request_fast only return
the polling cookie (blk_qc_t), not a block status.  Note that we do
not use blk_get_request / blk_mq_alloc_request for the request allocation
of the request on the lower device, so unless the caller passed REQ_NOWAIT
and is able to handle BLK_STS_AGAIN we won't ever return it.


Re: [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic

2017-08-09 Thread h...@lst.de
Does commit 615d22a51c04856efe62af6e1d5b450aaf5cc2c0
"block: Fix __blkdev_issue_zeroout loop" fix the issue for you?


Re: [PATCH 08/19] block: Introduce request_queue.initialize_rq_fn()

2017-05-28 Thread h...@lst.de
On Fri, May 26, 2017 at 11:56:30PM +, Bart Van Assche wrote:
> I have tried to move that call into blk_mq_alloc_request() but that
> resulted in a kernel oops during boot due to scsi_add_cmd_to_list()
> dereferencing scsi_cmnd.device and due to that pointer being invalid.
> I think that pointer was invalid because moving the initialize_rq_fn()
> call into blk_mq_alloc_request() caused request initialization to be
> skipped for the following code path:
> submit_bio()
> -> generic_make_request()
>   -> .make_request_fn == blk_mq_make_request()
>     -> blk_mq_sched_get_request()
>   -> __blk_mq_alloc_request()
>         -> blk_mq_rq_ctx_init()
> 
> This is why I would like to keep the .initialize_rq_fn() call in
> blk_mq_rq_ctx_init().

But we don't call scsi_req_init for this path either with the current
code.  So not having the call should be fine as long as you ensure
we still manually initialize everything for the non-passthrough path
in the later patches.  I'll keep an eye on that issue while reviewing
the remaining patches.


Re: [PATCH 05/19] cdrom: Check private request size before attaching to a queue

2017-05-28 Thread h...@lst.de
On Fri, May 26, 2017 at 03:50:42PM +, Bart Van Assche wrote:
> On Fri, 2017-05-26 at 08:08 +0200, Christoph Hellwig wrote:
> > On Thu, May 25, 2017 at 11:43:13AM -0700, Bart Van Assche wrote:
> > > Since the cdrom driver only supports request queues for which
> > > struct scsi_request is the first member of their private request
> > > data, refuse to register block layer queues for which this is
> > > not the case.
> > 
> > Hmm.  I think we have a deeper issue there.  The cdrom layer usually
> > sends packets commands through the cdrom ops ->generic_packet
> > method, but one function (cdrom_read_cdda_bpc) seems to directly
> > send a scsi passthrough request.  It is only used if the cdda_method
> > is not CDDA_OLD, which is set if the cdrom_device_info structure
> > does not have a gendisk pointer hanging off it.  It seems like
> > the legacy cdrom drivers fall into that category and would be
> > broken by this change.
> 
> Hello Christoph,
> 
> How about moving the check into cdrom_read_cdda_bpc()? As far as I can see
> that function is only called if a CDROMREADAUDIO ioctl is submitted. Although
> Documentation/cdrom/cdrom-standard.tex mentions that that ioctl is unsupported
> applications like cdparanoia use it.

I guess that's fine for now.  In the long run we can probably replace
the current users of generic_packet method with direct scsi passthrough
requests, but that can be left for later.


Re: [PATCH 06/19] nfsd: Check private request size before submitting a SCSI request

2017-05-26 Thread h...@lst.de
On Thu, May 25, 2017 at 08:19:47PM +, Bart Van Assche wrote:
> On Thu, 2017-05-25 at 14:48 -0400, J . Bruce Fields wrote:
> > On Thu, May 25, 2017 at 11:43:14AM -0700, Bart Van Assche wrote:
> > > Since using scsi_req() is only allowed against request queues for
> > > which struct scsi_request is the first member of their private
> > > request data, refuse to submit SCSI commands against a queue for
> > > which this is not the case.
> > 
> > Is it possible we could catch this earlier and avoid giving out the
> > layout in the first place?
> 
> Hello Christoph,
> 
> According to what I see in commit 8650b8a05850 you are the author of this
> code? Can the blk_queue_scsi_pdu(q) test fail in nfsd4_scsi_identify_device()?

If the user explicitly asked for a scsi layout export of a non-scsi
device it can.

> If so, can nfsd4_layout_verify() be modified in such a way that it prevents
> that nfsd4_scsi_proc_getdeviceinfo() is ever called for a non-SCSI queue?
> Can you recommend an approach?

Not easily.  The only thing we could do is an export time check, that
would refuse the scsi layout export if the device is not capable.

I can look into that, but it will take some time so for now I think we
should go ahead with your series.


Re: [PATCH 02/15] scsi/osd: don't save block errors into req_results

2017-05-25 Thread h...@lst.de
On Wed, May 24, 2017 at 04:04:40PM +, Bart Van Assche wrote:
> Are you sure that that code is not necessary? From osd_initiator.c:
> 
> static void _put_request(struct request *rq)
> {
>   /*
>* If osd_finalize_request() was called but the request was not
>* executed through the block layer, then we must release BIOs.
>* TODO: Keep error code in or->async_error. Need to audit all
>*   code paths.
>*/
>   if (unlikely(rq->bio))
>   blk_end_request(rq, -ENOMEM, blk_rq_bytes(rq));
>   else
>   blk_put_request(rq);
> }

Which isn't using it at all.  It has a ten year old comment to pass
on some error, but even then ORing two different error types together
would no be very helpful.

> 
> Bart.---end quoted text---


Re: [PATCH 02/18] bsg: Check private request size before attaching to a queue

2017-05-22 Thread h...@lst.de
On Sun, May 21, 2017 at 02:33:05PM +, Bart Van Assche wrote:
> Thanks for the review comments. The cover letter should have made it to at
> least the linux-scsi mailing list since it shows up in at least one archive of
> that mailing list: https://www.spinics.net/lists/linux-scsi/msg108940.html.

Yes, I see it on the list now.  But it's missing various Cc that the
actual patches have, including that to me, which seems a bit broken.


Re: [PATCH 12/15] block: merge blk_types.h into bio.h

2017-05-19 Thread h...@lst.de
On Thu, May 18, 2017 at 02:25:52PM +, Bart Van Assche wrote:
> On Thu, 2017-05-18 at 15:18 +0200, Christoph Hellwig wrote:
> > We've cleaned up our headers sufficiently that we don't need this split
> > anymore.
> 
> Hello Christoph,
> 
> Request-based drivers need the structure definitions from  and
> the type definitions from  but do not need the definition 
> of
> struct bio. Have you considered to remove #include  from file
> include/linux/blkdev.h? Do you think that would help to reduce the kernel 
> build
> time?

Most block drivers end up needing bio.h anyway.  But I can skip this
part of the series for now.


Re: small dm mpath cleanups

2017-04-27 Thread h...@lst.de
On Wed, Apr 26, 2017 at 06:41:27PM +, Bart Van Assche wrote:
> On Wed, 2017-04-26 at 09:40 +0200, Christoph Hellwig wrote:
> > this series has some prep patches for my work to have proper, type
> > checked block errors codes.  One fallout of that is that we need to
> > get rid of how dm overloads a few return values with either internal
> > positive error codes or negative errno values.  This patches does
> > that, which happens to clean things up a bit, and also allows us
> > dm to propagate the actual error code in one case where it currently
> > is dropped on the floor.
> 
> Hello Christoph,
> 
> Some patches in this series conflict with patches I would like to end up in
> the stable kernel series. If I would rebase my patch series on top of your
> series then that would make it harder to apply my patches on the stable
> kernel trees. Mike and Christoph, please advise how to proceed.

Bugfixes always go before cleanups.  I'd be happy to delay and/or rebase
any of my patches as needed.


Re: [PATCH 02/23] block: remove the blk_execute_rq return value

2017-04-19 Thread h...@lst.de
On Wed, Apr 19, 2017 at 09:07:45PM +, Bart Van Assche wrote:
> > +   blk_execute_rq(or->request->q, NULL, or->request, 0);
> > +   error = or->request ? -EIO : 0;
> 
> Hello Christoph,
> 
> Did you perhaps intend or->request->errors instead of or->request?

Yes, thanks.


Re: blk-mq: remove the error argument to blk_mq_complete_request

2017-04-19 Thread h...@lst.de
On Tue, Apr 18, 2017 at 10:50:18PM +, Bart Van Assche wrote:
> On Tue, 2017-04-18 at 08:52 -0700, Christoph Hellwig wrote:
> > Now that we always have a ->complete callback we can remove the direct
> > call to blk_mq_end_request, as well as the error argument to
> > blk_mq_complete_request.
> 
> Hello Christoph,
> 
> Please add a runtime check that issues a warning early if a .complete
> callback function is missing.
> 
> Are you sure that all blk-mq drivers have a .complete callback? In the
> request-errors branch of your block git tree I found the following:

I think the problem is that my above changelog was wrong - we only
require the complete_rq method if the driver calls the
blk_mq_complete_request function.  Both rbd and ubiblock don't in
the tree.  They probably should, but that's work for a separate
series.


Re: block: add a error_count field to struct request

2017-04-19 Thread h...@lst.de
On Tue, Apr 18, 2017 at 10:57:11PM +, Bart Van Assche wrote:
> Both blk-mq and the traditional block layer support a .cmd_size field to
> make the block layer core allocate driver-specific data at the end of struct
> request. Could that mechanism have been used for the error_count field?

It could, and that's what I did for the modern drivers.  It would have
been a bit of a pain for these old floppy drivers, though.


Re: scsi: introduce a new result field in struct scsi_request

2017-04-19 Thread h...@lst.de
On Tue, Apr 18, 2017 at 10:34:20PM +, Bart Van Assche wrote:
> Did you perhaps intend "req->result" instead of "rq->result"?

Yes.

> Did you intend "war" or is that perhaps a typo?

I'll fix the comment.

> > trace_scsi_dispatch_cmd_done(cmd);
> > -   blk_mq_complete_request(cmd->request, cmd->request->errors);
> > +   blk_mq_complete_request(cmd->request, 0);
> >  }
> 
> Why has cmd->request->errors been changed into 0?

Because the argument is only used to set req->errors, which we won't
look at any more.


Re: block: remove the blk_execute_rq return value

2017-04-19 Thread h...@lst.de
On Tue, Apr 18, 2017 at 10:24:15PM +, Bart Van Assche wrote:
> On Tue, 2017-04-18 at 08:52 -0700, Christoph Hellwig wrote:
> > diff --git a/drivers/block/paride/pd.c b/drivers/block/paride/pd.c
> > index b05e151c9b38..82c6d02193ae 100644
> > --- a/drivers/block/paride/pd.c
> > +++ b/drivers/block/paride/pd.c
> > @@ -747,7 +747,8 @@ static int pd_special_command(struct pd_unit *disk,
> >  
> > rq->special = func;
> >  
> > -   err = blk_execute_rq(disk->gd->queue, disk->gd, rq, 0);
> > +   blk_execute_rq(disk->gd->queue, disk->gd, rq, 0);
> > +   err = req->errors ? -EIO : 0;
> >  
> > blk_put_request(rq);
> > return err;
> 
> Hello Christoph,
> 
> Sorry that I hadn't noticed this before but shouldn't "req" be changed into
> "rq"? Otherwise this patch looks fine to me. If this comment gets addressed
> you can add:

It should.  But it seems this no one caught this because the check gets
removed later in the series by "pd: remove bogus check for req->errors",
so I should just move that patch earlier in the series.


Re: [PATCH 02/25] block: remove the blk_execute_rq return value

2017-04-18 Thread h...@lst.de
On Mon, Apr 17, 2017 at 10:01:09AM -0600, Jens Axboe wrote:
> Are you respinning this series for 4.12?

Yes, I think I got enough feedback by now to resend it.  I'll try to
get it out today.


Re: [PATCH 02/25] block: remove the blk_execute_rq return value

2017-04-14 Thread h...@lst.de
On Thu, Apr 13, 2017 at 08:03:22PM +, Bart Van Assche wrote:
> That blk_execute_rq() call can only be reached if a few lines above 0 was
> assigned to the "error" variable. Since nfsd4_scsi_identify_device() returns
> the value of the "error" variable I think -EIO should be assigned to that
> variable before the "goto out_put_request" statement is reached.

You're right!  I'll fix it up.


Re: [PATCH 12/23] sd: handle REQ_UNMAP

2017-03-31 Thread h...@lst.de
On Thu, Mar 30, 2017 at 10:19:55PM -0400, Martin K. Petersen wrote:
> > If you manually change the provisioning mode to WS10 on a device that
> > must use WRITE SAME (16) to be able to address all blocks you're already
> > screwed right now, and with this patch you can screw yourself through
> > the WRITE_ZEROES path in addition to the DISCARD path.
> 
> Oh, I see. We only had the LBA sanity check in place for write same, not
> for discard.

And btw, I'd be happy to add such a check, I'd just rather keep it out
of this patch.  It might be a good idea if you give it a turn after
this series given that you have all the devices that have weird
provisioning modes while I don't..


Re: [PATCH 23/23] block: remove the discard_zeroes_data flag

2017-03-30 Thread h...@lst.de
On Thu, Mar 30, 2017 at 11:29:29AM -0400, Martin K. Petersen wrote:
> "h...@lst.de" <h...@lst.de> writes:
> 
> > Jens, any opinion?  I'd like to remove it too, but I fear it might
> > break things.  We could deprecate it first with a warning when read
> > and then remove it a few releases down the road.
> 
> I know of several apps that check this variable (as opposed to the
> ioctl).

The above was in reference to both methods..


Re: [PATCH 23/23] block: remove the discard_zeroes_data flag

2017-03-30 Thread h...@lst.de
On Tue, Mar 28, 2017 at 05:00:48PM +, Bart Van Assche wrote:
> 
> It seems to me like the documentation in Documentation/ABI/testing/sysfs-block
> and the above code are not in sync. I think the above code will cause reading
> from the discard_zeroes_data attribute to return an empty string ("") instead
> of "0\n".

Thanks, fine with me.

> 
> BTW, my personal preference is to remove this attribute entirely because 
> keeping
> it will cause confusion, no matter how well we document the behavior of this
> attribute.

Jens, any opinion?  I'd like to remove it too, but I fear it might
break things.  We could deprecate it first with a warning when read
and then remove it a few releases down the road.


Re: [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches

2017-03-30 Thread h...@lst.de
On Tue, Mar 28, 2017 at 04:50:47PM +, Bart Van Assche wrote:
> On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> > This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
> > and preparse for removing the discard_zeroes_data flag.
> 
> Hello Christoph,
> 
> "preparse" probably should have been "prepare"?

Yes, fixed.


Re: [PATCH 12/23] sd: handle REQ_UNMAP

2017-03-30 Thread h...@lst.de
On Tue, Mar 28, 2017 at 04:48:55PM +, Bart Van Assche wrote:
> > if (sdp->no_write_same)
> > return BLKPREP_INVALID;
> > if (sdkp->ws16 || sector > 0x || nr_sectors > 0x)
> 
> Users can change the provisioning mode from user space from SD_LBP_WS16 into
> SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
> 0x || nr_sectors > 0x) check if REQ_UNMAP is set.

They can, and if the device has too many sectors that will already cause
discard to fail, and in this case it will cause write zeroes to fail as
well.  The intent behind this patch is to keep the behavior the same
as the old path that uses discards for zeroing.  The logic looks a bit
clumsy, but I'd rather keep it as-is.


Re: [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES

2017-03-30 Thread h...@lst.de
On Tue, Mar 28, 2017 at 04:12:46PM +, Bart Van Assche wrote:
> Since REQ_OP_WRITE_ZEROES was introduced in kernel v4.10, do we need
> "Cc: stable" and "Fixes: a6f0788ec2881" tags for this patch?

No.  This just works around the way scsi_setup_cmnd sets up the data
direction.  Before this series it's not an issue because no one used
the req_op data direction for setting up the dma direction.


Re: [PATCH 1/7] ѕd: split sd_setup_discard_cmnd

2017-03-30 Thread h...@lst.de
On Tue, Mar 28, 2017 at 08:05:09AM -0600, ax...@kernel.dk wrote:
> > Although I know this is an issue in the existing code and not something
> > introduced by you: please consider using logical_to_sectors() instead of
> > open-coding this function. Otherwise this patch looks fine to me.
> 
> The downside of doing that is that we still call ilog2() twice, which
> sucks. Would be faster to cache ilog2(sector_size) and use that in the
> shift calculation.

I suspect that gcc is smart enough to optimize it away.  That beeing said
while this looks like a nice cleanup this patch is just supposed to move
code, so I'd rather not add the change here and leave it for a separate
submission.


Re: unify and streamline the blk-mq make_request implementations V2

2017-03-22 Thread h...@lst.de
On Tue, Mar 21, 2017 at 12:40:17PM +, Bart Van Assche wrote:
> On Mon, 2017-03-20 at 16:39 -0400, Christoph Hellwig wrote:
> > Changes since V1:
> >  - rebase on top of the recent blk_mq_try_issue_directly changes
> >  - incorporate comments from Bart
> 
> Hi Christoph,
> 
> It seems to me like none of the three comments I had posted on patch 4/4
> have been addressed. Please have another look at these comments.

I had a look but resend the same old series again due to a rebase bug.
I'll resend it ASAP with what should address your comments to the previous
series as well as this one.


Re: [PATCH 2/4] blk-mq: merge mq and sq make_request instances

2017-03-13 Thread h...@lst.de
On Mon, Mar 13, 2017 at 09:01:08PM +, Bart Van Assche wrote:
> > +   /*
> > +* @request_count may become stale because of schedule
> > +* out, so check the list again.
> > +*/
> 
> The above comment was relevant as long as there was a request_count assignment
> above blk_mq_sched_get_request(). This patch moves that assignment inside if
> (plug && q->nr_hw_queues == 1). Does that mean that the above comment should 
> be
> removed entirely?

I don't think so - for the !blk_queue_nomerges cases we still rely
on blk_attempt_plug_merge calculatіng request_count, so the comment
still applies.


Re: [PATCH 1/4] blk-mq: remove BLK_MQ_F_DEFER_ISSUE

2017-03-13 Thread h...@lst.de
On Mon, Mar 13, 2017 at 08:52:54PM +, Bart Van Assche wrote:
> > -   if (((plug && !blk_queue_nomerges(q)) || is_sync) &&
> > -   !(data.hctx->flags & BLK_MQ_F_DEFER_ISSUE)) {
> > +   if (((plug && !blk_queue_nomerges(q)) || is_sync)) {
> 
> A minor comment: due to this change the outer pair of parentheses
> became superfluous. Please consider removing these.

The last patch in the series removes the statement in this form. But
if I have to respin the series for some reason I'll make sure it
gets removed here already.


Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-25 Thread h...@lst.de
On Fri, Feb 24, 2017 at 01:22:42PM -0700, Jens Axboe wrote:
> Bart, I pushed a fix here:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=61febef40bfe8ab68259d8545257686e8a0d91d1

Yeah, this looks fine to me.  It was broken on blk-mq before, but
basically impossible to hit.  I wonder if we should have a debug mode
where we set requests to a known pattern after they are freed to catch
these sort of bugs.


Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
> I tested today's linux-next (next-20170214) + the 2 patches just now and got
> a weird result: 
> sometimes the VM stills hung with a new calltrace (BUG: spinlock bad
> magic) , but sometimes the VM did boot up despite the new calltrace!
> 
> Attached is the log of a "good" boot.
> 
> It looks we have a memory corruption issue somewhere...

Yes.

> Actually previously I saw the "BUG: spinlock bad magic" message once, but I
> couldn't repro it later, so I didn't mention it to you.

Interesting.

> 
> The good news is that now I can repro the "spinlock bad magic" message
> every time. 
> I tried to dig into this by enabling Kernel hacking -> Memory debugging,
> but didn't find anything abnormal. 
> Is it possible that the SCSI layer passes a wrong memory address?

It's possible, but this looks like it might be a different issue.

A few questions on the dmesg:

[6.208794] sd 2:0:0:0: [storvsc] Sense Key : Illegal Request [current] 
[6.209447] sd 2:0:0:0: [storvsc] Add. Sense: Invalid command operation code
[6.210043] sd 3:0:0:0: [storvsc] Sense Key : Illegal Request [current] 
[6.210618] sd 3:0:0:0: [storvsc] Add. Sense: Invalid command operation code
[6.212272] sd 2:0:0:0: [storvsc] Sense Key : Illegal Request [current] 
[6.212897] sd 2:0:0:0: [storvsc] Add. Sense: Invalid command operation code
[6.213474] sd 3:0:0:0: [storvsc] Sense Key : Illegal Request [current] 
[6.214051] sd 3:0:0:0: [storvsc] Add. Sense: Invalid command operation code

I didn't see anything like this in the other logs.  Are these messages
something usual on HyperV VMs?

[6.358405] XFS (sdb1): Mounting V5 Filesystem
[6.404478] XFS (sdb1): Ending clean mount
[7.535174] BUG: spinlock bad magic on CPU#0, swapper/0/0
[7.536807]  lock: host_ts+0x30/0xe1a0 [hv_utils], .magic: 
, .owner: /-1, .owner_cpu: 0
[7.538436] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.10.0-rc8-next-20170214+ #1
[7.539142] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS 090006  04/28/2016
[7.539142] Call Trace:
[7.539142]  
[7.539142]  dump_stack+0x63/0x82
[7.539142]  spin_dump+0x78/0xc0
[7.539142]  do_raw_spin_lock+0xfd/0x160
[7.539142]  _raw_spin_lock_irqsave+0x4c/0x60
[7.539142]  ? timesync_onchannelcallback+0x153/0x220 [hv_utils]
[7.539142]  timesync_onchannelcallback+0x153/0x220 [hv_utils]

Can you resolve this address using gdb to a line of code?  Once inside
gdb do:

l *(timesync_onchannelcallback+0x153)


Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
On Tue, Feb 14, 2017 at 02:46:41PM +, Dexuan Cui wrote:
> > From: h...@lst.de [mailto:h...@lst.de]
> > Sent: Tuesday, February 14, 2017 22:29
> > To: Dexuan Cui <de...@microsoft.com>
> > Subject: Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock
> > when scheduling workqueue elements")
> > 
> > Ok, thanks for testing.  Can you try the patch below?  It fixes a
> > clear problem which was partially papered over before the commit
> > you bisected to, although it can't explain why blk-mq still works.
> 
> Still bad luck. :-(
> 
> BTW, I'm using the first "bad" commit (scsi: allocate scsi_cmnd structures as
> part of struct request) + the 2 patches you provided today.
> 
> I suppose I don't need to test the 2 patches on the latest linux-next repo.

I'd love a test on that repo actually.  We had a few other for sense
handling since then I think.


Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
Ok, thanks for testing.  Can you try the patch below?  It fixes a
clear problem which was partially papered over before the commit
you bisected to, although it can't explain why blk-mq still works.

>From e4a66856fa2d92c0298000de658365f31bea60cd Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <h...@lst.de>
Date: Tue, 14 Feb 2017 15:08:39 +0100
Subject: scsi: always zero sshdr in scsi_normalize_sense

This gives us a clear state even if a command didn't return sense data.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/scsi/scsi_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_common.c b/drivers/scsi/scsi_common.c
index b1383a71400e..a75673bb82b3 100644
--- a/drivers/scsi/scsi_common.c
+++ b/drivers/scsi/scsi_common.c
@@ -137,11 +137,11 @@ EXPORT_SYMBOL(int_to_scsilun);
 bool scsi_normalize_sense(const u8 *sense_buffer, int sb_len,
 			  struct scsi_sense_hdr *sshdr)
 {
+	memset(sshdr, 0, sizeof(struct scsi_sense_hdr));
+
 	if (!sense_buffer || !sb_len)
 		return false;
 
-	memset(sshdr, 0, sizeof(struct scsi_sense_hdr));
-
 	sshdr->response_code = (sense_buffer[0] & 0x7f);
 
 	if (!scsi_sense_valid(sshdr))
-- 
2.11.0



Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-08 Thread h...@lst.de
On Wed, Feb 08, 2017 at 10:43:59AM -0700, Jens Axboe wrote:
> I've changed the subject line, this issue has nothing to do with the
> issue that Hannes was attempting to fix.

Nothing really useful in the thread.  Dexuan, can you throw in some
prints to see which command times out?


Re: [PATCH 15/18] scsi: allocate scsi_cmnd structures as part of struct request

2017-01-28 Thread h...@lst.de
On Fri, Jan 27, 2017 at 06:39:46PM +, Bart Van Assche wrote:
> Why have the scsi_release_buffers() and scsi_put_command(cmd) calls been
> moved up? I haven't found an explanation for this change in the patch
> description.

Because they reference the scsi_cmnd, which are now part of the request
and thus freed by blk_finish_request.  And yes, I should have mentioned
it in the changelog, sorry.

> Please also consider to remove the cmd->request->special = NULL assignments
> via this patch. Since this patch makes the lifetime of struct scsi_cmnd and
> struct request identical these assignments are no longer needed.

True.  If I had to resend again I would have fixed it up, but it's probably
not worth the churn now.

> This patch introduces the function scsi_exit_rq(). Having two functions
> for the single-queue path that release resources (scsi_release_buffers()
> and scsi_exit_rq()) is confusing. Since every scsi_release_buffers() call
> is followed by a blk_unprep_request() call, have you considered to move
> the scsi_release_buffers() call into scsi_unprep_fn() via an additional
> patch?

We could have done that.  But it's just more change for a code path
that I hope won't survive this calendar year.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split scsi passthrough fields out of struct request V2

2017-01-28 Thread h...@lst.de
On Fri, Jan 27, 2017 at 09:27:53PM +, Bart Van Assche wrote:
> Have you considered to convert all block drivers to the new
> approach and to get rid of request.special? If so, do you already
> have plans to start working on this? I'm namely wondering wheter I
> should start working on this myself.

Hi Bart,

I'd love to have all drivers move of using .special (and thus reducing
request size further).  I think the general way to do that is to convert
them to blk-mq and not using the legacy cmd_size field.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split scsi passthrough fields out of struct request V3

2017-01-28 Thread h...@lst.de
On Fri, Jan 27, 2017 at 06:58:53PM +, Bart Van Assche wrote:
> Version 3 of the patch with title "block: split scsi_request out of
> struct request" (commit 3c30af6ebe12) differs significantly from v2
> of that patch that has been posted on several mailing lists. E.g. v2
> moves __cmd[], cmd and cmd_len from struct request into struct
> scsi_request but v3 not. Which version do you want us to review?

Hi Bart,

I tried to resend the whole updated v3 series, but the mail server
stopped accepting mails due to overload.  Otherwise it would have
included all the patches.  Jens instead took the updated version
straight from this git branch:


http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/block-pc-refactor
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/18] scsi: remove __scsi_alloc_queue

2017-01-28 Thread h...@lst.de
On Fri, Jan 27, 2017 at 05:58:02PM +, Bart Van Assche wrote:
> Since __scsi_init_queue() modifies data in the Scsi_Host structure, have you
> considered to add the declaration for this function to ?
> If you want to keep this declaration in  please add a
> direct include of that header file to drivers/scsi/scsi_lib.c such that the
> declaration remains visible to the compiler if someone would minimize the
> number of #include directives in SCSI header files.

Feel free to send an incremental patch either way.  In the long run
I'd really like to kill off __scsi_init_queue and remove the transport
BSG queue abuse of SCSI internals, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split scsi passthrough fields out of struct request V2

2017-01-26 Thread h...@lst.de
On Thu, Jan 26, 2017 at 11:57:36AM -0700, Jens Axboe wrote:
> It's against my for-4.11/block, which you were running under Christoph's
> patches. Maybe he's using an older version? In any case, should be
> pretty trivial for you to hand apply. Just ensure that .flags is set to
> 0 for the common cases, and inherit 'flags' when it is passed in.

No, the flush op cleanups you asked for last round create a conflict
with your patch.  They should be trivial to fix, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html