Re: blktests block/019 lead system hang

2018-06-12 Thread Austin.Bolen
On 6/5/2018 11:16 AM, Keith Busch wrote:
> On Wed, May 30, 2018 at 03:26:54AM -0400, Yi Zhang wrote:
>> Hi Keith
>> I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, 
>> let me know if you need more info, thanks. 
>>
>> Server: Dell R730xd
>> NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd 
>> NVMe SSD Controller 172X (rev 01)
>>
>> Console log:
>> Kernel 4.17.0-rc7 on an x86_64
>>
>> storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 
>> 03:16:34
>> [ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic 
>> Hardware Error Source: 3
>> [ 6049.108478] {1}[Hardware Error]: event severity: fatal
>> [ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal
>> [ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error
>> [ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch port
>> [ 6049.108483] {1}[Hardware Error]:   version: 1.16
>> [ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
>> [ 6049.108485] {1}[Hardware Error]:   device_id: :83:05.0
>> [ 6049.108486] {1}[Hardware Error]:   slot: 0
>> [ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85
>> [ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x8734
>> [ 6049.108489] {1}[Hardware Error]:   class_code: 000406
>> [ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x, 
>> control: 0x0003
>> [ 6049.108491] Kernel panic - not syncing: Fatal hardware error!
>> [ 6049.108514] Kernel Offset: 0x2580 from 0x8100 (relocation 
>> range: 0x8000-0xbfff)
> Sounds like your platform fundamentally doesn't support surprise link
> down if it considers the event a fatal error. That's sort of what this
> test was supposed to help catch so we know what platforms can do this
> vs ones that can't.
>
> The test does check that the slot is hotplug capable before running,
> so it's supposed to only run the test on slots that claim to be capable
> of handling the event. I just don't know of a good way to query platform
> firmware to know what it will do in response to such an event.
It looks like the test is setting the Link Disable bit.  But this is not
a good simulation for hot-plug surprise removal testing or surprise link
down (SLD) testing, if that is the intent.  One reason is that Link
Disable does not invoke SLD semantics per PCIe spec.  This is somewhat
of a moot point in this case since the switch has Hot-Plug Surprise bit
set which also masks the SLD semantics in PCIe.

Also, the Hot-Plug Capable + Surprise Hot-Plug bits set means the
platform can tolerate the case where "an adapter present in this slot
might be removed from the system without any prior notification".  It
does not mean that a system can survive link down under any other
circumstances such as setting Link Disable or generating a Secondary Bus
Reset or a true surprise link down event.  To the earlier point, I also
do not know of any way the OS can know a priori if the platform can
handle surprise link down outside of surprise remove case.  We can look
at standardizing a way to do that if OSes find it useful to know.

Relative to this particular error, Link Disable doesn't clear Presence
Detect State which would happen on a real Surprise Hot-Plug removal
event and this is probably why the system crashes.  What will happen is
that after the link goes to disabled state, the ongoing I/O will cause
MMIO accesses on the drive and that will cause a UR which is an
uncorrectable PCIe error (ERR_FATAL on R730).  The BIOS on the R730 is
surprise remove aware (Surprise Hot-Plug = 1) and so it will check if
the device is still present by checking Presence Detect State.  If the
device is not present it will mask the error and let the OS handle the
device removal due to hot-plug interrupt(s).  If the device is present,
as in this case, then the BIOS will escalate to OS as a fatal NMI
(current R730 platform policy is to only mask errors due to removal).

For future, these servers may report these sort of errors as recoverable
via the GHES structures in APEI which will allow the OS to recover from
this non-surprise remove class of error as well.  In the (hopefully
near) future, the industry will move to DPC as the framework for this
sort of generic PCIe error handling/recovery but there are architectural
changes needed that are currently being defined in the relevant
standards bodies.  Once the architecture is defined it can be
implemented and tested to verify these sort of test cases pass.

-Austin


>
> ___
> Linux-nvme mailing list
> linux-n...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>



Re: [PATCH] lightnvm: pblk: add asynchronous partial read

2018-06-12 Thread Matias Bjørling

On 06/12/2018 04:59 PM, Javier Gonzalez wrote:

On 11 Jun 2018, at 22.53, Heiner Litz  wrote:

In the read path, partial reads are currently performed synchronously
which affects performance for workloads that generate many partial
reads. This patch adds an asynchronous partial read path as well as
the required partial read ctx.

Signed-off-by: Heiner Litz 
---
drivers/lightnvm/pblk-read.c | 179 ---
drivers/lightnvm/pblk.h  |  10 +++
2 files changed, 128 insertions(+), 61 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 7570ff6..026c708 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -231,74 +231,36 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
__pblk_end_io_read(pblk, rqd, true);
}

-static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
-struct bio *orig_bio, unsigned int bio_init_idx,
-unsigned long *read_bitmap)
+static void pblk_end_partial_read(struct nvm_rq *rqd)
{
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
-   struct bio *new_bio;
+   struct pblk *pblk = rqd->private;
+   struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
+   struct pblk_pr_ctx *pr_ctx = r_ctx->private;
+   struct bio *new_bio = rqd->bio;
+   struct bio *bio = pr_ctx->orig_bio;
struct bio_vec src_bv, dst_bv;
-   void *ppa_ptr = NULL;
-   void *src_p, *dst_p;
-   dma_addr_t dma_ppa_list = 0;
-   __le64 *lba_list_mem, *lba_list_media;
-   int nr_secs = rqd->nr_ppas;
+   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   int bio_init_idx = pr_ctx->bio_init_idx;
+   unsigned long *read_bitmap = _ctx->bitmap;
+   int nr_secs = pr_ctx->orig_nr_secs;
int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-   int i, ret, hole;
-
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-   lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
-
-   new_bio = bio_alloc(GFP_KERNEL, nr_holes);
-
-   if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
-   goto err;
-
-   if (nr_holes != new_bio->bi_vcnt) {
-   pr_err("pblk: malformed bio\n");
-   goto err;
-   }
-
-   for (i = 0; i < nr_secs; i++)
-   lba_list_mem[i] = meta_list[i].lba;
-
-   new_bio->bi_iter.bi_sector = 0; /* internal bio */
-   bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
-
-   rqd->bio = new_bio;
-   rqd->nr_ppas = nr_holes;
-   rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM);
-
-   if (unlikely(nr_holes == 1)) {
-   ppa_ptr = rqd->ppa_list;
-   dma_ppa_list = rqd->dma_ppa_list;
-   rqd->ppa_addr = rqd->ppa_list[0];
-   }
-
-   ret = pblk_submit_io_sync(pblk, rqd);
-   if (ret) {
-   bio_put(rqd->bio);
-   pr_err("pblk: sync read IO submission failed\n");
-   goto err;
-   }
-
-   if (rqd->error) {
-   atomic_long_inc(>read_failed);
-#ifdef CONFIG_NVM_DEBUG
-   pblk_print_failed_rqd(pblk, rqd, rqd->error);
-#endif
-   }
+   __le64 *lba_list_mem, *lba_list_media;
+   void *src_p, *dst_p;
+   int hole, i;

if (unlikely(nr_holes == 1)) {
struct ppa_addr ppa;

ppa = rqd->ppa_addr;
-   rqd->ppa_list = ppa_ptr;
-   rqd->dma_ppa_list = dma_ppa_list;
+   rqd->ppa_list = pr_ctx->ppa_ptr;
+   rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
rqd->ppa_list[0] = ppa;
}

+   /* Re-use allocated memory for intermediate lbas */
+   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
+   lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
+
for (i = 0; i < nr_secs; i++) {
lba_list_media[i] = meta_list[i].lba;
meta_list[i].lba = lba_list_mem[i];
@@ -316,7 +278,7 @@ static int pblk_partial_read(struct pblk *pblk, struct 
nvm_rq *rqd,
meta_list[hole].lba = lba_list_media[i];

src_bv = new_bio->bi_io_vec[i++];
-   dst_bv = orig_bio->bi_io_vec[bio_init_idx + hole];
+   dst_bv = bio->bi_io_vec[bio_init_idx + hole];

src_p = kmap_atomic(src_bv.bv_page);
dst_p = kmap_atomic(dst_bv.bv_page);
@@ -334,19 +296,107 @@ static int pblk_partial_read(struct pblk *pblk, struct 
nvm_rq *rqd,
} while (hole < nr_secs);

bio_put(new_bio);
+   kfree(pr_ctx);

/* restore original request */
rqd->bio = NULL;
rqd->nr_ppas = nr_secs;

+   bio_endio(bio);
__pblk_end_io_read(pblk, rqd, false);
-   return NVM_IO_DONE;
+}
+
+static int pblk_setup_partial_read(struct pblk 

Re: [PATCH] lightnvm: pblk: add asynchronous partial read

2018-06-12 Thread Javier Gonzalez
> On 11 Jun 2018, at 22.53, Heiner Litz  wrote:
> 
> In the read path, partial reads are currently performed synchronously
> which affects performance for workloads that generate many partial
> reads. This patch adds an asynchronous partial read path as well as
> the required partial read ctx.
> 
> Signed-off-by: Heiner Litz 
> ---
> drivers/lightnvm/pblk-read.c | 179 ---
> drivers/lightnvm/pblk.h  |  10 +++
> 2 files changed, 128 insertions(+), 61 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 7570ff6..026c708 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -231,74 +231,36 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>   __pblk_end_io_read(pblk, rqd, true);
> }
> 
> -static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> -  struct bio *orig_bio, unsigned int bio_init_idx,
> -  unsigned long *read_bitmap)
> +static void pblk_end_partial_read(struct nvm_rq *rqd)
> {
> - struct pblk_sec_meta *meta_list = rqd->meta_list;
> - struct bio *new_bio;
> + struct pblk *pblk = rqd->private;
> + struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> + struct pblk_pr_ctx *pr_ctx = r_ctx->private;
> + struct bio *new_bio = rqd->bio;
> + struct bio *bio = pr_ctx->orig_bio;
>   struct bio_vec src_bv, dst_bv;
> - void *ppa_ptr = NULL;
> - void *src_p, *dst_p;
> - dma_addr_t dma_ppa_list = 0;
> - __le64 *lba_list_mem, *lba_list_media;
> - int nr_secs = rqd->nr_ppas;
> + struct pblk_sec_meta *meta_list = rqd->meta_list;
> + int bio_init_idx = pr_ctx->bio_init_idx;
> + unsigned long *read_bitmap = _ctx->bitmap;
> + int nr_secs = pr_ctx->orig_nr_secs;
>   int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
> - int i, ret, hole;
> -
> - /* Re-use allocated memory for intermediate lbas */
> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
> -
> - new_bio = bio_alloc(GFP_KERNEL, nr_holes);
> -
> - if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
> - goto err;
> -
> - if (nr_holes != new_bio->bi_vcnt) {
> - pr_err("pblk: malformed bio\n");
> - goto err;
> - }
> -
> - for (i = 0; i < nr_secs; i++)
> - lba_list_mem[i] = meta_list[i].lba;
> -
> - new_bio->bi_iter.bi_sector = 0; /* internal bio */
> - bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
> -
> - rqd->bio = new_bio;
> - rqd->nr_ppas = nr_holes;
> - rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM);
> -
> - if (unlikely(nr_holes == 1)) {
> - ppa_ptr = rqd->ppa_list;
> - dma_ppa_list = rqd->dma_ppa_list;
> - rqd->ppa_addr = rqd->ppa_list[0];
> - }
> -
> - ret = pblk_submit_io_sync(pblk, rqd);
> - if (ret) {
> - bio_put(rqd->bio);
> - pr_err("pblk: sync read IO submission failed\n");
> - goto err;
> - }
> -
> - if (rqd->error) {
> - atomic_long_inc(>read_failed);
> -#ifdef CONFIG_NVM_DEBUG
> - pblk_print_failed_rqd(pblk, rqd, rqd->error);
> -#endif
> - }
> + __le64 *lba_list_mem, *lba_list_media;
> + void *src_p, *dst_p;
> + int hole, i;
> 
>   if (unlikely(nr_holes == 1)) {
>   struct ppa_addr ppa;
> 
>   ppa = rqd->ppa_addr;
> - rqd->ppa_list = ppa_ptr;
> - rqd->dma_ppa_list = dma_ppa_list;
> + rqd->ppa_list = pr_ctx->ppa_ptr;
> + rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
>   rqd->ppa_list[0] = ppa;
>   }
> 
> + /* Re-use allocated memory for intermediate lbas */
> + lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
> +
>   for (i = 0; i < nr_secs; i++) {
>   lba_list_media[i] = meta_list[i].lba;
>   meta_list[i].lba = lba_list_mem[i];
> @@ -316,7 +278,7 @@ static int pblk_partial_read(struct pblk *pblk, struct 
> nvm_rq *rqd,
>   meta_list[hole].lba = lba_list_media[i];
> 
>   src_bv = new_bio->bi_io_vec[i++];
> - dst_bv = orig_bio->bi_io_vec[bio_init_idx + hole];
> + dst_bv = bio->bi_io_vec[bio_init_idx + hole];
> 
>   src_p = kmap_atomic(src_bv.bv_page);
>   dst_p = kmap_atomic(dst_bv.bv_page);
> @@ -334,19 +296,107 @@ static int pblk_partial_read(struct pblk *pblk, struct 
> nvm_rq *rqd,
>   } while (hole < nr_secs);
> 
>   bio_put(new_bio);
> + kfree(pr_ctx);
> 
>   /* restore original request */
>   rqd->bio = NULL;
>   rqd->nr_ppas = nr_secs;
> 
> + bio_endio(bio);
>   __pblk_end_io_read(pblk, rqd, false);
> - return NVM_IO_DONE;
> +}
> +

Re: Regarding the recent new blk-mq timeout handling

2018-06-12 Thread jianchao.wang


On 06/12/2018 09:01 PM, jianchao.wang wrote:
> Hi ming
> 
> Thanks for your kindly response.
> 
> On 06/12/2018 06:17 PM, Ming Lei wrote:
>> On Tue, Jun 12, 2018 at 6:04 PM, jianchao.wang
>>  wrote:
>>> Hi Jens and Christoph
>>>
>>> In the recent commit of new blk-mq timeout handling, we don't have any 
>>> protection
>>> on timed out request against the completion path. We just hold a 
>>> request->ref count,
>>> it just could avoid the request tag to be released and life-recycle, but 
>>> not completion.
>>>
>>> For the scsi mid-layer, what if a request is in error handler and normal 
>>> completion come
>>> at the moment ?
>>
>> Per my understanding, now the protection needs to be done completely by 
>> driver.
>>
> 
> But looks like the drivers have not prepared well to take over this work 
> right now.
> 

I modified the scsi-debug module as the attachment
0001-scsi-debug-make-normal-completion-and-timeout-could-.patch
and try to simulate the scenario where timeout and completion path occur 
concurrently.
The system would run into crash easily.
4.17.rc7 survived from this test.

Maybe we could do as the attachment 
0001-blk-mq-protect-timed-out-request-against-completion-.patch
then replace all the blk_mq_complete_request in timeout path. Then we could 
preserve the capability
to protect the timed out request against completion path.
The patch also survived from the test.

Thanks
Jianchao
>>
>> Thanks,
>> Ming Lei
>>
> 
>From 640a67e7b4386ac42ee789f54dd0898ecd00f8f7 Mon Sep 17 00:00:00 2001
From: Jianchao Wang 
Date: Tue, 12 Jun 2018 12:04:26 +0800
Subject: [PATCH] scsi-debug: make normal completion and timeout could occur
 concurrently

Invoke blk_abort_request to force the request timed out periodically,
when complete the request in workqueue context.

Signed-off-by: Jianchao Wang 
---
 drivers/scsi/scsi_debug.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 656c98e..2ca0280 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -4323,6 +4323,8 @@ static void setup_inject(struct sdebug_queue *sqp,
 	sqcp->inj_host_busy = !!(SDEBUG_OPT_HOST_BUSY & sdebug_opts);
 }
 
+static atomic_t g_abort_counter;
+
 /* Complete the processing of the thread that queued a SCSI command to this
  * driver. It either completes the command by calling cmnd_done() or
  * schedules a hr timer or work queue then returns 0. Returns
@@ -4459,6 +4461,11 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
 			sd_dp->issuing_cpu = raw_smp_processor_id();
 		sd_dp->defer_t = SDEB_DEFER_WQ;
 		schedule_work(_dp->ew.work);
+		atomic_inc(_abort_counter);
+		if (atomic_read(_abort_counter)%2000 == 0) {
+			blk_abort_request(cmnd->request);
+			trace_printk("abort request tag %d\n", cmnd->request->tag);
+		}
 	}
 	if (unlikely((SDEBUG_OPT_Q_NOISE & sdebug_opts) &&
 		 (scsi_result == device_qfull_result)))
@@ -5843,6 +5850,7 @@ static int sdebug_driver_probe(struct device *dev)
 	struct Scsi_Host *hpnt;
 	int hprot;
 
+	atomic_set(_abort_counter, 0);
 	sdbg_host = to_sdebug_host(dev);
 
 	sdebug_driver_template.can_queue = sdebug_max_queue;
-- 
2.7.4

>From fcc515b3a642c909e8b82d2a240014faff5acd44 Mon Sep 17 00:00:00 2001
From: Jianchao Wang 
Date: Tue, 12 Jun 2018 21:20:13 +0800
Subject: [PATCH] blk-mq: protect timed out request against completion path

Signed-off-by: Jianchao Wang 
---
 block/blk-mq.c | 22 +++---
 include/linux/blk-mq.h |  1 +
 include/linux/blkdev.h |  6 ++
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6332940..2714a23 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -473,6 +473,7 @@ static void __blk_mq_free_request(struct request *rq)
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
 	const int sched_tag = rq->internal_tag;
 
+	WRITE_ONCE(rq->state, MQ_RQ_IDLE);
 	if (rq->tag != -1)
 		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
@@ -509,7 +510,6 @@ void blk_mq_free_request(struct request *rq)
 	if (blk_rq_rl(rq))
 		blk_put_rl(blk_rq_rl(rq));
 
-	WRITE_ONCE(rq->state, MQ_RQ_IDLE);
 	if (refcount_dec_and_test(>ref))
 		__blk_mq_free_request(rq);
 }
@@ -552,15 +552,17 @@ static void __blk_mq_complete_request_remote(void *data)
 	rq->q->softirq_done_fn(rq);
 }
 
-static void __blk_mq_complete_request(struct request *rq)
+/*
+ * The LLDD timeout path must invoke this interface to complete
+ * the request.
+ */
+void __blk_mq_complete_request(struct request *rq)
 {
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	bool shared = false;
 	int cpu;
 
-	if (cmpxchg(>state, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE) !=
-			MQ_RQ_IN_FLIGHT)
-		return;
+	WARN_ON(blk_mq_rq_state(rq) != MQ_RQ_COMPLETE);
 
 	if (rq->internal_tag != -1)
 		blk_mq_sched_completed_request(rq);
@@ -584,6 +586,7 @@ static void __blk_mq_complete_request(struct request *rq)
 	}
 	put_cpu();
 }

Re: Regarding the recent new blk-mq timeout handling

2018-06-12 Thread jianchao.wang
Hi ming

Thanks for your kindly response.

On 06/12/2018 06:17 PM, Ming Lei wrote:
> On Tue, Jun 12, 2018 at 6:04 PM, jianchao.wang
>  wrote:
>> Hi Jens and Christoph
>>
>> In the recent commit of new blk-mq timeout handling, we don't have any 
>> protection
>> on timed out request against the completion path. We just hold a 
>> request->ref count,
>> it just could avoid the request tag to be released and life-recycle, but not 
>> completion.
>>
>> For the scsi mid-layer, what if a request is in error handler and normal 
>> completion come
>> at the moment ?
> 
> Per my understanding, now the protection needs to be done completely by 
> driver.
> 

But looks like the drivers have not prepared well to take over this work right 
now.

Thanks
Jianchao


> 
> Thanks,
> Ming Lei
> 


Re: Regarding the recent new blk-mq timeout handling

2018-06-12 Thread Ming Lei
On Tue, Jun 12, 2018 at 6:04 PM, jianchao.wang
 wrote:
> Hi Jens and Christoph
>
> In the recent commit of new blk-mq timeout handling, we don't have any 
> protection
> on timed out request against the completion path. We just hold a request->ref 
> count,
> it just could avoid the request tag to be released and life-recycle, but not 
> completion.
>
> For the scsi mid-layer, what if a request is in error handler and normal 
> completion come
> at the moment ?

Per my understanding, now the protection needs to be done completely by driver.


Thanks,
Ming Lei


Re: [PATCH] nbd: set discard_alignment to the granularity

2018-06-12 Thread Wouter Verhelst
Hi Josef,

On Tue, Jun 05, 2018 at 11:41:23AM -0400, Josef Bacik wrote:
> Technically we should be able to get away with 0 as the
> discard_alignment, but there's no way currently for the protocol to
> indicate different alignments,

Actually there is, with the NBD_INFO_BLOCK_SIZE (and related)
response(s) to the NBD_OPT_GO message during negotiation, but
implementing that will probably require some more netlink messages.

(some of this is still being hashed out on the NBD mailinglist; I'm sure
your insights in that would be welcome)

-- 
Could you people please use IRC like normal people?!?

  -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008
 Hacklab


Regarding the recent new blk-mq timeout handling

2018-06-12 Thread jianchao.wang
Hi Jens and Christoph

In the recent commit of new blk-mq timeout handling, we don't have any 
protection
on timed out request against the completion path. We just hold a request->ref 
count,
it just could avoid the request tag to be released and life-recycle, but not 
completion.

For the scsi mid-layer, what if a request is in error handler and normal 
completion come
at the moment ?

Or do I miss anything about this commit ?


Thanks
Jianchao


[PATCH blktests] check: add command line switch to test device drivers only

2018-06-12 Thread Johannes Thumshirn
Sometimes it's useful to only run tests which exercise a device
special driver to verify a patch for the driver doesn't introduce a
regression.

Running the whole test-suite is just a waste of time in this case, so
provide a way to only run tests which have a test_device() function
set and not a test() function.

Signed-off-by: Johannes Thumshirn 
---
 check | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/check b/check
index 4baa8dde2436..639fcc43f09d 100755
--- a/check
+++ b/check
@@ -395,6 +395,12 @@ _run_test() {
. "tests/${TEST_NAME}"
 
if declare -fF test >/dev/null; then
+   if [[ -v DEVICE_ONLY ]]; then
+   SKIP_REASON="test excluded by user"
+   _output_notrun "$TEST_NAME"
+   return 0
+   fi
+
if declare -fF requires >/dev/null && ! requires; then
_output_notrun "$TEST_NAME"
return 0
@@ -546,6 +552,9 @@ Test runs:
   -x, --exclude=TEST exclude a test (or test group) from the list of
 tests to run
 
+  -d --device-only   only run test which use a test device from the
+ TEST_DEV config setting
+
 Miscellaneous:
   -h, --help display this help message and exit"
 
@@ -570,6 +579,7 @@ unset TEMP
 
 # Default configuration.
 QUICK_RUN=0
+DEVICE_ONLY=0
 EXCLUDE=()
 TEST_DEVS=()
 
@@ -592,6 +602,10 @@ while true; do
EXCLUDE+=("$2")
shift 2
;;
+   '-d'|'--device-only')
+   DEVICE_ONLY=1
+   shift 2
+   ;;
'-h'|'--help')
usage out
;;
@@ -609,6 +623,10 @@ if [[ QUICK_RUN -ne 0 && ! -v TIMEOUT ]]; then
_error "QUICK_RUN specified without TIMEOUT"
 fi
 
+if [[ DEVICE_ONLY -ne 0 && ${#TEST_DEVS[@]} -eq 0 ]]; then
+   _error "DEVICE_ONLY specified without TEST_DEVS"
+fi
+
 # Convert the exclude list to an associative array.
 TEMP_EXCLUDE=("${EXCLUDE[@]}")
 unset EXCLUDE
-- 
2.16.4



Re: [PATCH 1/6] block: add a bio_reuse helper

2018-06-12 Thread Kent Overstreet
On Mon, Jun 11, 2018 at 09:48:01PM +0200, Christoph Hellwig wrote:
> This abstracts out a way to reuse a bio without destroying the
> data pointers.

What is the point of this? What "data pointers" does it not destroy?

> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/bio.c | 20 
>  include/linux/bio.h |  1 +
>  2 files changed, 21 insertions(+)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 70c4e1b6dd45..fa1b7ab50784 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -308,6 +308,26 @@ void bio_reset(struct bio *bio)
>  }
>  EXPORT_SYMBOL(bio_reset);
>  
> +/**
> + * bio_reuse - prepare a bio for reuse
> + * @bio: bio to reuse
> + *
> + * Prepares an already setup and possible used bio for reusing it another
> + * time.  Compared to bio_reset() this preserves the bio size and the
> + * layout and contents of the bio vectors.
> + */
> +void bio_reuse(struct bio *bio)
> +{
> + unsigned int size = bio->bi_iter.bi_size;
> + unsigned short vcnt = bio->bi_vcnt;
> +
> + bio_reset(bio);
> +
> + bio->bi_iter.bi_size = size;
> + bio->bi_vcnt = vcnt;
> +}
> +EXPORT_SYMBOL_GPL(bio_reuse);
> +
>  static struct bio *__bio_chain_endio(struct bio *bio)
>  {
>   struct bio *parent = bio->bi_private;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index f08f5fe7bd08..15c871ab50db 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -475,6 +475,7 @@ extern void bio_init(struct bio *bio, struct bio_vec 
> *table,
>unsigned short max_vecs);
>  extern void bio_uninit(struct bio *);
>  extern void bio_reset(struct bio *);
> +void bio_reuse(struct bio *);
>  void bio_chain(struct bio *, struct bio *);
>  
>  extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned 
> int);
> -- 
> 2.17.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html