Re: [GIT PULL 02/58] lightnvm: prevent bd removal if busy

2017-10-13 Thread Javier González
> On 13 Oct 2017, at 17.58, Javier González wrote: > > >>> On 13 Oct 2017, at 17.35, Rakesh Pandit wrote: >>> On Fri, Oct 13, 2017 at 07:58:09AM -0700, Christoph Hellwig wrote: On Fri, Oct 13, 2017 at 02:45:51PM +0200, Matias Bjørling wrote: From: Rakesh Pandit When

Re: [PATCH V9 0/7] blk-mq-sched: improve sequential I/O performance

2017-10-13 Thread Ming Lei
On Fri, Oct 13, 2017 at 02:23:07PM -0600, Jens Axboe wrote: > On 10/13/2017 01:21 PM, Jens Axboe wrote: > > On 10/13/2017 01:08 PM, Jens Axboe wrote: > >> On 10/13/2017 12:05 PM, Ming Lei wrote: > >>> Hi Jens, > >>> > >>> In Red Hat internal storage test wrt. blk-mq scheduler, we found that I/O > >

Re: [PATCH V9 4/7] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Bart Van Assche
On Sat, 2017-10-14 at 02:05 +0800, Ming Lei wrote: > @@ -89,19 +89,36 @@ static bool blk_mq_sched_restart_hctx(struct > blk_mq_hw_ctx *hctx) > return false; > } > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_c

[PATCH 11/15] bcache: writeback rate shouldn't artifically clamp

2017-10-13 Thread Michael Lyle
The previous code artificially limited writeback rate to 100 blocks/second (NSEC_PER_MSEC), which is a rate that can be met on fast hardware. The rate limiting code works fine (though with decreased precision) up to 3 orders of magnitude faster, so use NSEC_PER_SEC. Additionally, ensure that

[PATCH 12/15] bcache: rearrange writeback main thread ratelimit

2017-10-13 Thread Michael Lyle
The time spent searching for things to write back "counts" for the actual rate achieved, so don't flush the accumulated rate with each chunk. This will maintain better fidelity to user-commanded rates, but it may slightly increase the burstiness of writeback. The writeback lock needs improvement

[PATCH 13/15] bcache: safeguard a dangerous addressing in closure_queue

2017-10-13 Thread Michael Lyle
From: Liang Chen The use of the union reduces the size of closure struct by taking advantage of the current size of its members. The offset of func in work_struct equals the size of the first three members, so that work.work_func will just reference the forth member - fn. This is smart but dange

[PATCH 10/15] bcache: smooth writeback rate control

2017-10-13 Thread Michael Lyle
This works in conjunction with the new PI controller. Currently, in real-world workloads, the rate controller attempts to write back 1 sector per second. In practice, these minimum-rate writebacks are between 4k and 60k in test scenarios, since bcache aggregates and attempts to do contiguous writ

[PATCH 08/15] bcache: don't write back data if reading it failed

2017-10-13 Thread Michael Lyle
If an IO operation fails, and we didn't successfully read data from the cache, don't writeback invalid/partial data to the backing disk. Signed-off-by: Michael Lyle Reviewed-by: Coly Li --- drivers/md/bcache/writeback.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-)

[PATCH 05/15] bcache: Remove redundant set_capacity

2017-10-13 Thread Michael Lyle
From: Yijing Wang set_capacity() has been called in bcache_device_init(), remove the redundant one. Signed-off-by: Yijing Wang Reviewed-by: Eric Wheeler Acked-by: Coly Li --- drivers/md/bcache/super.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/m

[PATCH 07/15] bcache: remove unused parameter

2017-10-13 Thread Michael Lyle
From: Yijing Wang Parameter bio is no longer used, clean it. Signed-off-by: Yijing Wang Reviewed-by: Coly Li --- drivers/md/bcache/request.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 9ee137e

[PATCH 09/15] bcache: implement PI controller for writeback rate

2017-10-13 Thread Michael Lyle
bcache uses a control system to attempt to keep the amount of dirty data in cache at a user-configured level, while not responding excessively to transients and variations in write rate. Previously, the system was a PD controller; but the output from it was integrated, turning the Proportional ter

[PATCH 04/15] bcache: rewrite multiple partitions support

2017-10-13 Thread Michael Lyle
From: Coly Li Current partition support of bcache is confusing and buggy. It tries to trace non-continuous device minor numbers by an ida bit string, and mistakenly mixed bcache device index with minor numbers. This design generates several negative results, - Index of bcache device name is not c

[PATCH 15/15] bcache: MAINTAINERS: set bcache to MAINTAINED

2017-10-13 Thread Michael Lyle
Also add URL for IRC channel. Signed-off-by: Michael Lyle --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 73213039bdb7..b3203e082a1e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2568,7 +2568,8 @@ M:Michael Lyle M:

[PATCH 14/15] bcache: Add Michael Lyle to MAINTAINERS

2017-10-13 Thread Michael Lyle
From: Kent Overstreet Signed-off-by: Kent Overstreet --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 02441c2e8c55..73213039bdb7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2564,6 +2564,7 @@ S:Maintained F: drivers/net/hamradio/

[PATCH 06/15] bcache: update bio->bi_opf bypass/writeback REQ_ flag hints

2017-10-13 Thread Michael Lyle
From: Eric Wheeler Flag for bypass if the IO is for read-ahead or background, unless the read-ahead request is for metadata (eg, from gfs2). Bypass if: bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && !(bio->bi_opf & REQ_META)) Writeback if:

[PATCH 02/15] bcache: check ca->alloc_thread initialized before wake up it

2017-10-13 Thread Michael Lyle
From: Coly Li In bcache code, sysfs entries are created before all resources get allocated, e.g. allocation thread of a cache set. There is posibility for NULL pointer deference if a resource is accessed but which is not initialized yet. Indeed Jorg Bornschein catches one on cache set allocation

[PATCH 01/15] bcache: Avoid nested function definition

2017-10-13 Thread Michael Lyle
From: Peter Foley Fixes below error with clang: ../drivers/md/bcache/sysfs.c:759:3: error: function definition is not allowed here { return *((uint16_t *) r) - *((uint16_t *) l); } ^ ../drivers/md/bcache/sysfs.c:789:32: error: use of undeclared identifier 'c

[PATCH 03/15] bcache: fix a comments typo in bch_alloc_sectors()

2017-10-13 Thread Michael Lyle
From: Coly Li Code comments in alloc.c:bch_alloc_sectors() mentions a function name find_data_bucket(), the correct function name should be pick_data_bucket() indeed. bch_alloc_sectors() is a quite important function in bcache allocation code, fixing the typo may help other people to have less co

[PATCH 00/15] bcache: series of patches for 4.15

2017-10-13 Thread Michael Lyle
Jens, and everyone: Here is the current series of work for inclusion in next and for 4.15's merge window. We may get some additional fixes, but this is most of the work we expect. All have been previously sent to these lists except the very last one. There's a lot of small cleanup patches and f

Re: [PATCH v2 2/2] bcache: rearrange writeback main thread ratelimit

2017-10-13 Thread Kent Overstreet
On Mon, Oct 09, 2017 at 12:37:30AM -0700, Michael Lyle wrote: > The time spent searching for things to write back "counts" for the > actual rate achieved, so don't flush the accumulated rate with each > chunk. > > This will maintain better fidelity to user-commanded rates, but it > may slightly in

Re: [PATCH V9 6/7] SCSI: allow to pass null rq to scsi_prep_state_check()

2017-10-13 Thread Bart Van Assche
On Sat, 2017-10-14 at 02:05 +0800, Ming Lei wrote: > In the following patch, we will implement scsi_get_budget() > which need to call scsi_prep_state_check() when rq isn't > dequeued yet. My understanding is that this change is only needed because scsi_mq_get_budget() calls scsi_prep_state_check()

Re: [PATCH V9 0/7] blk-mq-sched: improve sequential I/O performance

2017-10-13 Thread Jens Axboe
On 10/13/2017 01:21 PM, Jens Axboe wrote: > On 10/13/2017 01:08 PM, Jens Axboe wrote: >> On 10/13/2017 12:05 PM, Ming Lei wrote: >>> Hi Jens, >>> >>> In Red Hat internal storage test wrt. blk-mq scheduler, we found that I/O >>> performance is much bad with mq-deadline, especially about sequential I

Re: [PATCH V9 0/7] blk-mq-sched: improve sequential I/O performance

2017-10-13 Thread Jens Axboe
On 10/13/2017 01:08 PM, Jens Axboe wrote: > On 10/13/2017 12:05 PM, Ming Lei wrote: >> Hi Jens, >> >> In Red Hat internal storage test wrt. blk-mq scheduler, we found that I/O >> performance is much bad with mq-deadline, especially about sequential I/O >> on some multi-queue SCSI devcies(lpfc, qla2

Re: [PATCH V9 0/7] blk-mq-sched: improve sequential I/O performance

2017-10-13 Thread Jens Axboe
On 10/13/2017 12:05 PM, Ming Lei wrote: > Hi Jens, > > In Red Hat internal storage test wrt. blk-mq scheduler, we found that I/O > performance is much bad with mq-deadline, especially about sequential I/O > on some multi-queue SCSI devcies(lpfc, qla2xxx, SRP...) > > Turns out one big issue causes

[PATCH] block-throttle: avoid double charge

2017-10-13 Thread Shaohua Li
If a bio is throttled and splitted after throttling, the bio could be resubmited and enters the throttling again. This will cause part of the bio is charged multiple times. If the cgroup has an IO limit, the double charge will significantly harm the performance. The bio split becomes quite common a

[PATCH V9 2/7] blk-mq-sched: move actual dispatching into one helper

2017-10-13 Thread Ming Lei
So that it becomes easy to support to dispatch from sw queue in the following patch. No functional change. Reviewed-by: Bart Van Assche Reviewed-by: Omar Sandoval Suggested-by: Christoph Hellwig # for simplifying dispatch logic Signed-off-by: Ming Lei --- block/blk-mq-sched.c | 43 ++

[PATCH V9 3/7] sbitmap: introduce __sbitmap_for_each_set()

2017-10-13 Thread Ming Lei
We need to iterate ctx starting from any ctx in round robin way, so introduce this helper. Reviewed-by: Omar Sandoval Cc: Omar Sandoval Signed-off-by: Ming Lei --- include/linux/sbitmap.h | 64 - 1 file changed, 47 insertions(+), 17 deletions(-)

[PATCH V9 1/7] blk-mq-sched: dispatch from scheduler only after progress is made on ->dispatch

2017-10-13 Thread Ming Lei
When hw queue is busy, we shouldn't take requests from scheduler queue any more, otherwise it is difficult to do IO merge. This patch fixes the awful IO performance on some SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber is used by not taking requests if hw queue is busy. Reviewed-by: Oma

[PATCH V9 4/7] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Ming Lei
For SCSI devices, there is often per-request-queue depth, which need to be respected before queuing one request. The current blk-mq always dequeues request first, then calls .queue_rq() to dispatch the request to lld. One obvious issue of this way is that I/O merge may not be good, because when th

[PATCH V9 6/7] SCSI: allow to pass null rq to scsi_prep_state_check()

2017-10-13 Thread Ming Lei
In the following patch, we will implement scsi_get_budget() which need to call scsi_prep_state_check() when rq isn't dequeued yet. Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/sc

[PATCH V9 7/7] SCSI: implement .get_budget and .put_budget for blk-mq

2017-10-13 Thread Ming Lei
We need to tell blk-mq for reserving resource before queuing one request, so implement these two callbacks. Then blk-mq can avoid to dequeue request earlier, and IO merge can be improved a lot. Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 75 ++---

[PATCH V9 5/7] blk-mq-sched: improve dispatching from sw queue

2017-10-13 Thread Ming Lei
SCSI devices use host-wide tagset, and the shared driver tag space is often quite big. Meantime there is also queue depth for each lun( .cmd_per_lun), which is often small, for example, on both lpfc and qla2xxx, .cmd_per_lun is just 3. So lots of requests may stay in sw queue, and we always flush

[PATCH V9 0/7] blk-mq-sched: improve sequential I/O performance

2017-10-13 Thread Ming Lei
Hi Jens, In Red Hat internal storage test wrt. blk-mq scheduler, we found that I/O performance is much bad with mq-deadline, especially about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx, SRP...) Turns out one big issue causes the performance regression: requests are still dequeu

Re: [PATCH 1/8] lib: Introduce sgl_alloc() and sgl_free()

2017-10-13 Thread Bart Van Assche
On Fri, 2017-10-13 at 10:43 -0700, Randy Dunlap wrote: > On 10/12/17 15:45, Bart Van Assche wrote: > > + * @sg: Scatterlist with one or more elements > > @sgl: Thanks Randy. I will fix this before I repost this series. Bart.

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Bart Van Assche
On Sat, 2017-10-14 at 01:29 +0800, Ming Lei wrote: > ->can_queue is size of the whole tag space shared by all LUNs, looks it isn't > reasonable to increase cmd_per_lun to .can_queue. Sorry but I disagree. Setting cmd_per_lun to a value lower than can_queue will result in suboptimal performance if

Re: [PATCH 1/8] lib: Introduce sgl_alloc() and sgl_free()

2017-10-13 Thread Randy Dunlap
On 10/12/17 15:45, Bart Van Assche wrote: > diff --git a/lib/sgl_alloc.c b/lib/sgl_alloc.c > new file mode 100644 > index ..d96b395dd5c8 > --- /dev/null > +++ b/lib/sgl_alloc.c > @@ -0,0 +1,102 @@ > +/** > + * sgl_free_order - free a scatterlist and its pages > + * @sg: Scatterlist wi

Re: [PATCH V8 4/7] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Jens Axboe
On 10/13/2017 11:24 AM, Ming Lei wrote: > For SCSI devices, there is often per-request-queue depth, which need > to be respected before queuing one request. > > The current blk-mq always dequeues request first, then calls .queue_rq() > to dispatch the request to lld. One obvious issue of this way

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Ming Lei
On Fri, Oct 13, 2017 at 05:08:52PM +, Bart Van Assche wrote: > On Sat, 2017-10-14 at 00:45 +0800, Ming Lei wrote: > > On Fri, Oct 13, 2017 at 04:31:04PM +, Bart Van Assche wrote: > > > On Sat, 2017-10-14 at 00:07 +0800, Ming Lei wrote: > > > > Actually it is in hot path, for example, lpfc a

Re: [PATCH V8 1/7] blk-mq-sched: fix scheduler bad performance

2017-10-13 Thread Jens Axboe
On 10/13/2017 11:24 AM, Ming Lei wrote: > When hw queue is busy, we shouldn't take requests from > scheduler queue any more, otherwise it is difficult to do > IO merge. > > This patch fixes the awful IO performance on some > SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber > is used by not

[PATCH V8 2/7] blk-mq-sched: move actual dispatching into one helper

2017-10-13 Thread Ming Lei
So that it becomes easy to support to dispatch from sw queue in the following patch. No functional change. Reviewed-by: Bart Van Assche Reviewed-by: Omar Sandoval Suggested-by: Christoph Hellwig # for simplifying dispatch logic Signed-off-by: Ming Lei --- block/blk-mq-sched.c | 43 ++

[PATCH V8 5/7] blk-mq-sched: improve dispatching from sw queue

2017-10-13 Thread Ming Lei
SCSI devices use host-wide tagset, and the shared driver tag space is often quite big. Meantime there is also queue depth for each lun( .cmd_per_lun), which is often small, for example, on both lpfc and qla2xxx, .cmd_per_lun is just 3. So lots of requests may stay in sw queue, and we always flush

[PATCH V8 1/7] blk-mq-sched: fix scheduler bad performance

2017-10-13 Thread Ming Lei
When hw queue is busy, we shouldn't take requests from scheduler queue any more, otherwise it is difficult to do IO merge. This patch fixes the awful IO performance on some SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber is used by not taking requests if hw queue is busy. Reviewed-by: Oma

[PATCH V8 4/7] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Ming Lei
For SCSI devices, there is often per-request-queue depth, which need to be respected before queuing one request. The current blk-mq always dequeues request first, then calls .queue_rq() to dispatch the request to lld. One obvious issue of this way is that I/O merge may not be good, because when th

[PATCH V8 6/7] SCSI: allow to pass null rq to scsi_prep_state_check()

2017-10-13 Thread Ming Lei
In the following patch, we will implement scsi_get_budget() which need to call scsi_prep_state_check() when rq isn't dequeued yet. Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/sc

[PATCH V8 7/7] SCSI: implement .get_budget and .put_budget for blk-mq

2017-10-13 Thread Ming Lei
We need to tell blk-mq for reserving resource before queuing one request, so implement these two callbacks. Then blk-mq can avoid to dequeue request earlier, and IO merge can be improved a lot. Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 75 ++---

[PATCH V8 3/7] sbitmap: introduce __sbitmap_for_each_set()

2017-10-13 Thread Ming Lei
We need to iterate ctx starting from any ctx in round robin way, so introduce this helper. Reviewed-by: Omar Sandoval Cc: Omar Sandoval Signed-off-by: Ming Lei --- include/linux/sbitmap.h | 64 - 1 file changed, 47 insertions(+), 17 deletions(-)

[PATCH V8 0/7] blk-mq-sched: improve sequential I/O performance

2017-10-13 Thread Ming Lei
Hi Jens, In Red Hat internal storage test wrt. blk-mq scheduler, we found that I/O performance is much bad with mq-deadline, especially about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx, SRP...) Turns out one big issue causes the performance regression: requests are still dequeu

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Bart Van Assche
On Sat, 2017-10-14 at 00:45 +0800, Ming Lei wrote: > On Fri, Oct 13, 2017 at 04:31:04PM +, Bart Van Assche wrote: > > On Sat, 2017-10-14 at 00:07 +0800, Ming Lei wrote: > > > Actually it is in hot path, for example, lpfc and qla2xx's queue depth is > > > 3, > > > > Sorry but I doubt whether t

Re: [PATCH 0/2] Update the usage hints of submit_queues and

2017-10-13 Thread Jens Axboe
On 10/13/2017 10:24 AM, weiping zhang wrote: > no_sched > Message-ID: > Reply-To: > > Hi Jens, > > These two misc patch update the usage hints of null_blk module > parameter. > > Because submit_queues was changed to 1 by d

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Ming Lei
On Fri, Oct 13, 2017 at 04:31:04PM +, Bart Van Assche wrote: > On Sat, 2017-10-14 at 00:07 +0800, Ming Lei wrote: > > Actually it is in hot path, for example, lpfc and qla2xx's queue depth is 3, > > Sorry but I doubt whether that is correct. More in general, I don't know any > modern > storag

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Jens Axboe
On 10/13/2017 10:31 AM, Bart Van Assche wrote: > On Sat, 2017-10-14 at 00:07 +0800, Ming Lei wrote: >> Actually it is in hot path, for example, lpfc and qla2xx's queue depth is 3, > > Sorry but I doubt whether that is correct. More in general, I don't know any > modern > storage HBA for which the

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Bart Van Assche
On Sat, 2017-10-14 at 00:07 +0800, Ming Lei wrote: > Actually it is in hot path, for example, lpfc and qla2xx's queue depth is 3, Sorry but I doubt whether that is correct. More in general, I don't know any modern storage HBA for which the default queue depth is so low. Bart.

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Jens Axboe
On 10/13/2017 10:22 AM, Ming Lei wrote: > On Fri, Oct 13, 2017 at 10:20:01AM -0600, Jens Axboe wrote: >> On 10/13/2017 10:17 AM, Ming Lei wrote: >>> On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote: On 10/12/2017 06:19 PM, Ming Lei wrote: > On Thu, Oct 12, 2017 at 12:46:24PM -060

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Jens Axboe
On 10/13/2017 10:21 AM, Ming Lei wrote: > On Fri, Oct 13, 2017 at 10:19:04AM -0600, Jens Axboe wrote: >> On 10/13/2017 10:07 AM, Ming Lei wrote: >>> On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote: On 10/12/2017 06:19 PM, Ming Lei wrote: > On Thu, Oct 12, 2017 at 12:46:24PM -060

[PATCH 2/2] null_blk: add usage hints for no_sched

2017-10-13 Thread weiping zhang
This parameter provide an option to disable io scheduler when nullb* in multi-queue mode. Signed-off-by: weiping zhang --- Documentation/block/null_blk.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/block/null_blk.txt b/Documentation/block/null_blk.txt index 1f8d92c704

[PATCH 1/2] null_blk: update usage hints for submit_queues

2017-10-13 Thread weiping zhang
update the range of submits_queues, and correct usage hints. Signed-off-by: weiping zhang --- Documentation/block/null_blk.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/block/null_blk.txt b/Documentation/block/null_blk.txt index 3140dbd860d8..1f8d92c

[PATCH 0/2] Update the usage hints of submit_queues and

2017-10-13 Thread weiping zhang
no_sched Message-ID: Reply-To: Hi Jens, These two misc patch update the usage hints of null_blk module parameter. Because submit_queues was changed to 1 by default, so update it's description. The second patch add usage h

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Ming Lei
On Fri, Oct 13, 2017 at 10:20:01AM -0600, Jens Axboe wrote: > On 10/13/2017 10:17 AM, Ming Lei wrote: > > On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote: > >> On 10/12/2017 06:19 PM, Ming Lei wrote: > >>> On Thu, Oct 12, 2017 at 12:46:24PM -0600, Jens Axboe wrote: > On 10/12/2017 1

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Ming Lei
On Fri, Oct 13, 2017 at 10:19:04AM -0600, Jens Axboe wrote: > On 10/13/2017 10:07 AM, Ming Lei wrote: > > On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote: > >> On 10/12/2017 06:19 PM, Ming Lei wrote: > >>> On Thu, Oct 12, 2017 at 12:46:24PM -0600, Jens Axboe wrote: > On 10/12/2017 1

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Jens Axboe
On 10/13/2017 10:17 AM, Ming Lei wrote: > On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote: >> On 10/12/2017 06:19 PM, Ming Lei wrote: >>> On Thu, Oct 12, 2017 at 12:46:24PM -0600, Jens Axboe wrote: On 10/12/2017 12:37 PM, Ming Lei wrote: > For SCSI devices, there is often per-re

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Jens Axboe
On 10/13/2017 10:07 AM, Ming Lei wrote: > On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote: >> On 10/12/2017 06:19 PM, Ming Lei wrote: >>> On Thu, Oct 12, 2017 at 12:46:24PM -0600, Jens Axboe wrote: On 10/12/2017 12:37 PM, Ming Lei wrote: > For SCSI devices, there is often per-re

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Ming Lei
On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote: > On 10/12/2017 06:19 PM, Ming Lei wrote: > > On Thu, Oct 12, 2017 at 12:46:24PM -0600, Jens Axboe wrote: > >> On 10/12/2017 12:37 PM, Ming Lei wrote: > >>> For SCSI devices, there is often per-request-queue depth, which need > >>> to be r

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Ming Lei
On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote: > On 10/12/2017 06:19 PM, Ming Lei wrote: > > On Thu, Oct 12, 2017 at 12:46:24PM -0600, Jens Axboe wrote: > >> On 10/12/2017 12:37 PM, Ming Lei wrote: > >>> For SCSI devices, there is often per-request-queue depth, which need > >>> to be r

Re: [GIT PULL 02/58] lightnvm: prevent bd removal if busy

2017-10-13 Thread Javier González
> On 13 Oct 2017, at 17.35, Rakesh Pandit wrote: > >> On Fri, Oct 13, 2017 at 07:58:09AM -0700, Christoph Hellwig wrote: >>> On Fri, Oct 13, 2017 at 02:45:51PM +0200, Matias Bjørling wrote: >>> From: Rakesh Pandit >>> >>> When a virtual block device is formatted and mounted after creating >>>

Re: [GIT PULL 02/58] lightnvm: prevent bd removal if busy

2017-10-13 Thread Rakesh Pandit
On Fri, Oct 13, 2017 at 07:58:09AM -0700, Christoph Hellwig wrote: > On Fri, Oct 13, 2017 at 02:45:51PM +0200, Matias Bjørling wrote: > > From: Rakesh Pandit > > > > When a virtual block device is formatted and mounted after creating > > with "nvme lnvm create... -t pblk", a removal from "nvm lnv

Re: [GIT PULL 02/58] lightnvm: prevent bd removal if busy

2017-10-13 Thread Christoph Hellwig
On Fri, Oct 13, 2017 at 02:45:51PM +0200, Matias Bjørling wrote: > From: Rakesh Pandit > > When a virtual block device is formatted and mounted after creating > with "nvme lnvm create... -t pblk", a removal from "nvm lnvm remove" > would result in this: > > 446416.309757] bdi-block not registere

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-13 Thread Jens Axboe
On 10/12/2017 06:19 PM, Ming Lei wrote: > On Thu, Oct 12, 2017 at 12:46:24PM -0600, Jens Axboe wrote: >> On 10/12/2017 12:37 PM, Ming Lei wrote: >>> For SCSI devices, there is often per-request-queue depth, which need >>> to be respected before queuing one request. >>> >>> The current blk-mq always

Re: [GIT PULL 00/58] LightNVM updates for 4.15

2017-10-13 Thread Jens Axboe
On Fri, Oct 13 2017, Matias Bjørling wrote: > Hi Jens, > > A couple of patches for 4.15. > > Javier has improved garbage collection, statistics, memory pool usage, > and added support for single LUN configurations. He also made a lot of > bug fixes and cleanup patches. > > Rakesh have been fixin

Re: [PATCH 1/8] lib: Introduce sgl_alloc() and sgl_free()

2017-10-13 Thread Jens Axboe
On 10/12/2017 05:00 PM, Bart Van Assche wrote: > On Thu, 2017-10-12 at 16:52 -0600, Jens Axboe wrote: >> On 10/12/2017 04:45 PM, Bart Van Assche wrote: >>> +++ b/include/linux/sgl_alloc.h >>> @@ -0,0 +1,16 @@ >>> +#ifndef _SGL_ALLOC_H_ >>> +#define _SGL_ALLOC_H_ >>> + >>> +#include /* bool, gfp_t

[GIT PULL 02/58] lightnvm: prevent bd removal if busy

2017-10-13 Thread Matias Bjørling
From: Rakesh Pandit When a virtual block device is formatted and mounted after creating with "nvme lnvm create... -t pblk", a removal from "nvm lnvm remove" would result in this: 446416.309757] bdi-block not registered [446416.309773] [ cut here ] [446416.309780] WARNING:

[GIT PULL 03/58] lightnvm: protect target type list with correct locks

2017-10-13 Thread Matias Bjørling
From: Rakesh Pandit nvm_tgt_types list was protected by wrong lock for NVM_INFO ioctl call and can race with addition or removal of target types. Also unregistering target type was not protected correctly. Fixes: 5cd907853 ("lightnvm: remove nested lock conflict with mm") Signed-off-by: Rakesh

[GIT PULL 05/58] lightnvm: pblk: fix error path in pblk_lines_alloc_metadata

2017-10-13 Thread Matias Bjørling
From: Rakesh Pandit Use appropriate memory free calls based on allocation type used and also fix number of times free is called if kmalloc fails. Signed-off-by: Rakesh Pandit Reviewed-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-init.c | 5 - 1 file change

[GIT PULL 04/58] lightnvm: remove already calculated nr_chnls

2017-10-13 Thread Matias Bjørling
From: Rakesh Pandit Remove repeated calculation for number of channels while creating a target device. Signed-off-by: Rakesh Pandit Reviewed-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/lightnvm/core

[GIT PULL 06/58] lightnvm: include NVM Express driver if OCSSD is selected for build

2017-10-13 Thread Matias Bjørling
From: Rakesh Pandit Because NVM needs BLK_DEV_NVME, select it automatically if we mark NVM in config file before building kernel. Also append PCI to depends as select doesn't automatically add dependencies. Signed-off-by: Rakesh Pandit Signed-off-by: Matias Bjørling --- drivers/lightnvm/Kcon

[GIT PULL 09/58] lightnvm: pblk: improve error message if down_timeout fails

2017-10-13 Thread Matias Bjørling
From: Rakesh Pandit The two pr_err messages are useless as they don't differentiate error code. Signed-off-by: Rakesh Pandit Reviewed-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-core.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff

[GIT PULL 07/58] lightnvm: pblk: protect line bitmap while submitting meta io

2017-10-13 Thread Matias Bjørling
From: Rakesh Pandit It seems pblk_dealloc_page would race against pblk_alloc_pages for line bitmap for sector allocation.The chances are very low but might as well protect the bitmap properly. Signed-off-by: Rakesh Pandit Reviewed-by: Javier González Signed-off-by: Matias Bjørling --- driver

[GIT PULL 10/58] lightnvm: pblk: print incompatible line version correctly

2017-10-13 Thread Matias Bjørling
From: Rakesh Pandit Correct it by converting little endian to cpu endian and also define a macro for line version so that maintenance is easy. Signed-off-by: Rakesh Pandit Reviewed-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-core.c | 2 +- drivers/lightnv

[GIT PULL 08/58] lightnvm: pblk: fix message if L2P MAP is in device

2017-10-13 Thread Matias Bjørling
From: Rakesh Pandit This usually happens if we are developing with qemu and ll2pmode has default value. Improve description. Signed-off-by: Rakesh Pandit Reviewed-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-init.c | 2 +- 1 file changed, 1 insertion(+), 1 del

[GIT PULL 12/58] lightnvm: pblk: initialize debug stat counter

2017-10-13 Thread Matias Bjørling
From: Javier González Initialize the stat counter for garbage collected reads. Fixes: a4bd217b43268 ("lightnvm: physical block device (pblk) target") Signed-off-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-init.c | 1 + 1 file changed, 1 insertion(+) diff --gi

[GIT PULL 11/58] lightnvm: pblk: reuse pblk_gc_should_kick

2017-10-13 Thread Matias Bjørling
From: Rakesh Pandit This is a trivial change which reuses pblk_gc_should_kick instead of repeating it again in pblk_rl_free_lines_inc. Signed-off-by: Rakesh Pandit Made it apply to the common case. Reviewed-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-core.c |

[GIT PULL 13/58] lightnvm: pblk: use right flag for GC allocation

2017-10-13 Thread Matias Bjørling
From: Javier González The data buffer for the GC path allocates virtual memory through vmalloc. When this change was introduced, a flag signaling kmalloc'ed memory was wrongly introduced. Use the right flag when creating a bio from this buffer. Fixes: de54e703a422 ("lightnvm: pblk: use vmalloc f

[GIT PULL 17/58] lightnvm: pblk: fix min size for page mempool

2017-10-13 Thread Matias Bjørling
From: Javier González pblk uses an internal page mempool for allocating pages on internal bios. The main two users of this memory pool are partial reads (reads with some sectors in cache and some on media) and padded writes, which need to add dummy pages to an existing bio already containing vali

[GIT PULL 14/58] lightnvm: pblk: free padded entries in write buffer

2017-10-13 Thread Matias Bjørling
From: Javier González When a REQ_FLUSH reaches pblk, the bio cannot be directly completed. Instead, data on the write buffer is flushed and the bio is completed on the completion pah. This might require some sectors to be padded in order to guarantee a successful write. This patch fixes a memory

[GIT PULL 19/58] lightnvm: pblk: decouple read/erase mempools

2017-10-13 Thread Matias Bjørling
From: Javier González Since read and erase paths offer different guarantees for inflight I/Os, separate the mempools to set the right min_nr for each on creation. Reported-by: Jens Axboe Signed-off-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-core.c | 8 -

[GIT PULL 18/58] lightnvm: pblk: simplify work_queue mempool

2017-10-13 Thread Matias Bjørling
From: Javier González In pblk, we have a mempool to allocate a generic structure that we pass along workqueues. This is heavily used in the GC path in order to have enough inflight reads and fully utilize the GC bandwidth. However, the current GC path copies data to the host memory and puts it b

[GIT PULL 20/58] lightnvm: pblk: do not use a mempool for line bitmaps

2017-10-13 Thread Matias Bjørling
From: Javier González pblk holds two sector bitmaps: one to keep track of the mapped sectors while the line is active and another one to keep track of the invalid sectors. The latter is kept during the whole live of the line, until it is recycled. Since we cannot guarantee forward progress for th

[GIT PULL 21/58] lightnvm: pblk: remove checks on mempool alloc.

2017-10-13 Thread Matias Bjørling
From: Javier González As part of the mempool audit on pblk, remove unnecessary mempool allocation checks on mempools. Reported-by: Jens Axboe Signed-off-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-core.c | 4 drivers/lightnvm/pblk-read.c | 8 --

[GIT PULL 22/58] lightnvm: pblk: use constant for GC max inflight

2017-10-13 Thread Matias Bjørling
From: Javier González Use a constant to set the maximum number of inflight GC requests allowed. Signed-off-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-gc.c | 4 ++-- drivers/lightnvm/pblk.h| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --gi

[GIT PULL 23/58] lightnvm: pblk: normalize ppa namings

2017-10-13 Thread Matias Bjørling
From: Javier González Normalize the way we name ppa variables to improve code readability. Signed-off-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-core.c | 48 +++- 1 file changed, 25 insertions(+), 23 deletions(-) diff

[GIT PULL 24/58] lightnvm: pblk: refactor read lba sanity check

2017-10-13 Thread Matias Bjørling
From: Javier González Refactor lba sanity check on read path to avoid code duplication. Signed-off-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-read.c | 29 ++--- 1 file changed, 10 insertions(+), 19 deletions(-) diff --git a/drivers/li

[GIT PULL 25/58] lightnvm: pblk: simplify data validity check on GC

2017-10-13 Thread Matias Bjørling
From: Javier González When a line is selected for recycling by the garbage collector (GC), the line state changes and the invalid bitmap is frozen, preventing invalidations from happening. Throughout the GC, the L2P map is checked to verify that not data being recycled has been updated. The last

[GIT PULL 29/58] lightnvm: pblk: allocate bio size more accurately

2017-10-13 Thread Matias Bjørling
From: Javier González Wait until we know the exact number of ppas to be sent to the device, before allocating the bio. Signed-off-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-rb.c| 5 +++-- drivers/lightnvm/pblk-write.c | 20 ++-- drivers/l

[GIT PULL 28/58] lightnvm: pblk: simplify path on REQ_PREFLUSH

2017-10-13 Thread Matias Bjørling
From: Javier González On REQ_PREFLUSH, directly tag the I/O context flags to signal a flush in the write to cache path, instead of finding the correct entry context and imposing a memory barrier. This simplifies the code and might potentially prevent race conditions when adding functionality to t

[GIT PULL 27/58] lightnvm: pblk: put bio on bio completion

2017-10-13 Thread Matias Bjørling
From: Javier González Simplify put bio by doing it on bio end_io instead of manually putting it on the completion path. Signed-off-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-core.c | 10 +++--- drivers/lightnvm/pblk-read.c | 1 - drivers/lightnvm

[GIT PULL 30/58] lightnvm: pblk: improve naming for internal req.

2017-10-13 Thread Matias Bjørling
From: Javier González Each request type sent to the LightNVM subsystem requires different metadata. Until now, we have tailored this metadata based on write, read and erase commands. However, pblk uses different metadata for internal writes that do not hit the write buffer. Instead of abusing the

[GIT PULL 26/58] lightnvm: pblk: refactor read path on GC

2017-10-13 Thread Matias Bjørling
From: Javier González Simplify the part of the garbage collector where data is read from the line being recycled and moved into an internal queue before being copied to the memory buffer. This allows to get rid of a dedicated function, which introduces an unnecessary dependency on the code. Sign

[GIT PULL 32/58] lightnvm: pblk: use rqd->end_io for completion

2017-10-13 Thread Matias Bjørling
From: Javier González For consistency with the rest of pblk, use rqd->end_io to point to the function taking care of ending the request on the completion path. Signed-off-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-core.c | 7 --- drivers/lightnvm/pblk-rea

[GIT PULL 34/58] lightnvm: pblk: guarantee line integrity on reads

2017-10-13 Thread Matias Bjørling
From: Javier González When a line is recycled during garbage collection, reads can still be issued to the line. If the line is freed in the middle of this process, data corruption might occur. This patch guarantees that lines are not freed in the middle of reads that target them (lines). Specifi

[GIT PULL 31/58] lightnvm: pblk: refactor rqd alloc/free

2017-10-13 Thread Matias Bjørling
From: Javier González Refactor the rqd allocation and free functions so that all I/O types can use these helper functions. Signed-off-by: Javier González Signed-off-by: Matias Bjørling --- drivers/lightnvm/pblk-core.c | 40 ++-- drivers/lightnvm/pblk-re

[GIT PULL 36/58] lightnvm: pblk: remove I/O dependency on write path

2017-10-13 Thread Matias Bjørling
From: Javier González pblk schedules user I/O, metadata I/O and erases on the write path in order to minimize collisions at the media level. Until now, there has been a dependency between user and metadata I/Os that could lead to a deadlock as both take the per-LUN semaphore to schedule submissio

  1   2   >