Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Michael Lyle
It's 240GB (half of a Samsung 850) in front of a 1TB 5400RPM disk. The size isn't critical. 1/8 is chosen to exceed 10% (default writeback dirty data thresh), it might need to be 1/6 on really big environments. It needs to be big enough that it takes more than 100 seconds to write back, but

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Michael Lyle
Sorry I missed this question: > Is it the time from writeback starts to dirty reaches dirty target, or > the time from writeback starts to dirty reaches 0 ? Not quite either. I monitor the machine with zabbix; it's the time to when the backing disk reaches its background rate of activity / when

[PATCH] block/bio: Remove null checks before mempool_destroy in bioset_free

2017-10-06 Thread Tim Hansen
This patch removes redundant checks for null values on bio_pool and bvec_pool. Found using make coccicheck M=block/ on linux-net tree on the next-20170929 tag. Related to patch 9987695 that removed similar checks in bio-integrity. Signed-off-by: Tim Hansen ---

[GIT PULL] Block fixes for 4.14-rc4

2017-10-06 Thread Jens Axboe
Hi Linus, A collection of fixes for this series. This pull request contains: - NVMe pull request from Christoph, one uuid attribute fix, and one fix for the controller memory buffer address for remapped BARs. - use-after-free fix for bsg, from Benjamin Block. - bcache race/use-after-free fix

fio-based responsiveness test for MMTests

2017-10-06 Thread Paolo Valente
Hi Mel, I have been thinking of our (sub)discussion, in [1], on possible tests to measure responsiveness. First let me sum up that discuss in terms of the two main facts that we highlighted. On one side, - it is actually possible to measure the start-up time of some popular applications

Re: [PATCH] block/bio: Remove null checks before mempool_destroy in bioset_free

2017-10-06 Thread Jens Axboe
On 10/06/2017 12:45 PM, Tim Hansen wrote: > This patch removes redundant checks for null values on bio_pool and bvec_pool. > > Found using make coccicheck M=block/ on linux-net tree on the next-20170929 > tag. > > Related to patch 9987695 that removed similar checks in bio-integrity. Applied,

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Michael Lyle
On Fri, Oct 6, 2017 at 11:09 AM, Coly Li wrote: > If I use a 1.8T hard disk as cached device, and 1TB SSD as cache device, > and set fio to write 500G dirty data in total. Is this configuration > close to the working set and cache size you suggested ? I think it's quicker and

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Coly Li
Hi Mike, On 2017/10/7 上午1:36, Michael Lyle wrote: > It's 240GB (half of a Samsung 850) in front of a 1TB 5400RPM disk. > Copied. > The size isn't critical. 1/8 is chosen to exceed 10% (default > writeback dirty data thresh), it might need to be 1/6 on really big > environments. It needs to

Re: [PATCH] block: remove unnecessary NULL checks in bioset_integrity_free()

2017-10-06 Thread Jens Axboe
On 10/05/2017 12:09 PM, Tim Hansen wrote: > mempool_destroy() already checks for a NULL value being passed in, this > eliminates duplicate checks. > > This was caught by running make coccicheck M=block/ on linus' tree on commit > 77ede3a014a32746002f7889211f0cecf4803163 (current head as of this

[PATCH 0/2] lightnvm: patches for 4.15 on core

2017-10-06 Thread Javier González
These are 2 patches on the subsystem for 4.15. The first one is a fix on the passtrhough path that enables fail fast, just as it is done on standard nvme passtrough commands. The second one implements a generic way to send sync. I/O from targets. Trough time, we ended up having _many_

Re: [PATCH] lightnvm: pblk: remove spinlock when freeing line metadata

2017-10-06 Thread Javier González
> On 6 Oct 2017, at 11.20, Andrey Ryabinin wrote: > > On 10/05/2017 11:35 AM, Hans Holmberg wrote: >> From: Hans Holmberg >> >> Lockdep complains about being in atomic context while freeing line >> metadata - and rightly so as we take a

Re: [PATCH v2] block/aoe: Convert timers to use timer_setup()

2017-10-06 Thread Jens Axboe
On 10/05/2017 05:13 PM, Kees Cook wrote: > In preparation for unconditionally passing the struct timer_list pointer to > all timer callbacks, switch to using the new timer_setup() and from_timer() > to pass the timer pointer explicitly. Applied to for-4.15/timer -- Jens Axboe

Re: [PATCH v2] block/laptop_mode: Convert timers to use timer_setup()

2017-10-06 Thread Jens Axboe
On 10/06/2017 02:20 AM, Christoph Hellwig wrote: >> -static void blk_rq_timed_out_timer(unsigned long data) >> +static void blk_rq_timed_out_timer(struct timer_list *t) >> { >> -struct request_queue *q = (struct request_queue *)data; >> +struct request_queue *q = from_timer(q, t,

[PATCH 2/2] lightnvm: implement generic path for sync I/O

2017-10-06 Thread Javier González
Implement a generic path for sending sync I/O on LightNVM. This allows to reuse the standard synchronous path trough blk_execute_rq(), instead of implementing a wait_for_completion on the target side (e.g., pblk). Signed-off-by: Javier González --- drivers/lightnvm/core.c

Re: [PATCH] block: remove unnecessary NULL checks in bioset_integrity_free()

2017-10-06 Thread Kyle Fortin
Hi Tim, On Oct 5, 2017, at 2:09 PM, Tim Hansen wrote: > > mempool_destroy() already checks for a NULL value being passed in, this > eliminates duplicate checks. > > This was caught by running make coccicheck M=block/ on linus' tree on commit >

Re: [PATCH] block: remove unnecessary NULL checks in bioset_integrity_free()

2017-10-06 Thread Tim Hansen
On Fri, Oct 06, 2017 at 01:04:25PM -0600, Jens Axboe wrote: > On 10/05/2017 12:09 PM, Tim Hansen wrote: > > mempool_destroy() already checks for a NULL value being passed in, this > > eliminates duplicate checks. > > > > This was caught by running make coccicheck M=block/ on linus' tree on > >

Re: [PATCH] block/bio: Remove null checks before mempool_destroy in bioset_free

2017-10-06 Thread Tim Hansen
On Fri, Oct 06, 2017 at 01:05:01PM -0600, Jens Axboe wrote: > On 10/06/2017 12:45 PM, Tim Hansen wrote: > > This patch removes redundant checks for null values on bio_pool and > > bvec_pool. > > > > Found using make coccicheck M=block/ on linux-net tree on the next-20170929 > > tag. > > > >

[ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-06 Thread Josef Bacik
Hello, One thing that comes up a lot every LSF is the fact that we have no general way that we do performance testing. Every fs developer has a set of scripts or things that they run with varying degrees of consistency, but nothing central that we all use. I for one am getting tired of finding

[PATCH V2 1/3] blk-stat: delete useless code

2017-10-06 Thread Shaohua Li
From: Shaohua Li Fix two issues: - the per-cpu stat flush is unnecessary, nobody uses per-cpu stat except sum it to global stat. We can do the calculation there. The flush just wastes cpu time. - some fields are signed int/s64. I don't see the point. Cc: Omar Sandoval

[PATCH V2 3/3] blockcg: export latency info for each cgroup

2017-10-06 Thread Shaohua Li
From: Shaohua Li Export the latency info to user. The latency is a good sign to indicate if IO is congested or not. User can use the info to make decisions like adjust cgroup settings. Existing io.stat shows accumulated IO bytes and requests, but accumulated value for latency

[PATCH V2 0/3] block: export latency info for cgroups

2017-10-06 Thread Shaohua Li
From: Shaohua Li Hi, latency info is a good sign to determine if IO is healthy. The patches export such info to cgroup io.stat. I sent the first patch separately before, but since the latter depends on it, I include it here. Thanks, Shaohua V1->V2: improve the scalability

[PATCH V2 2/3] block: set request_list for request

2017-10-06 Thread Shaohua Li
From: Shaohua Li Legacy queue sets request's request_list, mq doesn't. This makes mq does the same thing, so we can find cgroup of a request. Note, we really only use blkg field of request_list, it's pointless to allocate mempool for request_list in mq case. Signed-off-by: Shaohua

Re: [PATCH v6 9/9] block, scsi: Make SCSI device suspend and resume work reliably

2017-10-06 Thread Ming Lei
On Wed, Oct 04, 2017 at 05:01:10PM -0700, Bart Van Assche wrote: > It is essential during suspend and resume that neither the filesystem > state nor the filesystem metadata in RAM changes. This is why while > the hibernation image is being written or restored that SCSI devices quiesce isn't used

Re: Circular locking dependency with pblk

2017-10-06 Thread Javier González
> On 6 Oct 2017, at 01.36, Dave Chinner wrote: > > On Thu, Oct 05, 2017 at 12:53:50PM +0200, Javier González wrote: >> Hi, >> >> lockdep is reporting a circular dependency when using XFS and pblk, >> which I am a bit confused about. >> >> This happens when XFS sends a

Re: [PATCH V9 13/15] mmc: block: Add CQE and blk-mq support

2017-10-06 Thread Adrian Hunter
On 02/10/17 11:32, Ulf Hansson wrote: > On 22 September 2017 at 14:37, Adrian Hunter wrote: >> Add CQE support to the block driver, including: >> - optionally using DCMD for flush requests >> - "manually" issuing discard requests >> - issuing read / write

Re: [PATCH] lightnvm: pblk: remove spinlock when freeing line metadata

2017-10-06 Thread Andrey Ryabinin
On 10/05/2017 11:35 AM, Hans Holmberg wrote: > From: Hans Holmberg > > Lockdep complains about being in atomic context while freeing line > metadata - and rightly so as we take a spinlock and end up calling > vfree that might sleep(in pblk_mfree). > > There is no

Why removing REQ_FAILFAST_DRIVER in LightNVM?

2017-10-06 Thread Javier González
Hi Christoph, I'm cleaning up lightnvm.c to use as much as possible the nvme helpers. I see that in Commit: d49187e97e94 "nvme: introduce struct nvme_request" you introduced: rq->cmd_flags &= ~REQ_FAILFAST_DRIVER on the lightnvm I/O path and that has propagated through the code as we added

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Michael Lyle
Coly-- I did not say the result from the changes will be random. I said the result from your test will be random, because where the writeback position is making non-contiguous holes in the data is nondeterministic-- it depends where it is on the disk at the instant that writeback begins. There

[PATCH] block: add partition uuid into uevent as "PARTUUID"

2017-10-06 Thread Konstantin Khlebnikov
Both most common formats have uuid in addition to partition name: GPT: standard uuid ---- DOS: 4 byte disk signature and 1 byte partition -xx Tools from util-linux use the same notation for them. Signed-off-by: Konstantin Khlebnikov

[PATCH RFC] blk-throttle: add feedback to cgroup writeback about throttled writes

2017-10-06 Thread Konstantin Khlebnikov
Throttler steals bio before allocating requests for them, thus throttled writeback never reaches congestion. This adds bit WB_write_throttled into per-cgroup bdi congestion control. It's set when write bandwidth limit is exceeded and throttler has at least one bio inside and cleared when last

[PATCH 1/2] lightnvm: pblk: cleanup unused and static functions

2017-10-06 Thread Javier González
Cleanup up unused and static functions across the whole codebase. Signed-off-by: Javier González --- drivers/lightnvm/pblk-core.c | 133 --- drivers/lightnvm/pblk-gc.c | 40 ++--- drivers/lightnvm/pblk-rl.c | 10

[PATCH 2/2] lightnvm: pblk: avoid being reported as hung on rated GC

2017-10-06 Thread Javier González
The amount of GC I/O on the write buffer is managed by the rate-limiter, which is calculated as a function of the number of available free blocks. When reaching the stable point, we risk having scheduled more I/Os for GC than are allowed on the write buffer. This would result on the GC semaphore

[PATCH 0/2] lightnvm: pblk fixes

2017-10-06 Thread Javier González
Two small patches extra for 4.15 The first one is a general cleanup. The second one is an easy fix to avoid being reported as a hung task when GC is rate-limited Javier González (2): lightnvm: pblk: cleanup unused and static functions lightnvm: pblk: avoid being reported as hung on rated GC

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Michael Lyle
OK, here's some data: http://jar.lyle.org/~mlyle/writeback/ The complete test script is there to automate running writeback scenarios--- NOTE DONT RUN WITHOUT EDITING THE DEVICES FOR YOUR HARDWARE. Only one run each way, but they take 8-9 minutes to run, we can easily get more ;) I compared

Re: Why removing REQ_FAILFAST_DRIVER in LightNVM?

2017-10-06 Thread Christoph Hellwig
On Fri, Oct 06, 2017 at 11:19:09AM +0200, Javier González wrote: > on the lightnvm I/O path and that has propagated through the code as we > added more functionality. Can you explain why this is necessary? If I > can just remove it, it is much easier to do the cleanup. > > I have tested on or HW

Re: Why removing REQ_FAILFAST_DRIVER in LightNVM?

2017-10-06 Thread Javier González
> On 6 Oct 2017, at 13.59, Christoph Hellwig wrote: > > On Fri, Oct 06, 2017 at 11:19:09AM +0200, Javier González wrote: >> on the lightnvm I/O path and that has propagated through the code as we >> added more functionality. Can you explain why this is necessary? If I >> can

Re: [PATCH v2 2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-06 Thread Christoph Hellwig
On Thu, Oct 05, 2017 at 09:32:33PM +0200, Ilya Dryomov wrote: > This is to avoid returning -EREMOTEIO in the following case: device > doesn't support WRITE SAME but scsi_disk::max_ws_blocks != 0, zeroout > is called with BLKDEV_ZERO_NOFALLBACK. Enter blkdev_issue_zeroout(), >

Re: Why removing REQ_FAILFAST_DRIVER in LightNVM?

2017-10-06 Thread Christoph Hellwig
On Fri, Oct 06, 2017 at 02:01:46PM +0200, Javier González wrote: > I think it is good to fail fast as any other nvme I/O command and then > recover in pblk if necessary. Note that we only do it for other nvme _passthrough_ commands - the actual I/O commands dot not get the failfast flag.

Re: Why removing REQ_FAILFAST_DRIVER in LightNVM?

2017-10-06 Thread Javier González
> On 6 Oct 2017, at 14.06, Christoph Hellwig wrote: > > On Fri, Oct 06, 2017 at 02:01:46PM +0200, Javier González wrote: >> I think it is good to fail fast as any other nvme I/O command and then >> recover in pblk if necessary. > > Note that we only do it for other nvme

Re: Why removing REQ_FAILFAST_DRIVER in LightNVM?

2017-10-06 Thread Javier González
> On 6 Oct 2017, at 14.08, Javier González wrote: > >> On 6 Oct 2017, at 14.06, Christoph Hellwig wrote: >> >> On Fri, Oct 06, 2017 at 02:01:46PM +0200, Javier González wrote: >>> I think it is good to fail fast as any other nvme I/O command and then >>>

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Coly Li
On 2017/10/6 下午6:42, Michael Lyle wrote: > Coly-- > > Holy crap, I'm not surprised you don't see a difference if you're > writing with 512K size! The potential benefit from merging is much > less, and the odds of missing a merge is much smaller. 512KB is 5ms > sequential by itself on a

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Michael Lyle
I will write a test bench and send results soon. Just please note-- you've crafted a test where there's not likely to be sequential data to writeback, and chosen a block size where there is limited difference between sequential and nonsequential writeback. Not surprisingly, you don't see a real

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Hannes Reinecke
On 10/06/2017 12:42 PM, Michael Lyle wrote: > Coly-- > > Holy crap, I'm not surprised you don't see a difference if you're > writing with 512K size! The potential benefit from merging is much > less, and the odds of missing a merge is much smaller. 512KB is 5ms > sequential by itself on a

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Michael Lyle
Hannes-- Thanks for your input. Assuming there's contiguous data to writeback, the dataset size is immaterial; writeback gathers 500 extents from a btree, and writes back up to 64 of them at a time. With 8k extents, the amount of data the writeback code is juggling at a time is about 4

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Coly Li
On 2017/10/6 下午5:20, Michael Lyle wrote: > Coly-- > > I did not say the result from the changes will be random. > > I said the result from your test will be random, because where the > writeback position is making non-contiguous holes in the data is > nondeterministic-- it depends where it is on

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Michael Lyle
Coly-- Holy crap, I'm not surprised you don't see a difference if you're writing with 512K size! The potential benefit from merging is much less, and the odds of missing a merge is much smaller. 512KB is 5ms sequential by itself on a 100MB/sec disk--- lots more time to wait to get the next

Re: [PATCH v2] block/laptop_mode: Convert timers to use timer_setup()

2017-10-06 Thread Christoph Hellwig
> -static void blk_rq_timed_out_timer(unsigned long data) > +static void blk_rq_timed_out_timer(struct timer_list *t) > { > - struct request_queue *q = (struct request_queue *)data; > + struct request_queue *q = from_timer(q, t, timeout); > > kblockd_schedule_work(>timeout_work);

Re: [PATCH V3 3/3] block: don't print message for discard error

2017-10-06 Thread Ming Lei
On Wed, Oct 04, 2017 at 07:52:45AM -0700, Shaohua Li wrote: > From: Shaohua Li > > discard error isn't fatal, don't flood discard error messages. > > Suggested-by: Ming Lei > Signed-off-by: Shaohua Li Reviewed-by: Ming Lei

Re: [PATCH v2 2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-06 Thread Ilya Dryomov
On Fri, Oct 6, 2017 at 2:05 PM, Christoph Hellwig wrote: > On Thu, Oct 05, 2017 at 09:32:33PM +0200, Ilya Dryomov wrote: >> This is to avoid returning -EREMOTEIO in the following case: device >> doesn't support WRITE SAME but scsi_disk::max_ws_blocks != 0, zeroout >> is called

Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

2017-10-06 Thread Coly Li
On 2017/10/6 下午7:57, Michael Lyle wrote: > OK, here's some data: http://jar.lyle.org/~mlyle/writeback/ > > The complete test script is there to automate running writeback > scenarios--- NOTE DONT RUN WITHOUT EDITING THE DEVICES FOR YOUR > HARDWARE. > > Only one run each way, but they take 8-9

Re: [RFC 5/5] pm: remove kernel thread freezing

2017-10-06 Thread Theodore Ts'o
On Fri, Oct 06, 2017 at 02:07:13PM +0200, Pavel Machek wrote: > > Yeah, I was not careful enough reading cover letter. Having series > where 1-4/5 are ready to go, and 5/5 not-good-idea for years to come > is quite confusing. 4/5 is not ready to go either, at the very least