It's 240GB (half of a Samsung 850) in front of a 1TB 5400RPM disk.
The size isn't critical. 1/8 is chosen to exceed 10% (default
writeback dirty data thresh), it might need to be 1/6 on really big
environments. It needs to be big enough that it takes more than 100
seconds to write back, but
Sorry I missed this question:
> Is it the time from writeback starts to dirty reaches dirty target, or
> the time from writeback starts to dirty reaches 0 ?
Not quite either. I monitor the machine with zabbix; it's the time to
when the backing disk reaches its background rate of activity / when
This patch removes redundant checks for null values on bio_pool and bvec_pool.
Found using make coccicheck M=block/ on linux-net tree on the next-20170929 tag.
Related to patch 9987695 that removed similar checks in bio-integrity.
Signed-off-by: Tim Hansen
---
Hi Linus,
A collection of fixes for this series. This pull request contains:
- NVMe pull request from Christoph, one uuid attribute fix, and one fix
for the controller memory buffer address for remapped BARs.
- use-after-free fix for bsg, from Benjamin Block.
- bcache race/use-after-free fix
Hi Mel,
I have been thinking of our (sub)discussion, in [1], on possible tests
to measure responsiveness.
First let me sum up that discuss in terms of the two main facts that
we highlighted.
On one side,
- it is actually possible to measure the start-up time of some popular
applications
On 10/06/2017 12:45 PM, Tim Hansen wrote:
> This patch removes redundant checks for null values on bio_pool and bvec_pool.
>
> Found using make coccicheck M=block/ on linux-net tree on the next-20170929
> tag.
>
> Related to patch 9987695 that removed similar checks in bio-integrity.
Applied,
On Fri, Oct 6, 2017 at 11:09 AM, Coly Li wrote:
> If I use a 1.8T hard disk as cached device, and 1TB SSD as cache device,
> and set fio to write 500G dirty data in total. Is this configuration
> close to the working set and cache size you suggested ?
I think it's quicker and
Hi Mike,
On 2017/10/7 上午1:36, Michael Lyle wrote:
> It's 240GB (half of a Samsung 850) in front of a 1TB 5400RPM disk.
>
Copied.
> The size isn't critical. 1/8 is chosen to exceed 10% (default
> writeback dirty data thresh), it might need to be 1/6 on really big
> environments. It needs to
On 10/05/2017 12:09 PM, Tim Hansen wrote:
> mempool_destroy() already checks for a NULL value being passed in, this
> eliminates duplicate checks.
>
> This was caught by running make coccicheck M=block/ on linus' tree on commit
> 77ede3a014a32746002f7889211f0cecf4803163 (current head as of this
These are 2 patches on the subsystem for 4.15.
The first one is a fix on the passtrhough path that enables fail fast,
just as it is done on standard nvme passtrough commands.
The second one implements a generic way to send sync. I/O from targets.
Trough time, we ended up having _many_
> On 6 Oct 2017, at 11.20, Andrey Ryabinin wrote:
>
> On 10/05/2017 11:35 AM, Hans Holmberg wrote:
>> From: Hans Holmberg
>>
>> Lockdep complains about being in atomic context while freeing line
>> metadata - and rightly so as we take a
On 10/05/2017 05:13 PM, Kees Cook wrote:
> In preparation for unconditionally passing the struct timer_list pointer to
> all timer callbacks, switch to using the new timer_setup() and from_timer()
> to pass the timer pointer explicitly.
Applied to for-4.15/timer
--
Jens Axboe
On 10/06/2017 02:20 AM, Christoph Hellwig wrote:
>> -static void blk_rq_timed_out_timer(unsigned long data)
>> +static void blk_rq_timed_out_timer(struct timer_list *t)
>> {
>> -struct request_queue *q = (struct request_queue *)data;
>> +struct request_queue *q = from_timer(q, t,
Implement a generic path for sending sync I/O on LightNVM. This allows
to reuse the standard synchronous path trough blk_execute_rq(), instead
of implementing a wait_for_completion on the target side (e.g., pblk).
Signed-off-by: Javier González
---
drivers/lightnvm/core.c
Hi Tim,
On Oct 5, 2017, at 2:09 PM, Tim Hansen wrote:
>
> mempool_destroy() already checks for a NULL value being passed in, this
> eliminates duplicate checks.
>
> This was caught by running make coccicheck M=block/ on linus' tree on commit
>
On Fri, Oct 06, 2017 at 01:04:25PM -0600, Jens Axboe wrote:
> On 10/05/2017 12:09 PM, Tim Hansen wrote:
> > mempool_destroy() already checks for a NULL value being passed in, this
> > eliminates duplicate checks.
> >
> > This was caught by running make coccicheck M=block/ on linus' tree on
> >
On Fri, Oct 06, 2017 at 01:05:01PM -0600, Jens Axboe wrote:
> On 10/06/2017 12:45 PM, Tim Hansen wrote:
> > This patch removes redundant checks for null values on bio_pool and
> > bvec_pool.
> >
> > Found using make coccicheck M=block/ on linux-net tree on the next-20170929
> > tag.
> >
> >
Hello,
One thing that comes up a lot every LSF is the fact that we have no general way
that we do performance testing. Every fs developer has a set of scripts or
things that they run with varying degrees of consistency, but nothing central
that we all use. I for one am getting tired of finding
From: Shaohua Li
Fix two issues:
- the per-cpu stat flush is unnecessary, nobody uses per-cpu stat except
sum it to global stat. We can do the calculation there. The flush just
wastes cpu time.
- some fields are signed int/s64. I don't see the point.
Cc: Omar Sandoval
From: Shaohua Li
Export the latency info to user. The latency is a good sign to indicate
if IO is congested or not. User can use the info to make decisions like
adjust cgroup settings.
Existing io.stat shows accumulated IO bytes and requests, but
accumulated value for latency
From: Shaohua Li
Hi,
latency info is a good sign to determine if IO is healthy. The patches export
such info to cgroup io.stat.
I sent the first patch separately before, but since the latter depends on it, I
include it here.
Thanks,
Shaohua
V1->V2: improve the scalability
From: Shaohua Li
Legacy queue sets request's request_list, mq doesn't. This makes mq does
the same thing, so we can find cgroup of a request. Note, we really
only use blkg field of request_list, it's pointless to allocate mempool
for request_list in mq case.
Signed-off-by: Shaohua
On Wed, Oct 04, 2017 at 05:01:10PM -0700, Bart Van Assche wrote:
> It is essential during suspend and resume that neither the filesystem
> state nor the filesystem metadata in RAM changes. This is why while
> the hibernation image is being written or restored that SCSI devices
quiesce isn't used
> On 6 Oct 2017, at 01.36, Dave Chinner wrote:
>
> On Thu, Oct 05, 2017 at 12:53:50PM +0200, Javier González wrote:
>> Hi,
>>
>> lockdep is reporting a circular dependency when using XFS and pblk,
>> which I am a bit confused about.
>>
>> This happens when XFS sends a
On 02/10/17 11:32, Ulf Hansson wrote:
> On 22 September 2017 at 14:37, Adrian Hunter wrote:
>> Add CQE support to the block driver, including:
>> - optionally using DCMD for flush requests
>> - "manually" issuing discard requests
>> - issuing read / write
On 10/05/2017 11:35 AM, Hans Holmberg wrote:
> From: Hans Holmberg
>
> Lockdep complains about being in atomic context while freeing line
> metadata - and rightly so as we take a spinlock and end up calling
> vfree that might sleep(in pblk_mfree).
>
> There is no
Hi Christoph,
I'm cleaning up lightnvm.c to use as much as possible the nvme helpers.
I see that in Commit: d49187e97e94 "nvme: introduce struct nvme_request"
you introduced:
rq->cmd_flags &= ~REQ_FAILFAST_DRIVER
on the lightnvm I/O path and that has propagated through the code as we
added
Coly--
I did not say the result from the changes will be random.
I said the result from your test will be random, because where the
writeback position is making non-contiguous holes in the data is
nondeterministic-- it depends where it is on the disk at the instant
that writeback begins. There
Both most common formats have uuid in addition to partition name:
GPT: standard uuid ----
DOS: 4 byte disk signature and 1 byte partition -xx
Tools from util-linux use the same notation for them.
Signed-off-by: Konstantin Khlebnikov
Throttler steals bio before allocating requests for them,
thus throttled writeback never reaches congestion.
This adds bit WB_write_throttled into per-cgroup bdi congestion control.
It's set when write bandwidth limit is exceeded and throttler has at least
one bio inside and cleared when last
Cleanup up unused and static functions across the whole codebase.
Signed-off-by: Javier González
---
drivers/lightnvm/pblk-core.c | 133 ---
drivers/lightnvm/pblk-gc.c | 40 ++---
drivers/lightnvm/pblk-rl.c | 10
The amount of GC I/O on the write buffer is managed by the rate-limiter,
which is calculated as a function of the number of available free
blocks. When reaching the stable point, we risk having scheduled more
I/Os for GC than are allowed on the write buffer. This would result on
the GC semaphore
Two small patches extra for 4.15
The first one is a general cleanup. The second one is an easy fix to
avoid being reported as a hung task when GC is rate-limited
Javier González (2):
lightnvm: pblk: cleanup unused and static functions
lightnvm: pblk: avoid being reported as hung on rated GC
OK, here's some data: http://jar.lyle.org/~mlyle/writeback/
The complete test script is there to automate running writeback
scenarios--- NOTE DONT RUN WITHOUT EDITING THE DEVICES FOR YOUR
HARDWARE.
Only one run each way, but they take 8-9 minutes to run, we can easily
get more ;) I compared
On Fri, Oct 06, 2017 at 11:19:09AM +0200, Javier González wrote:
> on the lightnvm I/O path and that has propagated through the code as we
> added more functionality. Can you explain why this is necessary? If I
> can just remove it, it is much easier to do the cleanup.
>
> I have tested on or HW
> On 6 Oct 2017, at 13.59, Christoph Hellwig wrote:
>
> On Fri, Oct 06, 2017 at 11:19:09AM +0200, Javier González wrote:
>> on the lightnvm I/O path and that has propagated through the code as we
>> added more functionality. Can you explain why this is necessary? If I
>> can
On Thu, Oct 05, 2017 at 09:32:33PM +0200, Ilya Dryomov wrote:
> This is to avoid returning -EREMOTEIO in the following case: device
> doesn't support WRITE SAME but scsi_disk::max_ws_blocks != 0, zeroout
> is called with BLKDEV_ZERO_NOFALLBACK. Enter blkdev_issue_zeroout(),
>
On Fri, Oct 06, 2017 at 02:01:46PM +0200, Javier González wrote:
> I think it is good to fail fast as any other nvme I/O command and then
> recover in pblk if necessary.
Note that we only do it for other nvme _passthrough_ commands - the
actual I/O commands dot not get the failfast flag.
> On 6 Oct 2017, at 14.06, Christoph Hellwig wrote:
>
> On Fri, Oct 06, 2017 at 02:01:46PM +0200, Javier González wrote:
>> I think it is good to fail fast as any other nvme I/O command and then
>> recover in pblk if necessary.
>
> Note that we only do it for other nvme
> On 6 Oct 2017, at 14.08, Javier González wrote:
>
>> On 6 Oct 2017, at 14.06, Christoph Hellwig wrote:
>>
>> On Fri, Oct 06, 2017 at 02:01:46PM +0200, Javier González wrote:
>>> I think it is good to fail fast as any other nvme I/O command and then
>>>
On 2017/10/6 下午6:42, Michael Lyle wrote:
> Coly--
>
> Holy crap, I'm not surprised you don't see a difference if you're
> writing with 512K size! The potential benefit from merging is much
> less, and the odds of missing a merge is much smaller. 512KB is 5ms
> sequential by itself on a
I will write a test bench and send results soon.
Just please note-- you've crafted a test where there's not likely to
be sequential data to writeback, and chosen a block size where there
is limited difference between sequential and nonsequential writeback.
Not surprisingly, you don't see a real
On 10/06/2017 12:42 PM, Michael Lyle wrote:
> Coly--
>
> Holy crap, I'm not surprised you don't see a difference if you're
> writing with 512K size! The potential benefit from merging is much
> less, and the odds of missing a merge is much smaller. 512KB is 5ms
> sequential by itself on a
Hannes--
Thanks for your input.
Assuming there's contiguous data to writeback, the dataset size is
immaterial; writeback gathers 500 extents from a btree, and writes
back up to 64 of them at a time. With 8k extents, the amount of data
the writeback code is juggling at a time is about 4
On 2017/10/6 下午5:20, Michael Lyle wrote:
> Coly--
>
> I did not say the result from the changes will be random.
>
> I said the result from your test will be random, because where the
> writeback position is making non-contiguous holes in the data is
> nondeterministic-- it depends where it is on
Coly--
Holy crap, I'm not surprised you don't see a difference if you're
writing with 512K size! The potential benefit from merging is much
less, and the odds of missing a merge is much smaller. 512KB is 5ms
sequential by itself on a 100MB/sec disk--- lots more time to wait to
get the next
> -static void blk_rq_timed_out_timer(unsigned long data)
> +static void blk_rq_timed_out_timer(struct timer_list *t)
> {
> - struct request_queue *q = (struct request_queue *)data;
> + struct request_queue *q = from_timer(q, t, timeout);
>
> kblockd_schedule_work(>timeout_work);
On Wed, Oct 04, 2017 at 07:52:45AM -0700, Shaohua Li wrote:
> From: Shaohua Li
>
> discard error isn't fatal, don't flood discard error messages.
>
> Suggested-by: Ming Lei
> Signed-off-by: Shaohua Li
Reviewed-by: Ming Lei
On Fri, Oct 6, 2017 at 2:05 PM, Christoph Hellwig wrote:
> On Thu, Oct 05, 2017 at 09:32:33PM +0200, Ilya Dryomov wrote:
>> This is to avoid returning -EREMOTEIO in the following case: device
>> doesn't support WRITE SAME but scsi_disk::max_ws_blocks != 0, zeroout
>> is called
On 2017/10/6 下午7:57, Michael Lyle wrote:
> OK, here's some data: http://jar.lyle.org/~mlyle/writeback/
>
> The complete test script is there to automate running writeback
> scenarios--- NOTE DONT RUN WITHOUT EDITING THE DEVICES FOR YOUR
> HARDWARE.
>
> Only one run each way, but they take 8-9
On Fri, Oct 06, 2017 at 02:07:13PM +0200, Pavel Machek wrote:
>
> Yeah, I was not careful enough reading cover letter. Having series
> where 1-4/5 are ready to go, and 5/5 not-good-idea for years to come
> is quite confusing.
4/5 is not ready to go either, at the very least
51 matches
Mail list logo