Hi Coly--
We talked a bunch on IRC, so I'd just like to answer a couple of the
things again here for the benefit of everyone else / further
discussion.
On Mon, Oct 9, 2017 at 11:58 AM, Coly Li wrote:
> I do observe the perfect side of your patches, when I/O blocksize
> <=256kB, I
On 2017/10/7 上午2:36, Michael Lyle wrote:
> Sorry I missed this question:
>
>> Is it the time from writeback starts to dirty reaches dirty target, or
>> the time from writeback starts to dirty reaches 0 ?
>
> Not quite either. I monitor the machine with zabbix; it's the time to
> when the
On 10/06/2017 08:09 PM, Coly Li wrote:
> Hi Mike,
>
> On 2017/10/7 上午1:36, Michael Lyle wrote:
>> It's 240GB (half of a Samsung 850) in front of a 1TB 5400RPM disk.
>>
>
> Copied.
>
>> The size isn't critical. 1/8 is chosen to exceed 10% (default
>> writeback dirty data thresh), it might need
Sorry I missed this question:
> Is it the time from writeback starts to dirty reaches dirty target, or
> the time from writeback starts to dirty reaches 0 ?
Not quite either. I monitor the machine with zabbix; it's the time to
when the backing disk reaches its background rate of activity / when
On Fri, Oct 6, 2017 at 11:09 AM, Coly Li wrote:
> If I use a 1.8T hard disk as cached device, and 1TB SSD as cache device,
> and set fio to write 500G dirty data in total. Is this configuration
> close to the working set and cache size you suggested ?
I think it's quicker and
Hi Mike,
On 2017/10/7 上午1:36, Michael Lyle wrote:
> It's 240GB (half of a Samsung 850) in front of a 1TB 5400RPM disk.
>
Copied.
> The size isn't critical. 1/8 is chosen to exceed 10% (default
> writeback dirty data thresh), it might need to be 1/6 on really big
> environments. It needs to
It's 240GB (half of a Samsung 850) in front of a 1TB 5400RPM disk.
The size isn't critical. 1/8 is chosen to exceed 10% (default
writeback dirty data thresh), it might need to be 1/6 on really big
environments. It needs to be big enough that it takes more than 100
seconds to write back, but
On 2017/10/6 下午7:57, Michael Lyle wrote:
> OK, here's some data: http://jar.lyle.org/~mlyle/writeback/
>
> The complete test script is there to automate running writeback
> scenarios--- NOTE DONT RUN WITHOUT EDITING THE DEVICES FOR YOUR
> HARDWARE.
>
> Only one run each way, but they take 8-9
On 2017/10/6 下午6:42, Michael Lyle wrote:
> Coly--
>
> Holy crap, I'm not surprised you don't see a difference if you're
> writing with 512K size! The potential benefit from merging is much
> less, and the odds of missing a merge is much smaller. 512KB is 5ms
> sequential by itself on a
OK, here's some data: http://jar.lyle.org/~mlyle/writeback/
The complete test script is there to automate running writeback
scenarios--- NOTE DONT RUN WITHOUT EDITING THE DEVICES FOR YOUR
HARDWARE.
Only one run each way, but they take 8-9 minutes to run, we can easily
get more ;) I compared
Hannes--
Thanks for your input.
Assuming there's contiguous data to writeback, the dataset size is
immaterial; writeback gathers 500 extents from a btree, and writes
back up to 64 of them at a time. With 8k extents, the amount of data
the writeback code is juggling at a time is about 4
On 10/06/2017 12:42 PM, Michael Lyle wrote:
> Coly--
>
> Holy crap, I'm not surprised you don't see a difference if you're
> writing with 512K size! The potential benefit from merging is much
> less, and the odds of missing a merge is much smaller. 512KB is 5ms
> sequential by itself on a
I will write a test bench and send results soon.
Just please note-- you've crafted a test where there's not likely to
be sequential data to writeback, and chosen a block size where there
is limited difference between sequential and nonsequential writeback.
Not surprisingly, you don't see a real
Coly--
Holy crap, I'm not surprised you don't see a difference if you're
writing with 512K size! The potential benefit from merging is much
less, and the odds of missing a merge is much smaller. 512KB is 5ms
sequential by itself on a 100MB/sec disk--- lots more time to wait to
get the next
On 2017/10/6 下午5:20, Michael Lyle wrote:
> Coly--
>
> I did not say the result from the changes will be random.
>
> I said the result from your test will be random, because where the
> writeback position is making non-contiguous holes in the data is
> nondeterministic-- it depends where it is on
Coly--
I did not say the result from the changes will be random.
I said the result from your test will be random, because where the
writeback position is making non-contiguous holes in the data is
nondeterministic-- it depends where it is on the disk at the instant
that writeback begins. There
I think one of the problems here is that there is no consistent
requirements or figure of merit for performance.
You've argued against changes because of A) perceived impact to
front-end I/O latency, B) writeback rate at idle, C) peak
instantaneous writeback rate, D) time to writeback the entire
On 2017/10/6 上午1:53, Michael Lyle wrote:
> On Thu, Oct 5, 2017 at 10:38 AM, Coly Li wrote:
>> [snip]
>> In this test, without bio reorder patches, writeback throughput is
>> much faster, you may see the write request number and request merge
>> number are also much faster then bio
On Thu, Oct 5, 2017 at 10:38 AM, Coly Li wrote:
> [snip]
> In this test, without bio reorder patches, writeback throughput is
> much faster, you may see the write request number and request merge
> number are also much faster then bio reorder patches. After around 10
> minutes
On 2017/10/5 上午7:54, Michael Lyle wrote:
> Coly---
>
> Thanks for running these tests.
>
Hi Mike,
You provided very detailed information for the PI controller patch, make
me understand it better. As a return, I spend several days to test your
bio reorder patches, you are deserved :-)
> The
Coly---
Thanks for running these tests.
The change is expected to improve performance when the application
writes to many adjacent blocks, out of order. Many database workloads
are like this. My VM provisioning / installation / cleanup workloads
have a lot of this, too.
I believe that it
On 2017/10/2 上午1:34, Michael Lyle wrote:
> On Sun, Oct 1, 2017 at 10:23 AM, Coly Li wrote:
>> Hi Mike,
>>
>> Your data set is too small. Normally bcache users I talk with, they use
>> bcache for distributed storage cluster or commercial data base, their
>> catch device is large and
On Sun, Oct 1, 2017 at 10:23 AM, Coly Li wrote:
> Hi Mike,
>
> Your data set is too small. Normally bcache users I talk with, they use
> bcache for distributed storage cluster or commercial data base, their
> catch device is large and fast. It is possible we see different I/O
>
On 2017/10/2 上午12:56, Michael Lyle wrote:
> That's strange-- are you doing the same test scenario? How much
> random I/O did you ask for?
>
> My tests took 6-7 minutes to do the 30G of 8k not-repeating I/Os in a
> 30G file (about 9k IOPs for me-- it's actually significantly faster
> but then
That's strange-- are you doing the same test scenario? How much
random I/O did you ask for?
My tests took 6-7 minutes to do the 30G of 8k not-repeating I/Os in a
30G file (about 9k IOPs for me-- it's actually significantly faster
but then starves every few seconds-- not new with these patches)..
On 2017/10/1 上午6:49, Michael Lyle wrote:
> One final attempt to resend, because gmail has been giving me trouble
> sending plain text mail.
>
> Two instances of this. Tested as above, with a big set of random I/Os
> that ultimately cover every block in a file (e.g. allowing sequential
>
One final attempt to resend, because gmail has been giving me trouble
sending plain text mail.
Two instances of this. Tested as above, with a big set of random I/Os
that ultimately cover every block in a file (e.g. allowing sequential
writeback).
With the 5 patches, samsung 940 SSD cache +
Just one more note---
IO merging is not happening properly right now.
It's easy to get a case together where basically all the writeback is
sequential. E.g. if your writeback dirty data target is 15GB, do
something like:
$ sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
On Sat, Sep 30, 2017 at 1:03 AM, Coly Li wrote:
>>> If writeback_rate is not minimum value, it means there are front end
>>> write requests existing.
>>
>> This is wrong. Else we'd never make it back to the target.
>>
>
> Of cause we can :-) When dirty data is far beyond dirty
On 2017/9/30 下午3:13, Michael Lyle wrote:
> Coly--
>
>
> On Fri, Sep 29, 2017 at 11:58 PM, Coly Li wrote:
>> On 2017/9/30 上午11:17, Michael Lyle wrote:
>> [snip]
>>
>> If writeback_rate is not minimum value, it means there are front end
>> write requests existing.
>
> This is
Actually-- I give up.
You've initially bounced every single one of my patches, even the ones
that fix crash & data corruption bugs.
I spend 10x as much time fighting with you as writing stuff for
bcache, and basically every time it's turned out that you're wrong.
I will go do something else
On 2017/9/30 上午11:17, Michael Lyle wrote:
> Coly--
>
> What you say is correct-- it has a few changes from current behavior.
>
> - When writeback rate is low, it is more willing to do contiguous
> I/Os. This provides an opportunity for the IO scheduler to combine
> operations together. The
Coly--
What you say is correct-- it has a few changes from current behavior.
- When writeback rate is low, it is more willing to do contiguous
I/Os. This provides an opportunity for the IO scheduler to combine
operations together. The cost of doing 5 contiguous I/Os and 1 I/O is
usually about
On 2017/9/27 下午3:32, tang.jun...@zte.com.cn wrote:
> From: Tang Junhui
>
> Hello Mike:
>
> For the second question, I thinks this modification is somewhat complex,
> cannot we do something simple to resolve it? I remember there were some
> patches trying to avoid too
Ah. I will copy linux-block in the future, then. Thanks! -mpl
On Thu, Sep 28, 2017 at 9:22 PM, Coly Li wrote:
> Oh, it is because Christoph Hellwig suggested to Cc linux-block when
> posting bcache patches to linux-bcache list, then patches may have more
> eyes on them.
>
>
P.S. another concern with the current design is that for every backing
device, there's a separate writeback thread that can want to scan for
dirty blocks with very poor locking (starving other writes). With
several of these threads, there's no guarantee that they won't all
stack up and want to do
On 2017/9/29 下午12:15, Michael Lyle wrote:
[snip]
> Also, why do you keep copying linux-block for every bit of feedback on
> this code? It seems like we probably don't want to clutter that list
> with this discussion.
Oh, it is because Christoph Hellwig suggested to Cc linux-block when
posting
It's presently guaranteed, but it sure seems like a good idea to
confirm that the keys are actually contiguous on the same volume.
What's the harm with checking-- it prevents the code from being
delicate if things change. If anything, this should get factored to
util.h with the check in place.
From: Tang Junhui
Hello Mike:
> + if (KEY_INODE(>key) != KEY_INODE(>key))
> + return false;
Please remove these redundant codes, all the keys in dc->writeback_keys
have the same KEY_INODE. it is guaranted by refill_dirty().
Regards,
Tang Junhui
> Previously,
I just fixed the problem you pointed out, and ran a test where I did
sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
--name=test --filename=test --bs=8k --iodepth=256 --size=25G
--readwrite=randwrite --ramp_time=4
to generate a lot of 8k extents.
After letting things quiet
Tang--
This is a first step towards further stuff I want to do--
1. I want to allow blk_plug-- but to do that, a request needs to know
there are subsequent contiguous requests after it when issuing the
write. The new structure allows that.
2. I want to allow tuning to issue multiple requests
From: Tang Junhui
Hello Mike:
For the second question, I thinks this modification is somewhat complex,
cannot we do something simple to resolve it? I remember there were some
patches trying to avoid too small writeback rate, Coly, is there any
progress now?
---
42 matches
Mail list logo