subject:"btrfs performance, sudden drop to 0 IOPs"

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-13 Thread Kai Krakow

P. Remek  schrieb:

>> Yes, it was implemented for the purpose of allowing an application to
>> implement its own caching - probably for the sole purpose of doing it
>> "better" or more efficient. But it simply does not work out that well, at
>> least with COW fs. The original idea "performance" is more or less eaten
>> away in a COW scenario - or worse. And that in turn is why Linus said
>> O_DIRECT is broken and should go away, use cache hinting instead.
> 
> Linus is saying to use things like madvise but the fact is that in
> reality people are using O_DIRECT instead of it, so it is important to
> get it right.

Yeah, quite true - apparently... But as you already found, the O_DIRECT 
implementation of btrfs is probably not the culprit.

> The case which I am interested is KVM. Virtual machine
> disk file is opened with O_DIRECT so that when Virtual machine is
> doing IO, it is not cached twice - first time on guest operating
> system level, and second time on hypervisor host operating system
> level. With O_DIRECT it is only cached in guest.

In VirtualBox I enabled host-side caching on purpose and instead lowered the 
RAM. I don't know if VirtualBox does something like memory ballooning, but 
usually I'd expect ballooning to push cache out of RAM - so host-side 
caching may make sense.

I never measured it but it feels a bit snappier to work inside the 
VirtualBox machine. Of course, recommendation depends on if you are using 
ballooning and VM density.

In VirtualBox, this setting probably just turns off O_DIRECT. And my VM 
images are set to nocow.

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-13 Thread Liu Bo

On Fri, Feb 13, 2015 at 02:06:27PM +0100, P. Remek wrote:
> > I'd use a blktrace based tool like iowatcher or seekwatcher to see
> > what's really happening on the performance drops.
> 
> So I used this command to see if there are any outstanding requests in
> the I/O scheduler queue when the performance drops to 0 IOPs
> root@lab1:/# iostat -c -d -x -t -m /dev/sdi 1 1
> 
> The output is:
> 
> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> 
> sdi   0.00 0.000.000.00 0.00 0.00
> 0.00 0.000.000.000.00   0.00   0.00
> 
> "avgqu-sz" gives the queue length (1 second avarage). So really it
> seems that the system is not stuck in the Block I/O layer but in upper
> layer instead (most likely filesystem layer).
> 
> I also created ext4 filesystem on another pair of disks - so I was
> able to run simultaneous benchmark - one for ext4 and one for btrfs
> (each having 4 SSDs assigned) and when btrfs went down to 0 IOPs the
> ext4 fio benchmark kept generating high IOPs.
> 
> I also tried to mount the system with nodatacow:
> 
> /dev/sdi on /mnt/btrfs type btrfs (rw,nodatacow)
> 
> It didn't help with the performance drops.

It's just weird since 10s is too much for filesystems, I don't know
what's happening and didn't have such an experience in my tests,

Perhaps Try "perf record -a -g" to see magic.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-13 Thread P. Remek

> We did benchmark Btrfs aio/dio performance before, we noticed one big 
> differences
> from COW and nocow is not only checksum but checksum cost more metadata, 
> which will
> make Btrfs performance drop suddenly for a while, because of metadata 
> reservation.

I mounted the filesystem with nodatacow which sould also switch off
the checksuming but it didn't help - sudden drops are still there.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-13 Thread P. Remek

> Yes, it was implemented for the purpose of allowing an application to
> implement its own caching - probably for the sole purpose of doing it
> "better" or more efficient. But it simply does not work out that well, at
> least with COW fs. The original idea "performance" is more or less eaten
> away in a COW scenario - or worse. And that in turn is why Linus said
> O_DIRECT is broken and should go away, use cache hinting instead.

Linus is saying to use things like madvise but the fact is that in
reality people are using O_DIRECT instead of it, so it is important to
get it right. The case which I am interested is KVM. Virtual machine
disk file is opened with O_DIRECT so that when Virtual machine is
doing IO, it is not cached twice - first time on guest operating
system level, and second time on hypervisor host operating system
level. With O_DIRECT it is only cached in guest.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-13 Thread P. Remek

> I'd definitely suggest using NOCOW for any file you are doing O_DIRECT with,
> as you should see _much_ better performance that way, and also don't run the
> (theoretical) risk of some of the same types of corruption that swapfiles on
> BTRFS can cause.

I mounted the filesystem with nodatacow as follows and it didn't help
- it still drops to 0 IOPs every couple of seconds.

/dev/sdi on /mnt/btrfs type btrfs (rw,nodatacow)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-13 Thread P. Remek

> I'd use a blktrace based tool like iowatcher or seekwatcher to see
> what's really happening on the performance drops.

So I used this command to see if there are any outstanding requests in
the I/O scheduler queue when the performance drops to 0 IOPs
root@lab1:/# iostat -c -d -x -t -m /dev/sdi 1 1

The output is:

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

sdi   0.00 0.000.000.00 0.00 0.00
0.00 0.000.000.000.00   0.00   0.00

"avgqu-sz" gives the queue length (1 second avarage). So really it
seems that the system is not stuck in the Block I/O layer but in upper
layer instead (most likely filesystem layer).

I also created ext4 filesystem on another pair of disks - so I was
able to run simultaneous benchmark - one for ext4 and one for btrfs
(each having 4 SSDs assigned) and when btrfs went down to 0 IOPs the
ext4 fio benchmark kept generating high IOPs.

I also tried to mount the system with nodatacow:

/dev/sdi on /mnt/btrfs type btrfs (rw,nodatacow)

It didn't help with the performance drops.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-12 Thread Wang Shilong

Hello guys,

> 
> On Thu, Feb 12, 2015 at 05:33:41AM +0100, Kai Krakow wrote:
>> Duncan <1i5t5.dun...@cox.net> schrieb:
>> 
>>> P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted:
>>> 
 In the test, I use --direct=1 parameter for fio which basically does
 O_DIRECT on target file. The O_DIRECT should guarantee that the
 filesystem cache is bypassed and IO is sent directly to the underlaying
 storage. Are you saying that btrfs buffers writes despite of O_DIRECT?
>>> 
>>> I'm out of my (admin, no claims at developer) league on that.  I see
>>> someone else replied, and would defer to them on this.
>> 
>> I don't think that O_DIRECT can work efficiently on COW filesystems. It 
>> probably has a negative effect and cannot be faster as normal access. Linus 
>> itself said one time that O_DIRECT is broken and should go away, and instead 
>> cache hinting should be used.
>> 
>> Think of this: For the _unbuffered_ direct-io request to be fulfilled the 
>> file system has to go through its COW logic first which it otherwise had 
>> buffered and done in background. Bypassing the cache is probably only a 
>> side-effect of O_DIRECT, not its purpose.
> 
> Hmm, not true in btrfs, the COW logic mentioned above is nothing but to 
> allocate
> a NEW extent, and it's not done in background.
> 
> Comparing to nocow logic, the main difference comes from 
> a) COW files' calculating checksums of the dirty data in DIO pages which 
> nocow files don't need to.
> b) their endio handlers.
> 
> Or am I missing something?


We did benchmark Btrfs aio/dio performance before, we noticed one big 
differences
from COW and nocow is not only checksum but checksum cost more metadata, which 
will
make Btrfs performance drop suddenly for a while, because of metadata 
reservation.


> 
> Thanks,
> 
> -liubo
>> 
>> At least I'd try with a nocow-file for the benchmark if you still have to 
>> use O_DIRECT.
>> 
>> -- 
>> Replies to list only preferred.
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-12 Thread Liu Bo

On Thu, Feb 12, 2015 at 05:33:41AM +0100, Kai Krakow wrote:
> Duncan <1i5t5.dun...@cox.net> schrieb:
> 
> > P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted:
> > 
> >> In the test, I use --direct=1 parameter for fio which basically does
> >> O_DIRECT on target file. The O_DIRECT should guarantee that the
> >> filesystem cache is bypassed and IO is sent directly to the underlaying
> >> storage. Are you saying that btrfs buffers writes despite of O_DIRECT?
> > 
> > I'm out of my (admin, no claims at developer) league on that.  I see
> > someone else replied, and would defer to them on this.
> 
> I don't think that O_DIRECT can work efficiently on COW filesystems. It 
> probably has a negative effect and cannot be faster as normal access. Linus 
> itself said one time that O_DIRECT is broken and should go away, and instead 
> cache hinting should be used.
> 
> Think of this: For the _unbuffered_ direct-io request to be fulfilled the 
> file system has to go through its COW logic first which it otherwise had 
> buffered and done in background. Bypassing the cache is probably only a 
> side-effect of O_DIRECT, not its purpose.

Hmm, not true in btrfs, the COW logic mentioned above is nothing but to allocate
a NEW extent, and it's not done in background.

Comparing to nocow logic, the main difference comes from 
a) COW files' calculating checksums of the dirty data in DIO pages which nocow 
files don't need to.
b) their endio handlers.

Or am I missing something?

Thanks,

-liubo
> 
> At least I'd try with a nocow-file for the benchmark if you still have to 
> use O_DIRECT.
> 
> -- 
> Replies to list only preferred.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-12 Thread Kai Krakow

Austin S Hemmelgarn  schrieb:

> On 2015-02-11 23:33, Kai Krakow wrote:
>> Duncan <1i5t5.dun...@cox.net> schrieb:
>>
>>> P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted:
>>>
 In the test, I use --direct=1 parameter for fio which basically does
 O_DIRECT on target file. The O_DIRECT should guarantee that the
 filesystem cache is bypassed and IO is sent directly to the underlaying
 storage. Are you saying that btrfs buffers writes despite of O_DIRECT?
>>>
>>> I'm out of my (admin, no claims at developer) league on that.  I see
>>> someone else replied, and would defer to them on this.
>>
>> I don't think that O_DIRECT can work efficiently on COW filesystems. It
>> probably has a negative effect and cannot be faster as normal access.
>> Linus itself said one time that O_DIRECT is broken and should go away,
>> and instead cache hinting should be used.
>>
>> Think of this: For the _unbuffered_ direct-io request to be fulfilled the
>> file system has to go through its COW logic first which it otherwise had
>> buffered and done in background. Bypassing the cache is probably only a
>> side-effect of O_DIRECT, not its purpose.
> IIUC, the original purpose of O_DIRECT was to allow the application to
> handle caching itself, instead of having the kernel do it.  The issue is
> that it is (again, IIUC) a hard requirement for AIO, which is a
> performance booster for many use cases.

Yes, it was implemented for the purpose of allowing an application to 
implement its own caching - probably for the sole purpose of doing it 
"better" or more efficient. But it simply does not work out that well, at 
least with COW fs. The original idea "performance" is more or less eaten 
away in a COW scenario - or worse. And that in turn is why Linus said 
O_DIRECT is broken and should go away, use cache hinting instead.

>From that perspective, I concluded what I wrote: Bypassing the cache is only 
a side-effect. It didn't solve the problem the right way - it 
unintentionally solved something else. So, to alleviate the design flaw, you 
can only use it for its intended purpose on nocow-files (or nocow-
filesystems).

>> At least I'd try with a nocow-file for the benchmark if you still have to
>> use O_DIRECT.
>>
> I'd definitely suggest using NOCOW for any file you are doing O_DIRECT
> with, as you should see _much_ better performance that way, and also
> don't run the (theoretical) risk of some of the same types of corruption
> that swapfiles on BTRFS can cause.

Dito.

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-12 Thread Austin S Hemmelgarn


On 2015-02-11 23:33, Kai Krakow wrote:

Duncan <1i5t5.dun...@cox.net> schrieb:


P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted:


In the test, I use --direct=1 parameter for fio which basically does
O_DIRECT on target file. The O_DIRECT should guarantee that the
filesystem cache is bypassed and IO is sent directly to the underlaying
storage. Are you saying that btrfs buffers writes despite of O_DIRECT?


I'm out of my (admin, no claims at developer) league on that.  I see
someone else replied, and would defer to them on this.


I don't think that O_DIRECT can work efficiently on COW filesystems. It
probably has a negative effect and cannot be faster as normal access. Linus
itself said one time that O_DIRECT is broken and should go away, and instead
cache hinting should be used.

Think of this: For the _unbuffered_ direct-io request to be fulfilled the
file system has to go through its COW logic first which it otherwise had
buffered and done in background. Bypassing the cache is probably only a
side-effect of O_DIRECT, not its purpose.
IIUC, the original purpose of O_DIRECT was to allow the application to 
handle caching itself, instead of having the kernel do it.  The issue is 
that it is (again, IIUC) a hard requirement for AIO, which is a 
performance booster for many use cases.


At least I'd try with a nocow-file for the benchmark if you still have to
use O_DIRECT.

I'd definitely suggest using NOCOW for any file you are doing O_DIRECT 
with, as you should see _much_ better performance that way, and also 
don't run the (theoretical) risk of some of the same types of corruption 
that swapfiles on BTRFS can cause.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-11 Thread Liu Bo

On Mon, Feb 09, 2015 at 06:26:49PM +0100, P. Remek wrote:
> Hello,
> 
> I am benchmarking Btrfs and when benchmarking random writes with fio
> utility, I noticed following two things:
> 
> 1) On first run when target file doesn't exist yet, perfromance is
> about 8000 IOPs. On second, and every other run, performance goes up
> to 7 IOPs. Its massive difference. The target file is the one
> created during the first run.

I was doing similar tests in the last few days, well, the huge performance 
difference comes from AIO+DIO path,

fs/direct-io.c: 1170
/*
 * For file extending writes updating i_size before data
 * writeouts
 * complete can expose uninitialized blocks in dumb filesystems.
 * In that case we need to wait for I/O completion even if asked
 * for an asynchronous write.
 */
if (is_sync_kiocb(iocb))
dio->is_async = false;
else if (!(dio->flags & DIO_ASYNC_EXTEND) &&
(rw & WRITE) && end > i_size_read(inode))
dio->is_async = false;
else
dio->is_async = true;

So you may like to play with fio's fallocate option, although it's 'posix' on 
default which should have set proper i_size for you, but I don't believe it 
unless I set it to.

> 
> 2) There are windows during the test where IOPs drop to 0 and stay 0
> about 10 seconds and then it goes back again, and after couple of
> seconds again to 0. This is reproducible 100% times.
> 
> Can somobody shred some light on what's happening?
> 

I'd use a blktrace based tool like iowatcher or seekwatcher to see
what's really happening on the performance drops.

> 
> Command: fio --randrepeat=1 --ioengine=libaio --direct=1
> --gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256
> --size=10G --numjobs=1 --readwrite=randwrite

Since this is just a libaio-dio random write, I think it has nothing to do with
progs side.

Thanks,

-liubo
> 
> Environment:
> CPU: dual socket: E5-2630 v2
>RAM: 32 GB ram
>OS: Ubuntu server 14.10
>Kernel: 3.19.0-031900rc2-generic
>btrfs tools: Btrfs v3.14.1
>2x LSI 9300 HBAs - SAS3 12/Gbs
>8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs
> 
> Regards,
> Premek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-11 Thread Kai Krakow

Duncan <1i5t5.dun...@cox.net> schrieb:

> P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted:
> 
>> In the test, I use --direct=1 parameter for fio which basically does
>> O_DIRECT on target file. The O_DIRECT should guarantee that the
>> filesystem cache is bypassed and IO is sent directly to the underlaying
>> storage. Are you saying that btrfs buffers writes despite of O_DIRECT?
> 
> I'm out of my (admin, no claims at developer) league on that.  I see
> someone else replied, and would defer to them on this.

I don't think that O_DIRECT can work efficiently on COW filesystems. It 
probably has a negative effect and cannot be faster as normal access. Linus 
itself said one time that O_DIRECT is broken and should go away, and instead 
cache hinting should be used.

Think of this: For the _unbuffered_ direct-io request to be fulfilled the 
file system has to go through its COW logic first which it otherwise had 
buffered and done in background. Bypassing the cache is probably only a 
side-effect of O_DIRECT, not its purpose.

At least I'd try with a nocow-file for the benchmark if you still have to 
use O_DIRECT.

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-11 Thread Duncan

P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted:

> In the test, I use --direct=1 parameter for fio which basically does
> O_DIRECT on target file. The O_DIRECT should guarantee that the
> filesystem cache is bypassed and IO is sent directly to the underlaying
> storage. Are you saying that btrfs buffers writes despite of O_DIRECT?

I'm out of my (admin, no claims at developer) league on that.  I see 
someone else replied, and would defer to them on this.

>> Those 70K IOPs are all the extra work the filesystem is doing in
>> ordered to track those 4 KiB COWed writes!
> 
> This sounds like you are thinking that getting 70K IOPs is a bad thing
> but I am testing performance which means higher IOPS = better result. In
> other words, after second run when that target file already existed, the
> performance improved significantly.

Perhaps I'm wrong (I /did/ emphasize "suspect") here, but what I was 
suggesting was...

Those higher IOPs are I believe fake, manufactured by the filesystem as a 
result of splitting up the few larger extents into many smaller extents 
due to COW-fragmentation.  If I'm correct, the physical device and the 
below-filesystem-level kernel levels (where I expect your IOPs measure is 
sourced) are seeing this orders of magnitude increased number of IOPs due 
to breaking one original filesystem operation into perhaps hundreds of 
effectively random individual 4k block operations, but the actual thruput 
at the above-filesystem-level is reduced.

There's certainly a potential in theory for such an effect on btrfs due 
to COWing rewrites and faced with those results, it is how I'd explain 
them in a rather hand-wavy not too low-level technical way.

But if it doesn't match reality, then my understanding is insufficient 
and I'm wrong.  Wouldn't be the first time. =:^P

>> I suppose you're already aware that you're running a rather outdated
>> userspace/btrfs-progs (what I assume you meant by tools).
> 
> I was hoping that btrfs-progs doesn't have any influence on runtime
> properties of the btrfs filesystem. As I am doing performance tests, I
> hope that btrfs-progs version doesn't have any impact on the results.

I was simply pointing out the mismatch, in case you intended to actually 
deploy, and potentially try to fix any problems with that old a 
userspace.  As long as you're aware of the issue and won't be trying to 
btrfs check --repair or the like with that old userspace, for runtime 
testing, indeed, it shouldn't matter.

So you're "hoping correctly". =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-11 Thread Austin S Hemmelgarn


On 2015-02-09 12:26, P. Remek wrote:

Hello,

I am benchmarking Btrfs and when benchmarking random writes with fio
utility, I noticed following two things:

Based on what I know about BTRFS, I think that these issues actually 
have distinct causes.

1) On first run when target file doesn't exist yet, perfromance is
about 8000 IOPs. On second, and every other run, performance goes up
to 7 IOPs. Its massive difference. The target file is the one
created during the first run.
I've noticed that almost always, file creation on BTRFS is slower than 
file re-writes.  This seems to especially be the case when using AIO 
and/or O_DIRECT (although O_DIRECT on a COW filesystem is _really_ 
complicated to get right).  I don't know that there is really any way 
currently to solve this, although it would be interesting to see if 
fallocat'ing the files prior to the initial run would have any 
significant performance impact.


2) There are windows during the test where IOPs drop to 0 and stay 0
about 10 seconds and then it goes back again, and after couple of
seconds again to 0. This is reproducible 100% times.
I've seen this same behavior on a number of filesystems (not just BTRFS) 
when using the default I/O scheduler with it's default parameters, 
especially on systems with high performance storage.  IIRC, Ubuntu 13.10 
switched from using the upstream default I/O scheduler (CFQ) to using 
the Deadline I/O scheduler because it has better performance (and is 
more deterministic) on most cheap commodity desktop/laptop hardware. 
I've found however that the Deadline scheduler actually tends to perform 
worse than CFQ when used on higher-end server systems and/or SSD's, 
although CFQ with default parameters only does marginally better.  I'd 
suggest experimenting with some of the parameters under /sys/block 
(check the files in the Documentation/block directory of the Linux 
kernel sources for information about what (almost) everything there does).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-10 Thread P. Remek

> What I /suspect/ is happening, is that at the 10 GiB files size, on
> original file creation, btrfs is creating a large file of several
> comparatively large extents (possibly 1 GiB each, the nominal data chunk
> size, tho it can be larger on large enough filesystems).  Note that btrfs
> will normally wait to sync, accumulating further writes into the file
> before actually writing it.  By default it's 30 seconds, but there's a
> mount option to change that.  So btrfs is probably waiting, then writing
> out all changes for the last 30 seconds at once, allowing it to use
> fairly large extents when it does so.

In the test, I use --direct=1 parameter for fio which basically does
O_DIRECT on target file. The O_DIRECT should guarantee that the
filesystem cache is bypassed and IO is sent directly to the
underlaying storage. Are you saying that btrfs buffers writes despite
of O_DIRECT?

I also tried to mount the filesystem with commit parameter set to a) 1
second and b) 1000 seconds as follows:

root@lab1:/# mount -o autodefrag,commit=1 /dev/mapper/prm-0 /mnt/vol1

It didn't chage the behavior - after about 30-40 second of running
there is a drop to 0 IOPs and lasts about 20 seconds.


> Those 70K IOPs are all the extra work the filesystem is doing in ordered
> to track those 4 KiB COWed writes!

This sounds like you are thinking that getting 70K IOPs is a bad thing
but I am testing performance which means higher IOPS = better result.
In other words, after second run when that target file already
existed, the performance improved significantly.

In the light of what you are saying it more looks like there is some
higher overhead when allocating completely new block of data for the
file compared to overhead with COW operation on already existing block
of data.



> I suppose you're already aware that you're running a rather outdated
> userspace/btrfs-progs (what I assume you meant by tools).  Userspace
> versions sync with the kernel cycle, with a particular 3.x.0 version
> typically being released a couple weeks after the kernel of the same
> version, usually with a couple 3.x.y, y-update releases following before
> the next kernel-synced x-version bump.

I was hoping that btrfs-progs doesn't have any influence on runtime
properties of the btrfs filesystem. As I am doing performance tests, I
hope that btrfs-progs version doesn't have any impact on the results.


Regards,
P.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-09 Thread Kai Krakow

P. Remek  schrieb:

> Not sure if it helps, but here is it:
> 
> root@lab1:/mnt/vol1# btrfs filesystem df /mnt/vol1/
> Data, RAID10: total=116.00GiB, used=110.03GiB
> Data, single: total=8.00MiB, used=0.00
> System, RAID1: total=8.00MiB, used=16.00KiB
> System, single: total=4.00MiB, used=0.00
> Metadata, RAID1: total=2.00GiB, used=563.72MiB
> Metadata, single: total=8.00MiB, used=0.00
> unknown, single: total=192.00MiB, used=0.00

This looks completely different to my output. Do you use the latest btrfs-
progs?

$ btrfs --version
Btrfs v3.18.2

$ btrfs fi us /
Overall:
Device size:   2.71TiB
Device allocated:  1.50TiB
Device unallocated:1.21TiB
Used:  1.37TiB
Free (estimated):  1.33TiB  (min: 745.87GiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID0: Size:1.49TiB, Used:1.36TiB
   /dev/bcache0  507.00GiB
   /dev/bcache1  507.00GiB
   /dev/bcache2  507.00GiB

Metadata,RAID1: Size:6.00GiB, Used:3.99GiB
   /dev/bcache04.00GiB
   /dev/bcache14.00GiB
   /dev/bcache24.00GiB

System,RAID1: Size:32.00MiB, Used:100.00KiB
   /dev/bcache1   32.00MiB
   /dev/bcache2   32.00MiB

Unallocated:
   /dev/bcache0  414.51GiB
   /dev/bcache1  414.48GiB
   /dev/bcache2  414.48GiB


> On Mon, Feb 9, 2015 at 8:56 PM, Kai Krakow  wrote:
>> P. Remek  schrieb:
>>
>>> Hello,
>>>
>>> I am benchmarking Btrfs and when benchmarking random writes with fio
>>> utility, I noticed following two things:
>>>
>>> 1) On first run when target file doesn't exist yet, perfromance is
>>> about 8000 IOPs. On second, and every other run, performance goes up
>>> to 7 IOPs. Its massive difference. The target file is the one
>>> created during the first run.
>>>
>>> 2) There are windows during the test where IOPs drop to 0 and stay 0
>>> about 10 seconds and then it goes back again, and after couple of
>>> seconds again to 0. This is reproducible 100% times.
>>>
>>> Can somobody shred some light on what's happening?
>>
>> I'm not an expert or dev but it's probably due to btrfs doing some
>> housekeeping under the hood. Could you check the output of "btrfs
>> filesystem usage /mountpoint" while running the test? I'd guess there's
>> some pressure on the global reserve during those times.
>>
>>> Command: fio --randrepeat=1 --ioengine=libaio --direct=1
>>> --gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256
>>> --size=10G --numjobs=1 --readwrite=randwrite
>>>
>>> Environment:
>>> CPU: dual socket: E5-2630 v2
>>>RAM: 32 GB ram
>>>OS: Ubuntu server 14.10
>>>Kernel: 3.19.0-031900rc2-generic
>>>btrfs tools: Btrfs v3.14.1
>>>2x LSI 9300 HBAs - SAS3 12/Gbs
>>>8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs
>>>
>>> Regards,
>>> Premek
>>
>> --
>> Replies to list only preferred.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-09 Thread Duncan

P. Remek posted on Mon, 09 Feb 2015 18:26:49 +0100 as excerpted:

> Hello,
> 
> I am benchmarking Btrfs and when benchmarking random writes with fio
> utility, I noticed following two things:
> 
> 1) On first run when target file doesn't exist yet, perfromance is about
> 8000 IOPs. On second, and every other run, performance goes up to 7
> IOPs. Its massive difference. The target file is the one created during
> the first run.

You say a file size of 10 GiB with a block size of 4 KiB, but don't say 
whether you're using the autodefrag mount option, or whether you had set 
nocow on the file at creation (generally done by setting it on the 
directory, so new files inherit the option, chattr +C).

What I /suspect/ is happening, is that at the 10 GiB files size, on 
original file creation, btrfs is creating a large file of several 
comparatively large extents (possibly 1 GiB each, the nominal data chunk 
size, tho it can be larger on large enough filesystems).  Note that btrfs 
will normally wait to sync, accumulating further writes into the file 
before actually writing it.  By default it's 30 seconds, but there's a 
mount option to change that.  So btrfs is probably waiting, then writing 
out all changes for the last 30 seconds at once, allowing it to use 
fairly large extents when it does so.

Then when the file already exists,, keeping in mind that btrfs is COW 
(copy-on-write) and that by default it keeps two copies of metadata (dup 
on a single device, or one each on two separate devices, on a multi-
device filesystem), one copy of data (single on a single device, I 
believe raid0 on multi-device), it's having to COW individual 4K blocks 
within the file as they are rewritten.

This is going to massively fragment the file, driving up IOPs 
tremendously.  On top of that, each time a data fragment is written, 
there's going to be two metadata updates due to the dup/raid1 metadata 
default, and while they won't be updated immediately, every commit (30 
seconds), those metadata changes are going to replicate up the metadata 
tree to its root.

So instead of having a few orderly GiB-ish size extents written, along 
with their metadata, as at file-create, now you're writing a new extent 
for each changed 4 KiB block, plus 2X metadata updates for each one, plus 
every commit, the updated metadata chain up to the root.

Those 70K IOPs are all the extra work the filesystem is doing in ordered 
to track those 4 KiB COWed writes!

The autodefrag option will likely increase this even further, as it 
doesn't prevent the COWs, but instead, queues up any files it detects as 
fragmented, for later cleanup via autodefrag worker thread.  This is one 
reason this option isn't recommended for large (say quarter to half-gig-
plus) heavy-internal-rewrite-pattern use-cases (typically VM images or 
large database files), tho it works quite well for files upto a couple 
hundred MiB or so (typical of firefox sqlite database files, etc), since 
those get rewritten pretty fast.

The nocow file attribute can be used on these larger files, but it does 
have additional implications.  Nocow turns off btrfs compression for that 
file, if you had it enabled (mount option), and also turns off 
checksumming.  Turning off checksumming means btrfs will no longer detect 
file corruption, but many databases and vm tools have their own 
corruption detection and possibly correction schemes already, since they 
use them on filesystems such as ext* that don't have builtin 
checksumming, so turning off the btrfs checksumming and error detection 
for these files isn't as bad as it would otherwise seem, and in many 
cases prevents the filesystem duplicating work that the application is 
already doing.  (Also, on btrfs, nocow must be set at file creation, when 
it is still zero-sized.  As mentioned above, this is usually accomplished 
by setting it on the directory and letting new files and subdirs inherit 
the attribute.)

But with the nocow file attribute properly applied, these random rewrites 
will be done in-place, no cascading fragmentation and metadata updates, 
and my guess is that you'll see the IOPs on existing nocow files reduce 
to something far more sane as a result.

> 2) There are windows during the test where IOPs drop to 0 and stay 0
> about 10 seconds and then it goes back again, and after couple of
> seconds again to 0. This is reproducible 100% times.

I recall this periodic behavior coming up in at least one earlier thread 
as well, but I'm not a dev, just a btrfs user and list regular, and I 
don't recall what the explanation was, unless it was related to internal 
btrfs bookkeeping due to that 30-second commit cycle I mentioned above.

But I'm guessing that if you properly set nocow on the file, you'll 
probably see this go away as well, since you won't be overwhelming btrfs 
and the hardware with IOPs any longer.

Perhaps someone with a better understanding of the situation will jump in 
and explain this bit better than I c

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-09 Thread P. Remek

Not sure if it helps, but here is it:

root@lab1:/mnt/vol1# btrfs filesystem df /mnt/vol1/
Data, RAID10: total=116.00GiB, used=110.03GiB
Data, single: total=8.00MiB, used=0.00
System, RAID1: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID1: total=2.00GiB, used=563.72MiB
Metadata, single: total=8.00MiB, used=0.00
unknown, single: total=192.00MiB, used=0.00

On Mon, Feb 9, 2015 at 8:56 PM, Kai Krakow  wrote:
> P. Remek  schrieb:
>
>> Hello,
>>
>> I am benchmarking Btrfs and when benchmarking random writes with fio
>> utility, I noticed following two things:
>>
>> 1) On first run when target file doesn't exist yet, perfromance is
>> about 8000 IOPs. On second, and every other run, performance goes up
>> to 7 IOPs. Its massive difference. The target file is the one
>> created during the first run.
>>
>> 2) There are windows during the test where IOPs drop to 0 and stay 0
>> about 10 seconds and then it goes back again, and after couple of
>> seconds again to 0. This is reproducible 100% times.
>>
>> Can somobody shred some light on what's happening?
>
> I'm not an expert or dev but it's probably due to btrfs doing some
> housekeeping under the hood. Could you check the output of "btrfs filesystem
> usage /mountpoint" while running the test? I'd guess there's some pressure
> on the global reserve during those times.
>
>> Command: fio --randrepeat=1 --ioengine=libaio --direct=1
>> --gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256
>> --size=10G --numjobs=1 --readwrite=randwrite
>>
>> Environment:
>> CPU: dual socket: E5-2630 v2
>>RAM: 32 GB ram
>>OS: Ubuntu server 14.10
>>Kernel: 3.19.0-031900rc2-generic
>>btrfs tools: Btrfs v3.14.1
>>2x LSI 9300 HBAs - SAS3 12/Gbs
>>8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs
>>
>> Regards,
>> Premek
>
> --
> Replies to list only preferred.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

2015-02-09 Thread Kai Krakow

P. Remek  schrieb:

> Hello,
> 
> I am benchmarking Btrfs and when benchmarking random writes with fio
> utility, I noticed following two things:
> 
> 1) On first run when target file doesn't exist yet, perfromance is
> about 8000 IOPs. On second, and every other run, performance goes up
> to 7 IOPs. Its massive difference. The target file is the one
> created during the first run.
> 
> 2) There are windows during the test where IOPs drop to 0 and stay 0
> about 10 seconds and then it goes back again, and after couple of
> seconds again to 0. This is reproducible 100% times.
> 
> Can somobody shred some light on what's happening?

I'm not an expert or dev but it's probably due to btrfs doing some 
housekeeping under the hood. Could you check the output of "btrfs filesystem 
usage /mountpoint" while running the test? I'd guess there's some pressure 
on the global reserve during those times.

> Command: fio --randrepeat=1 --ioengine=libaio --direct=1
> --gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256
> --size=10G --numjobs=1 --readwrite=randwrite
> 
> Environment:
> CPU: dual socket: E5-2630 v2
>RAM: 32 GB ram
>OS: Ubuntu server 14.10
>Kernel: 3.19.0-031900rc2-generic
>btrfs tools: Btrfs v3.14.1
>2x LSI 9300 HBAs - SAS3 12/Gbs
>8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs
> 
> Regards,
> Premek

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs performance, sudden drop to 0 IOPs

2015-02-09 Thread P. Remek

Hello,

I am benchmarking Btrfs and when benchmarking random writes with fio
utility, I noticed following two things:

1) On first run when target file doesn't exist yet, perfromance is
about 8000 IOPs. On second, and every other run, performance goes up
to 7 IOPs. Its massive difference. The target file is the one
created during the first run.

2) There are windows during the test where IOPs drop to 0 and stay 0
about 10 seconds and then it goes back again, and after couple of
seconds again to 0. This is reproducible 100% times.

Can somobody shred some light on what's happening?


Command: fio --randrepeat=1 --ioengine=libaio --direct=1
--gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256
--size=10G --numjobs=1 --readwrite=randwrite

Environment:
CPU: dual socket: E5-2630 v2
   RAM: 32 GB ram
   OS: Ubuntu server 14.10
   Kernel: 3.19.0-031900rc2-generic
   btrfs tools: Btrfs v3.14.1
   2x LSI 9300 HBAs - SAS3 12/Gbs
   8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs

Regards,
Premek
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

Re: btrfs performance, sudden drop to 0 IOPs

btrfs performance, sudden drop to 0 IOPs

20 matches

Site Navigation

Mail list logo

Footer information