Re: btrfs performance, sudden drop to 0 IOPs
P. Remek schrieb: >> Yes, it was implemented for the purpose of allowing an application to >> implement its own caching - probably for the sole purpose of doing it >> "better" or more efficient. But it simply does not work out that well, at >> least with COW fs. The original idea "performance" is more or less eaten >> away in a COW scenario - or worse. And that in turn is why Linus said >> O_DIRECT is broken and should go away, use cache hinting instead. > > Linus is saying to use things like madvise but the fact is that in > reality people are using O_DIRECT instead of it, so it is important to > get it right. Yeah, quite true - apparently... But as you already found, the O_DIRECT implementation of btrfs is probably not the culprit. > The case which I am interested is KVM. Virtual machine > disk file is opened with O_DIRECT so that when Virtual machine is > doing IO, it is not cached twice - first time on guest operating > system level, and second time on hypervisor host operating system > level. With O_DIRECT it is only cached in guest. In VirtualBox I enabled host-side caching on purpose and instead lowered the RAM. I don't know if VirtualBox does something like memory ballooning, but usually I'd expect ballooning to push cache out of RAM - so host-side caching may make sense. I never measured it but it feels a bit snappier to work inside the VirtualBox machine. Of course, recommendation depends on if you are using ballooning and VM density. In VirtualBox, this setting probably just turns off O_DIRECT. And my VM images are set to nocow. -- Replies to list only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
On Fri, Feb 13, 2015 at 02:06:27PM +0100, P. Remek wrote: > > I'd use a blktrace based tool like iowatcher or seekwatcher to see > > what's really happening on the performance drops. > > So I used this command to see if there are any outstanding requests in > the I/O scheduler queue when the performance drops to 0 IOPs > root@lab1:/# iostat -c -d -x -t -m /dev/sdi 1 1 > > The output is: > > Device: rrqm/s wrqm/s r/s w/srMB/swMB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > > sdi 0.00 0.000.000.00 0.00 0.00 > 0.00 0.000.000.000.00 0.00 0.00 > > "avgqu-sz" gives the queue length (1 second avarage). So really it > seems that the system is not stuck in the Block I/O layer but in upper > layer instead (most likely filesystem layer). > > I also created ext4 filesystem on another pair of disks - so I was > able to run simultaneous benchmark - one for ext4 and one for btrfs > (each having 4 SSDs assigned) and when btrfs went down to 0 IOPs the > ext4 fio benchmark kept generating high IOPs. > > I also tried to mount the system with nodatacow: > > /dev/sdi on /mnt/btrfs type btrfs (rw,nodatacow) > > It didn't help with the performance drops. It's just weird since 10s is too much for filesystems, I don't know what's happening and didn't have such an experience in my tests, Perhaps Try "perf record -a -g" to see magic. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
> We did benchmark Btrfs aio/dio performance before, we noticed one big > differences > from COW and nocow is not only checksum but checksum cost more metadata, > which will > make Btrfs performance drop suddenly for a while, because of metadata > reservation. I mounted the filesystem with nodatacow which sould also switch off the checksuming but it didn't help - sudden drops are still there. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
> Yes, it was implemented for the purpose of allowing an application to > implement its own caching - probably for the sole purpose of doing it > "better" or more efficient. But it simply does not work out that well, at > least with COW fs. The original idea "performance" is more or less eaten > away in a COW scenario - or worse. And that in turn is why Linus said > O_DIRECT is broken and should go away, use cache hinting instead. Linus is saying to use things like madvise but the fact is that in reality people are using O_DIRECT instead of it, so it is important to get it right. The case which I am interested is KVM. Virtual machine disk file is opened with O_DIRECT so that when Virtual machine is doing IO, it is not cached twice - first time on guest operating system level, and second time on hypervisor host operating system level. With O_DIRECT it is only cached in guest. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
> I'd definitely suggest using NOCOW for any file you are doing O_DIRECT with, > as you should see _much_ better performance that way, and also don't run the > (theoretical) risk of some of the same types of corruption that swapfiles on > BTRFS can cause. I mounted the filesystem with nodatacow as follows and it didn't help - it still drops to 0 IOPs every couple of seconds. /dev/sdi on /mnt/btrfs type btrfs (rw,nodatacow) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
> I'd use a blktrace based tool like iowatcher or seekwatcher to see > what's really happening on the performance drops. So I used this command to see if there are any outstanding requests in the I/O scheduler queue when the performance drops to 0 IOPs root@lab1:/# iostat -c -d -x -t -m /dev/sdi 1 1 The output is: Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 "avgqu-sz" gives the queue length (1 second avarage). So really it seems that the system is not stuck in the Block I/O layer but in upper layer instead (most likely filesystem layer). I also created ext4 filesystem on another pair of disks - so I was able to run simultaneous benchmark - one for ext4 and one for btrfs (each having 4 SSDs assigned) and when btrfs went down to 0 IOPs the ext4 fio benchmark kept generating high IOPs. I also tried to mount the system with nodatacow: /dev/sdi on /mnt/btrfs type btrfs (rw,nodatacow) It didn't help with the performance drops. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
Hello guys, > > On Thu, Feb 12, 2015 at 05:33:41AM +0100, Kai Krakow wrote: >> Duncan <1i5t5.dun...@cox.net> schrieb: >> >>> P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted: >>> In the test, I use --direct=1 parameter for fio which basically does O_DIRECT on target file. The O_DIRECT should guarantee that the filesystem cache is bypassed and IO is sent directly to the underlaying storage. Are you saying that btrfs buffers writes despite of O_DIRECT? >>> >>> I'm out of my (admin, no claims at developer) league on that. I see >>> someone else replied, and would defer to them on this. >> >> I don't think that O_DIRECT can work efficiently on COW filesystems. It >> probably has a negative effect and cannot be faster as normal access. Linus >> itself said one time that O_DIRECT is broken and should go away, and instead >> cache hinting should be used. >> >> Think of this: For the _unbuffered_ direct-io request to be fulfilled the >> file system has to go through its COW logic first which it otherwise had >> buffered and done in background. Bypassing the cache is probably only a >> side-effect of O_DIRECT, not its purpose. > > Hmm, not true in btrfs, the COW logic mentioned above is nothing but to > allocate > a NEW extent, and it's not done in background. > > Comparing to nocow logic, the main difference comes from > a) COW files' calculating checksums of the dirty data in DIO pages which > nocow files don't need to. > b) their endio handlers. > > Or am I missing something? We did benchmark Btrfs aio/dio performance before, we noticed one big differences from COW and nocow is not only checksum but checksum cost more metadata, which will make Btrfs performance drop suddenly for a while, because of metadata reservation. > > Thanks, > > -liubo >> >> At least I'd try with a nocow-file for the benchmark if you still have to >> use O_DIRECT. >> >> -- >> Replies to list only preferred. >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
On Thu, Feb 12, 2015 at 05:33:41AM +0100, Kai Krakow wrote: > Duncan <1i5t5.dun...@cox.net> schrieb: > > > P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted: > > > >> In the test, I use --direct=1 parameter for fio which basically does > >> O_DIRECT on target file. The O_DIRECT should guarantee that the > >> filesystem cache is bypassed and IO is sent directly to the underlaying > >> storage. Are you saying that btrfs buffers writes despite of O_DIRECT? > > > > I'm out of my (admin, no claims at developer) league on that. I see > > someone else replied, and would defer to them on this. > > I don't think that O_DIRECT can work efficiently on COW filesystems. It > probably has a negative effect and cannot be faster as normal access. Linus > itself said one time that O_DIRECT is broken and should go away, and instead > cache hinting should be used. > > Think of this: For the _unbuffered_ direct-io request to be fulfilled the > file system has to go through its COW logic first which it otherwise had > buffered and done in background. Bypassing the cache is probably only a > side-effect of O_DIRECT, not its purpose. Hmm, not true in btrfs, the COW logic mentioned above is nothing but to allocate a NEW extent, and it's not done in background. Comparing to nocow logic, the main difference comes from a) COW files' calculating checksums of the dirty data in DIO pages which nocow files don't need to. b) their endio handlers. Or am I missing something? Thanks, -liubo > > At least I'd try with a nocow-file for the benchmark if you still have to > use O_DIRECT. > > -- > Replies to list only preferred. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
Austin S Hemmelgarn schrieb: > On 2015-02-11 23:33, Kai Krakow wrote: >> Duncan <1i5t5.dun...@cox.net> schrieb: >> >>> P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted: >>> In the test, I use --direct=1 parameter for fio which basically does O_DIRECT on target file. The O_DIRECT should guarantee that the filesystem cache is bypassed and IO is sent directly to the underlaying storage. Are you saying that btrfs buffers writes despite of O_DIRECT? >>> >>> I'm out of my (admin, no claims at developer) league on that. I see >>> someone else replied, and would defer to them on this. >> >> I don't think that O_DIRECT can work efficiently on COW filesystems. It >> probably has a negative effect and cannot be faster as normal access. >> Linus itself said one time that O_DIRECT is broken and should go away, >> and instead cache hinting should be used. >> >> Think of this: For the _unbuffered_ direct-io request to be fulfilled the >> file system has to go through its COW logic first which it otherwise had >> buffered and done in background. Bypassing the cache is probably only a >> side-effect of O_DIRECT, not its purpose. > IIUC, the original purpose of O_DIRECT was to allow the application to > handle caching itself, instead of having the kernel do it. The issue is > that it is (again, IIUC) a hard requirement for AIO, which is a > performance booster for many use cases. Yes, it was implemented for the purpose of allowing an application to implement its own caching - probably for the sole purpose of doing it "better" or more efficient. But it simply does not work out that well, at least with COW fs. The original idea "performance" is more or less eaten away in a COW scenario - or worse. And that in turn is why Linus said O_DIRECT is broken and should go away, use cache hinting instead. >From that perspective, I concluded what I wrote: Bypassing the cache is only a side-effect. It didn't solve the problem the right way - it unintentionally solved something else. So, to alleviate the design flaw, you can only use it for its intended purpose on nocow-files (or nocow- filesystems). >> At least I'd try with a nocow-file for the benchmark if you still have to >> use O_DIRECT. >> > I'd definitely suggest using NOCOW for any file you are doing O_DIRECT > with, as you should see _much_ better performance that way, and also > don't run the (theoretical) risk of some of the same types of corruption > that swapfiles on BTRFS can cause. Dito. -- Replies to list only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
On 2015-02-11 23:33, Kai Krakow wrote: Duncan <1i5t5.dun...@cox.net> schrieb: P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted: In the test, I use --direct=1 parameter for fio which basically does O_DIRECT on target file. The O_DIRECT should guarantee that the filesystem cache is bypassed and IO is sent directly to the underlaying storage. Are you saying that btrfs buffers writes despite of O_DIRECT? I'm out of my (admin, no claims at developer) league on that. I see someone else replied, and would defer to them on this. I don't think that O_DIRECT can work efficiently on COW filesystems. It probably has a negative effect and cannot be faster as normal access. Linus itself said one time that O_DIRECT is broken and should go away, and instead cache hinting should be used. Think of this: For the _unbuffered_ direct-io request to be fulfilled the file system has to go through its COW logic first which it otherwise had buffered and done in background. Bypassing the cache is probably only a side-effect of O_DIRECT, not its purpose. IIUC, the original purpose of O_DIRECT was to allow the application to handle caching itself, instead of having the kernel do it. The issue is that it is (again, IIUC) a hard requirement for AIO, which is a performance booster for many use cases. At least I'd try with a nocow-file for the benchmark if you still have to use O_DIRECT. I'd definitely suggest using NOCOW for any file you are doing O_DIRECT with, as you should see _much_ better performance that way, and also don't run the (theoretical) risk of some of the same types of corruption that swapfiles on BTRFS can cause. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
On Mon, Feb 09, 2015 at 06:26:49PM +0100, P. Remek wrote: > Hello, > > I am benchmarking Btrfs and when benchmarking random writes with fio > utility, I noticed following two things: > > 1) On first run when target file doesn't exist yet, perfromance is > about 8000 IOPs. On second, and every other run, performance goes up > to 7 IOPs. Its massive difference. The target file is the one > created during the first run. I was doing similar tests in the last few days, well, the huge performance difference comes from AIO+DIO path, fs/direct-io.c: 1170 /* * For file extending writes updating i_size before data * writeouts * complete can expose uninitialized blocks in dumb filesystems. * In that case we need to wait for I/O completion even if asked * for an asynchronous write. */ if (is_sync_kiocb(iocb)) dio->is_async = false; else if (!(dio->flags & DIO_ASYNC_EXTEND) && (rw & WRITE) && end > i_size_read(inode)) dio->is_async = false; else dio->is_async = true; So you may like to play with fio's fallocate option, although it's 'posix' on default which should have set proper i_size for you, but I don't believe it unless I set it to. > > 2) There are windows during the test where IOPs drop to 0 and stay 0 > about 10 seconds and then it goes back again, and after couple of > seconds again to 0. This is reproducible 100% times. > > Can somobody shred some light on what's happening? > I'd use a blktrace based tool like iowatcher or seekwatcher to see what's really happening on the performance drops. > > Command: fio --randrepeat=1 --ioengine=libaio --direct=1 > --gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256 > --size=10G --numjobs=1 --readwrite=randwrite Since this is just a libaio-dio random write, I think it has nothing to do with progs side. Thanks, -liubo > > Environment: > CPU: dual socket: E5-2630 v2 >RAM: 32 GB ram >OS: Ubuntu server 14.10 >Kernel: 3.19.0-031900rc2-generic >btrfs tools: Btrfs v3.14.1 >2x LSI 9300 HBAs - SAS3 12/Gbs >8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs > > Regards, > Premek > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
Duncan <1i5t5.dun...@cox.net> schrieb: > P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted: > >> In the test, I use --direct=1 parameter for fio which basically does >> O_DIRECT on target file. The O_DIRECT should guarantee that the >> filesystem cache is bypassed and IO is sent directly to the underlaying >> storage. Are you saying that btrfs buffers writes despite of O_DIRECT? > > I'm out of my (admin, no claims at developer) league on that. I see > someone else replied, and would defer to them on this. I don't think that O_DIRECT can work efficiently on COW filesystems. It probably has a negative effect and cannot be faster as normal access. Linus itself said one time that O_DIRECT is broken and should go away, and instead cache hinting should be used. Think of this: For the _unbuffered_ direct-io request to be fulfilled the file system has to go through its COW logic first which it otherwise had buffered and done in background. Bypassing the cache is probably only a side-effect of O_DIRECT, not its purpose. At least I'd try with a nocow-file for the benchmark if you still have to use O_DIRECT. -- Replies to list only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted: > In the test, I use --direct=1 parameter for fio which basically does > O_DIRECT on target file. The O_DIRECT should guarantee that the > filesystem cache is bypassed and IO is sent directly to the underlaying > storage. Are you saying that btrfs buffers writes despite of O_DIRECT? I'm out of my (admin, no claims at developer) league on that. I see someone else replied, and would defer to them on this. >> Those 70K IOPs are all the extra work the filesystem is doing in >> ordered to track those 4 KiB COWed writes! > > This sounds like you are thinking that getting 70K IOPs is a bad thing > but I am testing performance which means higher IOPS = better result. In > other words, after second run when that target file already existed, the > performance improved significantly. Perhaps I'm wrong (I /did/ emphasize "suspect") here, but what I was suggesting was... Those higher IOPs are I believe fake, manufactured by the filesystem as a result of splitting up the few larger extents into many smaller extents due to COW-fragmentation. If I'm correct, the physical device and the below-filesystem-level kernel levels (where I expect your IOPs measure is sourced) are seeing this orders of magnitude increased number of IOPs due to breaking one original filesystem operation into perhaps hundreds of effectively random individual 4k block operations, but the actual thruput at the above-filesystem-level is reduced. There's certainly a potential in theory for such an effect on btrfs due to COWing rewrites and faced with those results, it is how I'd explain them in a rather hand-wavy not too low-level technical way. But if it doesn't match reality, then my understanding is insufficient and I'm wrong. Wouldn't be the first time. =:^P >> I suppose you're already aware that you're running a rather outdated >> userspace/btrfs-progs (what I assume you meant by tools). > > I was hoping that btrfs-progs doesn't have any influence on runtime > properties of the btrfs filesystem. As I am doing performance tests, I > hope that btrfs-progs version doesn't have any impact on the results. I was simply pointing out the mismatch, in case you intended to actually deploy, and potentially try to fix any problems with that old a userspace. As long as you're aware of the issue and won't be trying to btrfs check --repair or the like with that old userspace, for runtime testing, indeed, it shouldn't matter. So you're "hoping correctly". =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
On 2015-02-09 12:26, P. Remek wrote: Hello, I am benchmarking Btrfs and when benchmarking random writes with fio utility, I noticed following two things: Based on what I know about BTRFS, I think that these issues actually have distinct causes. 1) On first run when target file doesn't exist yet, perfromance is about 8000 IOPs. On second, and every other run, performance goes up to 7 IOPs. Its massive difference. The target file is the one created during the first run. I've noticed that almost always, file creation on BTRFS is slower than file re-writes. This seems to especially be the case when using AIO and/or O_DIRECT (although O_DIRECT on a COW filesystem is _really_ complicated to get right). I don't know that there is really any way currently to solve this, although it would be interesting to see if fallocat'ing the files prior to the initial run would have any significant performance impact. 2) There are windows during the test where IOPs drop to 0 and stay 0 about 10 seconds and then it goes back again, and after couple of seconds again to 0. This is reproducible 100% times. I've seen this same behavior on a number of filesystems (not just BTRFS) when using the default I/O scheduler with it's default parameters, especially on systems with high performance storage. IIRC, Ubuntu 13.10 switched from using the upstream default I/O scheduler (CFQ) to using the Deadline I/O scheduler because it has better performance (and is more deterministic) on most cheap commodity desktop/laptop hardware. I've found however that the Deadline scheduler actually tends to perform worse than CFQ when used on higher-end server systems and/or SSD's, although CFQ with default parameters only does marginally better. I'd suggest experimenting with some of the parameters under /sys/block (check the files in the Documentation/block directory of the Linux kernel sources for information about what (almost) everything there does). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
> What I /suspect/ is happening, is that at the 10 GiB files size, on > original file creation, btrfs is creating a large file of several > comparatively large extents (possibly 1 GiB each, the nominal data chunk > size, tho it can be larger on large enough filesystems). Note that btrfs > will normally wait to sync, accumulating further writes into the file > before actually writing it. By default it's 30 seconds, but there's a > mount option to change that. So btrfs is probably waiting, then writing > out all changes for the last 30 seconds at once, allowing it to use > fairly large extents when it does so. In the test, I use --direct=1 parameter for fio which basically does O_DIRECT on target file. The O_DIRECT should guarantee that the filesystem cache is bypassed and IO is sent directly to the underlaying storage. Are you saying that btrfs buffers writes despite of O_DIRECT? I also tried to mount the filesystem with commit parameter set to a) 1 second and b) 1000 seconds as follows: root@lab1:/# mount -o autodefrag,commit=1 /dev/mapper/prm-0 /mnt/vol1 It didn't chage the behavior - after about 30-40 second of running there is a drop to 0 IOPs and lasts about 20 seconds. > Those 70K IOPs are all the extra work the filesystem is doing in ordered > to track those 4 KiB COWed writes! This sounds like you are thinking that getting 70K IOPs is a bad thing but I am testing performance which means higher IOPS = better result. In other words, after second run when that target file already existed, the performance improved significantly. In the light of what you are saying it more looks like there is some higher overhead when allocating completely new block of data for the file compared to overhead with COW operation on already existing block of data. > I suppose you're already aware that you're running a rather outdated > userspace/btrfs-progs (what I assume you meant by tools). Userspace > versions sync with the kernel cycle, with a particular 3.x.0 version > typically being released a couple weeks after the kernel of the same > version, usually with a couple 3.x.y, y-update releases following before > the next kernel-synced x-version bump. I was hoping that btrfs-progs doesn't have any influence on runtime properties of the btrfs filesystem. As I am doing performance tests, I hope that btrfs-progs version doesn't have any impact on the results. Regards, P. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
P. Remek schrieb: > Not sure if it helps, but here is it: > > root@lab1:/mnt/vol1# btrfs filesystem df /mnt/vol1/ > Data, RAID10: total=116.00GiB, used=110.03GiB > Data, single: total=8.00MiB, used=0.00 > System, RAID1: total=8.00MiB, used=16.00KiB > System, single: total=4.00MiB, used=0.00 > Metadata, RAID1: total=2.00GiB, used=563.72MiB > Metadata, single: total=8.00MiB, used=0.00 > unknown, single: total=192.00MiB, used=0.00 This looks completely different to my output. Do you use the latest btrfs- progs? $ btrfs --version Btrfs v3.18.2 $ btrfs fi us / Overall: Device size: 2.71TiB Device allocated: 1.50TiB Device unallocated:1.21TiB Used: 1.37TiB Free (estimated): 1.33TiB (min: 745.87GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID0: Size:1.49TiB, Used:1.36TiB /dev/bcache0 507.00GiB /dev/bcache1 507.00GiB /dev/bcache2 507.00GiB Metadata,RAID1: Size:6.00GiB, Used:3.99GiB /dev/bcache04.00GiB /dev/bcache14.00GiB /dev/bcache24.00GiB System,RAID1: Size:32.00MiB, Used:100.00KiB /dev/bcache1 32.00MiB /dev/bcache2 32.00MiB Unallocated: /dev/bcache0 414.51GiB /dev/bcache1 414.48GiB /dev/bcache2 414.48GiB > On Mon, Feb 9, 2015 at 8:56 PM, Kai Krakow wrote: >> P. Remek schrieb: >> >>> Hello, >>> >>> I am benchmarking Btrfs and when benchmarking random writes with fio >>> utility, I noticed following two things: >>> >>> 1) On first run when target file doesn't exist yet, perfromance is >>> about 8000 IOPs. On second, and every other run, performance goes up >>> to 7 IOPs. Its massive difference. The target file is the one >>> created during the first run. >>> >>> 2) There are windows during the test where IOPs drop to 0 and stay 0 >>> about 10 seconds and then it goes back again, and after couple of >>> seconds again to 0. This is reproducible 100% times. >>> >>> Can somobody shred some light on what's happening? >> >> I'm not an expert or dev but it's probably due to btrfs doing some >> housekeeping under the hood. Could you check the output of "btrfs >> filesystem usage /mountpoint" while running the test? I'd guess there's >> some pressure on the global reserve during those times. >> >>> Command: fio --randrepeat=1 --ioengine=libaio --direct=1 >>> --gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256 >>> --size=10G --numjobs=1 --readwrite=randwrite >>> >>> Environment: >>> CPU: dual socket: E5-2630 v2 >>>RAM: 32 GB ram >>>OS: Ubuntu server 14.10 >>>Kernel: 3.19.0-031900rc2-generic >>>btrfs tools: Btrfs v3.14.1 >>>2x LSI 9300 HBAs - SAS3 12/Gbs >>>8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs >>> >>> Regards, >>> Premek >> >> -- >> Replies to list only preferred. >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Replies to list only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
P. Remek posted on Mon, 09 Feb 2015 18:26:49 +0100 as excerpted: > Hello, > > I am benchmarking Btrfs and when benchmarking random writes with fio > utility, I noticed following two things: > > 1) On first run when target file doesn't exist yet, perfromance is about > 8000 IOPs. On second, and every other run, performance goes up to 7 > IOPs. Its massive difference. The target file is the one created during > the first run. You say a file size of 10 GiB with a block size of 4 KiB, but don't say whether you're using the autodefrag mount option, or whether you had set nocow on the file at creation (generally done by setting it on the directory, so new files inherit the option, chattr +C). What I /suspect/ is happening, is that at the 10 GiB files size, on original file creation, btrfs is creating a large file of several comparatively large extents (possibly 1 GiB each, the nominal data chunk size, tho it can be larger on large enough filesystems). Note that btrfs will normally wait to sync, accumulating further writes into the file before actually writing it. By default it's 30 seconds, but there's a mount option to change that. So btrfs is probably waiting, then writing out all changes for the last 30 seconds at once, allowing it to use fairly large extents when it does so. Then when the file already exists,, keeping in mind that btrfs is COW (copy-on-write) and that by default it keeps two copies of metadata (dup on a single device, or one each on two separate devices, on a multi- device filesystem), one copy of data (single on a single device, I believe raid0 on multi-device), it's having to COW individual 4K blocks within the file as they are rewritten. This is going to massively fragment the file, driving up IOPs tremendously. On top of that, each time a data fragment is written, there's going to be two metadata updates due to the dup/raid1 metadata default, and while they won't be updated immediately, every commit (30 seconds), those metadata changes are going to replicate up the metadata tree to its root. So instead of having a few orderly GiB-ish size extents written, along with their metadata, as at file-create, now you're writing a new extent for each changed 4 KiB block, plus 2X metadata updates for each one, plus every commit, the updated metadata chain up to the root. Those 70K IOPs are all the extra work the filesystem is doing in ordered to track those 4 KiB COWed writes! The autodefrag option will likely increase this even further, as it doesn't prevent the COWs, but instead, queues up any files it detects as fragmented, for later cleanup via autodefrag worker thread. This is one reason this option isn't recommended for large (say quarter to half-gig- plus) heavy-internal-rewrite-pattern use-cases (typically VM images or large database files), tho it works quite well for files upto a couple hundred MiB or so (typical of firefox sqlite database files, etc), since those get rewritten pretty fast. The nocow file attribute can be used on these larger files, but it does have additional implications. Nocow turns off btrfs compression for that file, if you had it enabled (mount option), and also turns off checksumming. Turning off checksumming means btrfs will no longer detect file corruption, but many databases and vm tools have their own corruption detection and possibly correction schemes already, since they use them on filesystems such as ext* that don't have builtin checksumming, so turning off the btrfs checksumming and error detection for these files isn't as bad as it would otherwise seem, and in many cases prevents the filesystem duplicating work that the application is already doing. (Also, on btrfs, nocow must be set at file creation, when it is still zero-sized. As mentioned above, this is usually accomplished by setting it on the directory and letting new files and subdirs inherit the attribute.) But with the nocow file attribute properly applied, these random rewrites will be done in-place, no cascading fragmentation and metadata updates, and my guess is that you'll see the IOPs on existing nocow files reduce to something far more sane as a result. > 2) There are windows during the test where IOPs drop to 0 and stay 0 > about 10 seconds and then it goes back again, and after couple of > seconds again to 0. This is reproducible 100% times. I recall this periodic behavior coming up in at least one earlier thread as well, but I'm not a dev, just a btrfs user and list regular, and I don't recall what the explanation was, unless it was related to internal btrfs bookkeeping due to that 30-second commit cycle I mentioned above. But I'm guessing that if you properly set nocow on the file, you'll probably see this go away as well, since you won't be overwhelming btrfs and the hardware with IOPs any longer. Perhaps someone with a better understanding of the situation will jump in and explain this bit better than I c
Re: btrfs performance, sudden drop to 0 IOPs
Not sure if it helps, but here is it: root@lab1:/mnt/vol1# btrfs filesystem df /mnt/vol1/ Data, RAID10: total=116.00GiB, used=110.03GiB Data, single: total=8.00MiB, used=0.00 System, RAID1: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00 Metadata, RAID1: total=2.00GiB, used=563.72MiB Metadata, single: total=8.00MiB, used=0.00 unknown, single: total=192.00MiB, used=0.00 On Mon, Feb 9, 2015 at 8:56 PM, Kai Krakow wrote: > P. Remek schrieb: > >> Hello, >> >> I am benchmarking Btrfs and when benchmarking random writes with fio >> utility, I noticed following two things: >> >> 1) On first run when target file doesn't exist yet, perfromance is >> about 8000 IOPs. On second, and every other run, performance goes up >> to 7 IOPs. Its massive difference. The target file is the one >> created during the first run. >> >> 2) There are windows during the test where IOPs drop to 0 and stay 0 >> about 10 seconds and then it goes back again, and after couple of >> seconds again to 0. This is reproducible 100% times. >> >> Can somobody shred some light on what's happening? > > I'm not an expert or dev but it's probably due to btrfs doing some > housekeeping under the hood. Could you check the output of "btrfs filesystem > usage /mountpoint" while running the test? I'd guess there's some pressure > on the global reserve during those times. > >> Command: fio --randrepeat=1 --ioengine=libaio --direct=1 >> --gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256 >> --size=10G --numjobs=1 --readwrite=randwrite >> >> Environment: >> CPU: dual socket: E5-2630 v2 >>RAM: 32 GB ram >>OS: Ubuntu server 14.10 >>Kernel: 3.19.0-031900rc2-generic >>btrfs tools: Btrfs v3.14.1 >>2x LSI 9300 HBAs - SAS3 12/Gbs >>8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs >> >> Regards, >> Premek > > -- > Replies to list only preferred. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
P. Remek schrieb: > Hello, > > I am benchmarking Btrfs and when benchmarking random writes with fio > utility, I noticed following two things: > > 1) On first run when target file doesn't exist yet, perfromance is > about 8000 IOPs. On second, and every other run, performance goes up > to 7 IOPs. Its massive difference. The target file is the one > created during the first run. > > 2) There are windows during the test where IOPs drop to 0 and stay 0 > about 10 seconds and then it goes back again, and after couple of > seconds again to 0. This is reproducible 100% times. > > Can somobody shred some light on what's happening? I'm not an expert or dev but it's probably due to btrfs doing some housekeeping under the hood. Could you check the output of "btrfs filesystem usage /mountpoint" while running the test? I'd guess there's some pressure on the global reserve during those times. > Command: fio --randrepeat=1 --ioengine=libaio --direct=1 > --gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256 > --size=10G --numjobs=1 --readwrite=randwrite > > Environment: > CPU: dual socket: E5-2630 v2 >RAM: 32 GB ram >OS: Ubuntu server 14.10 >Kernel: 3.19.0-031900rc2-generic >btrfs tools: Btrfs v3.14.1 >2x LSI 9300 HBAs - SAS3 12/Gbs >8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs > > Regards, > Premek -- Replies to list only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs performance, sudden drop to 0 IOPs
Hello, I am benchmarking Btrfs and when benchmarking random writes with fio utility, I noticed following two things: 1) On first run when target file doesn't exist yet, perfromance is about 8000 IOPs. On second, and every other run, performance goes up to 7 IOPs. Its massive difference. The target file is the one created during the first run. 2) There are windows during the test where IOPs drop to 0 and stay 0 about 10 seconds and then it goes back again, and after couple of seconds again to 0. This is reproducible 100% times. Can somobody shred some light on what's happening? Command: fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test9 --filename=test9 --bs=4k --iodepth=256 --size=10G --numjobs=1 --readwrite=randwrite Environment: CPU: dual socket: E5-2630 v2 RAM: 32 GB ram OS: Ubuntu server 14.10 Kernel: 3.19.0-031900rc2-generic btrfs tools: Btrfs v3.14.1 2x LSI 9300 HBAs - SAS3 12/Gbs 8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs Regards, Premek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html