Re: Confused by performance
On 16/06/2010 21:35, Freddie Cash wrote: snip a lot of fancy math that missed the point That's all well and good, but you missed the part where he said ext2 on a 5-way LVM stripeset is many times faster than btrfs on a 5-way btrfs stripeset. IOW, same 5-way stripeset, different filesystems and volume managers, and very different performance. And he's wondering why the btrfs method used for striping is so much slower than the lvm method used for striping. This could easily be explained by Roberto's theory and maths - if the lvm stripe set used large stripe sizes so that the random reads were mostly read from a single disk, it would be fast. If the btrfs stripes were small, then it would be slow due to all the extra seeks. Do we know anything about the stripe sizes used? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Odd block-count behaviour
On Wed, Jun 16, 2010 at 01:47:30PM +0100, Hugo Mills wrote: Hi, I've just been copying large quantities of data from one btrfs volume to another, and while watching the progress of the copy, I've noticed something odd. $ mv source-dir dest-dir $ watch du -ms source-dir dest-dir This gives me a count of the size of the source and target directories, every 2 seconds. As expected, the size of the source dir stays constant, and the size of the destination directory increases. Except when it doesn't. Occasionally, while copying, the size of the dest-dir *drops* by several (tens of) megabytes. I'm not too worried about this, as it all seems to be copying the data OK, but it just seems a bit odd, and was wondering of there was a sane explanation for this behaviour. If the files are small, they can be packed into the metadata btree. But this doesn't happen until the file is actually written. So we start with a worst case estimate on the number of blocks the file will consume (4k) and then when it is actually written we update the metadata to reflect the number of blocks it is actually using (maybe 1 or 2). You can see this with a test: mkdir testdir cd testdir dd if=/dev/zero of=foo bs=512 count=1 du -k . sync du -k . -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mysteriously changing RAID levels
On Wed, Jun 16, 2010 at 08:12:03AM -0600, cwillu wrote: I just noticed today that btrfs fi df / no longer reports any raid level on Metadata or Data. I know as of Jun 4 that Data was RAID1 and Metadata was DUP (I had posted my df output to irc). I've already checked that I didn't revert to an old version of btrfs-progs, and am at a bit of a loss to explain it. I'm currently running 2.6.35rc1. Any ideas what could have caused this? Thanks for pointing this out, the ioctl needs to be updated. I'll get it fixed up. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Confused by performance
On Wed, Jun 16, 2010 at 11:08:48AM -0700, K. Richard Pixley wrote: Once again I'm stumped by some performance numbers and hoping for some insight. Using an 8-core server, building in parallel, I'm building some code. Using ext2 over a 5-way, (5 disk), lvm partition, I can build that code in 35 minutes. Tests with dd on the raw disk and lvm partitions show me that I'm getting near linear improvement from the raw stripe, even with dd runs exceeding 10G, so I think that convinces me that my disks and controller subsystem are capable of operating in parallel and in concert. hdparm -t numbers seem to support what I'm seeing from dd. Running the same build, same parallelism, over a btrfs (defaults) partition on a single drive, I'm seeing very consistent build times around an hour, which is reasonable. I get a little under an hour on ext4 single disk, again, very consistently. However, if I build a btrfs file system across the 5 disks, my build times decline to around 1.5 - 2hrs, although there's about a 30min variation between different runs. If I build a btrfs file system across the 5-way lvm stripe, I get even worse performance at around 2.5hrs per build, with about a 45min variation between runs. I can't explain these last two results. Any theories? I suspect they come down to different raid levels done by btrfs, and maybe barriers. By default btrfs will duplicate metadata, so ext2 is doing much less metadata IO than btrfs does. Try mkfs.btrfs -m raid0 -d raid0 /dev/xxx /dev/xxy ... Then try mount -o nobarrier /dev/xxx /mnt Someone else mentioned blktrace, it would help explain things if you're interested in tracking this down. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Still ENOSPC problems with 2.6.35-rc3
Yan, Zheng wrote (ao): what will happen if you keep deleting files using 2.6.35? From the list: Things you don't want your fs developer to say ;-) PS, I am a very happy btrfs user on several systems (including AMR based OpenRD-Client and SheevaPlug, and large 64bit servers), so no flame intended ;-) Sander -- Humilis IT Services and Solutions http://www.humilis.net -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Still ENOSPC problems with 2.6.35-rc3
Am Donnerstag 17 Juni 2010, 02:47:07 schrieb Yan, Zheng: On Thu, Jun 17, 2010 at 7:56 AM, Johannes Hirte johannes.hi...@fem.tu-ilmenau.de wrote: Am Donnerstag 17 Juni 2010, 01:12:54 schrieb Yan, Zheng: On Thu, Jun 17, 2010 at 1:48 AM, Johannes Hirte johannes.hi...@fem.tu-ilmenau.de wrote: With kernel-2.6.34 I run into the ENOSPC problems that where reported on this list recently. The filesystem was somewhat over 90% full and most operations on it caused a Oops. I was able to delete files by trial and error and freed up half of the filesystem space. Operation on the other files still caused an Oops. For 2.6.35 there went some patches in, that addressed this problem. Sadly they don't fix it but only avoid the Oops. A simple 'ls' on this filesystem results in To avoid ENOSPC oops, btrfs in 2.6.35 reserves more metadata space for system use than older btrfs. If the FS has already ran out of metadata space, using btrfs in 2.6.35 doesn't help. Yan, Zheng So how can this be fixed/avoided? There must be some free metadata space, since I was able to delete files, more than 20Gig, mostly small files. Also from my understanding, when freeing space by deleting files, metadata space should be freed. Or do I get something wrong here? 2.6.35 does change something, since I can delete more files, where 2.6.34 does Oops. But you're right, it doesn't help at all. So, where is this space and why it can't be used? what will happen if you keep deleting files using 2.6.35? With 2.6.35 I'm able to continue deleting files, even those where 2.6.34 would Oops. It's slow and give me many of this warnings in dmesg but they get deleted. I didn't tried it to the end, but I can do if you want. I've saved the affected filesystem to a separate partition, so I can test with it. regards, Johannes -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Complex filesystem operations: split and join
On Tuesday 15 June 2010 17:16:06 David Pottage wrote: On 15/06/10 11:41, Nikanth Karthikesan wrote: I had a one-off use-case, where I had no free-space, which made me think along this line. 1. We have the GNU split tool for example, which I guess, many of us use to split larger files to be transfered via smaller thumb drives, for example. We do cat many files into one, afterwards. [For this usecase, one can simply dd with seek and skip and avoid split/cat completely, but we dont.] I am not sure how you gain here as either way you have to do I/O to get the split files on and off the thumb drive. It might make sense if the thumb drive is formated with btrfs, and the file needs to be copied to another filling system that can't handle large files (eg FAT-16), but I would say that is unlikely. But you do have to do only half as much of I/O with those features implemented. The old way is: 1. Have a file 2. split a file (in effect use twice as much drive space) 3. copy fragments to flash disks The btrfs way would be: 1. Have a file 2. split the file by using COW and referencing blocks in the original file (in effect using only a little more space after splitting) 3. copy fragments to flash disks the amount of I/O in the second case is limited only to metadata operations, in the first case, all data must be duplicated -- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl System Zarządzania Jakością zgodny z normą ISO 9001:2000 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
xfstests: Failed 4 of 36 tests
I successfully ran check from xfstests on a 2.6.35 based system. I've patebin'd the test output as well as my dmesg. Note, there is a backtrace in the dmesg. output from check: http://pastebin.ubuntu.com/451219/ dmesg: http://pastebin.ubuntu.com/451222/ Feel free to let me know what other information I should have provided. Brad p.s. In the future is this the way the mailing-list would like logs handled? -- Brad Figg brad.f...@canonical.com http://www.canonical.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: hanging processes - race condition?
On Thu, Jun 17, 2010 at 09:41:18AM +0800, Shaohua Li wrote: On Mon, Jun 14, 2010 at 09:28:29PM +0800, Chris Mason wrote: On Sun, Jun 13, 2010 at 02:50:06PM +0800, Shaohua Li wrote: On Fri, Jun 11, 2010 at 10:32:07AM +0800, Yan, Zheng wrote: On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li shaohua...@intel.com wrote: On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes wrote: List, I ran into a hang issue (race condition: cpu is high when the server is idle, meaning that btrfs is hanging, and IOwait is high as well) running 2.6.34 on debian/lenny on a x86_64 server (dual Opteron 275 w/ 16GB ram). The btrfs filesystem live on 18x300GB scsi spindles, configured as Raid-0, as shown below: Label: none uuid: bc6442c6-2fe2-4236-a5aa-6b7841234c52 Total devices 18 FS bytes used 2.94TB devid 5 size 279.39GB used 208.33GB path /dev/cciss/c1d0 devid 17 size 279.39GB used 208.34GB path /dev/cciss/c1d8 devid 16 size 279.39GB used 209.33GB path /dev/cciss/c1d7 devid 4 size 279.39GB used 208.33GB path /dev/cciss/c0d4 devid 1 size 279.39GB used 233.72GB path /dev/cciss/c0d1 devid 13 size 279.39GB used 208.33GB path /dev/cciss/c1d4 devid 8 size 279.39GB used 208.33GB path /dev/cciss/c1d11 devid 12 size 279.39GB used 208.33GB path /dev/cciss/c1d3 devid 3 size 279.39GB used 208.33GB path /dev/cciss/c0d3 devid 9 size 279.39GB used 208.33GB path /dev/cciss/c1d12 devid 6 size 279.39GB used 208.33GB path /dev/cciss/c1d1 devid 11 size 279.39GB used 208.33GB path /dev/cciss/c1d2 devid 14 size 279.39GB used 208.33GB path /dev/cciss/c1d5 devid 2 size 279.39GB used 233.70GB path /dev/cciss/c0d2 devid 15 size 279.39GB used 209.33GB path /dev/cciss/c1d6 devid 10 size 279.39GB used 208.33GB path /dev/cciss/c1d13 devid 7 size 279.39GB used 208.33GB path /dev/cciss/c1d10 devid 18 size 279.39GB used 208.34GB path /dev/cciss/c1d9 Btrfs v0.19-16-g075587c-dirty The filesystem, mounted in /mnt/btrfs is hanging, no existing or new process can access it, however 'df' still displays the disk usage (3TB out of 5). The disks appear to be physically healthy. Please note that a significant number of files were placed on this filesystem, between 20 and 30 million files. The relevant kernel messages are displayed below: INFO: task btrfs-submit-0:4220 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-submit- D 00010042e12f 0 4220 2 0x 8803e584ac70 0046 4000 00011680 8803f7349fd8 8803f7349fd8 8803e584ac70 00011680 0001 8803ff99d250 8149f020 81150ab0 Call Trace: [813089f3] ? io_schedule+0x71/0xb1 [811470be] ? get_request_wait+0xab/0x140 [810406f4] ? autoremove_wake_function+0x0/0x2e [81143a4d] ? elv_rq_merge_ok+0x89/0x97 [8114a245] ? blk_recount_segments+0x17/0x27 [81147429] ? __make_request+0x2d6/0x3fc [81145b16] ? generic_make_request+0x207/0x268 [81145c12] ? submit_bio+0x9b/0xa2 [a01aa081] ? btrfs_requeue_work+0xd7/0xe1 [btrfs] [a01a5365] ? run_scheduled_bios+0x297/0x48f [btrfs] [a01aa687] ? worker_loop+0x17c/0x452 [btrfs] [a01aa50b] ? worker_loop+0x0/0x452 [btrfs] [81040331] ? kthread+0x79/0x81 [81003674] ? kernel_thread_helper+0x4/0x10 [810402b8] ? kthread+0x0/0x81 [81003670] ? kernel_thread_helper+0x0/0x10 This looks like the issue we saw too, http://lkml.org/lkml/2010/6/8/375. This is reproduceable in our setup. I think I know the cause of http://lkml.org/lkml/2010/6/8/375. The code in the first do-while loop in btrfs_commit_transaction set current process to TASK_UNINTERRUPTIBLE state, then calls btrfs_start_delalloc_inodes, btrfs_wait_ordered_extents and btrfs_run_ordered_operations(). All of these function may call cond_resched(). Hi, When I test random write, I saw a lot of threads jump into btree_writepages() and do noting and io throughput is zero for some time. Looks like there is a live lock. See the code of btree_writepages(): if (wbc-sync_mode == WB_SYNC_NONE) { struct btrfs_root *root = BTRFS_I(mapping-host)-root; u64 num_dirty; unsigned long thresh = 32 * 1024 * 1024; if (wbc-for_kupdate)