Re: Make existing snapshots read-only?
2012-05-28 12:37:00 -0600, Bruce Guenter: > > Is there any way to mark existing snapshots as read-only? Making new > ones read-only is easy enough, but what about existing ones? [...] you can always do btrfs sub snap -r vol vol-ro btrfs sub del vol mv vol-ro vol -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Cloning a Btrfs partition
2011-12-08, 10:49(-05), Phillip Susi: > On 12/7/2011 1:49 PM, BJ Quinn wrote: >> What I need isn't really an equivalent "zfs send" -- my script can do >> that. As I remember, zfs send was pretty slow too in a scenario like >> this. What I need is to be able to clone a btrfs array somehow -- dd >> would be nice, but as I said I end up with the identical UUID >> problem. Is there a way to change the UUID of an array? > > No, btrfs send is exactly what you need. Using dd is slow because it > copies unused blocks, and requires the source fs be unmounted. [...] Not necessarily, you can snapshot them (as in the method I suggested). If your FS is already on a device mapper device, you can even get away with not unmounting it (freeze, reload the device mapper table with a snapshot-origin one and thaw). > and the destination be an empty partition. rsync is slow > because it can't take advantage of the btrfs tree to quickly > locate the files (or parts of them) that have changed. A > btrfs send would solve all of these issues. [...] When you want to clone a FS using a similar device or set of devices, a tool like clone2fs or ntfsclone that copies only the used sectors across sequentially would probably be a lot more efficient as it copies the data at the max speed of the drive, seeking as little as possible. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Cloning a Btrfs partition
2011-12-07, 12:35(-06), BJ Quinn: > I've got a 6TB btrfs array (two 3TB drives in a RAID 0). It's > about 2/3 full and has lots of snapshots. I've written a > script that runs through the snapshots and copies the data > efficiently (rsync --inplace --no-whole-file) from the main > 6TB array to a backup array, creating snapshots on the backup > array and then continuing on copying the next snapshot. > Problem is, it looks like it will take weeks to finish. > > I've tried simply using dd to clone the btrfs partition, which > technically appears to work, but then it appears that the UUID > between the arrays is identical, so I can only mount one or > the other. This means I can't continue to simply update the > backup array with the new snapshots created on the main array > (my script is capable of "catching up" the backup array with > the new snapshots, but if I can't mount both arrays...). [...] You can mount them if you specify the devices upon mount. Here's a method to transfer a full FS to some other with different layout. In this example, we're transfering from a FS on a 3GB device (/dev/loop1) to a new FS on 2 2GB devices (/dev/loop2, /dev/loop3) truncate -s 3G a1 truncate -s 2G b1 b2 losetup /dev/loop1 a1 losetup /dev/loop2 b1 losetup /dev/loop2 b2 # our src FS on 1 disk: mkfs.btrfs /dev/loop1 mkdir A B mount /dev/loop1 A # now we can fill it up, create subvolumes and snapshots... # at this point, we decide to make a clone of it. To do that, we # will make a snapshot of the device. For that, we need # temporary storage as a block device. That could be a disk # (like a USB key) or a nbd to another host, or anything. Here, # I'm going to use a loop device to a file. You need enough # space to store any modification done on the src FS while # you're the transfer and what is needed to do the transfer # (I can't tell you much about that). truncate -s 100M sa losetup /dev/loop4 sa umount A size=$(blockdev --getsize /dev/loop1) echo 0 "$size" /dev/loop1) snapshot-origin /dev/loop1 | dmsetup create a echo 0 "$size" snapshot /dev/loop1 /dev/loop4 N 8 | dmsetup create aSnap # now we have /dev/mapper/a as the src device which we can # remount as such and use: mount /dev/mapper/a A # and aSnap as a writable snapshot of the src device, which we # mount separately: mount /dev/mapper/aSnap B # The trick here is that we're going to add the two new devices # to "B" and remove the snapshot one. btrfs will automatically # migrate the data to the new device: btrfs device add /dev/loop2 /dev/loop3 B btrfs device delete /dev/mapper/aSnap B # END Once that's completed, you should have a copy of A in B. You may want to watch the status of the snapshot while you're transfering to check that it doesn't get full That method can't be used to do some incremental "syncing" between two FS for which you'd still need something similar to "zfs send" (speaking of which, you may want to consider zfsonlinux which is now reaching a point where it's about as stable as btrfs, same performance level if not better and has a lot more features. I'm doing the switch myself while waiting for btrfs to be a bit more mature) Because of the same uuid, the btrfs commands like filesystem show will not always give sensible outputs. I tried to rename the fsid by changing it in the superblocks, but it looks like it is alsa included in a few other places where changing it manually breaks some checksums, so I guess someone would have to write a tool to do that job. I'm surprised it doesn't exist already (or maybe it does and I'm not aware of it?). -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: stripe alignment consideration for btrfs on RAID5
2011-11-23, 09:08(-08), Blair Zajac: > > On Nov 23, 2011, at 9:04 AM, Stephane CHAZELAS wrote: > >> Hiya, >> >> is there any recommendation out there to setup a btrfs FS on top >> of hardware or software raid5 or raid6 wrt stripe/stride alignment? > > Isn't the advantage of having btrfs do all the raiding itself > so one gets the checksums? If one puts btrfs on top of > software or hardware raid, then if there is a checksum error, > you don't have another copy of the data to fall back to. If > one uses btrfs' raid1 or above for data and metadata, then you > can suffer a checksum failure and get a good copy from another > drive? [...] Yes, but btrfs doesn't support raid5 yet and I have a limited number of drives I can connect to that system, and storage capacity is more important for me than the odd chance of corruptions of odd sectors (which can be mitigated by running regular RAID checks). Also, my tests of btrfs raid10 didn't indicate it was reliable enough yet (when a drive disappears and reappears, btrfs seems to get quite confused). -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
stripe alignment consideration for btrfs on RAID5
Hiya, is there any recommendation out there to setup a btrfs FS on top of hardware or software raid5 or raid6 wrt stripe/stride alignment? >From mkfs.btrfs, it doesn't look like there's much that can be adjusted that would help, and what I'm asking might not even make sense for btrfs but I thought I'd just ask. Thanks, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mounting btrfs FS on zfs zvol hangs
Hiya, yes, you'll probably think that is crazy, but after observing better performance with btrfs in some work loads on md RAID5 than btrfs builtin RAID10, I thought I'd try btrfs on zfs (in-kernel, not fuse) zvol (on raidz) just for a laugh. While this procedure worked for ext4 and xfs, for btrfs, the mount hangs suggesting there might be something wrong with btrfs and/or zfs. Here's what I'm doing: zpool create X raidz /dev/sd{a,b,c,d,e,f} zfs create -V 6T -o refreservation=0 X/Y mkfs.btrfs /dev/zvol/X/Y mount /dev/zvol/X/Y /mnt backtrace for mount: mount D 0009 0 2193 1761 0x 880401b4d9a8 0082 0001 880401b4dfd8 880401b4dfd8 880401b4dfd8 00012a40 8802092f 88040ef7dc80 880401b4d988 88041fa732c0 Call Trace: [] ? __lock_page+0x70/0x70 [] schedule+0x3f/0x60 [] io_schedule+0x8f/0xd0 [] sleep_on_page+0xe/0x20 [] __wait_on_bit+0x5f/0x90 [] wait_on_page_bit+0x78/0x80 [] ? autoremove_wake_function+0x40/0x40 [] read_extent_buffer_pages+0x3ca/0x430 [btrfs] [] ? btrfs_destroy_pinned_extent+0xb0/0xb0 [btrfs] [] btree_read_extent_buffer_pages.isra.62+0x8a/0xc0 [btrfs] [] read_tree_block+0x41/0x60 [btrfs] [] open_ctree+0xe75/0x1760 [btrfs] [] ? snprintf+0x34/0x40 [] btrfs_fill_super.isra.38+0x78/0x150 [btrfs] [] ? disk_name+0xba/0xc0 [] ? strlcpy+0x47/0x60 [] btrfs_mount+0x3c6/0x470 [btrfs] [] mount_fs+0x43/0x1b0 [] vfs_kern_mount+0x6a/0xc0 [] do_kern_mount+0x54/0x110 [] do_mount+0x1a4/0x260 [] sys_mount+0x90/0xe0 [] system_call_fastpath+0x16/0x1b 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:27:26 UTC 2011 x86_64 Best regards, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-delalloc - threaded?
2011-11-22, 09:47(-05), Chris Mason: > On Tue, Nov 22, 2011 at 02:30:07PM +0000, Stephane CHAZELAS wrote: >> 2011-09-6, 11:21(-05), Andrew Carlson: >> > I was doing some testing with writing out data to a BTFS filesystem >> > with the compress-force option. With 1 program running, I saw >> > btfs-delalloc taking about 1 CPU worth of time, much as could be >> > expected. I then started up 2 programs at the same time, writing data >> > to the BTRFS volume. btrfs-delalloc still only used 1 CPU worth of >> > time. Is btrfs-delalloc threaded, to where it can use more than 1 CPU >> > worth of time? Is there a threshold where it would start using more >> > CPU? >> [...] >> >> Hiya, >> >> I observe the same here. The bottleneck when writing data >> sequencially seems to be that btrfs-delalloc using 100% of the >> time of one CPU. > > The compression is spread out to multiple CPUs. Using zlib on my 4 cpu > box, I get 4 delalloc threads working on two concurrent dds. > > The thread hand off is based on the amount of work queued up to each > thread, and you're probably just below the threshold where it kicks off > another one. Are you using lzo or zlib? mounted with -o compress-force, so getting whatever the default compression algorithm is. > What is the workload you're using? We can make the compression code > more aggressive at fanning out. [...] That was a basic test of: head -c 40M /dev/urandom > a (while :; do cat a; done) | pv -rab > b (I expect the content of "a" to be cached in memory). Running "dstat -df" and top in parallel. Nothing else reading or writing to that FS. btrfs maxes out at about 150MB/s, and zfs at about 400MB/s. For the concurrent writing, replace pv with pv | tee b c d e > f (I suppose there's a fair chance of this incurring disk seeking, so reduced throughput is probably to be expected. I get the same kind of throughput (mayby 15% more) with zfs raid5 in that case). -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-delalloc - threaded?
2011-09-6, 11:21(-05), Andrew Carlson: > I was doing some testing with writing out data to a BTFS filesystem > with the compress-force option. With 1 program running, I saw > btfs-delalloc taking about 1 CPU worth of time, much as could be > expected. I then started up 2 programs at the same time, writing data > to the BTRFS volume. btrfs-delalloc still only used 1 CPU worth of > time. Is btrfs-delalloc threaded, to where it can use more than 1 CPU > worth of time? Is there a threshold where it would start using more > CPU? [...] Hiya, I observe the same here. The bottleneck when writing data sequencially seems to be that btrfs-delalloc using 100% of the time of one CPU. If I do several writes in parallel, a few more btrfs-delalloc's appear (3 when filling up 5 files concurrently), but btrfs-delalloc is still the bottleneck. Interestingly, if I write to 10 files simultanously, I see only two btrfs-delalloc and the throughput is lower. That's on ubuntu 11.10 3.0.0-13 amd64, 12 core, 16GB DDR3 1333MHz RAM. raid10 on 6 drives. Note that zfsonlinux does perform a lot better in that regard (on a raidz (ZFS raid5) on those same 6 drives): 50% CPU utilisation and max out the disk bandwidth. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status of raid10 reliability
2011-11-17 17:09:25 +, Stephane CHAZELAS: [...] > Before setting up a new RAID10 btrfs array with 6 drives, I > wanted to check how good it behaved in case of disk failure. > I've not been too impressed. Is RAID10 btrfs support only > meant for reading performance improvement? > > My test method was: > > Use the device-mapper to have devices mapped (linear) to loop > devices [...] > Then write some data, and then use DM's error target to simulate > a failing drive (all I/O ends up in error): > > # dmsetup suspend hd3; echo 0 $s error | dmsetup reload hd3; dmsetup resume > hd3 [...] Note that I did the same test with both md (raid10) and zfsonlinux (raidz) and it worked as expected. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
status of raid10 reliability
Hiya, Before setting up a new RAID10 btrfs array with 6 drives, I wanted to check how good it behaved in case of disk failure. I've not been too impressed. Is RAID10 btrfs support only meant for reading performance improvement? My test method was: Use the device-mapper to have devices mapped (linear) to loop devices zsh syntax: # l=({1..4}) # mv /dev/loop$^l . # truncate -s1T $l # s=$(blockdev --getsize /dev/loop1) # for f ($l) losetup loop$f $f # for f ($l) echo 0 $s linear loop$f 0 | dmsetup create hd$f # mkfs.btrfs -m raid10 -d raid10 /dev/mapper/hd$^l # d=(device=/dev/mapper/hd$^l) # mount -o ${(j:,:)d} /dev/mapper/hd1 /mnt/3 Then write some data, and then use DM's error target to simulate a failing drive (all I/O ends up in error): # dmsetup suspend hd3; echo 0 $s error | dmsetup reload hd3; dmsetup resume hd3 Then write some more data. The FS doesn't become degraded automatically. If I restore the drive: # echo 0 $s linear loop3 0 | dmsetup create hd3 More funny things occur of course as btrfs doesn't seem to have registered it being broken. If I do a scrub with the failing drive, it "BUGs ON": [13960.286464] [ cut here ] [13960.286484] kernel BUG at /home/blank/debian/kernel/release/linux-2.6/linux-2.6-3.1.0/debian/build/source_amd64_none/fs/btrfs/volumes.c:2891! [13960.286496] invalid opcode: [#1] SMP [13960.286507] CPU 0 [13960.286510] Modules linked in: vboxnetadp(O) vboxnetflt(O) vboxdrv(O) ip6table_filter ip6_tables ebtable_nat acpi_cpufreq mperf ebtables cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp parport_pc ppdev lp parport rfcomm bnep binfmt_misc uinput deflate ctr twofish_generic twofish_x86_64 twofish_common camellia serpent blowfish cast5 des_generic cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic hmac crypto_null af_key fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt coretemp loop kvm_intel kvm uvcvideo bcm5974 videodev media v4l2_compat_ioctl32 cryptd snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm aes_x86_64 aes_generic snd_seq_midi ecb btusb bluetooth rfkill snd_rawmidi nouveau ttm snd_seq_midi_event drm_kms_helper snd_seq drm i2c_algo_bit mxm_wmi snd_timer wmi snd_seq_device joydev video battery snd ac apple_bl power_supply soundcore snd_page_alloc applesmc pcspkr input_polldev i2c_i801 i2c_core evdev button processor thermal_sys ext4 mbcache jbd2 crc16 btrfs zlib_deflate crc32c libcrc32c raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod nbd dm_mirror dm_region_hash dm_log dm_mod sg sr_mod sd_mod cdrom crc_t10dif hid_apple usbhid hid ata_generic uhci_hcd firewire_ohci sata_sil24 ata_piix firewire_core crc_itu_t libata sky2 ehci_hcd scsi_mod usbcore [last unloaded: scsi_wait_scan] [13960.287012] [13960.287016] Pid: 12681, comm: btrfs-scrub-0 Tainted: GW O 3.1.0-1-amd64 #1 Apple Inc. MacBookPro4,1/Mac-F42C86C8 [13960.287037] RIP: 0010:[] [] __btrfs_map_block+0xfd/0x629 [btrfs] [13960.287061] RSP: 0018:880078c87cb0 EFLAGS: 00010282 [13960.287067] RAX: 0042 RBX: 880078c87d68 RCX: 54ee [13960.287076] RDX: RSI: 0046 RDI: 0246 [13960.287085] RBP: 8800378f6380 R08: 0002 R09: fffe [13960.287093] R10: 0001 R11: 88007e395c90 R12: [13960.287102] R13: 0008 R14: R15: 0001 [13960.287114] FS: () GS:88013fc0() knlGS: [13960.287126] CS: 0010 DS: ES: CR0: 8005003b [13960.287136] CR2: f76990a0 CR3: 000104319000 CR4: 06f0 [13960.287147] DR0: DR1: DR2: [13960.287159] DR3: DR6: 0ff0 DR7: 0400 [13960.287169] Process btrfs-scrub-0 (pid: 12681, threadinfo 880078c86000, task 880021698280) [13960.287184] Stack: [13960.287189] 88010001 0040 88008a1f40f8 [13960.287207] 88008a1f4100 880078c87d60 0001 0001 [13960.287223] 880078c87cf0 880109671000 880123416400 [13960.287242] Call Trace: [13960.287259] [] ? scrub_recheck_error+0x105/0x29b [btrfs] [13960.287280] [] ? scrub_checksum+0x75/0x372 [btrfs] [13960.287288] [] ? check_preempt_wakeup+0x122/0x18b [13960.287297] [] ? set_next_entity+0x32/0x52 [13960.287304] [] ? load_gs_index+0x7/0xa [13960.287312] [] ? __switch_to+0x15a/0x20e [13960.287331] [] ? worker_loop+0x16a/0x45d [btrfs] [13960.287341] [] ? __schedule+0x5ac/0x5c3 [13960.287360] [] ? btrfs_queue_worker+0x25b
ENOSPC on almost empty FS
Hiya, trying to restore a FS from a backup (tgz) on a freshly made btrfs this morning, I got ENOSPCs after about 100MB out of 4GB have been extracted. strace indicates that the ENOSPC are upon the open(O_WRONLY). Restoring with: mkfs.btrfs /dev/mapper/VG_USB-root mount -o compress-force,ssd $_ /mnt cd /mnt pv ~/backup.tgz | gunzip | sudo bsdtar -xpSf - --numeric-owner That's on a LVM LV with the PV on a USB key. If I supspend the job and resume it, then the ENOSPCs go away. The only way I could restore the backup was via rate limiting the untar: zcat ~/backup.tgz | pv -L 300 | sudo bsdtar -xpSf - --numeric-owner That 3MB/s wasn't even enough, as 3 files triggered a ENOSPC, but I did untar them separately afterwards. That's with debian's 3.0.0-1 amd64 kernel. Is that expected behavior due to the way allocation works in btrfs? -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to mount (or, why not to work late at night).
2011-10-28, 07:57(+07), Fajar A. Nugraha: [...] >> Already got 'em. Everything that tries to even think about modifying stuff >> (btrfs-zero-log, btrfsck, and btrfs-debug-tree) all dump core: > > Your last resort (for now, anyway) might be using "restore" from > Josef's btrfs-progs: https://github.com/josefbacik/btrfs-progs > > It might be able to copy some data. I also have got one FS in that same situation. I tried everything on it including that "restore" (which bailed out with those same error messages IIRC). The only thing that got me a bit further was to use an alternate superblock, though that screwed the FS even further as I need to reboot the machine after trying to mount it (mount hangs and there are some btrfs tasks using all the CPU time). Fortunately, for that one, I had a not too old backup at the block device level. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs fi defrag -c
2011-10-28, 10:25(+08), Li Zefan: [...] > # df . -h > FilesystemSize Used Avail Use% Mounted on > /home/lizf/tmp/a 2.0G 409M 1.4G 23% /mnt OK, why are we not gaining space after compression though? > And I was not suprised, as there's a regression. > > With this fix: > > http://marc.info/?l=linux-btrfs&m=131495014823121&w=2 [...] Thanks. That's the one that's scheduled for 3.2 and maybe 3.1.x, right? -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs fi defrag -c
I don't quite understand the behavior of "btrfs fi defrag" ~# truncate -s2G ~/a ~# mkfs.btrfs ~/a nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB ~# mount -o loop ~/a /mnt/1 /mnt/1# cd x /mnt/1# df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop1 2.0G 64K 1.8G 1% /mnt/1 /mnt/1# yes | head -c400M > a /mnt/1# df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop1 2.0G 64K 1.8G 1% /mnt/1 /mnt/1# sync /mnt/1# df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop1 2.0G 402M 1.4G 23% /mnt/1 /mnt/1# btrfs fi defrag -c a (exit status == 20 BTW). (20)/mnt/1# sync /mnt/1# df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop1 2.0G 415M 994M 30% /mnt/1 No space gain, even lost 15M or 400M depending on how you look at it. /mnt/1# btrfs fi defrag a (20)/mnt/1# sync /mnt/1# df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop1 2.0G 797M 612M 57% /mnt/1 Lost another 400M. /mnt/1# ls -l total 409600 -rw-r--r-- 1 root root 419430400 Oct 27 19:53 a /mnt/1# btrfs fi balance . /mnt/1# sync /mnt/1# df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop1 2.0G 798M 845M 49% /mnt/1 Possibly reclaimed some of the space? At the point where it says 612M free, if I do: /mnt/1# cat < /dev/zero > b cat: write error: No space left on device /mnt/1# ls -lh b -rw-r--r-- 1 root root 612M Oct 27 20:14 b There was indeed 612M free. When the FS is mounted with compress: ~# mkfs.btrfs ./a nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB ~# mount -o compress ./a /mnt/1 ~# cd /mnt/1 /mnt/1# yes | head -c400M > a /mnt/1# sync /mnt/1# df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop1 2.0G 14M 1.8G 1% /mnt/1 /mnt/1# btrfs fi defrag -c ./a (20)/mnt/1# sync /mnt/1# df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop1 2.0G 21M 1.4G 2% /mnt/1 Lost 400M? /mnt/1# btrfs fi defrag ./a (20)/mnt/1# sync /mnt/1# df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop1 2.0G 21M 1.4G 2% /mnt/1 I take it it doesn't uncompress? I'm a bit confused here. (that's with 3.0 amd64) -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
lseek hanging
This morning, I have a strange behavior when doing a "tail -f" on a log file. "cat log" runs successfully, but "tail -f log" hangs. Running a strace shows it hanging on lseek(3, 0, SEEK_CUR... 3 being the fd for that log file. In dmesg: [59881.520030] INFO: task btrfs-delalloc-:763 blocked for more than 120 seconds. [59881.527205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59881.535068] btrfs-delalloc- D 000100e2892b 0 763 2 0x [59881.542161] 88014738bc20 0046 4000 [59881.549673] 00012840 00012840 00012840 8801478c2580 [59881.557147] 00012840 88014738bfd8 00012840 00012840 [59881.568736] Call Trace: [59881.571219] [] wait_current_trans.clone.22+0xab/0xdc [btrfs] [59881.578589] [] ? wake_up_bit+0x2a/0x2a [59881.584012] [] ? _raw_spin_lock+0xe/0x10 [59881.589613] [] start_transaction+0xe3/0x231 [btrfs] [59881.596176] [] btrfs_join_transaction+0x15/0x17 [btrfs] [59881.603103] [] compress_file_range+0x297/0x515 [btrfs] [59881.609926] [] async_cow_start+0x35/0x4a [btrfs] [59881.616237] [] ? _raw_spin_lock_irq+0x1f/0x21 [59881.622277] [] worker_loop+0x19d/0x4cb [btrfs] [59881.628433] [] ? btrfs_queue_worker+0x27a/0x27a [btrfs] [59881.635330] [] kthread+0x82/0x8a [59881.640227] [] kernel_thread_helper+0x4/0x10 [59881.646160] [] ? kthread_worker_fn+0x14c/0x14c [59881.652293] [] ? gs_change+0x13/0x13 [59881.657555] INFO: task flush-btrfs-1:2675 blocked for more than 120 seconds. [59881.664617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59881.672477] flush-btrfs-1 D 000100e28931 0 2675 2 0x [59881.679628] 88013c559990 0046 8801 [59881.687118] 00012840 00012840 00012840 88013cf76140 [59881.694591] 00012840 88013c559fd8 00012840 00012840 [59881.702064] Call Trace: [59881.704536] [] ? lock_page+0x2f/0x2f [59881.709800] [] io_schedule+0x63/0x7e [59881.715042] [] sleep_on_page+0xe/0x12 [59881.720376] [] __wait_on_bit_lock+0x46/0x8f [59881.726241] [] ? pagevec_lru_move_fn+0xaa/0xc0 [59881.732372] [] __lock_page+0x66/0x6d [59881.737628] [] ? autoremove_wake_function+0x39/0x39 [59881.744173] [] ? should_resched+0xe/0x2e [59881.749779] [] lock_page+0x2a/0x2e [btrfs] [59881.755631] [] extent_write_cache_pages.clone.10.clone.17+0xba/0x28e [btrfs] [59881.764382] [] extent_writepages+0x47/0x5c [btrfs] [59881.770877] [] ? uncompress_inline.clone.32+0x119/0x119 [btrfs] [59881.778494] [] btrfs_writepages+0x27/0x29 [btrfs] [59881.784867] [] do_writepages+0x21/0x2a [59881.790302] [] writeback_single_inode+0xb5/0x1c6 [59881.796588] [] writeback_sb_inodes+0xbc/0x138 [59881.802683] [] writeback_inodes_wb+0x172/0x184 [59881.808795] [] wb_writeback+0x26c/0x3aa [59881.814297] [] wb_do_writeback+0x147/0x1a0 [59881.820081] [] ? schedule_timeout+0xb3/0xe3 [59881.825947] [] bdi_writeback_thread+0x8c/0x20f [59881.832056] [] ? wb_do_writeback+0x1a0/0x1a0 [59881.838062] [] kthread+0x82/0x8a [59881.842959] [] kernel_thread_helper+0x4/0x10 [59881.848896] [] ? kthread_worker_fn+0x14c/0x14c [59881.855005] [] ? gs_change+0x13/0x13 [59881.860267] INFO: task ntfsclone:2787 blocked for more than 120 seconds. [59881.866981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59881.874842] ntfsclone D 000100e28625 0 2787 2767 0x [59881.881929] 88013b421b88 0082 814044e0 [59881.889399] 00012840 00012840 00012840 88013bcba5c0 [59881.896895] 00012840 88013b421fd8 00012840 00012840 [59881.904388] Call Trace: [59881.906848] [] ? prepare_to_wait+0x76/0x81 [59881.912630] [] wait_current_trans.clone.22+0xab/0xdc [btrfs] [59881.919970] [] ? wake_up_bit+0x2a/0x2a [59881.925410] [] ? _raw_spin_lock+0xe/0x10 [59881.931014] [] start_transaction+0xe3/0x231 [btrfs] [59881.937572] [] btrfs_join_transaction+0x15/0x17 [btrfs] [59881.944479] [] btrfs_dirty_inode+0x2c/0x117 [btrfs] [59881.951051] [] __mark_inode_dirty+0x31/0x19e [59881.957069] [] ? mnt_clone_write+0x12/0x2a [59881.962867] [] file_update_time+0xed/0x111 [59881.968725] [] btrfs_file_aio_write+0x1a6/0x495 [btrfs] [59881.975657] [] do_sync_write+0xcb/0x108 [59881.981172] [] ? security_file_permission+0x2e/0x33 [59881.987743] [] vfs_write+0xac/0xff [59881.992814] [] sys_write+0x4a/0x6e [59881.997955] [] system_call_fastpath+0x16/0x1b [59882.003978] INFO: task tail:2789 blocked for more than 120 seconds. [59882.010257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59882.018137] tailD 000100e2ca63 0 2789 2604 0x0004 [59882.025225] 8801337c5e48 0082 8801337c5e28 8801 [59882.032698] 0001
Re: how stable are snapshots at the block level?
2011-10-25, 07:46(-04), Edward Ned Harvey: [...] > My suggestion to the OP of this thread is to use rsync for now, wait for > btrfs send, or switch to zfs. [...] rsync won't work if you've got snapshot volumes though (unless you're prepared to have a backup copy thousands of times the size of the original or have a framework in place to replicate the snapshots on the backup copy as soon as they are created (but before they're being written to)). To backup a btrfs FS with snapshots, the only option seems to be to copy the block devices for now (or the other trick mentionned earlier). -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how stable are snapshots at the block level?
2011-10-24, 09:59(-04), Edward Ned Harvey: [...] > If you are reading the raw device underneath btrfs, you are > not getting the benefit of the filesystem checksumming. If > you encounter an undetected read/write error, it will silently > pass. Your data will be corrupted, you'll never know about it > until you see the side-effects (whatever they may be). [...] I don't follow you here. If you're cloning a device holding a btrfs FS, you'll clone the checksums as well. If there were errors, they will be detected on the cloned FS as well? > There is never a situation where block level copies have any > advantage over something like btrfs send. Except perhaps > forensics or espionage. But in terms of fast efficient > reliable backups, btrfs send has every advantage and no > disadvantage compared to block level copy. $ btrfs send ERROR: unknown command 'send' Usage: [...] (from 2011-10-12 integration branch). Am I missing something? > There are many situations where btrfs send has an advantage > over both block level and file level copies. It instantly > knows all the relevant disk blocks to send, it preserves every > property, it's agnostic about filesystem size or layout on > either sending or receiving end, you have the option to create > different configurations on each side, including compression > etc. And so on. [...] That sounds like "zfs send", I didn't know btrfs had it yet. My understanding was that to clone/backup a btrfs FS, you could only clone the block devices or use the "device add" + "device del" trick with some extra copy-on-write (LVM, nbd) layer. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how stable are snapshots at the block level?
2011-10-23, 17:19(+02), Mathijs Kwik: [...] > For this case (my laptop) I can stick to file-based rsync, but I think > some guarantees should exist at the block level. Many virtual machines > and cloud hosting services (like ec2) provide block-level snapshots. > With xfs, I can freeze the filesystem for a short amount of time > (<100ms), snapshot, unfreeze. I don't think such a lock/freeze feature > exists for btrfs [...] That FS-freeze feature has been moved to the vfs layer so is available to any filesystem now. You can either use xfs_io (see -F option to "freeze" for foreign FS) like for xfs FS or use fsfreeze from util-linux. Note that you can thaw file systems with a sysrq combination now. (for instance with xen using "xm sysrq vm j"). For block level snapshots, see also ddsnap (device mapper target unfortunately no longer maintained) and lvm of course (but doesn't scale well with several snapshots). -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS thinks that a device is mounted
2011-10-21, 00:39(+03), Nikos Voutsinas: [...] > ## Comment: Of course /dev/sdh is not mounted. > mount |grep /dev/sdh > root@lxc:~# [...] Note that mount(8) uses /etc/mtab to find out what is mounted. And if that file is not a symlink to /proc/mounts, the information is not necessarily correct. You can also have a look at /proc/mounts directly. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: recursive subvolume delete
2011-10-02 16:38:21 +0200, krz...@gmail.com : > Also I think there are no real tools to find out which > directories are subvolumes/snapshots [...] On my system (debian), there's "mountpoint" command (from the initscript package from http://savannah.nongnu.org/projects/sysvinit) that will tell you that (it compares the st_dev of the given directory with that of directory/..). -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high CPU usage and low perf
2011-09-27 10:15:09 +0100, Stephane Chazelas: [...] > btrfs-transacti R running task0 963 2 0x > 880143af7730 0001 ff10 880143af77b0 > 8801456da420 e86aa840 1000 > ffe4 8801462ba800 880109f9b540 88002a95eba8 > Call Trace: > [] ? tree_search_offset+0x18f/0x1b8 [btrfs] > [] ? btrfs_reserve_extent+0xb0/0x190 [btrfs] > [] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs] > [] ? __btrfs_cow_block+0x102/0x31e [btrfs] > [] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs] > [] ? __btrfs_cow_block+0x102/0x31e [btrfs] > [] ? btrfs_set_node_key+0x1a/0x20 [btrfs] > [] ? btrfs_cow_block+0x104/0x14e [btrfs] > [] ? btrfs_search_slot+0x162/0x4cb [btrfs] > [] ? btrfs_insert_empty_items+0x6a/0xba [btrfs] > [] ? run_clustered_refs+0x370/0x682 [btrfs] > [] ? btrfs_find_ref_cluster+0xd/0x13c [btrfs] > [] ? btrfs_run_delayed_refs+0xd1/0x17c [btrfs] > [] ? btrfs_commit_transaction+0x38f/0x709 [btrfs] > [] ? _raw_spin_lock+0xe/0x10 > [] ? join_transaction.clone.23+0xc1/0x200 [btrfs] [...] Any idea anyone? The above suggests btrfs struggles to allocate space, even though the FS is only 66% full. For now, my work around is to reboot the system once a day. Not ideal... I'm also suspecting some data corruption which I'm investigating now (one a file written via mmap()). Thanks, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high CPU usage and low perf
2011-09-27 10:15:09 +0100, Stephane Chazelas: [...] > a btrfs file system of mine started to behave very poorly with > some btrfs kernel tasks taking 100% of CPU time. > > # btrfs fi show /dev/sdb > Label: none uuid: b3ce8b16-970e-4ba8-b9d2-4c7de270d0f1 > Total devices 3 FS bytes used 4.25TB > devid2 size 2.73TB used 1.52TB path /dev/sdc > devid1 size 2.70TB used 1.49TB path /dev/sda4 > devid3 size 2.73TB used 1.52TB path /dev/sdb > > Btrfs v0.19-100-g4964d65 > > FS mounted with compress-force,noatime > > (Can't do a "filesystem df" just now, as there's a umount > running, there should be around 33% free). [...] The umount just returned. # btrfs fi df /backup Data, RAID0: total=4.20TB, used=4.20TB Data: total=8.00MB, used=7.97MB System, RAID1: total=8.00MB, used=344.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=162.75GB, used=59.30GB Metadata: total=8.00MB, used=0.00 It's now running fine again after reload of btrfs module and remount. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
high CPU usage and low perf
Hiya, Recently, a btrfs file system of mine started to behave very poorly with some btrfs kernel tasks taking 100% of CPU time. # btrfs fi show /dev/sdb Label: none uuid: b3ce8b16-970e-4ba8-b9d2-4c7de270d0f1 Total devices 3 FS bytes used 4.25TB devid2 size 2.73TB used 1.52TB path /dev/sdc devid1 size 2.70TB used 1.49TB path /dev/sda4 devid3 size 2.73TB used 1.52TB path /dev/sdb Btrfs v0.19-100-g4964d65 FS mounted with compress-force,noatime (Can't do a "filesystem df" just now, as there's a umount running, there should be around 33% free). Kernel 3.0, with patch: http://www.spinics.net/lists/linux-btrfs/msg11023.html While the FS is running, I see for instance btrfs-transacti taking 100% CPU and iostat shows no disk activity. Writing performance is dreadful (a few kB/s). sysrq-t gives: btrfs-transacti R running task0 963 2 0x 880143af7730 0001 ff10 880143af77b0 8801456da420 e86aa840 1000 ffe4 8801462ba800 880109f9b540 88002a95eba8 Call Trace: [] ? tree_search_offset+0x18f/0x1b8 [btrfs] [] ? btrfs_reserve_extent+0xb0/0x190 [btrfs] [] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs] [] ? __btrfs_cow_block+0x102/0x31e [btrfs] [] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs] [] ? __btrfs_cow_block+0x102/0x31e [btrfs] [] ? btrfs_set_node_key+0x1a/0x20 [btrfs] [] ? btrfs_cow_block+0x104/0x14e [btrfs] [] ? btrfs_search_slot+0x162/0x4cb [btrfs] [] ? btrfs_insert_empty_items+0x6a/0xba [btrfs] [] ? run_clustered_refs+0x370/0x682 [btrfs] [] ? btrfs_find_ref_cluster+0xd/0x13c [btrfs] [] ? btrfs_run_delayed_refs+0xd1/0x17c [btrfs] [] ? btrfs_commit_transaction+0x38f/0x709 [btrfs] [] ? _raw_spin_lock+0xe/0x10 [] ? join_transaction.clone.23+0xc1/0x200 [btrfs] [] ? wake_up_bit+0x2a/0x2a [] ? transaction_kthread+0x175/0x22a [btrfs] [] ? btrfs_congested_fn+0x86/0x86 [btrfs] [] ? kthread+0x82/0x8a [] ? kernel_thread_helper+0x4/0x10 [] ? kthread_worker_fn+0x14c/0x14c [] ? gs_change+0x13/0x13 After a while, with no FS activity, it does calm down though. umount has already used over 10 minutes of CPU time: # ps -flC umount F S UIDPID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 4 R root 6045 1853 65 80 0 - 2538 - 09:46 pts/200:11:06 umount /backup sysrq-t gives: [515954.295050] umount R running task0 6045 1853 0x [515954.295050] 88011131c600 0001 811cb1ee 88012c2fd598 [515954.295050] 8801456da420 1000 8800 8801456da420 [515954.295050] 88012c2fd578 a0327d96 880111bebb60 1000 [515954.295050] Call Trace: [515954.295050] [] ? tree_search_offset+0x18f/0x1b8 [btrfs] [515954.295050] [] ? need_resched+0x23/0x2d [515954.295050] [] ? kmem_cache_alloc+0x94/0x105 [515954.295050] [] ? btrfs_find_space_cluster+0xce/0x189 [btrfs] [515954.295050] [] ? find_free_extent.clone.64+0x549/0x8c7 [btrfs] [515954.295050] [] ? tree_search_offset+0x18f/0x1b8 [btrfs] [515954.295050] [] ? btrfs_reserve_extent+0xb0/0x190 [btrfs] [515954.295050] [] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs] [515954.295050] [] ? __btrfs_cow_block+0x102/0x31e [btrfs] [515954.295050] [] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs] [515954.295050] [] ? __btrfs_cow_block+0x102/0x31e [btrfs] [515954.295050] [] ? lookup_inline_extent_backref+0xa5/0x328 [btrfs] [515954.295050] [] ? __btrfs_free_extent+0xc3/0x55b [btrfs] [515954.295050] [] ? kfree+0x72/0x7b [515954.295050] [] ? btrfs_delayed_ref_lock+0x4a/0xa1 [btrfs] [515954.295050] [] ? run_clustered_refs+0x638/0x682 [btrfs] [515954.295050] [] ? btrfs_find_ref_cluster+0xc/0x13c [btrfs] [515954.295050] [] ? btrfs_run_delayed_refs+0xd1/0x17c [btrfs] [515954.295050] [] ? commit_cowonly_roots+0x78/0x18f [btrfs] [515954.295050] [] ? need_resched+0x23/0x2d [515954.295050] [] ? should_resched+0xe/0x2e [515954.295050] [] ? btrfs_commit_transaction+0x3ff/0x709 [btrfs] [515954.295050] [] ? _raw_spin_lock+0xe/0x10 [515954.295050] [] ? join_transaction.clone.23+0x1ca/0x200 [btrfs] [515954.295050] [] ? wake_up_bit+0x2a/0x2a [515954.295050] [] ? btrfs_sync_fs+0x9f/0xa7 [btrfs] [515954.295050] [] ? __sync_filesystem+0x66/0x7a [515954.295050] [] ? sync_filesystem+0x4c/0x50 [515954.295050] [] ? generic_shutdown_super+0x38/0xf6 [515954.295050] [] ? kill_anon_super+0x16/0x50 [515954.295050] [] ? deactivate_locked_super+0x26/0x4b [515954.295050] [] ? deactivate_super+0x3a/0x3e [515954.295050] [] ? mntput_no_expire+0xd0/0xd5 [515954.295050] [] ? sys_umount+0x2ee/0x31c [515954.295050] [] ? system_call_fastpath+0x16/0x1b Last time it happened, I hard rebooted the system, and it was fine for a while. This time, I'll try and let umount finish. Would anybody know what is happening and how to get out of it? Thanks. Stephane -- To unsubscribe from thi
Re: kernel BUG at fs/btrfs/inode.c:4676!
2011-06-06 12:19:56 +0200, Marek Otahal: > Hello, > the issue happens every time when i have to hard power-off my notebook > (suspend problems). > With kernel 2.6.39 the partition is unmountable, solution is to boot 2.6.38 > kernel which > 1/ is able to mount the partition, > 2/ by doing that fixes the problem so later .39 (after clean shutdown) can > mount it also. [...] I've just been hit by this (3.0). I dug up a 2.6.38 kernel and got it back running just the same. Has any progress made on this? [39564.802905] device fsid 01b919f7-32cd-4d09-be1c-1810249001b2 devid 1 transid 21097 /dev/mapper/VG_USB_debian-root [39565.555655] [ cut here ] [39565.555662] kernel BUG at /build/buildd-linux-2.6_3.0.0-3-amd64-9ClimQ/linux-2.6-3.0.0/debian/build/source_amd64_none/fs/btrfs/inode.c:4586! [39565.555668] invalid opcode: [#1] SMP [39565.555672] CPU 1 [39565.555674] Modules linked in: ext2 hfsplus nls_utf8 nls_cp437 vfat fat ip6table_filter ip6_tables ebtable_nat ebtables vboxnetadp(O) vboxnetflt(O) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT vboxdrv(O) xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp parport_pc ppdev lp parport rfcomm bnep bluetooth rfkill xt_multiport iptable_filter ip_tables x_tables snd_hrtimer acpi_cpufreq mperf cpufreq_conservative cpufreq_powersave cpufreq_userspace cpufreq_stats binfmt_misc fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc ext3 jbd loop dm_crypt kvm_intel kvm uvcvideo videodev media v4l2_compat_ioctl32 nvidia(P) snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device evdev i7core_edac snd i2c_i801 edac_core pcspkr soundcore i2c_core asus_atk0110 snd_page_alloc button processor thermal_sys ext4 mbcache jbd2 crc16 btrfs zlib_deflate crc32c libcrc32c dm_mod raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod nbd sg sd_mod sr_mod crc_t10dif cdrom ata_generic usb_storage usbhid hid uas pata_jmicron firewire_ohci firewire_core crc_itu_t ahci libahci ehci_hcd libata scsi_mod r8169 mii usbcore [last unloaded: scsi_wait_scan] [39565.555806] [39565.555810] Pid: 18729, comm: mount Tainted: P IO 3.0.0-1-amd64 #1 System manufacturer System Product Name/P7P55D [39565.555817] RIP: 0010:[] [] btrfs_add_link+0x120/0x178 [btrfs] [39565.555850] RSP: 0018:8801c5273858 EFLAGS: 00010282 [39565.555854] RAX: ffef RBX: 8801caf96d90 RCX: 8802124d01d8 [39565.555858] RDX: 000e RSI: 8801948da880 RDI: 0292 [39565.555862] RBP: 880162b12800 R08: 0050 R09: 000d [39565.555866] R10: 000c R11: 00015670 R12: 8801caf7d1d8 [39565.555870] R13: 000b R14: 88017b3c0600 R15: 8801cecfb540 [39565.555875] FS: 7f1062f7b7e0() GS:88023fc2() knlGS: [39565.555879] CS: 0010 DS: ES: CR0: 80050033 [39565.555883] CR2: 7f84b7689000 CR3: 00017b112000 CR4: 06e0 [39565.555887] DR0: DR1: DR2: [39565.555891] DR3: DR6: 0ff0 DR7: 0400 [39565.555896] Process mount (pid: 18729, threadinfo 8801c5272000, task 880236e947f0) [39565.555899] Stack: [39565.555901] 0001 01c9 8801 000592ad [39565.555908] 000592ad 0001 880203d96e00 000c [39565.555915] 1000 8801cec8b7f0 8801c52739e8 8801caf7d1d8 [39565.555922] Call Trace: [39565.555948] [] ? add_inode_ref+0x2f3/0x385 [btrfs] [39565.555974] [] ? replay_one_buffer+0x181/0x1fb [btrfs] [39565.556000] [] ? alloc_extent_buffer+0x6f/0x295 [btrfs] [39565.556025] [] ? walk_down_log_tree+0x153/0x29c [btrfs] [39565.556050] [] ? walk_log_tree+0x81/0x196 [btrfs] [39565.556074] [] ? btrfs_read_fs_root_no_radix+0x166/0x1a5 [btrfs] [39565.556099] [] ? btrfs_recover_log_trees+0x192/0x297 [btrfs] [39565.556125] [] ? replay_one_dir_item+0xb3/0xb3 [btrfs] [39565.556148] [] ? btree_read_extent_buffer_pages.clone.63+0x6f/0xb2 [btrfs] [39565.556173] [] ? open_ctree+0x10f5/0x140e [btrfs] [39565.556180] [] ? string.clone.2+0x39/0x9f [39565.556187] [] ? sget+0x363/0x381 [39565.556207] [] ? btrfs_mount+0x228/0x470 [btrfs] [39565.556213] [] ? pcpu_next_pop+0x37/0x45 [39565.556219] [] ? cpumask_next+0x18/0x1d [39565.556224] [] ? pcpu_alloc+0x7b4/0x7cc [39565.556232] [] ? mount_fs+0x67/0x150 [39565.556241] [] ? vfs_kern_mount+0x58/0x97 [39565.556249] [] ? do_kern_mount+0x49/0xd8 [39565.556255] [] ? do_mount+0x690/0x6f6 [39565.556262] [] ? should_resched+0x5/0x24 [39565.556269] [] ? _cond_resched+0x9/0x20 [39565.556275] [] ? memdup_user+0x36/0x5b [39565.556280] [] ? sys_mount+0x88/0xc3 [39565.556287] [] ? system_call_fastpath+0x16/0x1b [39565.556291] Code: 89 f
Re: btrfs hung tasks
2011-07-28 10:22:54 -0400, Josef Bacik: > On Thu, Jul 28, 2011 at 07:23:43AM +0100, Stephane Chazelas wrote: > > Hiya, I got below those last night. That was 3 minutes after a > > bunch of rsync and ntfsclone processes started. > > > > It's the first time it happens. I upgraded from 3.0rc6 to 3.0 > > yesterday. > > > > Ok I fixed that recently and Chris just sent it to Linus. The patch you are > looking for is > > Btrfs: use a worker thread to do caching [...] Thanks a lot Josef, That seems to have done the trick. I've not reproduced the issue yet with that patch applied. Strange that I only get the issue with 3.0 and not 3.0rc6 though -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs hung tasks
2011-07-28 07:23:43 +0100, Stephane Chazelas: > Hiya, I got below those last night. That was 3 minutes after a > bunch of rsync and ntfsclone processes started. > > It's the first time it happens. I upgraded from 3.0rc6 to 3.0 > yesterday. [...] And again this morning, though at that point only one ntfsclone process was actively writing to the FS. At this point, I can read directories and stat(2) files on that FS, but reading or writing files hangs. I'll try and revert to 3.0rc6 to see if that makes a difference. call traces for some processes trying to read from the FS: cat D 8801424ee240 0 3478 1 0x0005 8801424ee240 0086 8801080497e8 8800494322e0 8801461908b0 00012800 880108049fd8 880108049fd8 00012800 8801424ee240 00012800 00012800 Call Trace: [] ? _raw_spin_lock_irqsave+0x9/0x25 [] ? btrfs_tree_lock+0x9a/0xa7 [btrfs] [] ? btrfs_spin_on_block+0x49/0x49 [btrfs] [] ? btrfs_set_path_blocking+0x21/0x32 [btrfs] [] ? btrfs_search_slot+0x3c6/0x4d6 [btrfs] [] ? btrfs_lookup_csum+0x65/0x105 [btrfs] [] ? btrfs_lookup_ordered_extent+0x2b/0x69 [btrfs] [] ? btrfs_find_ordered_sum+0x34/0xcc [btrfs] [] ? __btrfs_lookup_bio_sums+0x16f/0x2ed [btrfs] [] ? btrfs_submit_compressed_read+0x3b7/0x42e [btrfs] [] ? submit_one_bio+0x85/0xbc [btrfs] [] ? submit_extent_page.clone.16+0x118/0x1b9 [btrfs] [] ? check_page_uptodate+0x36/0x36 [btrfs] [] ? __extent_read_full_page+0x463/0x4cc [btrfs] [] ? check_page_uptodate+0x36/0x36 [btrfs] [] ? uncompress_inline.clone.32+0x117/0x117 [btrfs] [] ? extent_readpages+0xb1/0xf6 [btrfs] [] ? uncompress_inline.clone.32+0x117/0x117 [btrfs] [] ? __do_page_cache_readahead+0x124/0x1c8 [] ? ra_submit+0x1c/0x23 [] ? generic_file_aio_read+0x2a7/0x5c7 [] ? do_sync_read+0xb1/0xea [] ? _raw_spin_lock_irq+0xd/0x1a [] ? vfs_read+0x9f/0xf2 [] ? syscall_trace_enter+0xb5/0x15d [] ? sys_read+0x45/0x6b [] ? tracesys+0xd9/0xde wc D 8801424ef710 0 3495 1 0x0005 8801424ef710 0086 811ab802 88014951f5c0 8160b020 00012800 880109617fd8 880109617fd8 00012800 8801424ef710 00012800 00012800 Call Trace: [] ? delay_tsc+0x2b/0x68 [] ? _raw_spin_lock_irqsave+0x9/0x25 [] ? btrfs_tree_lock+0x9a/0xa7 [btrfs] [] ? btrfs_spin_on_block+0x49/0x49 [btrfs] [] ? map_private_extent_buffer+0xa3/0xc4 [btrfs] [] ? btrfs_lock_root_node+0x1d/0x3f [btrfs] [] ? btrfs_search_slot+0xe6/0x4d6 [btrfs] [] ? btrfs_header_generation.clone.17+0xf/0x14 [btrfs] [] ? btrfs_lookup_csum+0x65/0x105 [btrfs] [] ? btrfs_lookup_ordered_extent+0x2b/0x69 [btrfs] [] ? btrfs_find_ordered_sum+0x34/0xcc [btrfs] [] ? __btrfs_lookup_bio_sums+0x16f/0x2ed [btrfs] [] ? btrfs_submit_bio_hook+0xa4/0x129 [btrfs] [] ? submit_one_bio+0x85/0xbc [btrfs] [] ? submit_extent_page.clone.16+0x118/0x1b9 [btrfs] [] ? check_page_uptodate+0x36/0x36 [btrfs] [] ? __extent_read_full_page+0x463/0x4cc [btrfs] [] ? check_page_uptodate+0x36/0x36 [btrfs] [] ? uncompress_inline.clone.32+0x117/0x117 [btrfs] [] ? extent_readpages+0xb1/0xf6 [btrfs] [] ? uncompress_inline.clone.32+0x117/0x117 [btrfs] [] ? __do_page_cache_readahead+0x124/0x1c8 [] ? ra_submit+0x1c/0x23 [] ? generic_file_aio_read+0x26b/0x5c7 [] ? do_sync_read+0xb1/0xea [] ? _raw_spin_lock_irq+0xd/0x1a [] ? vfs_read+0x9f/0xf2 [] ? syscall_trace_enter+0xb5/0x15d [] ? sys_read+0x45/0x6b [] ? tracesys+0xd9/0xde tailD 88014651b750 0 3442 1844 0x0004 88014651b750 0082 880147b06508 8801495660c0 00012800 88010e05dfd8 88010e05dfd8 00012800 88014651b750 00012800 00012800 Call Trace: [] ? __mutex_lock_common.clone.5+0x114/0x179 [] ? mutex_lock+0x1a/0x2d [] ? generic_file_llseek+0x21/0x52 [] ? sys_lseek+0x3c/0x59 [] ? system_call_fastpath+0x16/0x1b rm D 8801424bd060 0 3504 1 0x0005 8801424bd060 0086 0001 8801495660c0 00012800 880118b21fd8 880118b21fd8 00012800 8801424bd060 00012800 00012800 Call Trace: [] ? _raw_spin_lock_irqsave+0x9/0x25 [] ? wait_current_trans.clone.22+0xa1/0xd0 [btrfs] [] ? wake_up_bit+0x23/0x23 [] ? start_transaction+0xd9/0x227 [btrfs] [] ? __unlink_start_trans+0x52/0x399 [btrfs] [] ? should_resched+0x5/0x24 [] ? _cond_resched+0x9/0x20 [] ? generic_permission+0xe/0x9b [] ? btrfs_unlink+0x1e/0xa4 [btrfs] [] ? vfs_unlink+0x65/0xbe [] ? do_unlinkat+0xc6/0x14d [] ? ptrace_report_syscall.clone.8+0x27/0x4f [] ? syscall_trace_enter+0xb5/0x15d [] ? tracesys+0xd9/0xde tee D 88013d9d4140 0 3506 1 0x0005 88013d9d4140 0082 880142634c40 880142634730 8160b020 00012800 88007e18bfd8 88007
btrfs hung tasks
Hiya, I got below those last night. That was 3 minutes after a bunch of rsync and ntfsclone processes started. It's the first time it happens. I upgraded from 3.0rc6 to 3.0 yesterday. sysrq-t output attached. [30961.476020] INFO: task kthreadd:2 blocked for more than 120 seconds. [30961.482414] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [30961.490271] kthreaddD 880149526e20 0 2 0 0x [30961.497489] 880149526e20 0046 88014fffb700 ea7eb520 [30961.504977] 880147045890 00012800 88014952ffd8 88014952ffd8 [30961.512525] 00012800 880149526e20 00012800 00012800 [30961.520179] Call Trace: [30961.522702] [] ? read_tsc+0x5/0x14 [30961.527850] [] ? timekeeping_get_ns+0xd/0x2a [30961.533861] [] ? lock_page+0x20/0x20 [30961.539134] [] ? io_schedule+0x5b/0x75 [30961.548853] [] ? sleep_on_page+0x9/0x10 [30961.554424] [] ? __wait_on_bit+0x3e/0x71 [30961.560042] [] ? wait_on_page_bit+0x6a/0x70 [30961.565912] [] ? autoremove_wake_function+0x2a/0x2a [30961.572478] [] ? migrate_pages+0x1b6/0x39b [30961.578243] [] ? suitable_migration_target+0x35/0x35 [30961.584877] [] ? compact_zone+0x68e/0x6a1 [30961.590567] [] ? shrink_dcache_memory+0x132/0x15d [30961.596934] [] ? compact_zone_order+0x9e/0xab [30961.602954] [] ? try_to_compact_pages+0x90/0xe6 [30961.609147] [] ? __alloc_pages_direct_compact+0xac/0x163 [30961.616124] [] ? __alloc_pages_nodemask+0x485/0x75c [30961.622665] [] ? copy_process+0x109/0x10eb [30961.628462] [] ? perf_event_task_sched_out+0x48/0x54 [30961.635108] [] ? do_fork+0xff/0x263 [30961.640262] [] ? kernel_thread+0x7b/0x83 [30961.645846] [] ? kthread_worker_fn+0x149/0x149 [30961.651955] [] ? gs_change+0x13/0x13 [30961.657193] [] ? kthreadd+0xe7/0x125 [30961.662434] [] ? finish_task_switch+0x84/0xaf [30961.668477] [] ? kernel_thread_helper+0x4/0x10 [30961.674587] [] ? tsk_fork_get_node+0x1a/0x1a [30961.680518] [] ? gs_change+0x13/0x13 [30961.685772] INFO: task btrfs-delayed-m:785 blocked for more than 120 seconds. [30961.692919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [30961.700793] btrfs-delayed-m D 88014628a300 0 785 2 0x [30961.707874] 88014628a300 0046 88014549cc28 eb3f [30961.715347] 880147044ab0 00012800 8801484c3fd8 8801484c3fd8 [30961.722816] 00012800 88014628a300 00012800 00012800 [30961.730283] Call Trace: [30961.732750] [] ? _raw_spin_lock_irqsave+0x9/0x25 [30961.739056] [] ? btrfs_tree_lock+0x9a/0xa7 [btrfs] [30961.745524] [] ? btrfs_spin_on_block+0x49/0x49 [btrfs] [30961.752337] [] ? btrfs_lock_root_node+0x1d/0x3f [btrfs] [30961.759231] [] ? btrfs_search_slot+0xe6/0x4d6 [btrfs] [30961.765948] [] ? mutex_lock+0xd/0x2d [30961.771190] [] ? should_resched+0x5/0x24 [30961.776809] [] ? btrfs_lookup_inode+0x25/0x87 [btrfs] [30961.783527] [] ? need_resched+0x1a/0x23 [30961.789062] [] ? should_resched+0x5/0x24 [30961.794654] [] ? _cond_resched+0x9/0x20 [30961.800155] [] ? mutex_lock+0xd/0x2d [30961.805412] [] ? btrfs_update_delayed_inode+0x6b/0x126 [btrfs] [30961.812967] [] ? btrfs_async_run_delayed_node_done+0x9f/0x1cb [btrfs] [30961.821126] [] ? worker_loop+0x17e/0x498 [btrfs] [30961.827420] [] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [30961.834320] [] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [30961.841209] [] ? kthread+0x7a/0x82 [30961.846276] [] ? kernel_thread_helper+0x4/0x10 [30961.852381] [] ? kthread_worker_fn+0x149/0x149 [30961.858487] [] ? gs_change+0x13/0x13 [30961.863725] INFO: task btrfs-transacti:892 blocked for more than 120 seconds. [30961.870870] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [30961.878727] btrfs-transacti D 880147e24ee0 0 892 2 0x [30961.885811] 880147e24ee0 0046 ea06f318 8800 [30961.893280] 8801495660c0 00012800 880144357fd8 880144357fd8 [30961.900743] 00012800 880147e24ee0 00012800 00012800 [30961.908210] Call Trace: [30961.910669] [] ? read_tsc+0x5/0x14 [30961.915737] [] ? lock_page+0x20/0x20 [30961.920978] [] ? io_schedule+0x5b/0x75 [30961.926390] [] ? sleep_on_page+0x9/0x10 [30961.931888] [] ? __wait_on_bit+0x3e/0x71 [30961.937473] [] ? wait_on_page_bit+0x6a/0x70 [30961.943320] [] ? autoremove_wake_function+0x2a/0x2a [30961.949871] [] ? lock_page+0x11/0x20 [btrfs] [30961.955817] [] ? extent_write_cache_pages.clone.10.clone.17+0xed/0x21f [btrfs] [30961.964730] [] ? pagevec_lookup_tag+0x18/0x1f [30961.970757] [] ? filemap_fdatawait_range+0x12c/0x14a [30961.977400] [] ? extent_writepages+0x44/0x5a [btrfs] [30961.984042] [] ? uncompress_inline.clone.32+0x117/0x117 [btrfs] [30961.991639] [] ? __filemap_fdatawrite_range+0x4b/0x50 [30961.998365] [] ? btrfs_wait_ordered_range+0x53/0x11
BUG() in btrfs-fixup (Was: btrfs invalid opcode)
2011-07-25 17:38:10 +0100, Jeremy Sanders: > I'm afraid this is a rather old kernel, 2.6.35.13-92.fc14.x86_64, but this > error looks rather similiar to > http://www.spinics.net/lists/linux-btrfs/msg11053.html > > Has this been fixed? I was simultaneously doing rsyncs into different > subvolumes (one reading and one writing). [...] > [454244.123523] kernel BUG at fs/btrfs/inode.c:1528! [...] > [454244.124338] Pid: 3158, comm: btrfs-fixup-0 Not tainted > 2.6.35.13-92.fc14.x86_64 #1 C51MCP51/C51GM03 > [454244.124338] RIP: 0010:[] [] > btrfs_writepage_fixup_worker+0xde/0x118 [btrfs] [...] Hi Jeremy, glad I'm not the only one with that issue. That may renew the interest in it... I don't think much progress has been made on it. We could compare our experience to see what contributes to its occurrence. It occurs (quite reproducibly) for me when rsyncing from a multi-device multi-subvolume btrfs fs (mounted with compress-force) onto a single device, no subvolume btrfs fs also mounted with compress-force. It also happens when the target is mounted with compress instead of compress-force but not if I leave out "compress". I only get one occurrence of those BUG()s until I reboot. After the occurrence of that BUG(), I saw a number of misbehaviors that may or may not be linked to it: - btrfs eating all memory (mostly in the btrfs_inode_cache slab) resulting in crash. That doesn't happen anymore since I'm mounting with no atime and use CONFIG_SLUB (though I suspect it's noatime alone that did the trick) - occasionally, 20 to 95% of write(2) system calls to files on the source FS take 4 seconds, making it hardly usable. I also notice a flush-btrfs-1 stuck in "D" state How does that compare with your experience? -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: write(2) taking 4s
2011-07-18 20:37:25 +0100, Stephane Chazelas: > 2011-07-18 11:39:12 +0100, Stephane Chazelas: > > 2011-07-17 10:17:37 +0100, Stephane Chazelas: > > > 2011-07-16 13:12:10 +0100, Stephane Chazelas: > > > > Still on my btrfs-based backup system. I still see one BUG() > > > > reached in btrfs-fixup per boot time, no memory exhaustion > > > > anymore. There is now however something new: write performance > > > > is down to a few bytes per second. > > > [...] > > > > > > The condition that was causing that seems to have cleared by > > > itself this morning before 4am. > > > > > > flush-btrfs-1 and sync are still in D state. > > > > > > Can't really tell what cleared it. Could be when the first of > > > the rsyncs ended as all the other ones (and ntfsclones from nbd > > > devices) ended soon after > > [...] > > > > New nightly backup, and it's happening again. Started about 40 > > minutes after the start of the backup. > [...] > > Actively running at the moment are 1 rsync and 3 ntfsclone. > [...] > > And then again today. > > Interestingly, I "killall -STOP"ed all the ntfsclone and rsync > processes and: [...] > Now 95% of the write(2)s take 4 seconds (while it was about 15% > before I stopped the processes). [...] And this morning, after killing everything so that nothing was writing to the FS anymore, 95% of write(2)s were delayed as well (according to strace -Te write yes > file-on-btrfs). Then I rebooted (sysrq-b) and am trying btrfsck (from integration-20110705) on it, but btrfsck is using 8G of memory on a system that has only 5G so it's swapping in and out constantly and getting nowhere (and renders the system hardly usable) I found http://thread.gmane.org/gmane.comp.file-systems.btrfs/5716/focus=5728 from last year. Is that still the case? PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 1950 root 20 0 7684m 4.4g 232 R4 91.1 4:22.87 btrfsck (and still growing) vmstat 1 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 2 2 3232016 115232 4524 3520 698 708 3305 716 991 570 3 1 56 39 0 2 3231816 111536 5976 3428 2964 532 4912 532 1569 683 1 0 46 53 0 2 3231144 105832 8144 3536 3140 24 532424 1612 392 1 1 38 60 0 2 3231532 104964 8180 3684 2672 900 2708 900 1017 324 1 1 34 64 -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: write(2) taking 4s
2011-07-18 11:39:12 +0100, Stephane Chazelas: > 2011-07-17 10:17:37 +0100, Stephane Chazelas: > > 2011-07-16 13:12:10 +0100, Stephane Chazelas: > > > Still on my btrfs-based backup system. I still see one BUG() > > > reached in btrfs-fixup per boot time, no memory exhaustion > > > anymore. There is now however something new: write performance > > > is down to a few bytes per second. > > [...] > > > > The condition that was causing that seems to have cleared by > > itself this morning before 4am. > > > > flush-btrfs-1 and sync are still in D state. > > > > Can't really tell what cleared it. Could be when the first of > > the rsyncs ended as all the other ones (and ntfsclones from nbd > > devices) ended soon after > [...] > > New nightly backup, and it's happening again. Started about 40 > minutes after the start of the backup. [...] > Actively running at the moment are 1 rsync and 3 ntfsclone. [...] And then again today. Interestingly, I "killall -STOP"ed all the ntfsclone and rsync processes and: # strace -tt -Te write yes > a-file-on-the-btrfs-fs 20:23:26.635848 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 4096) = 4096 <4.095223> 20:23:30.731391 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 4096) = 4096 <4.095769> 20:23:34.827390 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 4096) = 4096 <4.095788> 20:23:38.923388 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 4096) = 4096 <4.095771> Now 95% of the write(2)s take 4 seconds (while it was about 15% before I stopped the processes). [304257.760119] yes S 88001e8e3780 0 13179 13178 0x0001 [304257.760119] 88001e8e3780 0086 8160b020 [304257.760119] 000127c0 880074543fd8 880074543fd8 000127c0 [304257.760119] 88001e8e3780 880074542010 0286 00010286 [304257.760119] Call Trace: [304257.760119] [] ? schedule_timeout+0xa0/0xd7 [304257.760119] [] ? lock_timer_base+0x49/0x49 [304257.760119] [] ? shrink_delalloc+0x100/0x14e [btrfs] [304257.760119] [] ? btrfs_delalloc_reserve_metadata+0xf9/0x10b [btrfs] [304257.760119] [] ? btrfs_delalloc_reserve_space+0x20/0x3e [btrfs] [304257.760119] [] ? __btrfs_buffered_write+0x137/0x2dc [btrfs] [304257.760119] [] ? btrfs_dirty_inode+0x119/0x139 [btrfs] [304257.760119] [] ? btrfs_file_aio_write+0x395/0x42b [btrfs] [304257.760119] [] ? __switch_to+0x19c/0x288 [304257.760119] [] ? do_sync_write+0xb1/0xea [304257.760119] [] ? ptrace_notify+0x7f/0x9d [304257.760119] [] ? security_file_permission+0x18/0x2d [304257.760119] [] ? vfs_write+0xa4/0xff [304257.760119] [] ? syscall_trace_enter+0xb6/0x15b [304257.760119] [] ? sys_write+0x45/0x6e [304257.760119] [] ? tracesys+0xd9/0xde After killall -CONT, it's back to 15% write(2)s delayed. What's going on? -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: write(2) taking 4s
2011-07-17 10:17:37 +0100, Stephane Chazelas: > 2011-07-16 13:12:10 +0100, Stephane Chazelas: > > Still on my btrfs-based backup system. I still see one BUG() > > reached in btrfs-fixup per boot time, no memory exhaustion > > anymore. There is now however something new: write performance > > is down to a few bytes per second. > [...] > > The condition that was causing that seems to have cleared by > itself this morning before 4am. > > flush-btrfs-1 and sync are still in D state. > > Can't really tell what cleared it. Could be when the first of > the rsyncs ended as all the other ones (and ntfsclones from nbd > devices) ended soon after [...] New nightly backup, and it's happening again. Started about 40 minutes after the start of the backup. system -net/total- ---procs--- --dsk/sda-dsk/sdb-dsk/sdc-- time | recv send|run blk new| read writ: read writ: read writ 17-07 20:19:18| 0 0 |0.0 0.0 0.0| 142k 31k: 119k 36k: 120k 33k 17-07 20:19:48|8087k 224k|1.2 5.3 0.1|2976k 98k: 793k 400k:2856k 375k 17-07 20:20:18|5174k 134k|0.8 4.6 0.9| 880k 179k: 830k 916k:1801k 825k 17-07 20:20:48|6634k 148k|1.3 4.9 0.2| 609k 101k:1259k 96k:2628k 98k 17-07 20:21:18|6725k 165k|0.7 5.8 0.0| 237k 442k: 975k 723k:1870k 644k 17-07 20:21:48|7100k 153k|0.7 5.4 0| 305k 83k:1124k 314k:2155k 274k 17-07 20:22:18|4440k 178k|0.5 5.3 0.0| 296k 1775B:2094k 240k:1663k 239k 17-07 20:22:48|8181k 220k|0.9 5.8 0| 360k 410B:1579k 196k:2065k 196k 17-07 20:23:18|8144k 228k|1.3 5.6 0| 348k 54k:1781k 216k:2213k 164k 17-07 20:23:48|5506k 185k|0.8 5.2 0.1| 307k0 :2040k0 :2166k0 17-07 20:24:18|6260k 206k|1.0 5.4 0.1| 474k 78k:2034k 285k:2218k 207k 17-07 20:24:48|8420k 314k|1.5 5.4 0| 313k 363k:2367k 391k:2182k 124k 17-07 20:25:18|8367k 247k|0.9 5.1 0.2| 475k 77k:1797k 75k:2220k 410B 17-07 20:25:48|7511k 179k|1.0 4.7 0| 406k 7646B:1596k 145k:2397k 147k 17-07 20:26:18|7930k 162k|0.7 5.1 0| 991k 410B:1468k 26k:2186k 26k 17-07 20:26:48|7757k 176k|1.0 5.3 0|1884k 26k:1147k 58k:2761k 32k [...] 17-07 20:57:18|6917k 120k|0.3 4.1 0| 56k 410B: 65k 4506B: 213k 4506B 17-07 20:57:48|5698k 103k|0.1 4.0 0| 0 410B: 27k 6007B: 590k 6007B 17-07 20:58:18|6582k 117k|0.2 4.0 0| 229k 20k: 195k 956B: 290k 21k 17-07 20:58:48|6048k 110k|0.6 4.0 0.1| 32k 21k: 81k 410B: 331k 21k 17-07 20:59:18|8057k 138k|0.6 4.1 0| 42k 5871B: 33k 410B: 35k 5871B 17-07 20:59:48|7369k 145k|0.5 4.1 0| 59k 3959B: 230k 410B: 532k 3959B 17-07 21:00:18|8189k 140k|0.7 4.0 0| 53k 6007B: 58k 410B: 40k 6007B 17-07 21:00:48|7596k 137k|0.3 4.2 0| 24k 6690B: 250k 410B: 15k 5734B 17-07 21:01:18|8448k 145k|0.7 4.2 0| 24k 1365B: 325k 6827B: 15k 7646B 17-07 21:01:48|6821k 119k|0.3 4.0 0| 17k 410B: 175k 3004B: 11k 3004B 17-07 21:02:18|3614k 66k|0.7 2.7 0| 39k 410B: 538k 4779B: 45k 4779B 17-07 21:02:48| 417k 14k|0.5 1.3 0.3| 106k 1638B: 209k 4779B: 0 4779B 17-07 21:03:18| 353k 7979B|0.8 1.2 0| 0 1229B: 449k 2867B: 0 2867B 17-07 21:03:48| 327k 8981B|1.1 1.2 0| 0 410B: 686k 4506B: 43k 4506B [...] 18-07 11:02:48| 243k 4866B|0.0 1.2 0.1| 0 2458B: 0 3550B: 0 3550B 18-07 11:03:18| 274k 5506B|0.1 1.2 0.1| 0 1775B: 0 3550B: 0 3550B 18-07 11:03:48| 238k 4851B|0.1 1.2 0.0| 0 4369B: 0 3550B: 0 3550B 18-07 11:04:18| 243k 4999B|0.1 1.1 0.1| 0 4506B: 0 3550B: 0 3550B 18-07 11:04:48| 288k 6488B|0.1 1.1 0.4| 0 2458B: 0 3550B: 0 3550B Because that's after the week-end, there's not much to write. What's holding 3 of the backups is actually writing log data like "xx% Completed". Actively running at the moment are 1 rsync and 3 ntfsclone. # strace -tt -s 2 -Te write -p 8771 -p 8567 -p 8856 -p 8403 Process 8771 attached - interrupt to quit Process 8567 attached - interrupt to quit Process 8856 attached - interrupt to quit Process 8403 attached - interrupt to quit [pid 8403] 11:12:26.539830 write(4, "es"..., 1024 [pid 8771] 11:12:26.540417 write(4, "hb"..., 4096 [pid 8567] 11:12:26.555211 write(1, " 3"..., 25 [pid 8856] 11:12:26.593232 write(1, " 6"..., 25 [pid 8403] 11:12:30.635257 <... write resumed> ) = 1024 <4.095271> [pid 8403] 11:12:30.635309 write(4, "19"..., 112 [pid 8567] 11:12:30.635364 <... write resumed> ) = 25 <4.080091> [pid 8856] 11:12:30.635553 <... write resumed> ) = 25 <4.042268> [pid 8771] 11:12:30.635799 <... write resumed> ) = 4096 <4.095350> [pid 8771] 11:12:30.636182 write(4, "hb"..., 4096 [pid 8567] 11:12:30.649904 write(1, " 3"..., 25 [pid 8403] 11:12:30.651452 <... write resumed> ) = 112 <0.015921> [pid 8567] 11:12:30.651595 <... write resumed> ) = 25 <0.001640> [pid 8403] 11:12:30.65
Re: write(2) taking 4s. (Was: Memory leak?)
2011-07-16 13:12:10 +0100, Stephane Chazelas: > Still on my btrfs-based backup system. I still see one BUG() > reached in btrfs-fixup per boot time, no memory exhaustion > anymore. There is now however something new: write performance > is down to a few bytes per second. [...] The condition that was causing that seems to have cleared by itself this morning before 4am. flush-btrfs-1 and sync are still in D state. Can't really tell what cleared it. Could be when the first of the rsyncs ended as all the other ones (and ntfsclones from nbd devices) ended soon after Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: write(2) taking 4s. (Was: Memory leak?)
2011-07-16 13:12:10 +0100, Stephane Chazelas: [...] > ntfsclone (patched to only write modified clusters): > > # strace -Te write -p 4717 > Process 4717 attached - interrupt to quit > write(1, " 65.16 percent completed\r", 25) = 25 <0.008996> > write(1, " 65.16 percent completed\r", 25) = 25 <0.743358> > write(1, " 65.16 percent completed\r", 25) = 25 <0.306582> > write(1, " 65.17 percent completed\r", 25) = 25 <4.082723> > write(1, " 65.17 percent completed\r", 25) = 25 <0.006402> > write(1, " 65.17 percent completed\r", 25) = 25 <0.012582> > write(1, " 65.17 percent completed\r", 25) = 25 <4.052504> > write(1, " 65.17 percent completed\r", 25) = 25 <0.012111> > write(1, " 65.17 percent completed\r", 25) = 25 <0.016001> > write(1, " 65.17 percent completed\r", 25) = 25 <4.028017> > write(1, " 65.18 percent completed\r", 25) = 25 <0.013365> > write(1, " 65.18 percent completed\r", 25) = 25 <0.003963> > (that's writing to a log file) > > See how many write(2)s take 4 seconds. [...] top - 17:14:18 up 1 day, 9:20, 3 users, load average: 1.00, 1.06, 1.11 Tasks: 146 total, 1 running, 145 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 25.0%id, 75.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 5094800k total, 4616412k used, 478388k free, 1420192k buffers Swap: 4194300k total, 8720k used, 4185580k free, 2266240k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ P COMMAND 2156 root 20 0 000 D0 0.0 0:00.02 0 flush-btrfs-1 6206 root 20 0 19112 1284 916 R0 0.0 0:00.09 0 top 1 root 20 0 8400 220 196 S0 0.0 0:02.26 0 init (all the other processes sleeping) I suspect load 1 is for that flush-btrfs-1 in [86372.445554] flush-btrfs-1 D 88014438da30 0 2156 2 0x [86372.445554] 88014438da30 0046 8100e366 88014438a9a0 [86372.445554] 000127c0 880021501fd8 880021501fd8 000127c0 [86372.445554] 88014438da30 880021500010 8100e366 81066ec6 [86372.445554] Call Trace: [86372.445554] [] ? read_tsc+0x5/0x16 [86372.445554] [] ? read_tsc+0x5/0x16 [86372.445554] [] ? timekeeping_get_ns+0xd/0x2a [86372.445554] [] ? __lock_page+0x63/0x63 [86372.445554] [] ? io_schedule+0x84/0xc3 [86372.445554] [] ? radix_tree_gang_lookup_tag_slot+0x7a/0x9f [86372.445554] [] ? sleep_on_page+0x9/0xd [86372.445554] [] ? __wait_on_bit_lock+0x3c/0x85 [86372.445554] [] ? __lock_page+0x5d/0x63 [86372.445554] [] ? autoremove_wake_function+0x2a/0x2a [86372.445554] [] ? T.1090+0xba/0x234 [btrfs] [86372.445554] [] ? extent_writepages+0x40/0x56 [btrfs] [86372.445554] [] ? btrfs_submit_direct+0x403/0x403 [btrfs] [86372.445554] [] ? writeback_single_inode+0xb8/0x1b8 [86372.445554] [] ? writeback_sb_inodes+0xc2/0x13b [86372.445554] [] ? writeback_inodes_wb+0xfd/0x10f [86372.445554] [] ? wb_writeback+0x213/0x330 [86372.445554] [] ? lock_timer_base+0x25/0x49 [86372.445554] [] ? wb_do_writeback+0x16d/0x1fc [86372.445554] [] ? del_timer_sync+0x34/0x3e [86372.445554] [] ? bdi_writeback_thread+0xc3/0x1ff [86372.445554] [] ? wb_do_writeback+0x1fc/0x1fc [86372.445554] [] ? wb_do_writeback+0x1fc/0x1fc [86372.445554] [] ? kthread+0x7a/0x82 [86372.445554] [] ? kernel_thread_helper+0x4/0x10 [86372.445554] [] ? kthread_worker_fn+0x147/0x147 [86372.445554] [] ? gs_change+0x13/0x13 Also, if I run sync(1), it seems to never return. [120348.788021] syncD 88011b3e1bc0 0 6215 1789 0x [120348.788021] 88011b3e1bc0 0082 8160b020 [120348.788021] 000127c0 8800b0f25fd8 8800b0f25fd8 000127c0 [120348.788021] 88011b3e1bc0 8800b0f24010 88011b3e1bc0 00014fc127c0 [120348.788021] Call Trace: [120348.788021] [] ? schedule_timeout+0x2d/0xd7 [120348.788021] [] ? __sync_filesystem+0x74/0x74 [120348.788021] [] ? wait_for_common+0xd1/0x14e [120348.788021] [] ? try_to_wake_up+0x18c/0x18c [120348.788021] [] ? __sync_filesystem+0x74/0x74 [120348.788021] [] ? __sync_filesystem+0x74/0x74 [120348.788021] [] ? writeback_inodes_sb_nr+0x72/0x78 [120348.788021] [] ? __sync_filesystem+0x4e/0x74 [120348.788021] [] ? iterate_supers+0x5e/0xab [120348.788021] [] ? sys_sync+0x28/0x53 [120348.788021] [] ? system_call_fastpath+0x16/0x1b -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
write(2) taking 4s. (Was: Memory leak?)
Still on my btrfs-based backup system. I still see one BUG() reached in btrfs-fixup per boot time, no memory exhaustion anymore. There is now however something new: write performance is down to a few bytes per second. I've got a few processes (rsync, patched ntfsclone, shells mostly) writing to files at the same time on this server. disk stats per second: --dsk/sda-dsk/sdb-dsk/sdc-- read writ: read writ: read writ 264k 44k: 193k 44k: 225k 42k 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 060k: 0 0 : 0 0 012k: 0 1176k: 0 1164k 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 040k: 0 0 :8192B0 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 0 0 : 0 0 :4096B0 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 324k0 : 248k0 : 548k0 0 4096B: 0 0 : 0 0 352k0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 0 0 : 0 0 :4096B0 0 0 : 0 0 : 0 0 080k: 0 0 : 0 0 rsync: # strace -Ts0 -p 5015 Process 5015 attached - interrupt to quit write(3, ""..., 1024) = 1024 <0.007700> write(3, ""..., 1024) = 1024 <0.015822> write(3, ""..., 1024) = 1024 <0.031853> write(3, ""..., 1024) = 1024 <0.015881> write(3, ""..., 1024) = 1024 <0.015911> write(3, ""..., 1024) = 1024 <0.015796> write(3, ""..., 1024) = 1024 <0.031946> write(3, ""..., 1024) = 1024 <4.083854> write(3, ""..., 1024) = 1024 <0.007925> write(3, ""..., 1024) = 1024 <0.003776> write(3, ""..., 1024) = 1024 <0.031862> write(3, ""..., 1024) = 1024 <0.011807> write(3, ""..., 1024) = 1024 <0.019742> write(3, ""..., 1024) = 1024 <0.015857> write(3, ""..., 1024) = 1024 <0.031833> write(3, ""..., 1024) = 1024 <0.015789> write(3, ""..., 1024) = 1024 <0.015926> write(3, ""..., 1024) = 1024 <4.095967> write(3, ""..., 1024) = 1024 <0.019798> write(3, ""..., 1024) = 1024 <4.083682> write(3, ""..., 1024) = 1024 <0.015398> write(3, ""..., 1024) = 1024 <0.015951> write(3, ""..., 1024) = 1024 <0.035837> write(3, ""..., 1024) = 1024 <0.015962> write(3, ""..., 1024) = 1024 <0.015909> write(3, ""..., 1024) = 1024 <0.015967> write(3, ""..., 48) = 48 <0.003782> write(3, ""..., 1024) = 1024 <0.031802> write(3, ""..., 1024) = 1024 <0.015811> write(3, ""..., 1024) = 1024 <0.015944> write(3, ""..., 1024) = 1024 <0.019810> write(3, ""..., 1024) = 1024 <0.031948> ntfsclone (patched to only write modified clusters): # strace -Te write -p 4717 Process 4717 attached - interrupt to quit write(1, " 65.16 percent completed\r", 25) = 25 <0.008996> write(1, " 65.16 percent completed\r", 25) = 25 <0.743358> write(1, " 65.16 percent completed\r", 25) = 25 <0.306582> write(1, " 65.17 percent completed\r", 25) = 25 <4.082723> write(1, " 65.17 percent completed\r", 25) = 25 <0.006402> write(1, " 65.17 percent completed\r", 25) = 25 <0.012582> write(1, " 65.17 percent completed\r", 25) = 25 <4.052504> write(1, " 65.17 percent completed\r", 25) = 25 <0.012111> write(1, " 65.17 percent completed\r", 25) = 25 <0.016001> write(1, " 65.17 percent completed\r", 25) = 25 <4.028017> write(1, " 65.18 percent completed\r", 25) = 25 <0.013365> write(1, " 65.18 percent completed\r", 25) = 25 <0.003963> (that's writing to a log file) See how many write(2)s take 4 seconds. No issue when writing to an ext4 FS. SMART status on all drives OK. What else could I look at? Attached is a sysrq-t output. -- Stephane sysrq-t.txt.xz Description: Binary data
Re: Memory leak?
2011-07-11 10:01:21 +0100, Stephane Chazelas: [...] > Same without dmcrypt. So to sum up, BUG() reached in btrfs-fixup > thread when doing an > > - rsync (though I also got (back when on ubuntu and 2.6.38) at > least one occurrence using bsdtar | bsdtar) > - of a large amount of data (with a large number of files), > though the bug occurs quite early probably after having > transfered about 50-100GB > - the source FS being btrfs with compress-force on 3 devices > (one of which slightly shorter than the others) and a lot of > subvolumes and snapshots (I'm now copying from read-only > snapshots but that happened with RW ones as well). > - to a newly created btrfs fs > - on one device (/dev/sdd or dmcrypt) > - mounted with compress or compress-force. > > - noatime on either source or dest doesn't make a difference > (wrt the occurrence of fixup BUG()) > - can't reproduce it when dest is not mounted with compress > - beside that BUG(), > - kernel memory is being used up (mostly in > btrfs_inode_cache) and can't be reclaimed (leading to crash > with oom killing everybody) > - the target FS can be unmounted but that does not reclaim > memory. However the *source* FS (that is not the one we tried > with and without compress) cannot be unmounted (umount hangs, > see another email for its stack trace). > - Only way to get out of there is reboot with sysrq-b > - happens with 2.6.38, 2.6.39, 3.0.0rc6 > - CONFIG_SLAB_DEBUG, CONFIG_DEBUG_PAGEALLOC, > CONFIG_DEBUG_SLAB_LEAK, slub_debug don't tell us anything > useful (there's more info in /proc/slabinfo when > CONFIG_SLAB_DEBUG is on, see below) > - happens with CONFIG_SLUB as well. [...] I don't know which of CONFIG_SLUB or noatime made it, but in that setup with both enabled, I do get the BUG(), but the system memory is not exhausted even after rsync goes over the section with millions of files where it used to cause the oom crash. The only issue remaining then is that I can't umount the source FS (and thus causing reboot issues). We could still have 2 or 3 different issues here for all we know. The situation is a lot more comfortable for me now though. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-11 12:25:51 -0400, Chris Mason: [...] > > Also, when I resume the rsync (so it doesn't transfer the > > already transfered files), it does BUG() again. > > Ok, could you please send along the exact rsync command you were > running? [...] I did earlier, but here it is again: rsync --archive --xattrs --hard-links --numeric-ids --sparse --acls /src/ /dst/ Also with: (cd /src && bsdtar cf - .) | pv | (cd /dst && bsdtar -xpSf - --numeric-owner) -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-11 11:00:19 -0400, Chris Mason: > Excerpts from Stephane Chazelas's message of 2011-07-11 05:01:21 -0400: > > 2011-07-10 19:37:28 +0100, Stephane Chazelas: > > > 2011-07-10 08:44:34 -0400, Chris Mason: > > > [...] > > > > Great, we're on the right track. Does it trigger with mount -o compress > > > > instead of mount -o compress_force? > > > [...] > > > > > > It does trigger. I get that same "invalid opcode". > > > > > > BTW, I tried with CONFIG_SLUB and slub_debug and no more useful > > > information than with SLAB_DEBUG. > > > > > > I'm trying now without dmcrypt. Then I won't have much bandwidth > > > for testing. > > [...] > > > > Same without dmcrypt. So to sum up, BUG() reached in btrfs-fixup > > thread when doing an [...] > > - CONFIG_SLAB_DEBUG, CONFIG_DEBUG_PAGEALLOC, > > CONFIG_DEBUG_SLAB_LEAK, slub_debug don't tell us anything > > useful (there's more info in /proc/slabinfo when > > CONFIG_SLAB_DEBUG is on, see below) [...] > This is some fantastic debugging, thank you. I know you tested with > slab debugging turned on, did you have CONFIG_DEBUG_PAGEALLOC on as > well? Yes when using SLAB, not when using SLUB. > It's probably something to do with a specific file, but pulling that > file out without extra printks is going to be tricky. I'll see if I can > reproduce it here. [...] For one occurrence, I know what file was being transfered at the time of the crash (looking at ctimes on the dest FS, see one of my earlier emails). And after just checking on the latest BUG(), it's a different one. Also, when I resume the rsync (so it doesn't transfer the already transfered files), it does BUG() again. regards, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: feature request: btrfs-image without zeroing data
2011-07-11 14:39:18 +0200, krz...@gmail.com : > 2011/7/11 Stephane Chazelas : [...] > > See also > > http://thread.gmane.org/gmane.comp.file-systems.btrfs/9675/focus=9820 > > for a way to transfer btrfs fs. > > > > (Add a layer of "copy-on-write" on the original devices (LVM > > snapshots, nbd/qemu-nbd cow...), "btrfs add" the new device(s) > > and then "btrfs del" of the cow'ed original devices. [...] > Copying on block level (dd, lvm) is old trick, however this takes same > ammount of time regardless of actual space used in filesystem. Hence > this feature request. Images inside filesystem can copy only actualy > used data and metadata, which dramaticly reduces copy times in large > volumes that are not filled up... The method I suggest doesn't copy the whole disks, please read more carefully. It can also work to copy from a 3 disk setup to a 1 disk setup or the other way round. With btrfs, you can add devices to a FS dynamically, you can also delete devices in which case data is being transfered to the other devices. The method I suggest uses that feature. Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: feature request: btrfs-image without zeroing data
2011-07-11 02:00:51 +0200, krz...@gmail.com : > Documentation says that btrfs-image zeros data. Feature request is for > disabling this. btrfs-image could be used to copy filesystem to > another drive (for example with snapshots, when copying it file by > file would take much longer time or acctualy was not possible > (snapshots)). btrfs-image in turn could be used to actualy shrink loop > devices/sparse file containing btrfs - by copying filesystem to new > loop device/sparse file. > > Also it would be nice if copying filesystem could occour without > intermediate dump to a file... [...] I second that. See also http://thread.gmane.org/gmane.comp.file-systems.btrfs/9675/focus=9820 for a way to transfer btrfs fs. (Add a layer of "copy-on-write" on the original devices (LVM snapshots, nbd/qemu-nbd cow...), "btrfs add" the new device(s) and then "btrfs del" of the cow'ed original devices. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-10 19:37:28 +0100, Stephane Chazelas: > 2011-07-10 08:44:34 -0400, Chris Mason: > [...] > > Great, we're on the right track. Does it trigger with mount -o compress > > instead of mount -o compress_force? > [...] > > It does trigger. I get that same "invalid opcode". > > BTW, I tried with CONFIG_SLUB and slub_debug and no more useful > information than with SLAB_DEBUG. > > I'm trying now without dmcrypt. Then I won't have much bandwidth > for testing. [...] Same without dmcrypt. So to sum up, BUG() reached in btrfs-fixup thread when doing an - rsync (though I also got (back when on ubuntu and 2.6.38) at least one occurrence using bsdtar | bsdtar) - of a large amount of data (with a large number of files), though the bug occurs quite early probably after having transfered about 50-100GB - the source FS being btrfs with compress-force on 3 devices (one of which slightly shorter than the others) and a lot of subvolumes and snapshots (I'm now copying from read-only snapshots but that happened with RW ones as well). - to a newly created btrfs fs - on one device (/dev/sdd or dmcrypt) - mounted with compress or compress-force. - noatime on either source or dest doesn't make a difference (wrt the occurrence of fixup BUG()) - can't reproduce it when dest is not mounted with compress - beside that BUG(), - kernel memory is being used up (mostly in btrfs_inode_cache) and can't be reclaimed (leading to crash with oom killing everybody) - the target FS can be unmounted but that does not reclaim memory. However the *source* FS (that is not the one we tried with and without compress) cannot be unmounted (umount hangs, see another email for its stack trace). - Only way to get out of there is reboot with sysrq-b - happens with 2.6.38, 2.6.39, 3.0.0rc6 - CONFIG_SLAB_DEBUG, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_SLAB_LEAK, slub_debug don't tell us anything useful (there's more info in /proc/slabinfo when CONFIG_SLAB_DEBUG is on, see below) - happens with CONFIG_SLUB as well. slabinfo every about 60-70 seconds which include the "globalstats" slabinfo - version: 2.1 (statistics) # name : tunables: slabdata : globalst at : cpustat btrfs_inode_cache 77610 77610 409611 : tunables 24 128 : slabdata 77610 77610 0 : globalstat 77610 77610 77610000 000 : cpustat104 77609 98 5 btrfs_inode_cache 165696 165696 409611 : tunables 24 128 : slabdata 165696 165696 0 : globalstat 174592 166889 17311700 37 800 : cpustat 14375 174178 21198 1659 btrfs_inode_cache 173906 173906 409611 : tunables 24 128 : slabdata 173906 173906 0 : globalstat 231342 196133 22884880 37 800 : cpustat 24914 230649 75318 6338 btrfs_inode_cache 201190 201190 409611 : tunables 24 128 : slabdata 201190 201190 0 : globalstat 338963 201190 33145480 38 1100 : cpustat 53954 335583 173512 14834 btrfs_inode_cache 224106 224143 409611 : tunables 24 128 : slabdata 224106 224143 96 : globalstat 453173 267189 44210180 38 1300 : cpustat 77063 448023 277242 23875 btrfs_inode_cache 126520 126520 409611 : tunables 24 128 : slabdata 126520 126520 0 : globalstat 486327 267189 472461 320 38 1300 : cpustat 96675 479904 414073 35992 btrfs_inode_cache 144723 144723 409611 : tunables 24 128 : slabdata 144723 144723 0 : globalstat 537600 267189 521248 320 38 1500 : cpustat 114446 530048 459922 39849 btrfs_inode_cache 176590 176590 409611 : tunables 24 128 : slabdata 176590 176590 0 : globalstat 626027 267189 605212 320 38 3500 : cpustat 142336 616188 535659 46275 btrfs_inode_cache 225715 225752 409611 : tunables 24 128 : slabdata 225715 225752 96 : globalstat 766387 267189 739439 320 38 6000 : cpustat 181607 753564 653165 56404 btrfs_inode_cache 179039 179076 409611 : tunables 24 128 : slabdata 179039 179076 84 : globalstat 821296 267189 793315 480 38 6000 : cpustat 189640 808027 753396 65349 btrfs_inode_cache 139572 139609 409611 : tunables 24 128 : slabdata 139572 139609 96 : globalstat 890513 267189 858553 560 38 6000 : cpustat 214964 875265 874796 75992 btrfs_inode_cache 122064 122101 409611 : tunables 24 128 : slabdata 122064 122101 96 : globalstat 936515 267189 903015 720 38 6600 : cpustat 230345 920877 947006 82282 btrfs_inode_cache 136431 136468 409611 : tunables 24 128 : slabdata 136431 136468
Re: Memory leak?
2011-07-10 08:44:34 -0400, Chris Mason: [...] > Great, we're on the right track. Does it trigger with mount -o compress > instead of mount -o compress_force? [...] It does trigger. I get that same "invalid opcode". BTW, I tried with CONFIG_SLUB and slub_debug and no more useful information than with SLAB_DEBUG. I'm trying now without dmcrypt. Then I won't have much bandwidth for testing. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-09 08:09:55 +0100, Stephane Chazelas: > 2011-07-08 16:12:28 -0400, Chris Mason: > [...] > > > I'm running a "dstat -df" at the same time and I'm seeing > > > substantive amount of disk writes on the disks that hold the > > > source FS (and I'm rsyncing from read-only snapshot subvolumes > > > in case you're thinking of atimes) almost more than onto the > > > drive holding the target FS!? > > > > These are probably atime updates. You can completely disable them with > > mount -o noatime. > [...] > > I don't think it is. First, as I said, I'm rsyncing from > read-only snapshots (and I could see atimes were not updated) > and nothing else is running. Then now looking at the traces this > morning, There was a lot written in the first minutes, then it > calmed down. [...] How embarrassing, sorry In that instance I wasn't rsyncing from the right place, so I was indeed copying non-readonly volumes before copying readonly ones. So, those writes were probably due to atimes. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-09 13:25:00 -0600, cwillu: > On Sat, Jul 9, 2011 at 11:09 AM, Stephane Chazelas > wrote: > > 2011-07-08 11:06:08 -0400, Chris Mason: > > [...] > >> I would do two things. First, I'd turn off compress_force. There's no > >> explicit reason for this, it just seems like the mostly likely place for > >> a bug. > > [...] > > > > I couldn't reproduce it with compress_force turned off, the > > inode_cache reached 600MB but never stayed high. > > > > Not using compress_force is not an option for me though > > unfortunately. > > Disabling compression doesn't decompress everything that's already compressed. Yes. But the very issue here is that I get this problem when I copy data onto an empty drive. If I don't enable compression there, it simply doesn't fit. In that very case, support for compression is the main reason why I use brtfs here. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-08 11:06:08 -0400, Chris Mason: [...] > I would do two things. First, I'd turn off compress_force. There's no > explicit reason for this, it just seems like the mostly likely place for > a bug. [...] I couldn't reproduce it with compress_force turned off, the inode_cache reached 600MB but never stayed high. Not using compress_force is not an option for me though unfortunately. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-08 12:17:54 -0400, Chris Mason: [...] > How easily can you recompile your kernel with more debugging flags? > That should help narrow it down. I'm looking for CONFIG_SLAB_DEBUG (or > slub) and CONFIG_DEBUG_PAGEALLOC [...] I tried that (with CONFIG_DEBUG_SLAB_LEAK as well) but no difference whatsoever -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
A lot of writing to FS only read (Was: Memory leak?)
2011-07-09 08:09:55 +0100, Stephane Chazelas: > 2011-07-08 16:12:28 -0400, Chris Mason: > [...] > > > I'm running a "dstat -df" at the same time and I'm seeing > > > substantive amount of disk writes on the disks that hold the > > > source FS (and I'm rsyncing from read-only snapshot subvolumes > > > in case you're thinking of atimes) almost more than onto the > > > drive holding the target FS!? [...] One thing I didn't mention is that before doing the rsync, I deleted a few snapshot volumes (those were read-only snapshots of read-only snapshots) and recreated them and that's the ones I'm rsyncing from (basically, I prepare an area to be rsynced made of the latests snapshots of a few subvolumes). -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-08 16:12:28 -0400, Chris Mason: [...] > > I'm running a "dstat -df" at the same time and I'm seeing > > substantive amount of disk writes on the disks that hold the > > source FS (and I'm rsyncing from read-only snapshot subvolumes > > in case you're thinking of atimes) almost more than onto the > > drive holding the target FS!? > > These are probably atime updates. You can completely disable them with > mount -o noatime. [...] I don't think it is. First, as I said, I'm rsyncing from read-only snapshots (and I could see atimes were not updated) and nothing else is running. Then now looking at the traces this morning, There was a lot written in the first minutes, then it calmed down. That's every 5 minutes (in megabytes written to the hard disks making part of the source FS): 976 *|**|** 1786 *|| 2486 *|**|** 3139 *|| 3720 *|*|* 4226 *|**|** 4756 *||*** 5134 **||*** 5573 ***|*| 5991 |**| 6423 |***|* 6580 ||* 6598 ||* 6602 ||* 6671 ||* 6774 ||** 6887 ||** 6928 ||** 6943 |*|** 6954 |*|** 6969 |*|** 6982 |*|** 6998 |*|** 7016 |*|** 7106 |*|** 7143 |*|** 7184 |*|*** 7216 |*|*** 7244 |*|*** 7315 |*|*** 7347 |**|*** 7358 |**|*** 7418 |**|*** 7432 |**|*** 7438 |**|*** 7438 |**|*** 7462 |**|*** 7483 |**|*** 7528 |**|*** 7529 |**|*** 7618 |**| 7869 |***| 8325 *||** 8419 *||** 8533 *|*|** 8587 *|*|** 8731 *|*|*** 8830 *|*|*** 8869 *|*|*** 8881 *|**|*** 8903 *|**|*** 8974 *|**|*** 9044 *|**|*** 9100 *|**| 9492 *|***|* 9564 *|***|* 9640 *||* 9684 *||* 9756 *||* 9803 *||* 9836 *||* 9845 *||** 9881 *||** 9989 *||** 10083 *|*|** 10172 *|*|** 10230 *|*|** 10250 *|*|*** 10271 *|*|*** 10291 *|*|*** 10306 *|*|*** 10314 *|*|*** 10330 *|*|*** 10395 *|*|*** 10468 *|**|*** 10641 *|**|*** 10728 *|**| 10818 *|**|**
Re: Memory leak?
2011-07-08 12:15:08 -0400, Chris Mason: [...] > I'd definitely try without -o compress_force. [...] Just started that over the night. I'm running a "dstat -df" at the same time and I'm seeing substantive amount of disk writes on the disks that hold the source FS (and I'm rsyncing from read-only snapshot subvolumes in case you're thinking of atimes) almost more than onto the drive holding the target FS!? --dsk/sda-- --dsk/sdb-- --dsk/sdc-- --dsk/sdd-- read writ: read writ: read writ: read writ 1000k0 : 580k0 :2176k0 : 0 0 1192k0 : 76k0 : 988k0 : 0 0 436k0 : 364k0 :1984k0 : 033M 824k0 : 812k0 :4276k0 : 0 0 3004k0 :2868k0 :5488k0 : 0 0 796k 564k: 640k 25M:2284k 4600k: 0 0 108k0 : 68k 23M: 648k 35M: 0 0 1712k 12k:1644k 12k:2476k 7864k: 0 0 732k0 : 620k0 :3192k0 : 0 0 1148k0 :1116k0 :3532k0 : 019M 1392k0 :1380k0 :4416k0 : 0 7056k 1336k0 :1212k0 :6664k0 : 0 3148k 820k0 : 784k0 :4528k0 : 048M 1336k0 :1340k0 :3964k0 : 0 8996k 1528k0 :1280k0 :2908k0 : 0 0 1352k0 :1216k0 :3880k0 : 0 0 864k0 : 888k0 :1684k0 : 0 0 1268k0 :1208k0 :3072k0 : 0 0 (source FS is on sda4+sdb+sdc, target on sdd, sda alsa has an ext4 FS) How can that be? Some garbage collection, background defrag or something like that? But then, if I stop the rsync, those writes stop as well. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-08 12:15:08 -0400, Chris Mason: [...] > You described this workload as rsync, is there anything else running? [...] Nope. Nothing else. And at least initially, that was onto an empty drive so basic copy. rsync --archive --xattrs --hard-links --numeric-ids --sparse --acls Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-08 12:17:54 -0400, Chris Mason: [...] > > Jun 5 00:58:10 BUG: Bad page state in process rsync pfn:1bfdf > > Jun 5 00:58:10 page:ea61f8c8 count:0 mapcount:0 mapping: > > (null) index:0x2300 > > Jun 5 00:58:10 page flags: 0x110(dirty) > > Jun 5 00:58:10 Pid: 1584, comm: rsync Tainted: G D C > > 2.6.38-7-server #35-Ubuntu > > Jun 5 00:58:10 Call Trace: > > Ok, this one is really interesting. Did you get this after another oops > or was it after a reboot? > After the oops above (a few hours after though). The two reports were together with nothing inbetween in the kern.log. That was the only occurrence though. > How easily can you recompile your kernel with more debugging flags? > That should help narrow it down. I'm looking for CONFIG_SLAB_DEBUG (or > slub) and CONFIG_DEBUG_PAGEALLOC [...] I can try next week. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-08 16:41:23 +0100, Stephane Chazelas: > 2011-07-08 11:06:08 -0400, Chris Mason: > [...] > > So the invalidate opcode in btrfs-fixup-0 is the big problem. We're > > either failing to write because we weren't able to allocate memory (and > > not dealing with it properly) or there is a bigger problem. > > > > Does the btrfs-fixup-0 oops come before or after the ooms? > > Hi Chris, thanks for looking into this. > > It comes long before. Hours before there's any problem. So it > seems unrelated. Though every time I had the issue, there had been such an "invalid opcode" before. But also, I only had both the "invalid opcode" and memory issue when doing that rsync onto external hard drive. > > Please send along any oops output during the run. Only the first > > (earliest) oops matters. > > There's always only one in between two reboots. I've sent two > already, but here they are: [...] I dug up the traces for before I switched to debian (thinking getting a newer kernel would improve matters) in case it helps: Jun 4 18:12:58 [ cut here ] Jun 4 18:12:58 kernel BUG at /build/buildd/linux-2.6.38/fs/btrfs/inode.c:1555! Jun 4 18:12:58 invalid opcode: [#2] SMP Jun 4 18:12:58 last sysfs file: /sys/devices/virtual/block/dm-2/dm/name Jun 4 18:12:58 CPU 0 Jun 4 18:12:58 Modules linked in: sha256_generic cryptd aes_x86_64 aes_generic dm_crypt psmouse serio_raw xgifb(C+) i3200_edac edac_core nbd btrfs zlib_deflate libcrc32c xenbus_probe_frontend ums_cypress usb_storage uas e1000e ahci libahci Jun 4 18:12:58 Jun 4 18:12:58 Pid: 416, comm: btrfs-fixup-0 Tainted: G D C 2.6.38-7-server #35-Ubuntu empty empty/Tyan Tank GT20 B5211 Jun 4 18:12:58 RIP: 0010:[] [] btrfs_writepage_fixup_worker+0x145/0x150 [btrfs] Jun 4 18:12:58 RSP: 0018:88003cfddde0 EFLAGS: 00010246 Jun 4 18:12:58 RAX: RBX: ea04ca88 RCX: Jun 4 18:12:58 RDX: 88003cfddd98 RSI: RDI: 8800152088b0 Jun 4 18:12:58 RBP: 88003cfdde30 R08: e8c09988 R09: 88003cfddd98 Jun 4 18:12:58 R10: R11: R12: 010ec000 Jun 4 18:12:58 R13: 880015208988 R14: R15: 010ecfff Jun 4 18:12:58 FS: () GS:88003fc0() knlGS: Jun 4 18:12:58 CS: 0010 DS: ES: CR0: 8005003b Jun 4 18:12:58 CR2: 00e73fe8 CR3: 30fcc000 CR4: 06f0 Jun 4 18:12:58 DR0: DR1: DR2: Jun 4 18:12:58 DR3: DR6: 0ff0 DR7: 0400 Jun 4 18:12:58 Process btrfs-fixup-0 (pid: 416, threadinfo 88003cfdc000, task 880036912dc0) Jun 4 18:12:58 Stack: Jun 4 18:12:58 880039c4e120 880015208820 88003cfdde90 880032da4b80 Jun 4 18:12:58 88003cfdde30 88003ce915a0 88003cfdde90 88003cfdde80 Jun 4 18:12:58 880036912dc0 88003ce915f0 88003cfddee0 a00c34f4 Jun 4 18:12:58 Call Trace: Jun 4 18:12:58 [] worker_loop+0xa4/0x3a0 [btrfs] Jun 4 18:12:58 [] ? worker_loop+0x0/0x3a0 [btrfs] Jun 4 18:12:58 [] kthread+0x96/0xa0 Jun 4 18:12:58 [] kernel_thread_helper+0x4/0x10 Jun 4 18:12:58 [] ? kthread+0x0/0xa0 Jun 4 18:12:58 [] ? kernel_thread_helper+0x0/0x10 Jun 4 18:12:58 Code: 1f 80 00 00 00 00 48 8b 7d b8 48 8d 4d c8 41 b8 50 00 00 00 4c 89 fa 4c 89 e6 e8 37 d1 01 00 eb b6 48 89 df e8 8d 1a 07 e1 eb 9a <0f> 0b 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 Jun 4 18:12:58 RIP [] btrfs_writepage_fixup_worker+0x145/0x150 [btrfs] Jun 4 18:12:58 RSP Jun 4 18:12:58 ---[ end trace e5cf15794ff3ebdb ]--- And: Jun 5 00:58:10 BUG: Bad page state in process rsync pfn:1bfdf Jun 5 00:58:10 page:ea61f8c8 count:0 mapcount:0 mapping: (null) index:0x2300 Jun 5 00:58:10 page flags: 0x110(dirty) Jun 5 00:58:10 Pid: 1584, comm: rsync Tainted: G D C 2.6.38-7-server #35-Ubuntu Jun 5 00:58:10 Call Trace: Jun 5 00:58:10 [] ? dump_page+0x9b/0xd0 Jun 5 00:58:10 [] ? bad_page+0xcc/0x120 Jun 5 00:58:10 [] ? prep_new_page+0x1a5/0x1b0 Jun 5 00:58:10 [] ? _raw_spin_lock+0xe/0x20 Jun 5 00:58:10 [] ? test_range_bit+0x111/0x150 [btrfs] Jun 5 00:58:10 [] ? get_page_from_freelist+0x264/0x650 Jun 5 00:58:10 [] ? generic_bin_search.clone.42+0x19e/0x200 [btrfs] Jun 5 00:58:10 [] ? __alloc_pages_nodemask+0x118/0x830 Jun 5 00:58:10 [] ? generic_bin_search.clone.42+0x19e/0x200 [btrfs] Jun 5 00:58:10 [] ? _raw_spin_lock+0xe/0x20 Jun 5 00:58:10 [] ? get_partial_node+0x92/0xb0 Jun 5 00:58:10 [] ? btrfs_submit_compressed_read+0x15d/0x4e0 [btrfs] Jun 5 00:58:10 [] ? alloc_pages_current+0xa5/0x110 Jun 5 00:58:10 [] ? btrfs_s
Re: Memory leak?
2011-07-08 11:06:08 -0400, Chris Mason: [...] > So the invalidate opcode in btrfs-fixup-0 is the big problem. We're > either failing to write because we weren't able to allocate memory (and > not dealing with it properly) or there is a bigger problem. > > Does the btrfs-fixup-0 oops come before or after the ooms? Hi Chris, thanks for looking into this. It comes long before. Hours before there's any problem. So it seems unrelated. > Please send along any oops output during the run. Only the first > (earliest) oops matters. There's always only one in between two reboots. I've sent two already, but here they are: Jul 1 18:25:57 [ cut here ] Jul 1 18:25:57 kernel BUG at /media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583! Jul 1 18:25:57 invalid opcode: [#1] SMP Jul 1 18:25:57 CPU 1 Jul 1 18:25:57 Modules linked in: sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt fuse snd_pcm psmouse tpm_tis tpm i2c_i801 snd_timer snd soundcore snd_page_alloc i3200_edac tpm_bios serio_raw evdev pcspkr processor button thermal_sys i2c_core container edac_core sg sr_mod cdrom ext4 mbcache jbd2 crc16 dm_mod nbd btrfs zlib_deflate crc32c libcrc32c ums_cypress usb_storage sd_mod crc_t10dif uas uhci_hcd ahci libahci libata ehci_hcd e1000e scsi_mod usbcore [last unloaded: scsi_wait_scan] Jul 1 18:25:57 Jul 1 18:25:57 Pid: 747, comm: btrfs-fixup-0 Not tainted 3.0.0-rc5-amd64 #1 empty empty/Tyan Tank GT20 B5211 Jul 1 18:25:57 RIP: 0010:[] [] btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs] Jul 1 18:25:57 RSP: 0018:8801483ffde0 EFLAGS: 00010246 Jul 1 18:25:57 RAX: RBX: ea000496a430 RCX: Jul 1 18:25:57 RDX: RSI: 06849000 RDI: 880071c1fcb8 Jul 1 18:25:57 RBP: 06849000 R08: 0008 R09: 8801483ffd98 Jul 1 18:25:57 R10: dead00200200 R11: dead00100100 R12: 880071c1fd90 Jul 1 18:25:57 R13: R14: 8801483ffdf8 R15: 06849fff Jul 1 18:25:57 FS: () GS:88014fd0() knlGS: Jul 1 18:25:57 CS: 0010 DS: ES: CR0: 8005003b Jul 1 18:25:57 CR2: f7596000 CR3: 00013def9000 CR4: 06e0 Jul 1 18:25:57 DR0: DR1: DR2: Jul 1 18:25:57 DR3: DR6: 0ff0 DR7: 0400 Jul 1 18:25:57 Process btrfs-fixup-0 (pid: 747, threadinfo 8801483fe000, task 88014672efa0) Jul 1 18:25:57 Stack: Jul 1 18:25:57 880071c1fc28 8800c70165c0 88011e61ca28 Jul 1 18:25:57 880146ef41c0 880146ef4210 880146ef41d8 Jul 1 18:25:57 880146ef41c8 880146ef4200 880146ef41e8 a01669fa Jul 1 18:25:57 Call Trace: Jul 1 18:25:57 [] ? worker_loop+0x186/0x4a1 [btrfs] Jul 1 18:25:57 [] ? schedule+0x5ed/0x61a Jul 1 18:25:57 [] ? btrfs_queue_worker+0x24a/0x24a [btrfs] Jul 1 18:25:57 [] ? btrfs_queue_worker+0x24a/0x24a [btrfs] Jul 1 18:25:57 [] ? kthread+0x7a/0x82 Jul 1 18:25:57 [] ? kernel_thread_helper+0x4/0x10 Jul 1 18:25:57 [] ? kthread_worker_fn+0x147/0x147 Jul 1 18:25:57 [] ? gs_change+0x13/0x13 Jul 1 18:25:57 Code: 41 b8 50 00 00 00 4c 89 f1 e8 d5 3b 01 00 48 89 df e8 fb ac f6 e0 ba 01 00 00 00 4c 89 ee 4c 89 e7 e8 ce 05 01 00 e9 4e ff ff ff <0f> 0b eb fe 48 8b 3c 24 41 b8 50 00 00 00 4c 89 f1 4c 89 fa 48 Jul 1 18:25:57 RIP [] btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs] Jul 1 18:25:57 RSP Jul 1 18:25:57 ---[ end trace 9744d33381de3d04 ]--- Jul 4 12:50:51 [ cut here ] Jul 4 12:50:51 kernel BUG at /media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583! Jul 4 12:50:51 invalid opcode: [#1] SMP Jul 4 12:50:51 CPU 0 Jul 4 12:50:51 Modules linked in: lm85 dme1737 hwmon_vid coretemp ipmi_si ipmi_msghandler sha256_generic cryptd aes_x86_64 aes_generic cbc fuse dm_crypt snd_pcm snd_timer snd sg soundcore i3200_edac snd_page_alloc sr_mod processor tpm_tis i2c_i801 pl2303 pcspkr thermal_sys i2c_core tpm edac_core tpm_bios cdrom usbserial container evdev psmouse button serio_raw ext4 mbcache jbd2 crc16 dm_mod nbd btrfs zlib_deflate crc32c libcrc32c ums_cypress sd_mod crc_t10dif usb_storage uas uhci_hcd ahci libahci ehci_hcd libata e1000e usbcore scsi_mod [last unloaded: i2c_dev] Jul 4 12:50:51 Jul 4 12:50:51 Pid: 764, comm: btrfs-fixup-0 Not tainted 3.0.0-rc5-amd64 #1 empty empty/Tyan Tank GT20 B5211 Jul 4 12:50:51 RIP: 0010:[] [] btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs] Jul 4 12:50:51 RSP: 0018:880147ffdde0 EFLAGS: 00010246 Jul 4 12:50:51 RAX: RBX: ea0004648098 RCX: Jul 4 12:50:51 RDX: RSI: 05854000 RDI: 8800073f18d0 Jul 4 12:50:51 RBP: 0
Re: Memory leak?
2011-07-06 09:11:11 +0100, Stephane Chazelas: > 2011-07-03 13:38:57 -0600, cwillu: > > On Sun, Jul 3, 2011 at 1:09 PM, Stephane Chazelas > > wrote: > [...] > > > Now, on a few occasions (actually, most of the time), when I > > > rsynced the data (about 2.5TB) onto the external drive, the > > > system would crash after some time with "Out of memory and no > > > killable process". Basically, something in kernel was allocating > > > the whole memory, then oom mass killed everybody and crash. > [...] > > Look at the output of slabtop (should be installed by default, procfs > > package), before rsync for comparison, and during. > > Hi, > > so, no crash this time [...] Another attempt, again onto an empty drive, this time with 3.0.0-rc6. Exact same scenario. slabinfo: extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache (in bytes) 2011-07-07_10:40 143264 29952 0 120832 2011-07-07_10:50 16088160 59593248 22814 196352 2011-07-07_11:00 19549728 83101824 25890 475776 2011-07-07_11:10 17695040 52438464 163416000 758976 2011-07-07_11:20 16723168 55321344 180821095040 2011-07-07_11:30 15468640 46624032 1665680001344256 2011-07-07_11:40 14651648 43962048 1439160001944640 2011-07-07_11:509010144262080097320002231616 2011-07-07_12:004224352364665692240002473280 2011-07-07_12:102528416 45676819840002511040 2011-07-07_12:202400640 42681614520002590336 2011-07-07_12:302241888 33696010520002590336 2011-07-07_12:404669632 10224864 368040002662080 2011-07-07_12:504386976432057680320002662080 2011-07-07_13:00428243217933765862662080 2011-07-07_13:104270816123177628480002662080 2011-07-07_13:204255328114192023480002662080 2011-07-07_13:304255328113443222320002662080 2011-07-07_13:404255328113068821920002662080 2011-07-07_13:504255328113068821720002662080 2011-07-07_14:004239840113068821680002665856 2011-07-07_14:1042127367259616 28422669632 2011-07-07_14:2070083209936576 231760002673408 2011-07-07_14:3069153929929088 225120002677184 2011-07-07_14:4070470409929088 253920002680960 2011-07-07_14:5085377609929088 263680002707392 2011-07-07_15:00 10469888 12804480 401160002718720 2011-07-07_15:10 13195776 14028768 47082726272 2011-07-07_15:20 13571360 13946400 457480002730048 2011-07-07_15:30 19580704 16200288 545960002737600 2011-07-07_15:40 19449056 16192800 524120002737600 2011-07-07_15:50 19445184 16192800 524120002737600 2011-07-07_16:00 19425824 19450080 66482741376 2011-07-07_16:10 24858240 25994592 910760002756480 2011-07-07_16:20 24246464 25930944 774480002760256 2011-07-07_16:30 25477760 35144928 1313120002767808 2011-07-07_16:40 40625024 85512960 3264440002767808 2011-07-07_16:50 53247744 102390912 346562767808 2011-07-07_17:00 63465952 113390784 3884160002786688 2011-07-07_17:10 10020736 13523328 24962805568 2011-07-07_17:2089714249345024 28902809344 2011-07-07_17:309257952 31408416 78042805568 2011-07-07_17:409257952 31378464 768120002805568 2011-07-07_17:509257952 31374720 758280002805568 2011-07-07_18:006857312 13725504 21942805568 2011-07-07_18:106605632 10707840 378160002813120 2011-07-07_18:20 10768032 17252352 540320002820672 2011-07-07_18:30 21737408 74595456 2104920002824448 2011-07-07_18:40 14554848 16967808 412320002832000 2011-07-07_18:506594016 10281024 308760002832000 2011-07-07_19:006322976 11467872 408040002835776 2011-07-07_19:107798208 12227904 46522843328 2011-07-07_19:209927808 13279968 46962850880 2011-07-07_19:30 10237568 13272480 466440002854656 2011-07-07_19:40 17160704 16368768 56742865984 2011-07-07_19:50 26039200 29105856 1018280002881088 2011-07-07_20:00 27878400 33988032 1155280002881088 2011-07-07_20:10 30604288 39151008 125962881088 2011-07-07_20:20 31339968 39049920 1254760002881088 2011-07-07_20:30 31297376 39042432 1279280002881088 2011-07-07_20:40 31390304 39038688 1290480002881088 2011-07-07_20:50 31390304 39038688 127282888640 2011-07-07_21:00 31390304 39038688 1271880002892416 2011-07-07_21:10 31738784 39038688 127162892416 2011-07-07_21:20 40299776 49342176 173702896192 2011-07-07_21:30 46041952 58904352 206176000
Re: Memory leak?
2011-07-07 16:20:20 +0800, Li Zefan: [...] > btrfs_inode_cache is a slab cache for in memory inodes, which is of > struct btrfs_inode. [...] Thanks Li. If that's a cache, the system should be able to reuse the space there when it's low on memory, wouldn't it? What would be the conditions where that couldn't be done? (like in my case, where the oom killer was hired to free memory rather than reclaiming that cache memory). Best regards, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-06 09:11:11 +0100, Stephane Chazelas: [...] > extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache > (in bytes) [...] > 01:00 267192640 668595744 23216460003418048 > 01:10 267192640 668595744 23216460003418048 > 01:20 267192640 668595744 23216460003418048 > 01:30 267192640 668595744 23216460003418048 > 01:40 267192640 668595744 23216460003418048 [...] I've just come accross http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=4b9465cb9e3859186eefa1ca3b990a5849386320 GIT> author Chris Mason GIT>Fri, 3 Jun 2011 13:36:29 + (09:36 -0400) GIT> committer Chris Mason GIT>Sat, 4 Jun 2011 12:03:47 + (08:03 -0400) GIT> commit 4b9465cb9e3859186eefa1ca3b990a5849386320 GIT> tree 8fc06452fb75e52f6c1c2e2253c2ff6700e622fdtree | snapshot GIT> parent e7786c3ae517b2c433edc91714e86be770e9f1cecommit | diff GIT> Btrfs: add mount -o inode_cache GIT> GIT> This makes the inode map cache default to off until we GIT> fix the overflow problem when the free space crcs don't fit GIT> inside a single page. I would have thought that would have disabled that btrfs_inode_cache. And I can see that patch is in 3.0.0-rc5 (I'm not mounting with -o inode_cache). So, why those 2.2GiB in btrfs_inode_cache above? -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory leak?
2011-07-03 13:38:57 -0600, cwillu: > On Sun, Jul 3, 2011 at 1:09 PM, Stephane Chazelas > wrote: [...] > > Now, on a few occasions (actually, most of the time), when I > > rsynced the data (about 2.5TB) onto the external drive, the > > system would crash after some time with "Out of memory and no > > killable process". Basically, something in kernel was allocating > > the whole memory, then oom mass killed everybody and crash. [...] > Look at the output of slabtop (should be installed by default, procfs > package), before rsync for comparison, and during. Hi, so, no crash this time, but at the end of the rsync, there's a whole chunk of memory that is no longer available to processes (about 3.5G). As suggested by Carey, I watched /proc/slabinfo during the rsync process (see below for a report of the most significant ones over time). Does that mean that if the system had had less than 3G of RAM, it would have crashed? I tried to reclaim the space without success. I had a process allocate as much memory as it could. Then I unmounted the btrfs fs that rsync was copying onto (the one on LUKS). btrfs_inode_cache hardly changed. Then I tried to unmount the source (the one on 3 hard disks and plenty of subvolumes). umount hung. The FS disappeared from /proc/mounts. Here is the backtrace: [169270.268005] umount D 880145ebe770 0 24079 1290 0x0004 [169270.268005] 880145ebe770 0086 8160b020 [169270.268005] 00012840 880123bc7fd8 880123bc7fd8 00012840 [169270.268005] 880145ebe770 880123bc6010 7fffac84f4a8 0001 [169270.268005] Call Trace: [169270.268005] [] ? rwsem_down_failed_common+0xda/0x10e [169270.268005] [] ? call_rwsem_down_write_failed+0x13/0x20 [169270.268005] [] ? down_write+0x25/0x27 [169270.268005] [] ? deactivate_super+0x30/0x3d [169270.268005] [] ? sys_umount+0x2ea/0x315 [169270.268005] [] ? system_call_fastpath+0x16/0x1b iostat shows nothing being written to the drives. extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache (in bytes) 09:40 131560 0 0128 09:50 131560 0 2000128 rsync started at 9:52 10:00 15832608 87963264 146583 325440 10:10 386056 85071168 1444656000 832960 10:20 237600 33988032 15497690001355584 10:30 23300288 145209600 4872740002237056 10:40 24667104 148610016 5064920002304448 10:50 22479776 139655808 5118930002382464 11:004137672104169645920002425344 11:1054985923211776 127420002443648 11:204567904140889695990002452608 11:302276736130982446850002453696 11:402225696 42192019870002455424 11:501971552 90432 3830002466176 12:001761672 74016 3270002469760 12:101939608 85824 4010002473536 12:202136288 121824 5510002479168 12:302367288 135648 6190002486016 12:401380984 181152 9110002485696 12:501053272 20217610270002483712 13:001938200 21974411520002491712 13:102037112 22377612040002494528 13:201775664 24912012440002497216 13:301704560 36604817320002500608 13:401433344 4688642462501824 13:508553248 20505888 673320002503168 14:00 12682208 34351200 1419680002494208 14:10 18488800 50282784 1778030002500544 14:20 19435592 46767744 1635820002505920 14:30 18734936 44863488 1565010002507200 14:40 21865184 46053504 1601850002484928 14:50 24457664 46473120 1624460002499200 15:00 24401344 47700576 1667230002502784 15:10 31390304 63426240 2211790002521472 15:20 34594560 61365600 2142430002524160 15:30 33836704 60934752 2126950002526400 15:40 33358776 60598944 2114550002528320 15:50 34909952 62583840 2184920002526272 16:00 44326656 65875392 2301230002529792 16:10 45840608 66373632 2321140002532736 16:20 47848064 66577536 2320480002535872 16:30 48013152 6160 2406510002536128 16:40 47594184 67766976 2362410002536576 16:50 48144184 67739904 236122542144 17:00 48005848 67639392 2352980002544000 17:10 48253920 67661280 2353760002537216 17:20 48857952 67612032 2349950002536000 17:30 48514752 67611168 2349860002535488 17:40 48436872 67609728 2349240002534528 17:50 48902216 67765248 2356540002542400 18:00 49055160 67763520 236022542912 18:10 48749712 67727520 235742550464 18:20 48631088 67705344 2355570002553280 18:30 49101096 6344 235713000220 18:40 48609264 67782816 2356010002558912 18:50 48480080 67808160 235595000256179
Memory leak?
Hiya, I've got a server using brtfs to implement a backup system. Basically, every night, for a few machines, I rsync (and other methods) their file systems into one btrfs subvolume each and then snapshot it. On that server, the btrfs fs is on 3 3TB drives, mounted with compress-force. Every week, I rsync the most recent snapshot of a selection of subvolumes onto an encrypted (LUKS) external hard drive (3TB as well). Now, on a few occasions (actually, most of the time), when I rsynced the data (about 2.5TB) onto the external drive, the system would crash after some time with "Out of memory and no killable process". Basically, something in kernel was allocating the whole memory, then oom mass killed everybody and crash. That was with ubuntu 2.6.38. I had then moved to debian and 2.6.39 and thought the problem was fixed, but it just happened again with 3.0.0rc5 while rsyncing onto an initially empty btrfs fs. I'm going to resume the rsync again, and it's likely to happen again. Is there anything simple (as I've got very little time to look into that) I could do to help debug the issue (I'm not 100% sure it's btrfs fault but that's the most likely culprit). For a start, I'll switch the console to serial, and watch /proc/vmstat. Anything else I could do? Note that that server has never crashed when doing a lot of IO at the same time in a lot of subvolumes with remote hosts. It's only when copying data to that external drive on LUKS that it seems to crash. Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.0.0rc5] invalid opcode
2011-07-01 15:07:52 -0400, Josef Bacik: > On 07/01/2011 01:44 PM, Stephane Chazelas wrote: [...] > > [ 8203.192146] kernel BUG at > > /media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583! > > [ 8203.192210] invalid opcode: [#1] SMP [...] > > [ 8203.193219] Process btrfs-fixup-0 (pid: 747, threadinfo > > 8801483fe000, task 88014672efa0) [...] > > [ 8203.193538] [] ? worker_loop+0x186/0x4a1 [btrfs] > > [ 8203.193579] [] ? schedule+0x5ed/0x61a > > [ 8203.193624] [] ? btrfs_queue_worker+0x24a/0x24a > > [btrfs] > > [ 8203.193673] [] ? btrfs_queue_worker+0x24a/0x24a > > [btrfs] > > [ 8203.193714] [] ? kthread+0x7a/0x82 > > [ 8203.193750] [] ? kernel_thread_helper+0x4/0x10 > > [ 8203.193788] [] ? kthread_worker_fn+0x147/0x147 > > [ 8203.193825] [] ? gs_change+0x13/0x13 > > [ 8203.193859] Code: 41 b8 50 00 00 00 4c 89 f1 e8 d5 3b 01 00 48 89 df e8 > > fb ac f6 e0 ba 01 00 00 00 4c 89 ee 4c 89 e7 e8 ce 05 01 00 e9 4e ff ff ff > > <0f> 0b eb fe 48 8b 3c 24 41 b8 50 00 00 00 4c 89 f1 4c 89 fa 48 > > [ 8203.194087] RIP [] > > btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs] > > [ 8203.194160] RSP > > [ 8203.194907] ---[ end trace 9744d33381de3d04 ]--- > > > > Should I be worried? > > > > A little, can you reproduce it? Thanks, [...] Well no. I have no idea what triggered it. It was in the middle of doing an rsync from one compressed btrfs fs on 3 HDs (1 partition and 2 HDs) onto a btrfs FS on one LUKS device (compressed as well). That rsync is still happily ongoing and the server looks fine. Nothing else happening. What's btrfs-fixup meant to do? Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[3.0.0rc5] invalid opcode
Hi, I just got one of those: [ 8203.192107] [ cut here ] [ 8203.192146] kernel BUG at /media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583! [ 8203.192210] invalid opcode: [#1] SMP [ 8203.192246] CPU 1 [ 8203.192256] Modules linked in: sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt fuse snd_pcm psmouse tpm_tis tpm i2c_i801 snd_timer snd soundcore snd_page_alloc i3200_edac tpm_bios serio_raw evdev pcspkr processor button thermal_sys i2c_core container edac_core sg sr_mod cdrom ext4 mbcache jbd2 crc16 dm_mod nbd btrfs zlib_deflate crc32c libcrc32c ums_cypress usb_storage sd_mod crc_t10dif uas uhci_hcd ahci libahci libata ehci_hcd e1000e scsi_mod usbcore [last unloaded: scsi_wait_scan] [ 8203.192603] [ 8203.192630] Pid: 747, comm: btrfs-fixup-0 Not tainted 3.0.0-rc5-amd64 #1 empty empty/Tyan Tank GT20 B5211 [ 8203.192697] RIP: 0010:[] [] btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs] [ 8203.192781] RSP: 0018:8801483ffde0 EFLAGS: 00010246 [ 8203.192816] RAX: RBX: ea000496a430 RCX: [ 8203.192855] RDX: RSI: 06849000 RDI: 880071c1fcb8 [ 8203.192893] RBP: 06849000 R08: 0008 R09: 8801483ffd98 [ 8203.192932] R10: dead00200200 R11: dead00100100 R12: 880071c1fd90 [ 8203.192971] R13: R14: 8801483ffdf8 R15: 06849fff [ 8203.193010] FS: () GS:88014fd0() knlGS: [ 8203.193067] CS: 0010 DS: ES: CR0: 8005003b [ 8203.193103] CR2: f7596000 CR3: 00013def9000 CR4: 06e0 [ 8203.193141] DR0: DR1: DR2: [ 8203.193180] DR3: DR6: 0ff0 DR7: 0400 [ 8203.193219] Process btrfs-fixup-0 (pid: 747, threadinfo 8801483fe000, task 88014672efa0) [ 8203.193277] Stack: [ 8203.193304] 880071c1fc28 8800c70165c0 88011e61ca28 [ 8203.193371] 880146ef41c0 880146ef4210 880146ef41d8 [ 8203.193434] 880146ef41c8 880146ef4200 880146ef41e8 a01669fa [ 8203.193497] Call Trace: [ 8203.193538] [] ? worker_loop+0x186/0x4a1 [btrfs] [ 8203.193579] [] ? schedule+0x5ed/0x61a [ 8203.193624] [] ? btrfs_queue_worker+0x24a/0x24a [btrfs] [ 8203.193673] [] ? btrfs_queue_worker+0x24a/0x24a [btrfs] [ 8203.193714] [] ? kthread+0x7a/0x82 [ 8203.193750] [] ? kernel_thread_helper+0x4/0x10 [ 8203.193788] [] ? kthread_worker_fn+0x147/0x147 [ 8203.193825] [] ? gs_change+0x13/0x13 [ 8203.193859] Code: 41 b8 50 00 00 00 4c 89 f1 e8 d5 3b 01 00 48 89 df e8 fb ac f6 e0 ba 01 00 00 00 4c 89 ee 4c 89 e7 e8 ce 05 01 00 e9 4e ff ff ff <0f> 0b eb fe 48 8b 3c 24 41 b8 50 00 00 00 4c 89 f1 4c 89 fa 48 [ 8203.194087] RIP [] btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs] [ 8203.194160] RSP [ 8203.194907] ---[ end trace 9744d33381de3d04 ]--- Should I be worried? Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Re: [btrfs-progs integration] incorrect argument checking for "btrfs sub snap -r"
2011-07-01 11:42:23 +0100, Hugo Mills: [...] > > > diff --git a/btrfs.c b/btrfs.c > > > index e117172..b50c58a 100644 > > > --- a/btrfs.c > > > +++ b/btrfs.c > > > @@ -49,7 +49,7 @@ static struct Command commands[] = { > > > /* > > > avoid short commands different for the case only > > > */ > > > - { do_clone, 2, > > > + { do_clone, -2, > > > "subvolume snapshot", "[-r] [/]\n" > > > "Create a writable/readonly snapshot of the subvolume > > > with\n" > > > "the name in the directory.", > > > diff --git a/btrfs_cmds.c b/btrfs_cmds.c > > > index 1d18c59..3415afc 100644 > > > --- a/btrfs_cmds.c > > > +++ b/btrfs_cmds.c > > > @@ -355,7 +355,7 @@ int do_clone(int argc, char **argv) > > > return 1; > > > } > > > } > > > - if (argc - optind < 2) { > > > + if (argc - optind != 2) { > > > fprintf(stderr, "Invalid arguments for subvolume snapshot\n"); > > > free(argv); > > > return 1; > > > > > Thanks for having another look at this. You are perfectly right. Should > > we patch my patch or should I rework a corrected version? What do you > > think Hugo? > >Could you send a follow-up patch with just the second hunk, please? > I screwed up the process with this (processing patches too quickly to > catch the review), and I've already published the patch with the first > hunk, above, into the for-chris branch. Hugo, not sure what you mean nor whom you're talking to, but I can certainly copy-paste the second hunk from above here: diff --git a/btrfs_cmds.c b/btrfs_cmds.c index 1d18c59..3415afc 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -355,7 +355,7 @@ int do_clone(int argc, char **argv) return 1; } } - if (argc - optind < 2) { + if (argc - optind != 2) { fprintf(stderr, "Invalid arguments for subvolume snapshot\n"); free(argv); return 1; Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Re: [btrfs-progs integration] incorrect argument checking for "btrfs sub snap -r"
2011-06-30 22:55:15 +0200, Andreas Philipp: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 30.06.2011 14:34, Stephane Chazelas wrote: > > Looks like this was missing in integration-20110626 for the > > readonly snapshot patch: > > > > diff --git a/btrfs.c b/btrfs.c > > index e117172..be6ece5 100644 > > --- a/btrfs.c > > +++ b/btrfs.c > > @@ -49,7 +49,7 @@ static struct Command commands[] = { > > /* > > avoid short commands different for the case only > > */ > > - { do_clone, 2, > > + { do_clone, -1, > > "subvolume snapshot", "[-r] [/]\n" > > "Create a writable/readonly snapshot of the subvolume with\n" > > "the name in the directory.", > > > > Without that, "btrfs sub snap -r x y" would fail as it's not *2* > > arguments. > Unfortunately, this is not correct either. "-1" means that the minimum > number of arguments is 1 and since we need at least and > this is 2. So the correct version should be -2. [...] Sorry, without looking closely at the source, I assumed -1 meant defer the checking to the subcommand handler. do_clone will indeed return an error if the number of arguments is less than expected (so with -2, you'll get a different error message if you do "btrfs sub snap -r foo" or "btrfs sub snap foo") , but will not if it's more than expected by the way. So the patch should probably be: diff --git a/btrfs.c b/btrfs.c index e117172..b50c58a 100644 --- a/btrfs.c +++ b/btrfs.c @@ -49,7 +49,7 @@ static struct Command commands[] = { /* avoid short commands different for the case only */ - { do_clone, 2, + { do_clone, -2, "subvolume snapshot", "[-r] [/]\n" "Create a writable/readonly snapshot of the subvolume with\n" "the name in the directory.", diff --git a/btrfs_cmds.c b/btrfs_cmds.c index 1d18c59..3415afc 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -355,7 +355,7 @@ int do_clone(int argc, char **argv) return 1; } } - if (argc - optind < 2) { + if (argc - optind != 2) { fprintf(stderr, "Invalid arguments for subvolume snapshot\n"); free(argv); return 1; Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [btrfs-progs integration] incorrect argument checking for "btrfs sub snap -r"
Looks like this was missing in integration-20110626 for the readonly snapshot patch: diff --git a/btrfs.c b/btrfs.c index e117172..be6ece5 100644 --- a/btrfs.c +++ b/btrfs.c @@ -49,7 +49,7 @@ static struct Command commands[] = { /* avoid short commands different for the case only */ - { do_clone, 2, + { do_clone, -1, "subvolume snapshot", "[-r] [/]\n" "Create a writable/readonly snapshot of the subvolume with\n" "the name in the directory.", Without that, "btrfs sub snap -r x y" would fail as it's not *2* arguments. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subvolumes missing from "btrfs subvolume list" output
2011-06-30 11:18:42 +0200, Andreas Philipp: [...] > >> After that, I posted a patch to fix btrfs-progs, which Chris > >> aggreed on: > >> > >> http://marc.info/?l=linux-btrfs&m=129238454714319&w=2 > > [...] > > > > Great. Thanks a lot > > > > It fixes my problem indeed. > > > > Which brings me to my next question: where to find the latest > > btrfs-progs if not at > > git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git [...] > Hugo Mills keeps an integration branch with nearly all patches to > btrfs-progs applied. > See > > http://www.spinics.net/lists/linux-btrfs/msg10594.html > > and for the last update > > http://www.spinics.net/lists/linux-btrfs/msg10890.html [...] Thanks. It might be worth adding a link to that to https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories Note that it (integration-20110626) doesn't seem to include the fix in http://marc.info/?l=linux-btrfs&m=129238454714319&w=2 though. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subvolumes missing from "btrfs subvolume list" output
2011-06-30 08:47:38 +0800, Li Zefan: > Stephane Chazelas wrote: > > 2011-06-29 15:37:47 +0100, Stephane Chazelas: > > [...] > >> I found > >> http://thread.gmane.org/gmane.comp.file-systems.btrfs/8123/focus=8208 > >> > >> which looks like the same issue, with Li Zefan saying he had a > >> fix, but I couldn't find any mention that it was actually fixed. > >> > >> Has anybody got any update on that? > > [...] > > > > I've found > > http://thread.gmane.org/gmane.comp.file-systems.btrfs/8232 > > > > After that, I posted a patch to fix btrfs-progs, which Chris aggreed > on: > > http://marc.info/?l=linux-btrfs&m=129238454714319&w=2 [...] Great. Thanks a lot It fixes my problem indeed. Which brings me to my next question: where to find the latest btrfs-progs if not at git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git (that one hasn't been modified in 8 months)? Have changes for read-only subvolumes, per-directory settings... been applied to some publicly available repository? Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subvolumes missing from "btrfs subvolume list" output
2011-06-29 15:37:47 +0100, Stephane Chazelas: [...] > I found > http://thread.gmane.org/gmane.comp.file-systems.btrfs/8123/focus=8208 > > which looks like the same issue, with Li Zefan saying he had a > fix, but I couldn't find any mention that it was actually fixed. > > Has anybody got any update on that? [...] I've found http://thread.gmane.org/gmane.comp.file-systems.btrfs/8232 but no corresponding fix or ioctl.c http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=history;f=fs/btrfs/ioctl.c I'm under the impression that the issue has been forgotten about. >From what I managed to gather though, it seems that what's on disk is correct, it's just the ioctl and/or "btrfs sub list" that's wrong. Am I right? (btw, I forgot to mention the kernel version: 3.0rc4 amd64, btrfs tools from git) Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
subvolumes missing from "btrfs subvolume list" output
Hiya, I've got a btrfs FS with 84 subvolumes in it (some created with "btrfs sub create", some with "btrfs sub snap" of the other ones). There's no nesting of subvolumes at all (all direct children of the root subvolume). The "btrfs subvolume list" is only showing 80 subvolumes. The 4 missing ones (1 original volume, 3 snapshots) do exist on disk and files in there have different st_devs from any other subvolume. How would I start investigating what's wrong? And how to fix it. I found http://thread.gmane.org/gmane.comp.file-systems.btrfs/8123/focus=8208 which looks like the same issue, with Li Zefan saying he had a fix, but I couldn't find any mention that it was actually fixed. Has anybody got any update on that? Thanks in advance, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: different st_dev's in one subvolume
2011-06-02 01:39:41 +0100, Stephane Chazelas: [...] > /mnt/1# zstat +device ./**/* > . 25 > A 26 > A/B 27 > A/B/inB 27 > A/inA 26 > A.snap 28 > A.snap/B 23 > A.snap/inA 28 > > Why does A.snap/B have a different st_dev from A.snap's? [...] > If I create another snap of A or A.snap, the "B" in there gets > the same st_dev (23). [...] And same inode, ctime, mtime, atime... And when I create a new snapshot, all those (regardless of where they are) have their times updated at once. I also noticed the st_nlink is always one but then came accross http://thread.gmane.org/gmane.comp.file-systems.btrfs/4580 -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
different st_dev's in one subvolume
Hiya, please consider this: ~# truncate -s1G ./a ~# mkfs.btrfs ./a ~# sudo mount -o loop ./a /mnt/1 ~# cd /mnt/1 /mnt/1# ls /mnt/1# btrfs sub c A Create subvolume './A' /mnt/1# btrfs sub c A/B Create subvolume 'A/B' /mnt/1# touch A/inA A/B/inB /mnt/1# btrfs sub snap A A.snap Create a snapshot of 'A' in './A.snap' /mnt/1# zmodload zsh/stat /mnt/1# zstat +device ./**/* . 25 A 26 A/B 27 A/B/inB 27 A/inA 26 A.snap 28 A.snap/B 23 A.snap/inA 28 Why does A.snap/B have a different st_dev from A.snap's? Also: /mnt/1# touch A.snap/B/foo touch: cannot touch `A.snap/B/foo': Permission denied I can rmdir that directory OK though. Also note that the permissions are different: /mnt/1# ll A total 0 drwx-- 1 root root 6 Jun 2 00:54 B/ -rw-r--r-- 1 root root 0 Jun 2 00:54 inA /mnt/1# ll A.snap total 0 drwxr-xr-x 1 root root 0 Jun 2 01:29 B/ -rw-r--r-- 1 root root 0 Jun 2 00:54 inA If I create another snap of A or A.snap, the "B" in there gets the same st_dev (23). /mnt/1# btrfs sub create A.snap/B/C Create subvolume 'A.snap/B/C' ERROR: cannot create subvolume # btrfs sub snap A.snap/B B.snap ERROR: 'A.snap/B' is not a subvolume -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange btrfs sub list output
2011-05-27 13:49:52 +0200, Andreas Philipp: [...] > > Thanks, I can understand that. What I don't get is how one creates > > a subvol with a top-level other than 5. I might be missing the > > obvious, though. > > > > If I do: > > > > btrfs sub create A btrfs sub create A/B btrfs sub snap A A/B/C > > > > A, A/B, A/B/C have their top-level being 5. How would I get a new > > snapshot to be a child of A/B for instance? > > > > In my case, 285, was not appearing in the btrfs sub list output, > > 287 was a child of 285 with path "data" while all I did was create > > a snapshot of 284 (path u6:10022/vm+xfs@u8/xvda1/g8/v3/data in vol > > 5) in u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 > > > > So I did manage to get a volume with a parent other than 5, but I > > did not ask for it. [...] > Reconsidering the explanations on btrfs subvolume list in this thread > I get the impression that a line in the output of btrfs subvolume list > with top level other than 5 indicates that the backrefs from one > subvolume to its parent are broken. > > What's your opinion on this? [...] Given that I don't really get what the parent-child relationship means in that context, I can't really comment. In effect, the snapshot had been created and was attached to the right directory (but didn't appear in the sub list), and there was an additional "data" volume that I had not asked for nor created that had the snapshot above as parent and that did appear in the sub list. It pretty much looks like a bug to me, I'd like to understand more so that I can maybe try and avoid running into it again. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange btrfs sub list output
2011-05-27 10:45:23 +0100, Hugo Mills: [...] > > How could a "subvolume 285" become a "top level"? > > > How does one get a subvolume with a top-level other than "5"? > >This just means that subvolume 287 was created (somewhere) inside > subvolume 285. > >Due to the way that the FS trees and subvolumes work, there's no > global namespace structure in btrfs; that is, there's no single data > structure that represents the entirety of the file/directory hierarchy > in the filesystem. Instead, it's broken up into these sub-namespaces > called subvolumes, and we only record parent/child relationships for > each subvolume separately. The "full path" you get from "btrfs subv > list" is reconstructed from that information in userspace(*). [...] Thanks, I can understand that. What I don't get is how one creates a subvol with a top-level other than 5. I might be missing the obvious, though. If I do: btrfs sub create A btrfs sub create A/B btrfs sub snap A A/B/C A, A/B, A/B/C have their top-level being 5. How would I get a new snapshot to be a child of A/B for instance? In my case, 285, was not appearing in the btrfs sub list output, 287 was a child of 285 with path "data" while all I did was create a snapshot of 284 (path u6:10022/vm+xfs@u8/xvda1/g8/v3/data in vol 5) in u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 So I did manage to get a volume with a parent other than 5, but I did not ask for it. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange btrfs sub list output
2011-05-27 10:12:24 +0100, Hugo Mills: [skipped useful clarification] > >That's all rather dense, and probably too much information. Hope > it's helpful, though. [...] It is, thanks. How would one end up in a situation where the output of "btrfs sub list ." has: ID 287 top level 285 path data How could a "subvolume 285" become a "top level"? How does one get a subvolume with a top-level other than "5"? -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange btrfs sub list output
Is there a way to derive the subvolume ID from the stat(2) st_dev, by the way. # btrfs sub list . ID 256 top level 5 path a ID 257 top level 5 path b # zstat +dev . a b . 27 a 28 b 29 Are the dev numbers allocated in the same order as the subvolids? Would there be any /sys, /proc, ioctl interface to get this kind of information? -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange btrfs sub list output
2011-05-27 10:21:03 +0200, Andreas Philipp: [...] > > What do those top-level IDs mean by the way? > The top-level ID associated with a subvolume is NOT the ID of this > particular subvolume but of the subvolume containing it. Since the > "root/initial" (sub-)volume has always ID 0, the subvolumes of "depth" > 1 will all have top-level ID set to 0. You need those top-level IDs to > correctly mount a specific subvolume by name. > > # mount /dev/dummy -o subvol=,subvolrootid= > /mountpoint > > Of course, you do need them, if you specify the subvolume to mount by > its ID. [...] Thanks Andreas for pointing that subvolrootid (might be worth adding it to https://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options BTW). In my case, on a freshly made btrfs file system, subvolumes have top-level 5. (and neither volume with id 0 or 5 appear in the btrfs sub list). All the top-levels are 5, and I don't even know how to create a subvolume with a different top-level there, so I wonder how that subvol that I had created with btrfs sub snap data snapshots/2011-03-30 ending up being a subvolume with ID 285 that doesn't appear in the "btrfs sub list" and contains a subvolume of "path" "data" in there (with its top-level being 285). All the other subvolumes and snapshots I've created in the exact same way are created with a top-level 5 and have an entry in "btrfs sub list" and don't have subvolumes of their own. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange btrfs sub list output
2011-05-26 22:22:03 +0100, Stephane Chazelas: [...] > I get a btrfs sub list output that I don't understand: > > # btrfs sub list /backup/ > ID 257 top level 5 path u1/linux/lvm+btrfs/storage/data/data > ID 260 top level 5 path u2/linux/lvm/linux/var/data > ID 262 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-11 > ID 263 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-07 > ID 264 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-07 > ID 265 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-07 > ID 266 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-26 > ID 267 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-08 > ID 268 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-22 > ID 269 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-15 > ID 270 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-14 > ID 271 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-14 > ID 272 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-14 > ID 273 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-29 > ID 274 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-01-26 > ID 275 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-03-07 > ID 276 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-04-01 > ID 277 top level 5 path u2/linux/lvm/linux/home/data > ID 278 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-27 > ID 279 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-27 > ID 280 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-27 > ID 281 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/data > ID 282 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/snapshots/2011-05-19 > ID 283 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/data > ID 284 top level 5 path u6:10022/vm+xfs@u8/xvda1/g8/v3/data > ID 286 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/snapshots/2011-05-24 > ID 287 top level 285 path data > ID 288 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/data > ID 289 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-03-11 > ID 290 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/data > ID 291 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/snapshots/2011-05-11 > ID 292 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-05-11 [...] > There is no "/backup/data" directory. There is however a > /backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 that > contains the same thing as what I get if I mount the fs with > subvolid=287. And I did do a btrfs sub snap data > snapshots/2011-03/30 there. > > What could be the cause of that? How to fix it? > > In case that matters, there used to be more components in the > path of u6:10022/vm+xfs@u8/xvda1/g8/v3/data. [...] I tried deleting the /backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 subvolume (what seems to be id 287) and I get: # btrfs sub delete snapshots/2011-03-30 Delete subvolume '/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30' ERROR: cannot delete '/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30' With a strace, it tells me: ioctl(3, 0x5000940f, 0x7fffc7841a80)= -1 ENOTEMPTY (Directory not empty) Then I realised that there was a "data" directory in there and that snapshots/2011-03-30 was actually id 285 (which doesn't appear in the btrfs sub list) and "snapshots/2011-03-30/data" is id 287. What do those top-level IDs mean by the way? Then I was able to delete snapshots/2011-03-30/data, but snapshots/2011-03-30 still didn't appear in the list. Then I was able to delete snapshots/2011-03-30 and recreate it, and this time it was fine. Still don't know what happened there. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
strange btrfs sub list output
Hiya, I get a btrfs sub list output that I don't understand: # btrfs sub list /backup/ ID 257 top level 5 path u1/linux/lvm+btrfs/storage/data/data ID 260 top level 5 path u2/linux/lvm/linux/var/data ID 262 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-11 ID 263 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-07 ID 264 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-07 ID 265 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-07 ID 266 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-26 ID 267 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-08 ID 268 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-22 ID 269 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-15 ID 270 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-14 ID 271 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-14 ID 272 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-14 ID 273 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-29 ID 274 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-01-26 ID 275 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-03-07 ID 276 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-04-01 ID 277 top level 5 path u2/linux/lvm/linux/home/data ID 278 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-27 ID 279 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-27 ID 280 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-27 ID 281 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/data ID 282 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/snapshots/2011-05-19 ID 283 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/data ID 284 top level 5 path u6:10022/vm+xfs@u8/xvda1/g8/v3/data ID 286 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/snapshots/2011-05-24 ID 287 top level 285 path data ID 288 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/data ID 289 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-03-11 ID 290 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/data ID 291 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/snapshots/2011-05-11 ID 292 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-05-11 See ID 287 above. There is no "/backup/data" directory. There is however a /backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 that contains the same thing as what I get if I mount the fs with subvolid=287. And I did do a btrfs sub snap data snapshots/2011-03/30 there. What could be the cause of that? How to fix it? In case that matters, there used to be more components in the path of u6:10022/vm+xfs@u8/xvda1/g8/v3/data. Thanks, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: curious writes on mounted, not used btrfs filesystem
2011-05-22 11:52:37 +0200, Tomasz Chmielewski: [...] > Can you try running these commands yourself: > > iostat -k 1 > > > And in a second terminal: > > while true; do sync ; done > > > To see if your btrfs makes writes on sync each time? [...] Yes it does. And I see: <7>[38554.244219] sync(3354): dirtied inode 1 (?) on dm-11 <7>[38554.244740] btrfs-submit-0(29134): WRITE block 301128 on dm-11 (56 sectors) <7>[38554.244774] btrfs-submit-0(29134): WRITE block 741832 on dm-11 (56 sectors) <7>[38554.249963] sync(3354): WRITE block 128 on dm-11 (8 sectors) <7>[38554.250010] sync(3354): WRITE block 131072 on dm-11 (8 sectors) <7>[38567.312908] sync(3356): dirtied inode 1 (?) on dm-11 <7>[38567.313330] btrfs-submit-0(29134): WRITE block 301256 on dm-11 (24 sectors) <7>[38567.313350] btrfs-submit-0(29134): WRITE block 741960 on dm-11 (24 sectors) <7>[38567.313358] btrfs-submit-0(29134): WRITE block 301288 on dm-11 (24 sectors) <7>[38567.313366] btrfs-submit-0(29134): WRITE block 741992 on dm-11 (24 sectors) <7>[38567.313393] btrfs-submit-0(29134): WRITE block 301312 on dm-11 (8 sectors) <7>[38567.313403] btrfs-submit-0(29134): WRITE block 742016 on dm-11 (8 sectors) <7>[38567.325194] sync(3356): WRITE block 128 on dm-11 (8 sectors) <7>[38567.325244] sync(3356): WRITE block 131072 on dm-11 (8 sectors) <7>[38570.374449] sync(3358): dirtied inode 1 (?) on dm-11 <7>[38570.374976] btrfs-submit-0(29134): WRITE block 301128 on dm-11 (56 sectors) <7>[38570.375011] btrfs-submit-0(29134): WRITE block 741832 on dm-11 (56 sectors) <7>[38570.379221] sync(3358): WRITE block 128 on dm-11 (8 sectors) <7>[38570.379272] sync(3358): WRITE block 131072 on dm-11 (8 sectors) <7>[38572.170816] sync(3359): dirtied inode 1 (?) on dm-11 <7>[38572.171289] btrfs-submit-0(29134): WRITE block 301256 on dm-11 (24 sectors) <7>[38572.171300] btrfs-submit-0(29134): WRITE block 741960 on dm-11 (24 sectors) <7>[38572.171304] btrfs-submit-0(29134): WRITE block 301288 on dm-11 (24 sectors) <7>[38572.171308] btrfs-submit-0(29134): WRITE block 741992 on dm-11 (24 sectors) <7>[38572.171320] btrfs-submit-0(29134): WRITE block 301312 on dm-11 (8 sectors) <7>[38572.171325] btrfs-submit-0(29134): WRITE block 742016 on dm-11 (8 sectors) <7>[38572.180338] sync(3359): WRITE block 128 on dm-11 (8 sectors) <7>[38572.180386] sync(3359): WRITE block 131072 on dm-11 (8 sectors) <7>[38574.186559] sync(3360): dirtied inode 1 (?) on dm-11 <7>[38574.187090] btrfs-submit-0(29134): WRITE block 301128 on dm-11 (56 sectors) <7>[38574.187125] btrfs-submit-0(29134): WRITE block 741832 on dm-11 (56 sectors) <7>[38574.191602] sync(3360): WRITE block 128 on dm-11 (8 sectors) <7>[38574.191654] sync(3360): WRITE block 131072 on dm-11 (8 sectors) <7>[38576.370003] sync(3361): dirtied inode 1 (?) on dm-11 <7>[38576.370452] btrfs-submit-0(29134): WRITE block 301256 on dm-11 (24 sectors) <7>[38576.370470] btrfs-submit-0(29134): WRITE block 741960 on dm-11 (24 sectors) <7>[38576.370478] btrfs-submit-0(29134): WRITE block 301288 on dm-11 (24 sectors) <7>[38576.370485] btrfs-submit-0(29134): WRITE block 741992 on dm-11 (24 sectors) <7>[38576.370513] btrfs-submit-0(29134): WRITE block 301312 on dm-11 (8 sectors) <7>[38576.370523] btrfs-submit-0(29134): WRITE block 742016 on dm-11 (8 sectors) <7>[38576.379718] sync(3361): WRITE block 128 on dm-11 (8 sectors) <7>[38576.379766] sync(3361): WRITE block 131072 on dm-11 (8 sectors) Every other sync the same. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: curious writes on mounted, not used btrfs filesystem
2011-05-21 14:58:21 +0200, Tomasz Chmielewski: > I have a btrfs filesystem (2.6.39) which is mounted, but otherwise, not used: > > # lsof -n|grep /mnt/btrfs processes with open fds are one thing. You could also have loop devices setup on it for instance. > # > > > I noticed that whenever I do "sync", btrfs will write for around 6.5s and > write 13 MB (see below). [...] You could try and play with /proc/sys/vm/block_dump to see what is being written (remember to disable logging of kernel messages by syslog). -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ssd option for USB flash drive?
2011-05-19 15:54:23 -0600, cwillu: [...] > Try with the "ssd_spread" mount option. [...] Thanks. I'll try that. > > I wonder now what credit to give to recommendations like in > > http://www.patriotmemory.com/forums/showthread.php?3696-HOWTO-Increase-write-speed-by-aligning-FAT32 > > http://linux-howto-guide.blogspot.com/2009/10/increase-usb-flash-drive-write-speed.html > > > > Doing a apt-get upgrade on that stick takes hours when the same > > takes a few minutes on an internal drive. > > Also, there's a package "libeatmydata" which will provide an > "eatmydata" command, which you can prefix your apt-get commands with. > This will disable the excessive sync calls that dpkg makes, and should > dramatically decrease the time for those sorts of things to finish. > Flash as found in thumb drives doesn't have much in the way of crash > guarantees anyway, so you're not really giving up much safety. Thanks. That's very useful indeed. Note that if you use that on aptitude/apg-get that means that the daemons started/restarted in the process will be affected, but it could be all the better in my case. Now, with that eatmydata, I'm thinking of trying qemu-nbd -c /dev/nbd0 /dev/mapper/original-device with that and have the rootfs mounted on that /dev/nbd0. That eatmydata could be a work around to the problem I was mentionning at https://lists.ubuntu.com/archives/ubuntu-server-bugs/2010-June/037846.html -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ssd option for USB flash drive?
2011-05-19 21:04:54 +0200, Hubert Kario: > On Wednesday 18 of May 2011 00:02:52 Stephane Chazelas wrote: > > Hiya, > > > > I've not found much detail on what the "ssd" btrfs mount option > > did. Would it make sense to enable it to a fs on a USB flash > > drive? > > yes, enabling discard is pointless though (no USB storage supports it AFAIK). > > > I'm using btrfs (over LVM) on a Live Linux USB stick to benefit > > from btrfs's compression and am trying to improve the > > performance. > > ssd mode won't improve performance by much (if any). > > You need to remember that USB2.0 is limited to about 20-30MiB/s (depending on > CPU) so it will be slow no matter what you do Thanks Hubert for the feedback. Well, for hard drives over USB, I can get to 40MiB/s read and write easily. Here, I believe the bottle neck is the flash memory. With that particular USB flash drive Corsair Voyager GT 16GB, I can get 25MiB/s sequential read and 17MiB/s sequential write, but that falls down to about 3-5MiB/s random write. [...] > aligning logical blocks to erase blocks can give some performance but the > only > way to make it really fast is not to use USB [...] For something that fits in your pocket and is almost universally bootable, there are not so many other options. I tried changing the alignment on FAT32 and it didn't make any difference. Playing with /proc/sys/vm/block_dump, I could see chunks of 3, 4, 5 data sectors being written at once regardless of the cluster size being used anyway. Interestingly when a user process writes to /dev/sdx, block_dump shows 4k writes to /dev/sdx only regardless of the size of the user writes while if it goes via the filesystem I can see writes of up to 120k. Also, I've very little knowledge of what happens at layers below the block device (scsi interface, usb-storage, and the device controller itself, for instance, I see /sys/block/sdi/queue/rotational is 1 for that usb stick, why, what does that mean in terms of performance and scheduling of read-writes?) I wonder now what credit to give to recommendations like in http://www.patriotmemory.com/forums/showthread.php?3696-HOWTO-Increase-write-speed-by-aligning-FAT32 http://linux-howto-guide.blogspot.com/2009/10/increase-usb-flash-drive-write-speed.html Doing a apt-get upgrade on that stick takes hours when the same takes a few minutes on an internal drive. If I boot a kvm virtual machine on that USB stick with a disk cache mode of "unsafe", that is writes are hardly every flushed to underlying storage, then that becomes lightning fast (at the expense of possibly losing data in case of host failure, but I'm not too worried about that), and flushing writes to device upon VM shutdown only takes a couple of minutes. So I figured that if I could make sure writing to the flash device is asynchronous (and reads priviledged), that would help. There's probably some solutions with aufs or some fuse solutions, but I thought there might be some solution in btrfs or some standard core layers usually underneath it. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ssd option for USB flash drive?
Hiya, I've not found much detail on what the "ssd" btrfs mount option did. Would it make sense to enable it to a fs on a USB flash drive? I'm using btrfs (over LVM) on a Live Linux USB stick to benefit from btrfs's compression and am trying to improve the performance. Would anybody have any recommendation on how to improve performance there? Like what would be the best way to enable/increase writeback buffer or any way to make sure writes are delayed and asynchronous? Would disabling read-ahead help? (at which level would it be done?). Any other tip (like disabling atime, aligning blocks/extents, figure out erase block sizes if relevant...)? Many thanks in advance, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wrong values in "df" and "btrfs filesystem df"
2011-04-12 15:22:57 +0800, Miao Xie: [...] > But the algorithm of df command doesn't simulate the above allocation > correctly, this > simulated allocation just allocates the stripes from two disks, and then, > these two disks > have no free space, but the third disk still has 1.2TB free space, df command > thinks > this space can be used to make a new RAID0 block group and ignores it. This > is a bug, > I think. [...] Thanks a lot Miao for the detailed explanation. So, the disk space is not lost, it's just df not reporting the available space correctly. That's me relieved. It explains why I'm getting: # blockdev --getsize64 /dev/sda4 2967698087424 # blockdev --getsize64 /dev/sdb 3000592982016 # blockdev --getsize64 /dev/sdc 3000592982016 # truncate -s 2967698087424 a # truncate -s 3000592982016 b # truncate -s 3000592982016 c # losetup /dev/loop0 ./a # losetup /dev/loop1 ./b # losetup /dev/loop2 ./c # mkfs.btrfs a b c # btrfs device scan /dev/loop[0-2] Scanning for Btrfs filesystems in '/dev/loop0' Scanning for Btrfs filesystems in '/dev/loop1' Scanning for Btrfs filesystems in '/dev/loop2' # mount /dev/loop0 /mnt/1 # df -k /mnt/1 Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop0 875867582856 5859474304 1% /mnt/1 # echo $(((8758675828 - 5859474304)*2**10)) 2968782360576 One disk worth of space lost according to df. While it should have been more something like $(((3000592982016-2967698087424)*2)) (about 60GB), or about 0 after the quasi-round-robin allocation patch, right? Best regards, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balancing start - and stop?
2011-04-06 12:43:50 +0100, Stephane Chazelas: [...] > The rate is going down. It's now down to about 14kB/s > > [658654.295752] btrfs: relocating block group 3919858106368 flags 20 > [671932.913235] btrfs: relocating block group 3919589670912 flags 20 > [686189.296126] btrfs: relocating block group 3919321235456 flags 20 > [701511.523990] btrfs: relocating block group 391905280 flags 20 > [718591.316339] btrfs: relocating block group 3918784364544 flags 20 > [725567.081031] btrfs: relocating block group 3918515929088 flags 20 > [744415.011581] btrfs: relocating block group 3918247493632 flags 20 > [762365.021458] btrfs: relocating block group 3917979058176 flags 20 > [780504.726067] btrfs: relocating block group 3917710622720 flags 20 [...] > At this rate, the balancing would be over in about 8 years. [...] Hurray! The btrfs balance eventually ran through after almost exactly 2 weeks. It didn't get down to 0: [1189505.152717] btrfs: found 60527 extents [1189505.440565] btrfs: relocating block group 3910731300864 flags 20 [1199805.071045] btrfs: found 60235 extents [1199805.447821] btrfs: relocating block group 3910462865408 flags 20 [1207914.737372] btrfs: found 58039 extents iostat reckons 9TB have been written to disk in the whole process (4.5TB read from them (!?)). There hasn't been any change in allocation though: # df -h /backup FilesystemSize Used Avail Use% Mounted on /dev/sda4 8.2T 3.5T 3.2T 53% /backup # btrfs fi df /backup Data, RAID0: total=3.42TB, used=3.41TB System, RAID1: total=16.00MB, used=228.00KB Metadata, RAID1: total=28.00GB, used=20.47GB # btrfs fi show Label: none uuid: a0ae35c4-51f2-405f-a4bb-e4f134b1d193 Total devices 3 FS bytes used 3.43TB devid4 size 2.73TB used 1.17TB path /dev/sdc devid3 size 2.73TB used 1.17TB path /dev/sdb devid2 size 2.70TB used 1.14TB path /dev/sda4 Btrfs Btrfs v0.19 Still 1.5TB missing. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wrong values in "df" and "btrfs filesystem df"
2011-04-10 18:13:51 +0800, Miao Xie: [...] > >> # df /srv/MM > >> > >> Filesystem 1K-blocks Used Available Use% Mounted on > >> /dev/sdd15846053400 1593436456 2898463184 36% /srv/MM > >> > >> # btrfs filesystem df /srv/MM > >> > >> Data, RAID0: total=1.67TB, used=1.48TB > >> System, RAID1: total=16.00MB, used=112.00KB > >> System: total=4.00MB, used=0.00 > >> Metadata, RAID1: total=3.75GB, used=2.26GB > >> > >> # btrfs-show > >> > >> Label: MMedia uuid: 120b036a-883f-46aa-bd9a-cb6a1897c8d2 > >>Total devices 3 FS bytes used 1.48TB > >>devid3 size 1.81TB used 573.76GB path /dev/sdb1 > >>devid2 size 1.81TB used 573.77GB path /dev/sde1 > >>devid1 size 1.82TB used 570.01GB path /dev/sdd1 > >> > >> Btrfs Btrfs v0.19 > >> > >> > >> > >> "df" shows an "Available" value which isn't related to any real value. > > > >I _think_ that value is the amount of space not allocated to any > > block group. If that's so, then Available (from df) plus the three > > "total" values (from btrfs fi df) should equal the size value from df. > > This value excludes the space that can not be allocated to any block group, > This feature was implemented to fix the bug df command add the disk space, > which > can not be allocated to any block group forever, into the "Available" value. > (see the changelog of the commit 6d07bcec969af335d4e35b3921131b7929bd634e) > > This implementation just like fake chunk allocation, but the fake allocation > just allocate the space from two of these three disks, doesn't spread the > stripes over all the disks, which has enough space. [...] Hi Miao, would you care to expand a bit on that. In Helmut's case above where all the drives have at least 1.2TB free, how would there be un-allocatable space? What's the implication of having disks of differing sizes? Does that mean that the extra space on larger disks is lost? Thanks, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wrong values in "df" and "btrfs filesystem df"
2011-04-09 10:11:41 +0100, Hugo Mills: [...] > > # df /srv/MM > > > > Filesystem 1K-blocks Used Available Use% Mounted on > > /dev/sdd15846053400 1593436456 2898463184 36% /srv/MM > > > > # btrfs filesystem df /srv/MM > > > > Data, RAID0: total=1.67TB, used=1.48TB > > System, RAID1: total=16.00MB, used=112.00KB > > System: total=4.00MB, used=0.00 > > Metadata, RAID1: total=3.75GB, used=2.26GB > > > > # btrfs-show > > > > Label: MMedia uuid: 120b036a-883f-46aa-bd9a-cb6a1897c8d2 > > Total devices 3 FS bytes used 1.48TB > > devid3 size 1.81TB used 573.76GB path /dev/sdb1 > > devid2 size 1.81TB used 573.77GB path /dev/sde1 > > devid1 size 1.82TB used 570.01GB path /dev/sdd1 > > > > Btrfs Btrfs v0.19 > > > > > > > > "df" shows an "Available" value which isn't related to any real value. > >I _think_ that value is the amount of space not allocated to any > block group. If that's so, then Available (from df) plus the three > "total" values (from btrfs fi df) should equal the size value from df. [...] Well, $ echo $((2898463184 + 1.67*2**30 + 4*2**10 + 16*2**10*2 + 3.75*2**20*2)) 4699513214.079 I do get the same kind of discrepancy: $ df -h /mnt FilesystemSize Used Avail Use% Mounted on /dev/sdb 8.2T 3.5T 3.2T 53% /mnt $ sudo btrfs fi show Label: none uuid: ... Total devices 3 FS bytes used 3.43TB devid4 size 2.73TB used 1.17TB path /dev/sdc devid3 size 2.73TB used 1.17TB path /dev/sdb devid2 size 2.70TB used 1.14TB path /dev/sda4 $ sudo btrfs fi df /mnt Data, RAID0: total=3.41TB, used=3.41TB System, RAID1: total=16.00MB, used=232.00KB Metadata, RAID1: total=35.25GB, used=20.55GB $ echo $((3.2 + 3.41 + 2*16/2**20 + 2*35.25/2**10)) 6.678847656253 -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cloning single-device btrfs file system onto multi-device one
2011-03-28 14:17:48 +0100, Stephane Chazelas: [...] > So here is how I transferred a 6TB btrfs on one 6TB raid5 device > (on host src) over the network onto a btrfs on 3 3TB hard drives [...] > I then did a btrfs fi balance again and let it run through. However here is > what I get: [...] Sorry, it didn't run through and it is still running (after 9 days) and there are indications it could still be running 8 years from now (see other thread). There hasn't been any change in the amount of free space reported by df since the beginning of the balance (there still are 2TB missing). Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balancing start - and stop?
2011-04-04 20:07:54 +0100, Stephane Chazelas: [...] > > > 4.7 more days to go. And I reckon it will have written about 9 > > > TB to disk by that time (which is the total size of the volume, > > > though only 3.8TB are occupied). > > > > Yes - that's the pessimistic estimation. As Hugo has explained it can > > finish faster - just look to the data tomorrow again. > [...] > > That may be an optimistic estimation actually, as there hasn't > been much progress in the last 34 hours: [...] The rate is going down. It's now down to about 14kB/s [658654.295752] btrfs: relocating block group 3919858106368 flags 20 [671932.913235] btrfs: relocating block group 3919589670912 flags 20 [686189.296126] btrfs: relocating block group 3919321235456 flags 20 [701511.523990] btrfs: relocating block group 391905280 flags 20 [718591.316339] btrfs: relocating block group 3918784364544 flags 20 [725567.081031] btrfs: relocating block group 3918515929088 flags 20 [744415.011581] btrfs: relocating block group 3918247493632 flags 20 [762365.021458] btrfs: relocating block group 3917979058176 flags 20 [780504.726067] btrfs: relocating block group 3917710622720 flags 20 Even though it is reading and writing to disk at a much higher rate. Here stats every second: --dsk/sda-dsk/sdb-dsk/sdc-- read writ: read writ: read writ 0 0 : 540k0 : 12k0 0 0 : 704k0 : 20k0 0 0 :1068k0 : 24k0 0 0 : 968k0 : 0 0 0 0 : 932k0 :4096B0 0 0 : 832k 880k: 152k 1320k 60k 4096B: 880k 140k: 028M 68k0 : 308k0 :4096B 9240k 048k: 0 0 : 0 7852k 0 0 : 576k 6192k:4096B 26M 0 0 : 100k 18M: 0 0 0 0 : 28k 10M: 0 0 0 0 : 0 7020k: 0 0 0 0 : 52k 13M: 0 0 012k: 528k 17M: 012k 0 0 : 884k0 :8192B0 0 0 :1068k0 : 20k0 0 0 : 660k0 : 0 0 040k: 776k0 :4096B0 0 0 : 576k0 : 0 0 0 0 : 596k0 :8192B0 1096k 28k: 664k0 :4096B0 0 0 : 660k0 : 0 0 0 0 : 592k0 :8192B0 At this rate, the balancing would be over in about 8 years. Since the start of the balance: Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sda 10.04 1.56 1.5712282861237359 sdc 396.24 1.77 3.9513970153115057 sdb 421.17 1.87 3.9514737593115093 I think that's the end of my attempt to transfer that FS to another machine (see other thread). I'll have to ditch that copy and try again from scratch with another approach. Before I do that, is there anything I can do to help investigate the problem? regards, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balancing start - and stop?
2011-04-03 21:35:00 +0200, Helmut Hullen: > Hallo, Stephane, > > Du meintest am 03.04.11: > > balancing about 2 TByte needed about 20 hours. > > [...] > > >> Hugo has explained the limits of regarding > >> > >> dmesg | grep relocating > >> > >> or (more simple) the last lines of "dmesg" and looking for the > >> "relocating" lines. But: what do these lines tell now? What is the > >> (pessimistic) estimation when you extrapolate the data? > > [...] > > > 4.7 more days to go. And I reckon it will have written about 9 > > TB to disk by that time (which is the total size of the volume, > > though only 3.8TB are occupied). > > Yes - that's the pessimistic estimation. As Hugo has explained it can > finish faster - just look to the data tomorrow again. [...] That may be an optimistic estimation actually, as there hasn't been much progress in the last 34 hours: # dmesg | awk -F '[][ ]+' '/reloc/ &&n++%5==0 {x=(n-$7)/($2-t)/1048576; printf "%s\t%s\t%.2f\t%*s\n", $2/3600,$7, x, x/3, ""; t=$2; n=$7}' | tr ' ' '*' | tail -40 125.629 4170039951360 11.93 *** 125.641 4166818725888 70.99 *** 125.699 4157155049472 43.87 ** 125.753 4144270147584 63.34 * 125.773 4137827696640 84.98 125.786 4134606471168 64.39 * 125.823 4124942794752 70.09 *** 125.87 4112057892864 71.66 *** 125.887 4105615441920 100.60 * 125.898 4102394216448 81.26 *** 125.935 4092730540032 69.06 *** 126.33 4085751218176 4.69* 131.904 4072597880832 0.63 132.082 4059712978944 19.20 ** 132.12 4053270528000 45.52 *** 132.138 4050049302528 45.60 *** 132.225 4040385626112 29.68 * 132.267 4027500724224 81.17 *** 132.283 4021058273280 106.31 *** 132.29 4017837047808 110.42 132.316 4008173371392 100.54 * 132.358 3995288469504 81.18 *** 132.475 3988846018560 14.62 132.514 3985624793088 21.55 *** 132.611 3975961116672 26.40 132.663 3963076214784 65.31 * 132.678 3956633763840 120.11 132.685 3956365328384 10.26 *** 137.701 3949922877440 0.34 137.709 3946701651968 106.54 *** 137.744 3937037975552 72.10 137.889 3927105863680 18.18 ** 137.901 3926837428224 5.85* 141.555 3926300557312 0.04 141.93 3925226815488 0.76 151.227 3924421509120 0.02 151.491 3924153073664 0.27 151.712 3923616202752 0.64 165.301 3922542460928 0.02 174.346 3921737154560 0.02 At this rate (third field expressed in MiB/s), it could take months to complete. iostat still reports writes at about 5MiB/s though. Note that this system is not doing anything else at all. There definitely seems to be scope for optimisation in the "balancing" I'd say. -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balancing start - and stop?
2011-04-01 21:26:00 +0200, Helmut Hullen: > Hallo, Stephane, Hi Helmut, > Du meintest am 01.04.11: > > >> balancing about 2 TByte needed about 20 hours. > > [...] > > > I've got a balance running since Monday on a 9TB volume (3.5 of which > > are used, 3.2 allegedly free), showing no sign of finishing soon. > > Should I be worried? > > > Using /proc/sys/vm/block_dump, I can see it's seeking all over the > > place, which is probably why throughput is not high. I can also see > > it writing several times to the same sectors. > > > Hugo has explained the limits of regarding > > dmesg | grep relocating > > or (more simple) the last lines of "dmesg" and looking for the > "relocating" lines. But: what do these lines tell now? What is the > (pessimistic) estimation when you extrapolate the data? > > (please excuse my gerlish) [...] $ dmesg | grep reloc | sed -e 1b -e '$!d' [370178.209571] btrfs: relocating block group 5612075220992 flags 20 [546163.062739] btrfs: relocating block group 3923616202752 flags 20 So, if it's to go down to zero: $ echo $((3923616202752*(546163.062739-370178.209571)/(5612075220992-3923616202752)/86400)) 4.7332292856705722 4.7 more days to go. And I reckon it will have written about 9 TB to disk by that time (which is the total size of the volume, though only 3.8TB are occupied). -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balancing start - and stop?
On Fri, 2011-04-01 at 14:12 +0200, Helmut Hullen wrote: > Hallo, Struan, > > Du meintest am 01.04.11: > > > 1) Is the balancing operation expected to take many hours (or days?) > > on a filesystem such as this? Or are there known issues with the > > algorithm that are yet to be addressed? > > May be. Balancing about 15 GByte needed about 2 hours (or less), > balancing about 2 TByte needed about 20 hours. [...] I've got a balance running since Monday on a 9TB volume (3.5 of which are used, 3.2 allegedly free), showing no sign of finishing soon. Should I be worried? Using /proc/sys/vm/block_dump, I can see it's seeking all over the place, which is probably why throughput is not high. I can also see it writing several times to the same sectors. # df -h /backup FilesystemSize Used Avail Use% Mounted on /dev/sda4 8.2T 3.5T 3.2T 53% /backup # btrfs fi sh Label: none uuid: ... Total devices 3 FS bytes used 3.43TB devid4 size 2.73TB used 1.16TB path /dev/sdc devid3 size 2.73TB used 1.16TB path /dev/sdb devid2 size 2.70TB used 1.14TB path /dev/sda4 Btrfs Btrfs v0.19 # ps -eolstart,args | grep balance Mon Mar 28 11:18:18 2011 sudo btrfs fi balance /backup Mon Mar 28 11:18:18 2011 btrfs fi balance /backup # date Fri Apr 1 19:28:40 BST 2011 # btrfs fi df /backup Data, RAID0: total=3.41TB, used=3.41TB System, RAID1: total=16.00MB, used=232.00KB Metadata, RAID1: total=27.75GB, used=20.47GB # iostat -md Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sda 14.49 2.37 2.39 903123 913112 sdc 501.23 2.68 5.0610224561928462 sdb 477.28 2.58 5.06 9828531928482 It's already written more than the used space. Cheers, Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cloning single-device btrfs file system onto multi-device one
2011-03-28 14:24:03 +0100, Stephane Chazelas: > 2011-03-23 12:13:45 +0700, Fajar A. Nugraha: > > On Mon, Mar 21, 2011 at 11:24 PM, Stephane Chazelas > > wrote: > > > AFAICT, compression is enabled at mount time and would > > > only apply to newly created files. Is there a way to compress > > > files already in a btrfs filesystem? > > > > You need to select the files manually (not possible to select a > > directory), but yes, it's possible using "btrfs filesystem defragment > > -c" > [...] > > Thanks. However I find that for files that have snapshots, it > ends up increasing disk usage instead of reducing it (size of > the file + size of the compressed file, instead of size of the > file). > > If I do the btrfs fi de on both the volume and its snapshot, I > end up with some benefit only if the compression ratio is over > 2 (and with more snapshots, there's little chance of getting any > benefit at all). Also, with dozens of snapshots on a 4TB volume, > it's likely to take weeks to do. > > Is there a way around that? [...] OK, sorry. I can see now that it's a FAQ. So the answer to my question would be "no"... -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cloning single-device btrfs file system onto multi-device one
2011-03-23 12:13:45 +0700, Fajar A. Nugraha: > On Mon, Mar 21, 2011 at 11:24 PM, Stephane Chazelas > wrote: > > AFAICT, compression is enabled at mount time and would > > only apply to newly created files. Is there a way to compress > > files already in a btrfs filesystem? > > You need to select the files manually (not possible to select a > directory), but yes, it's possible using "btrfs filesystem defragment > -c" [...] Thanks. However I find that for files that have snapshots, it ends up increasing disk usage instead of reducing it (size of the file + size of the compressed file, instead of size of the file). If I do the btrfs fi de on both the volume and its snapshot, I end up with some benefit only if the compression ratio is over 2 (and with more snapshots, there's little chance of getting any benefit at all). Also, with dozens of snapshots on a 4TB volume, it's likely to take weeks to do. Is there a way around that? Thanks Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cloning single-device btrfs file system onto multi-device one
2011-03-22 18:06:29 -0600, cwillu: > > I can mount it back, but not if I reload the btrfs module, in which case I > > get: > > > > [ 1961.328280] Btrfs loaded > > [ 1961.328695] device fsid df4e5454eb7b1c23-7a68fc421060b18b devid 1 > > transid 118 /dev/loop0 > > [ 1961.329007] btrfs: failed to read the system array on loop0 > > [ 1961.340084] btrfs: open_ctree failed > > Did you rescan all the loop devices (btrfs dev scan /dev/loop*) after > reloading the module, before trying to mount again? Thanks. That probably was the issue, that and using too big files on too small volumes I'd guess. I've tried it in real life and it seemed to work to some extent. So here is how I transferred a 6TB btrfs on one 6TB raid5 device (on host src) over the network onto a btrfs on 3 3TB hard drives (on host dst): on src: lvm snapshot -L100G -n snap /dev/VG/vol nbd-server 12345 /dev/VG/snap (if you're not lucky enough to have used lvm there, you can use nbd-server's copy-on-write feature). on dst: nbd-client src 12345 /dev/nbd0 mount /dev/nbd0 /mnt btrfs device add /dev/sdb /dev/sdc /dev/sdd /mnt # in reality it was /dev/sda4 (a little under 3TB), /dev/sdb, # /dev/sdc btrfs device delete /dev/nbd0 /mnt That was relatively fast (about 18 hours) but failed with an error. Apparently, it managed to fill up the 3 3TB drives (as shown by btrfs fi show). Usage for /dev/nbd0 was at 16MB though (?!) I then did a "btrfs fi balance /mnt". I could see usage on the drives go down quickly. However, that was writing data onto /dev/nbd0 so was threatening to fill up my LVM snapshot. I then cancelled that by doing a hard reset on "dst" (couldn't find any other way). And then: Upon reboot, I mounted /dev/sdb instead of /dev/nbd0 in case that made a difference and then ran the btrfs device delete /dev/nbd0 /mnt again, which this time went through. I then did a btrfs fi balance again and let it run through. However here is what I get: $ df -h /mnt FilesystemSize Used Avail Use% Mounted on /dev/sdb 8.2T 3.5T 3.2T 53% /mnt Only 3.2T left. How would I reclaim the missing space? $ sudo btrfs fi show Label: none uuid: ... Total devices 3 FS bytes used 3.43TB devid4 size 2.73TB used 1.17TB path /dev/sdc devid3 size 2.73TB used 1.17TB path /dev/sdb devid2 size 2.70TB used 1.14TB path /dev/sda4 $ sudo btrfs fi df /mnt Data, RAID0: total=3.41TB, used=3.41TB System, RAID1: total=16.00MB, used=232.00KB Metadata, RAID1: total=35.25GB, used=20.55GB So that kind of worked but that is of little use to me as 2TB kind of disappeared under my feet in the process. Any idea, anyone? Thanks Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cloning single-device btrfs file system onto multi-device one
2011-03-21 16:24:50 +, Stephane Chazelas: [...] > I'm trying to move a btrfs FS that's on a hardware raid 5 (6TB > large, 4 of which are in use) to another machine with 3 3TB HDs > and preserve all the subvolumes/snapshots. [...] I tried one approach: export a LVM snapshot of the old fs as a nbd device, mount it from the new machine (/dev/nbd0), then add the new disks to the FS (btrfs add device) and then delete /dev/nbd0 which I'd hope would relocate all the extents onto the new disks. I did some experiments with some loop devices but got all sorts of results with different versions of kernels (debian unstable 2.6.37 and 2.6.38 amd64). Here is what I did: dd seek=512 bs=1M of=./a < /dev/null dd seek=256 bs=1M of=./b < /dev/null dd seek=256 bs=1M of=./c < /dev/null mkfs.btrfs ./a losetup /dev/loop0 ./a losetup /dev/loop1 ./b losetup /dev/loop2 ./c mount /dev/loop0 /mnt yes | head -c 300M > /mnt/test btrfs device add /dev/loop1 /mnt btrfs device add /dev/loop2 /mnt # btrfs filesystem balance /mnt btrfs device delete /dev/loop0 /mnt In 2,6,38, upon the "balance" as well as upon the "delete", it seemed to go in a loop, the system at 70% wait, and some btrfs: found 1 extents 2 to 3 times per second in dmesg. I tried leaving it on for a few hours and it didn't help. The only thing I could do is reboot. Disk usage of the a, b, c files were not increasing, though dstat -d showed some disk writing at ~500kB/s (so I suppose it was writing the same blocks over and over and seeking a lot). In 2.6.37, I managed to have it working once, though I don't know how and never managed to reproduce. Upon the delete, I can see some relocations in dmesg output, but then: # btrfs device delete /dev/loop0 /mnt ERROR: error removing the device '/dev/loop0' (no error in dmesg) Upon umount, here is what I find in dmesg: [...] [ 1802.357205] btrfs: relocating block group 0 flags 2 [ 1860.193351] [ cut here ] [ 1860.193373] WARNING: at /build/buildd-linux-2.6_2.6.37-2-amd64-bITS0h/linux-2.6-2.6.37/debian/build/source_amd64_none/fs/btrfs/volumes.c:544 __btrfs_close_devices+0xb5/0xd0 [btrfs]() [ 1860.193379] Hardware name: MacBookPro4,1 [ 1860.193382] Modules linked in: btrfs libcrc32c hidp vboxnetadp vboxnetflt vboxdrv ip6table_filter ip6_tables ebtable_nat ebtables acpi_cpufreq mperf cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp parport_pc ppdev lp parport sco bnep rfcomm l2cap kvm_intel binfmt_misc kvm deflate ctr twofish_generic twofish_x86_64 twofish_common camellia serpent blowfish cast5 des_generic cbc cryptd aes_x86_64 aes_generic xcbc rmd160 sha512_generic sha256_generic sha1_generic hmac crypto_null af_key fuse nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc loop dm_crypt snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm uvcvideo videodev nouveau btusb bluetooth snd_seq_midi lib80211_crypt_tkip snd_rawmidi snd_seq_midi_event v4l1_compat rfkill snd_seq bcm5974 wl(P) ttm drm_kms_helper v4l2_compat_ioctl32 snd_timer snd_seq_device drm i2c_i801 i2c_algo_bit snd tpm_tis soundcore video snd_page_alloc lib80211 joydev i2c_core tpm tpm_bios battery ac applesmc input_polldev evdev pcspkr mbp_nvidia_bl output power_supply processor thermal_sys button ext4 mbcache jbd2 crc16 raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod nbd dm_mirror dm_region_hash dm_log dm_mod zlib_deflate crc32c sg sd_mod sr_mod cdrom crc_t10dif hid_apple usbhid hid ata_generic sata_sil24 uhci_hcd ata_piix libata ehci_hcd scsi_mod usbcore sky2 firewire_ohci firewire_core crc_itu_t nls_base [last unloaded: uinput] [ 1860.193550] Pid: 14808, comm: umount Tainted: PW 2.6.37-2-amd64 #1 [ 1860.193552] Call Trace: [ 1860.193561] [] ? warn_slowpath_common+0x78/0x8c [ 1860.193577] [] ? __btrfs_close_devices+0xb5/0xd0 [btrfs] [ 1860.193593] [] ? btrfs_close_devices+0x1d/0x70 [btrfs] [ 1860.193610] [] ? close_ctree+0x2cd/0x32f [btrfs] [ 1860.193616] [] ? dispose_list+0xa7/0xb9 [ 1860.193627] [] ? btrfs_put_super+0x10/0x1d [btrfs] [ 1860.193633] [] ? generic_shutdown_super+0x5c/0xd4 [ 1860.193638] [] ? kill_anon_super+0x9/0x40 [ 1860.193642] [] ? deactivate_locked_super+0x1e/0x3d [ 1860.193647] [] ? sys_umount+0x2cf/0x2fa [ 1860.193653] [] ? system_call_fastpath+0x16/0x1b [ 1860.193656] ---[ end trace 4e4b8320dc6e70cc ]--- I can mount it back, but not if I reload the btrfs module, in which case I get: [ 1961.328280] Btrfs loaded [ 1961.328695] device fsid df4e5454eb7b1c23-7a68fc421060b18b devid 1 transid 118 /dev/loop0 [ 1961.329007] btrfs: failed to read the system array on loop0 [ 1961.340084] btrfs: open_ct
cloning single-device btrfs file system onto multi-device one
Hiya, I'm trying to move a btrfs FS that's on a hardware raid 5 (6TB large, 4 of which are in use) to another machine with 3 3TB HDs and preserve all the subvolumes/snapshots. Is there a way to do that without using a software/hardware raid on the new machine (that is just use btrfs multi-device). If fewer than 3TB were occupied, I suppose I could just resize it so that it fits on one 3TB hd, then copy device to device onto a 3TB disk, add the 2 other ones and do a "balance", but here, I can't do that. I suspect that if compression was enabled, the FS could fit on 3 TB, but AFAICT, compression is enabled at mount time and would only apply to newly created files. Is there a way to compress files already in a btrfs filesystem? Any help would be appreciated. Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html