Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
On 04.08.2016 18:53, Lutz Vieweg wrote: > > I was today hit by what I think is probably the same bug: > A btrfs on a close-to-4TB sized block device, only half filled > to almost exactly 2 TB, suddenly says "no space left on device" > upon any attempt to write to it. The filesystem was NOT automatically > switched to read-only by the kernel, I should mention. > > Re-mounting (which is a pain as this filesystem is used for > $HOMEs of a multitude of active users who I have to kick from > the server for doing things like re-mounting) removed the symptom > for now, but from what I can read in linux-btrfs mailing list > archives, it pretty likely the symptom will re-appear. > > Here are some more details: > > Software versions: >> linux-4.6.1 (vanilla from kernel.org) ... > > dmesg output from the time the "no space left on device"-symptom > appeared: > >> [5171203.601620] WARNING: CPU: 4 PID: 23208 at fs/btrfs/inode.c:9261 >> btrfs_destroy_inode+0x263/0x2a0 [btrfs] > ... >> [5171230.306037] WARNING: CPU: 18 PID: 12656 at fs/btrfs/extent-tree.c:4233 >> btrfs_free_reserved_data_space_noquota+0xf3/0x100 [btrfs] Sounds like the bug I hit too also .. To fix this you'll need : crazy@zwerg:~/Work/linux-git$ git show 8b8b08cbf commit 8b8b08cbfb9021af4b54b4175fc4c51d655aac8c Author: Chris MasonDate: Tue Jul 19 05:52:36 2016 -0700 Btrfs: fix delalloc accounting after copy_from_user faults Commit 56244ef151c3cd11 was almost but not quite enough to fix the reservation math after btrfs_copy_from_user returned partial copies. Some users are still seeing warnings in btrfs_destroy_inode, and with a long enough test run I'm able to trigger them as well. This patch fixes the accounting math again, bringing it much closer to the way it was before the sectorsize conversion Chandan did. The problem is accounting for the offset into the page/sector when we do a partial copy. This one just uses the dirty_sectors variable which should already be updated properly. Signed-off-by: Chris Mason cc: sta...@vger.kernel.org # v4.6+ diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index f3f61d1..bcfb4a2 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1629,13 +1629,11 @@ again: * managed to copy. */ if (num_sectors > dirty_sectors) { - /* -* we round down because we don't want to count -* any partial blocks actually sent through the -* IO machines -*/ - release_bytes = round_down(release_bytes - copied, - root->sectorsize); + + /* release everything except the sectors we dirtied */ + release_bytes -= dirty_sectors << + root->fs_info->sb->s_blocksize_bits; + if (copied > 0) { spin_lock(_I(inode)->lock); BTRFS_I(inode)->outstanding_extents++; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On 30.07.2016 22:02, Chris Murphy wrote: > Short version: When systemd-logind login.conf KillUserProcesses=yes, > and the user does "sudo btrfs scrub start" in e.g. GNOME Terminal, and > then logs out of the shell, the user space operation is killed, and > btrfs scrub status reports that the scrub was aborted. [1] > How this is a bug ? Is excatly what 'KillUserProcesses=yes' is extected to do.. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A lot warnings in dmesg while running thunderbird
On 21.07.2016 14:56, Chris Mason wrote: > On 07/20/2016 01:50 PM, Gabriel C wrote: >> >> After 24h of running the program and thundirbird all is still fine here. >> >> I let it run one more day.. But looks very good. >> > > Thanks for your time in helping to track this down. It'll go into the > next merge window and be cc'd to stable. > You are welcome :) Test program was running without problems for 52h.. I think your fix is fine :) Also feel free to add Tested-by: Gabriel Craciunescu <nix.or....@gmail.com> to you commit. Regrads, Gabriel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A lot warnings in dmesg while running thunderbird
On 20.07.2016 15:50, Chris Mason wrote: > > > On 07/19/2016 08:11 PM, Gabriel C wrote: >> >> >> On 19.07.2016 13:05, Chris Mason wrote: >>> On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote: >>>> Hi Chris, >>>> >>>> I am able to reproduce the issue with the 'short-write' program. But before >>>> the call trace associated with btrfs_destroy_inode(), I see the following >>>> call >>>> trace ... >>>> >>>> [ cut here ] >>>> WARNING: CPU: 2 PID: 2311 at >>>> /home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 >>>> btrfs_free_reserved_data_space_noquota+0xe8/0x100 >>> >>> [ ... ] >>> >>> Ok, the problem is in how we're dealing with the offset into the sector when >>> we fail. The dirty_sectors variable already has this accounted in it, so >>> this patch fixes it for me. I ran overnight, but I'll let it go for a few >>> days just to make sure: >>> >>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >>> index fac9b839..5842423 100644 >>> --- a/fs/btrfs/file.c >>> +++ b/fs/btrfs/file.c >>> @@ -1629,13 +1629,11 @@ again: >>> * managed to copy. >>> */ >>> if (num_sectors > dirty_sectors) { >>> - /* >>> -* we round down because we don't want to count >>> -* any partial blocks actually sent through the >>> -* IO machines >>> -*/ >>> - release_bytes = round_down(release_bytes - copied, >>> - root->sectorsize); >>> + >>> + /* release everything except the sectors we dirtied */ >>> + release_bytes -= dirty_sectors << >>> + root->fs_info->sb->s_blocksize_bits; >>> + >>> if (copied > 0) { >>> spin_lock(_I(inode)->lock); >>> BTRFS_I(inode)->outstanding_extents++; >>> >> >> Since I guess you are testing this on latest git code I started to test on >> latest stable. > > Any v4.7-rc or v4.6 stable where the patch applies ;) > >> >> Until now all seems file .. your test program is still running without to >> trigger the bug. >> >> Also thunderbird is running without to trigger the bug. >> >> I let it run overnight and report back. > > Great, thanks! After 24h of running the program and thundirbird all is still fine here. I let it run one more day.. But looks very good. Regards, Gabriel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A lot warnings in dmesg while running thunderbird
On 19.07.2016 13:05, Chris Mason wrote: > On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote: >> Hi Chris, >> >> I am able to reproduce the issue with the 'short-write' program. But before >> the call trace associated with btrfs_destroy_inode(), I see the following >> call >> trace ... >> >> [ cut here ] >> WARNING: CPU: 2 PID: 2311 at >> /home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 >> btrfs_free_reserved_data_space_noquota+0xe8/0x100 > > [ ... ] > > Ok, the problem is in how we're dealing with the offset into the sector when > we fail. The dirty_sectors variable already has this accounted in it, so > this patch fixes it for me. I ran overnight, but I'll let it go for a few > days just to make sure: > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index fac9b839..5842423 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1629,13 +1629,11 @@ again: >* managed to copy. >*/ > if (num_sectors > dirty_sectors) { > - /* > - * we round down because we don't want to count > - * any partial blocks actually sent through the > - * IO machines > - */ > - release_bytes = round_down(release_bytes - copied, > - root->sectorsize); > + > + /* release everything except the sectors we dirtied */ > + release_bytes -= dirty_sectors << > + root->fs_info->sb->s_blocksize_bits; > + > if (copied > 0) { > spin_lock(_I(inode)->lock); > BTRFS_I(inode)->outstanding_extents++; > Since I guess you are testing this on latest git code I started to test on latest stable. Until now all seems file .. your test program is still running without to trigger the bug. Also thunderbird is running without to trigger the bug. I let it run overnight and report back. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A lot warnings in dmesg while running thunderbird
On 08.07.2016 14:41, Chris Mason wrote: On 07/08/2016 05:57 AM, Gabriel C wrote: 2016-07-07 21:21 GMT+02:00 Chris Mason <c...@fb.com>: On 07/07/2016 06:24 AM, Gabriel C wrote: Hi, while running thunderbird on linux 4.6.3 and 4.7.0-rc6 ( didn't tested other versions ) I trigger the following : I definitely thought we had this fixed in v4.7-rc. Can you easily fsck this filesystem? Something strange is going on. Yes , btrfs check and btrfs check --check-data-csum are fine , no errors found. If you want me to test any patches let me know. Can you please try a v4.5 stable kernel? I'm curious if this really is the same regression that I tried to fix in v4.7 I'm on linux 4.5.7 now and everything is fine. I'm writing this email from thunderbird.. which was not possible in 4.6.3 or 4.7.-rc. Let me know you want me to test other kernels or whatever else may help fixing this problem. Regards, Gabriel C -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A lot warnings in dmesg while running thunderbird
2016-07-08 14:41 GMT+02:00 Chris Mason <c...@fb.com>: > > > On 07/08/2016 05:57 AM, Gabriel C wrote: >> >> 2016-07-07 21:21 GMT+02:00 Chris Mason <c...@fb.com>: >>> >>> >>> >>> On 07/07/2016 06:24 AM, Gabriel C wrote: >>>> >>>> >>>> Hi, >>>> >>>> while running thunderbird on linux 4.6.3 and 4.7.0-rc6 ( didn't tested >>>> other versions ) >>>> I trigger the following : >>> >>> >>> >>> I definitely thought we had this fixed in v4.7-rc. Can you easily fsck >>> this filesystem? Something strange is going on. >> >> >> Yes , btrfs check and btrfs check --check-data-csum are fine , no errors >> found. >> >> If you want me to test any patches let me know. >> > > Can you please try a v4.5 stable kernel? I'm curious if this really is the > same regression that I tried to fix in v4.7 > Sure , I'll test on 4.5.7 and let you know. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A lot warnings in dmesg while running thunderbird
2016-07-07 21:21 GMT+02:00 Chris Mason <c...@fb.com>: > > > On 07/07/2016 06:24 AM, Gabriel C wrote: >> >> Hi, >> >> while running thunderbird on linux 4.6.3 and 4.7.0-rc6 ( didn't tested >> other versions ) >> I trigger the following : > > > I definitely thought we had this fixed in v4.7-rc. Can you easily fsck this > filesystem? Something strange is going on. Yes , btrfs check and btrfs check --check-data-csum are fine , no errors found. If you want me to test any patches let me know. Regards, Gabriel C -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
A lot warnings in dmesg while running thunderbird
] [] ? block_group_cache_tree_search+0xb1/0xd0 [btrfs] [ 6509.253610] [] ? run_delalloc_nocow+0xa60/0xba0 [btrfs] [ 6509.253627] [] ? run_delalloc_range+0x390/0x3b0 [btrfs] [ 6509.253630] [] ? flush_tlb_page+0x35/0x90 [ 6509.253647] [] ? writepage_delalloc.isra.20+0xfb/0x170 [btrfs] [ 6509.253664] [] ? __extent_writepage+0xb3/0x300 [btrfs] [ 6509.253668] [] ? __set_page_dirty_nobuffers+0xea/0x140 [ 6509.253685] [] ? extent_write_cache_pages.isra.16.constprop.31+0x23c/0x350 [btrfs] [ 6509.253702] [] ? extent_writepages+0x48/0x60 [btrfs] [ 6509.253718] [] ? btrfs_direct_IO+0x360/0x360 [btrfs] [ 6509.253723] [] ? __filemap_fdatawrite_range+0xa2/0xe0 [ 6509.253739] [] ? btrfs_fdatawrite_range+0x16/0x40 [btrfs] [ 6509.253755] [] ? start_ordered_ops+0x10/0x20 [btrfs] [ 6509.253771] [] ? btrfs_sync_file+0x41/0x360 [btrfs] [ 6509.253775] [] ? do_fsync+0x33/0x60 [ 6509.253778] [] ? SyS_fsync+0x7/0x10 [ 6509.253782] [] ? entry_SYSCALL_64_fastpath+0x1a/0xa4 ... See http://paste.opensuse.org/view/simple/86078072 and http://paste.opensuse.org/view/simple/87276071 This is from running thunderbird just few seconds , when I let it run for a while I have to reboot the system. $ uname -a Linux zwerg 4.7.0-rc6 #1 SMP PREEMPT Tue Jul 5 07:48:39 CEST 2016 x86_64 x86_64 x86_64 GNU/Linux btrfs-progs v4.6.1 sda is HW RAID0 ... Jun 23 14:27:48 localhost kernel: scsi host0: Avago SAS based MegaRAID driver Jun 23 14:27:48 localhost kernel: scsi 0:0:6:0: Direct-Access ATA WDC WD5002ABYS-5 3B06 PQ: 0 ANSI: 5 Jun 23 14:27:48 localhost kernel: scsi 0:0:7:0: Direct-Access ATA WDC WD5002ABYS-5 3B06 PQ: 0 ANSI: 5 Jun 23 14:27:48 localhost kernel: scsi 0:0:10:0: Direct-Access ATA ST500NM0011 FTM6 PQ: 0 ANSI: 5 Jun 23 14:27:48 localhost kernel: scsi 0:2:0:0: Direct-Access LSI MegaRAID SAS RMB 1.40 PQ: 0 ANSI: 5 ... mount | grep sda /dev/sda1 on / type btrfs (rw,noatime,compress=lzo,space_cache,autodefrag,subvolid=5,subvol=/) ( tested with and without compression , with just defaults the warnings are still the same ) btrfs fi show Label: none uuid: 67b2e285-e331-42ad-8478-d78b17ea6970 Total devices 1 FS bytes used 31.47GiB devid1 size 1.36TiB used 37.06GiB path /dev/sda1 btrfs fi df / Data, single: total=32.00GiB, used=30.43GiB System, DUP: total=32.00MiB, used=16.00KiB Metadata, DUP: total=2.50GiB, used=1.04GiB GlobalReserve, single: total=368.00MiB, used=0.00B Regards, Gabriel C -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [survey] sysfs layout for btrfs
On Sat, 15 Aug 2015 07:40:40 +0800, Anand Jain wrote: Hello, as of now btrfs sysfs does not include the attributes for the volume manager part in its sysfs layout, so its being developed and there are two types of layout here below, so I have a quick survey to know which will be preferred. contenders are: 1. FS and VM (volume manager) attributes[1] merged sysfs layout /sys/fs/btrfs/fsid -- holds FS attr, VM attr will be added here. /sys/fs/btrfs/fsid/devices/uuid [2] -- btrfs_devices attr here My vote is for the first one. Lengthening the UI/API with /pools/ seems unnecessary, and it's better to get attributes exposed earlier. 2. FS and VM attributes separated sysfs layout. /sys/fs/btrfs/fsid --- as is, will continue to hold fs attributes. /sys/fs/btrfs/pools/fsid/ -- will hold VM attributes /sys/fs/btrfs/pools/fsid/devices/sdx -- btrfs_devices attr here -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs filesystem show _exact_ freaking size?
Le 18/11/2014 11:39, Robert White a écrit : Howdy, How does one get the exact size (in blocks preferably, but bytes okay) of the filesystem inside a partition? I know how to get the partition size, but that's not useful when shrinking a partition... dev_item.total_bytes in brtfs-show-super's output is what you're after. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Manual deduplication would be useful
Hello, For over a year now, I've been experimenting with stacked filesystems as a way to save on resources. A basic OS layer is shared among Containers, each of which stacks a layer with modifications on top of it. This approach means that Containers share buffer cache and loaded executables. Concrete technology choices aside, the result is rock-solid and the efficiency improvements are incredible, as documented here: http://rickywiki.vanrein.org/doku.php?id=openvz-aufs One problem with this setup is updating software. In lieu of stacking-support in package managers, it is necessary to do this on a per-Container basis, meaning that each installs their own versions, including overwrites of the basic OS layer. Deduplication could remedy this, but the generic mechanism is known from ZFS to be fairly inefficient. Interestingly however, this particular use case demonstrates that a much simpler deduplication mechanism than normally considered could be useful. It would suffice if the filesystem could check on manual hints, or stack-specifying hints, to see if overlaid files share the same file contents; when they do, deduplication could commence. This saves searching through the entire filesystem for every file or block written. It might also mean that the actual stacking is not needed, but instead a basic OS could be cloned to form a new basic install, and kept around for this hint processing. I'm not sure if this should ideally be implemented inside the stacking approach (where it would be stacking-implementation-specific) or in the filesystem (for which it might be too far off the main purpose) but I thought it wouldn't hurt to start a discussion on it, given that (1) filesystems nowadays service multiple instances, (2) filesystems like Btrfs are based on COW, and (3) deduplication is a goal but the generic mechanism could use some efficiency improvements. I hope having seen this approach is useful to you! Have a look at bedup[1] (disclaimer: I wrote it). The normal mode does incremental scans, and there's also a subcommand for deduplicating files that you already know are identical: bedup dedup-files The implementation in master uses a clone ioctl. Here is Mark Fasheh's latest patch series to implement a dedup ioctl[2]; it also comes with a command to work on listed files (btrfs-extent-same in [3]). [1] https://github.com/g2p/bedup [2] http://comments.gmane.org/gmane.comp.file-systems.btrfs/26310/ [3] https://github.com/markfasheh/duperemove -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Why subvolumes?
Now... since the snapshot's FS tree is a direct duplicate of the original FS tree (actually, it's the same tree, but they look like different things to the outside world), they share everything -- including things like inode numbers. This is OK within a subvolume, because we have the semantics that subvolumes have their own distinct inode-number spaces. If we could snapshot arbitrary subsections of the FS, we'd end up having to fix up inode numbers to ensure that they were unique -- which can't really be an atomic operation (unless you want to have the FS locked while the kernel updates the inodes of the billion files you just snapshotted). I don't think so; I just checked some snapshots and the inos are the same. Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this). The other thing to talk about here is that while the FS tree is a tree structure, it's not a direct one-to-one map to the directory tree structure. In fact, it looks more like a list of inodes, in inode order, with some extra info for easily tracking through the list. The B-tree structure of the FS tree is just a fast indexing method. So snapshotting a directory entry within the FS tree would require (somehow) making an atomic copy, or CoW copy, of only the parts of the FS tree that fall under the directory in question -- so you'd end up trying to take a sequence of records in the FS tree, of arbitrary size (proportional roughly to the number of entries in the directory) and copying them to somewhere else in the same tree in such a way that you can automatically dereference the copies when you modify them. So, ultimately, it boils down to being able to do CoW operations at the byte level, which is going to introduce huge quantities of extra metadata, and it all starts looking really awkward to implement (plus having to deal with the long time taken to copy the directory entries for the thing you're snapshotting). Btrfs already does CoW of arbitrarily-large files (extent lists); doing the same for directories doesn't seem impossible. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Why subvolumes?
Le mar. 23 juil. 2013 21:30:13 CEST, Hugo Mills a écrit : On Tue, Jul 23, 2013 at 07:47:41PM +0200, Gabriel de Perthuis wrote: Now... since the snapshot's FS tree is a direct duplicate of the original FS tree (actually, it's the same tree, but they look like different things to the outside world), they share everything -- including things like inode numbers. This is OK within a subvolume, because we have the semantics that subvolumes have their own distinct inode-number spaces. If we could snapshot arbitrary subsections of the FS, we'd end up having to fix up inode numbers to ensure that they were unique -- which can't really be an atomic operation (unless you want to have the FS locked while the kernel updates the inodes of the billion files you just snapshotted). I don't think so; I just checked some snapshots and the inos are the same. Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this). That's what I said. Our current implementation allows different subvolumes to have the same inode numbers, which is what makes it work. If you threw out the concept of subvolumes, or allowed snapshots within subvolumes, then you'd be duplicating inodes within a subvolume, which is one reason it doesn't work. Sorry for misreading you. Directory snapshots can work by giving a new device number to the snapshot. There is no need to update inode numbers in that case. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Lots of harddrive chatter on after booting with btrfs on root (slow boot)
On Sat, 20 Jul 2013 17:15:50 +0200, Jason Russell wrote: Ive also noted that this excessive hdd chatter does not occur immediately after a fresh format with arch on btrfs root. Ive made some deductions/assumptions: This only seems to occur with btrfs roots. This only happens after some number of reboots OR after the partition fills up a little bit. Im pretty sure of ruled out everything except for the filesystem. In my experience (as of 3.8 or so), Btrfs performance degrades on a filled-up filesystem, even a comparatively new one. Various background workers start to eat io according to atop. I have just done two clean installs to more thoroughly compare ext4 and btrfs roots. So far no excessive hdd chatter from btrfs. I have also seen what I have described on two other computers (different hardware entirely) where there is lots of hdd chatter from btrfs root, and nothing from ext4. Here are two threads: https://bbs.archlinux.org/viewtopic.php?pid=1117932 https://bbs.archlinux.org/viewtopic.php?pid=1301684 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] btrfs: offline dedupe
On Mon, 15 Jul 2013 13:55:51 -0700, Zach Brown wrote: I'd get rid of all this code by only copying each input argument on to the stack as it's needed and by getting rid of the writable output struct fields. (more on this later) As I said, I'd get rid of the output fields. Like the other vectored io syscalls, the return value can indicate the number of initial consecutive bytes that worked. When no progess was made then it can return errors. Userspace is left to sort out the resulting state and figure out the extents to retry in exactly the same way that it found the initial extents to attempt to dedupe in the first place. (And imagine strace trying to print the inputs and outputs. Poor, poor, strace! :)) The dedup branch that uses this syscall[1] doesn't compare files before submitting them anymore (the kernel will do it, and ranges may not fit in cache once I get rid of an unnecessary loop). I don't have strong opinions on the return style, but it would be good to have the syscall always make progress by finding at least one good range before bailing out, and signaling which files were involved. With those constraints, the current struct seems like the cleanest way to pass the data. The early return you suggest is a good idea if Mark agrees, but the return condition should be something like: if one range with bytes_deduped != 0 doesn't get bytes_deduped incremented by iteration_len in this iteration, bail out. That's sufficient to guarantee progress and to know which ranges were involved. I hope this helps! - z Thank you and everyone involved for the progress on this. [1] https://github.com/g2p/bedup/tree/wip/dedup-syscall -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[XFSTESTS PATCH] btrfs: Test deduplication
--- The matching kernel patch is here: https://github.com/g2p/linux/tree/v3.10%2Bextent-same (rebased on 3.10, fixing a small conflict) Requires the btrfs-extent-same command: - http://permalink.gmane.org/gmane.comp.file-systems.btrfs/26579 - https://github.com/markfasheh/duperemove tests/btrfs/313 | 93 + tests/btrfs/313.out | 25 ++ tests/btrfs/group | 1 + 3 files changed, 119 insertions(+) create mode 100755 tests/btrfs/313 create mode 100644 tests/btrfs/313.out diff --git a/tests/btrfs/313 b/tests/btrfs/313 new file mode 100755 index 000..04e4ccb --- /dev/null +++ b/tests/btrfs/313 @@ -0,0 +1,93 @@ +#! /bin/bash +# FS QA Test No. 313 +# +# Test the deduplication syscall +# +#--- +# Copyright (c) 2013 Red Hat, Inc. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ +cd / +rm -f $tmp.* +} + +. ./common/rc +. ./common/filter + +ESAME=`set_prog_path btrfs-extent-same` + +_need_to_be_root +_supported_fs btrfs +_supported_os Linux +_require_command $ESAME +_require_command $XFS_IO_PROG +_require_scratch + +_scratch_mkfs /dev/null +_scratch_mount $seqres.full 21 + +fiemap() { +xfs_io -r -c fiemap $1 |tail -n+2 +} + +dedup() { +! diff -q (fiemap $1) (fiemap $2) +$ESAME $(stat -c %s $1) $1 0 $2 0 +diff -u (fiemap $1) (fiemap $2) +} + +echo Silence is golden +set -e + +v1=$SCRATCH_MNT/v1 +v2=$SCRATCH_MNT/v2 +v3=$SCRATCH_MNT/v3 + +$BTRFS_UTIL_PROG subvolume create $v1 +$BTRFS_UTIL_PROG subvolume create $v2 + +dd bs=1M status=none if=/dev/urandom of=$v1/file1 count=1 +dd bs=1M status=none if=/dev/urandom of=$v1/file2 count=1 +dd bs=1M status=none if=$v1/file1 of=$v2/file3 +dd bs=1M status=none if=$v1/file1 of=$v2/file4 + +$BTRFS_UTIL_PROG subvolume snapshot -r $v2 $v3 + +# identical, multiple volumes +dedup $v1/file1 $v2/file3 + +# not identical, same volume +! $ESAME $((2**20)) $v1/file1 0 $v1/file2 0 + +# identical, second file on a frozen volume +dedup $v1/file1 $v3/file4 + +_scratch_unmount +_check_scratch_fs +status=0 +exit diff --git a/tests/btrfs/313.out b/tests/btrfs/313.out new file mode 100644 index 000..eabe6be --- /dev/null +++ b/tests/btrfs/313.out @@ -0,0 +1,25 @@ +QA output created by 313 +Silence is golden +Create subvolume 'sdir/v1' +Create subvolume 'sdir/v2' +Create a readonly snapshot of 'sdir/v2' in 'sdir/v3' +Files /dev/fd/63 and /dev/fd/62 differ +Deduping 2 total files +(0, 1048576): sdir/v1/file1 +(0, 1048576): sdir/v2/file3 +1 files asked to be deduped +i: 0, status: 0, bytes_deduped: 1048576 +1048576 total bytes deduped in this operation +Deduping 2 total files +(0, 1048576): sdir/v1/file1 +(0, 1048576): sdir/v1/file2 +1 files asked to be deduped +i: 0, status: 1, bytes_deduped: 0 +0 total bytes deduped in this operation +Files /dev/fd/63 and /dev/fd/62 differ +Deduping 2 total files +(0, 1048576): sdir/v1/file1 +(0, 1048576): sdir/v3/file4 +1 files asked to be deduped +i: 0, status: 0, bytes_deduped: 1048576 +1048576 total bytes deduped in this operation diff --git a/tests/btrfs/group b/tests/btrfs/group index bc6c256..4c868c8 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -7,5 +7,6 @@ 264 auto 265 auto 276 auto rw metadata 284 auto 307 auto quick +313 auto -- 1.8.3.1.588.gb04834f -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two identical copies of an image mounted result in changes to both images if only one is modified
On Thu, 20 Jun 2013 10:16:22 +0100, Hugo Mills wrote: On Thu, Jun 20, 2013 at 10:47:53AM +0200, Clemens Eisserer wrote: Hi, I've observed a rather strange behaviour while trying to mount two identical copies of the same image to different mount points. Each modification to one image is also performed in the second one. touch m2/hello ls -la m1 //will now also include a file calles hello Is this behaviour intentional and known or should I create a bug-report? It's known, and not desired behaviour. The problem is that you've ended up with two filesystems with the same UUID, and the FS code gets rather confused about that. The same problem exists with LVM snapshots (or other block-device-layer copies). The solution is a combination of a tool to scan an image and change the UUID (offline), and of some code in the kernel that detects when it's being told about a duplicate image (rather than an additional device in the same FS). Neither of these has been written yet, I'm afraid. To clarify, the loop devices are properly distinct, but the first device ends up mounted twice. I've had a look at the vfs code, and it doesn't seem to be uuid-aware, which makes sense because the uuid is a property of the superblock and the fs structure doesn't expose it. It's a Btrfs problem. Instead of redirecting to a different block device, Btrfs could and should refuse to mount an already-mounted superblock when the block device doesn't match, somewhere in or below btrfs_mount. Registering extra, distinct superblocks for an already mounted raid is a different matter, but that isn't done through the mount syscall anyway. I've deleted quite a bunch of files on my production system because of this... Oops. I'm sorry to hear that. :( Hugo. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two identical copies of an image mounted result in changes to both images if only one is modified
Instead of redirecting to a different block device, Btrfs could and should refuse to mount an already-mounted superblock when the block device doesn't match, somewhere in or below btrfs_mount. Registering extra, distinct superblocks for an already mounted raid is a different matter, but that isn't done through the mount syscall anyway. The problem here is that you could quite legitimately mount /dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are both part of the same filesystem. So you can't simply prevent mounting based on the device that the mount's being done with. Okay. The check should rely on a list of known block devices for a given filesystem uuid. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two identical copies of an image mounted result in changes to both images if only one is modified
Thank you for your reply. I appreciate it. Unfortunately this issue is a deal killer for us. The ability to take very fast snapshots and replicate them to another site is key for us. We just can't us Btrfs with this setup. That's too bad. Good luck and thank you. The issue we were discussing is: how to fail early when there are duplicate UUIDs. Duplicate UUIDs will never be supported. If *your* problem has to do with fast snapshots and fast replication, that's supported, see btrfs send/receive. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] btrfs: offline dedupe v2
Le 11/06/2013 22:31, Mark Fasheh a écrit : Perhaps this isn't a limiation per-se but extent-same requires read/write access to the files we want to dedupe. During my last series I had a conversation with Gabriel de Perthuis about access checking where we tried to maintain the ability for a user to run extent-same against a readonly snapshot. In addition, I reasoned that since the underlying data won't change (at least to the user) that we ought only require the files to be open for read. What I found however is that neither of these is a great idea ;) - We want to require that the inode be open for writing so that an unprivileged user can't do things like run dedupe on a performance sensitive file that they might only have read access to. In addition I could see it as kind of a surprise (non-standard behavior) to an administrator that users could alter the layout of files they are only allowed to read. - Readonly snapshots won't let you open for write anyway (unsuprisingly, open() returns -EROFS). So that kind of kills the idea of them being able to open those files for write which we want to dedupe. That said, I still think being able to run this against a set of readonly snapshots makes sense especially if those snapshots are taken for backup purposes. I'm just not sure how we can sanely enable it. The check could be: if (fmode_write || cap_sys_admin). This isn't incompatible with mnt_want_write, that check is at the level of the superblocks and vfsmount and not the subvolume fsid. Code review is very much appreciated. Thanks, --Mark ChangeLog - check that we have appropriate access to each file before deduping. For the source, we only check that it is opened for read. Target files have to be open for write. - don't dedupe on readonly submounts (this is to maintain - check that we don't dedupe files with different checksumming states (compare BTRFS_INODE_NODATASUM flags) - get and maintain write access to the mount during the extent same operation (mount_want_write()) - allocate our read buffers up front in btrfs_ioctl_file_extent_same() and pass them through for re-use on every call to btrfs_extent_same(). (thanks to David Sterba dste...@suse.cz for reporting this - As the read buffers could possibly be up to 1MB (depending on user request), we now conditionally vmalloc them. - removed redundant check for same inode. btrfs_extent_same() catches it now and bubbles the error up. - remove some unnecessary printks Changes from RFC to v1: - don't error on large length value in btrfs exent-same, instead we just dedupe the maximum allowed. That way userspace doesn't have to worry about an arbitrary length limit. - btrfs_extent_same will now loop over the dedupe range at 1MB increments (for a total of 16MB per request) - cleaned up poorly coded while loop in __extent_read_full_page() (thanks to David Sterba dste...@suse.cz for reporting this) - included two fixes from Gabriel de Perthuis g2p.c...@gmail.com: - allow dedupe across subvolumes - don't lock compressed pages twice when deduplicating - removed some unused / poorly designed fields in btrfs_ioctl_same_args. This should also give us a bit more reserved bytes. - return -E2BIG instead of -ENOMEM when arg list is too large (thanks to David Sterba dste...@suse.cz for reporting this) - Some more reserved bytes are now included as a result of some of my cleanups. Quite possibly we could add a couple more. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] btrfs: offline dedupe v2
Le 11/06/2013 23:04, Mark Fasheh a écrit : On Tue, Jun 11, 2013 at 10:56:59PM +0200, Gabriel de Perthuis wrote: What I found however is that neither of these is a great idea ;) - We want to require that the inode be open for writing so that an unprivileged user can't do things like run dedupe on a performance sensitive file that they might only have read access to. In addition I could see it as kind of a surprise (non-standard behavior) to an administrator that users could alter the layout of files they are only allowed to read. - Readonly snapshots won't let you open for write anyway (unsuprisingly, open() returns -EROFS). So that kind of kills the idea of them being able to open those files for write which we want to dedupe. That said, I still think being able to run this against a set of readonly snapshots makes sense especially if those snapshots are taken for backup purposes. I'm just not sure how we can sanely enable it. The check could be: if (fmode_write || cap_sys_admin). This isn't incompatible with mnt_want_write, that check is at the level of the superblocks and vfsmount and not the subvolume fsid. Oh ok that's certainly better. I think we still have a problem though - how does a process gets write access to a file from a ro-snapshot? If I open a file (as root) on a ro-snapshot on my test machine here I'll get -EROFS. Your first series did work in that case. The process does get a read-only fd, but that's no obstacle for the ioctl. I'm a bit confused - how does mnt_want_write factor in here? I think that's for a totally seperate kind of accounting, right? It doesn't, it's just that I had spent a few minutes checking anyway. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] btrfs: offline dedupe
+#define BTRFS_MAX_DEDUPE_LEN (16 * 1024 * 1024) +#define BTRFS_ONE_DEDUPE_LEN (1 * 1024 * 1024) + +static long btrfs_ioctl_file_extent_same(struct file *file, +void __user *argp) +{ + struct btrfs_ioctl_same_args *args; + struct btrfs_ioctl_same_args tmp; + struct btrfs_ioctl_same_extent_info *info; + struct inode *src = file-f_dentry-d_inode; + struct file *dst_file = NULL; + struct inode *dst; + u64 off; + u64 len; + int args_size; + int i; + int ret; + u64 bs = BTRFS_I(src)-root-fs_info-sb-s_blocksize; The ioctl is available to non-root, so an extra care should be taken to potentail overflows etc. I haven't spotted anything so far. Sure. Actually, you got me thinking about some sanity checks... I need to add at least this check: if (btrfs_root_readonly(root)) return -EROFS; which isn't in there as of now. It's not needed and I'd rather do without, read-only snapshots and deduplication go together well for backups. Data and metadata are guaranteed to be immutable, extent storage isn't. This is also the case with raid. Also I don't really check the open mode (read, write, etc) on files passed in. We do this in the clone ioctl and it makes sense there since data (to the user) can change. With this ioctl though data won't ever change (even if the underlying extent does). So I left the checks out. A part of me is thinking we might want to be conservative to start with though and just add those type of checks in. Basically, I figure the source should be open for read at least and target files need write access. I don't know of any privileged files that one would be able to open(2), but if this is available to unprivileged users the files all need to be open for reading so that it can't be used to guess at their contents. As long as root gets to bypass the checks (no btrfs_root_readonly) it doesn't hurt my use case. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] btrfs: offline dedupe
Le sam. 25 mai 2013 00:38:27 CEST, Mark Fasheh a écrit : On Fri, May 24, 2013 at 09:50:14PM +0200, Gabriel de Perthuis wrote: Sure. Actually, you got me thinking about some sanity checks... I need to add at least this check: if (btrfs_root_readonly(root)) return -EROFS; which isn't in there as of now. It's not needed and I'd rather do without, read-only snapshots and deduplication go together well for backups. Data and metadata are guaranteed to be immutable, extent storage isn't. This is also the case with raid. You're absolutely right - I miswrote the check I meant. Specifically, I was thinking about when the entire fs is readonly due to either some error or the user mounted with -oro. So something more like: if (root-fs_info-sb-s_flags MS_RDONLY) return -EROFS; I think that should be reasonable and wouldn't affect most use cases, right? That's all right. Also I don't really check the open mode (read, write, etc) on files passed in. We do this in the clone ioctl and it makes sense there since data (to the user) can change. With this ioctl though data won't ever change (even if the underlying extent does). So I left the checks out. A part of me is thinking we might want to be conservative to start with though and just add those type of checks in. Basically, I figure the source should be open for read at least and target files need write access. I don't know of any privileged files that one would be able to open(2), but if this is available to unprivileged users the files all need to be open for reading so that it can't be used to guess at their contents. As long as root gets to bypass the checks (no btrfs_root_readonly) it doesn't hurt my use case. Oh ok so this seems to make sense. How does this logic sound: We're not going to worry about write access since it would be entirely reasonable for the user to want to do this on a readonly submount (specifically for the purpose of deduplicating backups). Read access needs to be provided however so we know that the user has access to the file data. So basically, if a user can open any files for read, they can check their contents and dedupe them. Letting users dedupe files in say, /etc seems kind of weird to me but I'm struggling to come up with a good explanation of why that should mean we limit this ioctl to root. --Mark I agree with that model. Most of the code is shared with clone (and the copy_range RFC) which are unprivileged, so it doesn't increase the potential surface for bugs much. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] a structure for the disks scan for btrfs
On Fri, 17 May 2013 18:54:38 +0800, Anand Jain wrote: The idea was to introduce /dev/mapper to find for btrfs disk, However I found first we need to congregate the disk scan procedure at a function so it would help to consistently tune it across the btrfs-progs. As of now both fi show and dev scan use the disks scan they do it on their own. So here it would congregate btrfs-disk scans at the function scan_devs_for_btrfs, adds /dev/mapper to be used to scan for btrfs, and updates its calling functions and few bug fixes. Just scan /dev/block/*. That contains all block devices. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] a structure for the disks scan for btrfs
Just scan /dev/block/*. That contains all block devices. Oh, this is about finding nicer names. Never mind. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: subvol copying
A user of a workstation has a home directory /home/john as a subvolume. I wrote a cron job to make read-only snapshots of it under /home/john/backup which was fortunate as they just ran a script that did something like rm -rf ~. Apart from copying dozens of gigs of data back, is there a good way of recovering it all? Whatever you suggest isn't going to work for this time (the copy is almost done) but will be useful for next time. Should I have put the backups under /backup instead so that I could just delete the corrupted subvol and make a read-write snapshot of the last good one? You can move subvolumes at any time, as if they were regular directories. For example: move the backups to an external location, move what's left of the home to another location out of the way, and make a snapshot to restore. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] btrfs: Introduce extent_read_full_page_nolock()
We want this for btrfs_extent_same. Basically readpage and friends do their own extent locking but for the purposes of dedupe, we want to have both files locked down across a set of readpage operations (so that we can compare data). Introduce this variant and a flag which can be set for extent_read_full_page() to indicate that we are already locked. This one can get stuck in TASK_UNINTERRUPTIBLE: [32129.522257] SysRq : Show Blocked State [32129.524337] taskPC stack pid father [32129.526515] python D 88021f394280 0 16281 1 0x0004 [32129.528656] 88020e079a48 0082 88013d3cdd40 88020e079fd8 [32129.530840] 88020e079fd8 88020e079fd8 8802138dc5f0 88013d3cdd40 [32129.533044] 1fff 88015286f440 0008 [32129.535285] Call Trace: [32129.537522] [816dcca9] schedule+0x29/0x70 [32129.539829] [a02b4908] wait_extent_bit+0xf8/0x150 [btrfs] [32129.542130] [8107ea00] ? finish_wait+0x80/0x80 [32129.544463] [a02b4f84] lock_extent_bits+0x44/0xa0 [btrfs] [32129.546824] [a02b4ff3] lock_extent+0x13/0x20 [btrfs] [32129.549198] [a02dc0cf] add_ra_bio_pages.isra.8+0x17f/0x2d0 [btrfs] [32129.551602] [a02dccfc] btrfs_submit_compressed_read+0x25c/0x4c0 [btrfs] [32129.554028] [a029d131] btrfs_submit_bio_hook+0x1d1/0x1e0 [btrfs] [32129.556457] [a02b2d07] submit_one_bio+0x67/0xa0 [btrfs] [32129.558899] [a02b7ecd] extent_read_full_page_nolock+0x4d/0x60 [btrfs] [32129.561290] [a02c8052] fill_data+0xb2/0x230 [btrfs] [32129.563623] [a02cd57e] btrfs_ioctl+0x1f7e/0x2560 [btrfs] [32129.565924] [816ddbae] ? _raw_spin_lock+0xe/0x20 [32129.568207] [8119b907] ? inode_get_bytes+0x47/0x60 [32129.570472] [811a8297] do_vfs_ioctl+0x97/0x560 [32129.572700] [8119bb5a] ? sys_newfstat+0x2a/0x40 [32129.574882] [811a87f1] sys_ioctl+0x91/0xb0 [32129.577008] [816e64dd] system_call_fastpath+0x1a/0x1f For anyone trying those patches, there's a fix here: https://github.com/g2p/linux/tree/v3.9%2Bbtrfs-extent-same -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] btrfs: Introduce extent_read_full_page_nolock()
We want this for btrfs_extent_same. Basically readpage and friends do their own extent locking but for the purposes of dedupe, we want to have both files locked down across a set of readpage operations (so that we can compare data). Introduce this variant and a flag which can be set for extent_read_full_page() to indicate that we are already locked. This one can get stuck in TASK_UNINTERRUPTIBLE: [32129.522257] SysRq : Show Blocked State [32129.524337] taskPC stack pid father [32129.526515] python D 88021f394280 0 16281 1 0x0004 [32129.528656] 88020e079a48 0082 88013d3cdd40 88020e079fd8 [32129.530840] 88020e079fd8 88020e079fd8 8802138dc5f0 88013d3cdd40 [32129.533044] 1fff 88015286f440 0008 [32129.535285] Call Trace: [32129.537522] [816dcca9] schedule+0x29/0x70 [32129.539829] [a02b4908] wait_extent_bit+0xf8/0x150 [btrfs] [32129.542130] [8107ea00] ? finish_wait+0x80/0x80 [32129.544463] [a02b4f84] lock_extent_bits+0x44/0xa0 [btrfs] [32129.546824] [a02b4ff3] lock_extent+0x13/0x20 [btrfs] [32129.549198] [a02dc0cf] add_ra_bio_pages.isra.8+0x17f/0x2d0 [btrfs] [32129.551602] [a02dccfc] btrfs_submit_compressed_read+0x25c/0x4c0 [btrfs] [32129.554028] [a029d131] btrfs_submit_bio_hook+0x1d1/0x1e0 [btrfs] [32129.556457] [a02b2d07] submit_one_bio+0x67/0xa0 [btrfs] [32129.558899] [a02b7ecd] extent_read_full_page_nolock+0x4d/0x60 [btrfs] [32129.561290] [a02c8052] fill_data+0xb2/0x230 [btrfs] [32129.563623] [a02cd57e] btrfs_ioctl+0x1f7e/0x2560 [btrfs] [32129.565924] [816ddbae] ? _raw_spin_lock+0xe/0x20 [32129.568207] [8119b907] ? inode_get_bytes+0x47/0x60 [32129.570472] [811a8297] do_vfs_ioctl+0x97/0x560 [32129.572700] [8119bb5a] ? sys_newfstat+0x2a/0x40 [32129.574882] [811a87f1] sys_ioctl+0x91/0xb0 [32129.577008] [816e64dd] system_call_fastpath+0x1a/0x1f Side note, I wish btrfs used TASK_KILLABLE[1] instead. [1]: https://lwn.net/Articles/288056/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/5] BTRFS hot relocation support
How will it compare to bcache? I'm currently thinking about buying an SSD but bcache requires some efforts in migrating the storage to use. And after all those hassles I am even not sure if it would work easily with a dracut generated initramfs. On the side note: dm-cache, which is already in-kernel, do not need to reformat backing storage. On the other hand dm-cache is somewhat complex to assemble, and letting the system automount the unsynchronised backing device is a recipe for data loss. It will need lvm integration to become really convenient to use. Anyway, here's a shameless plug for a tool that converts to bcache in-place: https://github.com/g2p/blocks#bcache-conversion -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible to dedpulicate read-only snapshots for space-efficient backups
Do you plan to support deduplication on a finer grained basis than file level? As an example, in the end it could be interesting to deduplicate 1M blocks of huge files. Backups of VM images come to my mind as a good candidate. While my current backup script[1] takes care of this by using rsync --inplace it won't consider files moved between two backup cycles. This is the main purpose I'm using bedup for on my backup drive. Maybe you could define another cutoff value to consider huge files for block-level deduplication? I'm considering deduplicating aligned blocks of large files sharing the same size (VMs with the same baseline. Those would ideally come pre-cowed, but rsync or scp could have broken that). It sounds simple, and was sort-of prompted by the new syscall taking short ranges, but it is tricky figuring out a sane heuristic (when to hash, when to bail, when to submit without comparing, what should be the source in the last case), and it's not something I have an immediate need for. It is also possible to use 9p (with standard cow and/or small-file dedup) and trade a bit of configuration for much more space-efficient VMs. Finer-grained tracking of which ranges have changed, and maybe some caching of range hashes, would be a good first step before doing any crazy large-file heuristics. The hash caching would actually benefit all use cases. Regards, Kai [1]: https://gist.github.com/kakra/5520370 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/5] BTRFS hot relocation support
On Tue, 07 May 2013 23:58:08 +0200, Kai Krakow wrote: Gabriel de Perthuis g2p.c...@gmail.com schrieb: On the side note: dm-cache, which is already in-kernel, do not need to reformat backing storage. On the other hand dm-cache is somewhat complex to assemble, and letting the system automount the unsynchronised backing device is a recipe for data loss. Yes, that was my first impression, too, after reading of how it works. How safe is bcache on that matter? The bcache superblock is there just to prevent the naked backing device from becoming available. So it's safe in that respect. LVM has something similar with hidden volumes. Anyway, here's a shameless plug for a tool that converts to bcache in-place: https://github.com/g2p/blocks#bcache-conversion Did I say: I love your shameless plugs? ;-) I've read the docs for this tool with interest. Still I do not feel very comfortable with converting my storage for some unknown outcome. Sure, I can take backups (and by any means: I will). But it takes time: backup, try, restore, try again, maybe restore... I don't want to find out that it was all useless because it's just not ready to boot a multi-device btrfs through dracut. So you see, the point is: Will that work? I didn't see any docs answering my questions. Try it with a throwaway filesystem inside a VM. The bcache list will appreciate the feedback on Dracut, even if you don't make the switch for real. Of course, if it would work I'd happily contribute documentation to your project. That would be very welcome. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible to deduplicate read-only snapshots for space-efficient backups
On Wed, 08 May 2013 01:04:38 +0200, Kai Krakow wrote: Gabriel de Perthuis g2p.c...@gmail.com schrieb: It sounds simple, and was sort-of prompted by the new syscall taking short ranges, but it is tricky figuring out a sane heuristic (when to hash, when to bail, when to submit without comparing, what should be the source in the last case), and it's not something I have an immediate need for. It is also possible to use 9p (with standard cow and/or small-file dedup) and trade a bit of configuration for much more space-efficient VMs. Finer-grained tracking of which ranges have changed, and maybe some caching of range hashes, would be a good first step before doing any crazy large-file heuristics. The hash caching would actually benefit all use cases. Looking back to good old peer-2-peer days (I think we all got in touch with that the one or the other way), one title pops back into my mind: tiger- tree-hash... I'm not really into it, but would it be possible to use tiger-tree-hashes to find identical blocks? Even accross different sized files... Possible, but bedup is all about doing as little io as it can get away with, doing streaming reads only when it has sampled that the files are likely duplicates and not spending a ton of disk space for indexing. Hashing everything in the hope that there are identical blocks at unrelated places on the disk is a much more resource-intensive approach; Liu Bo is working on that, following ZFS's design choices. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: don't stop searching after encountering the wrong item
The search ioctl skips items that are too large for a result buffer, but inline items of a certain size occuring before any search result is found would trigger an overflow and stop the search entirely. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641 Signed-off-by: Gabriel de Perthuis g2p.code+bt...@gmail.com --- fs/btrfs/ioctl.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 95d46cc..b3f0276 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1797,23 +1797,23 @@ static noinline int copy_to_sk(struct btrfs_root *root, for (i = slot; i nritems; i++) { item_off = btrfs_item_ptr_offset(leaf, i); item_len = btrfs_item_size_nr(leaf, i); - if (item_len BTRFS_SEARCH_ARGS_BUFSIZE) + btrfs_item_key_to_cpu(leaf, key, i); + if (!key_in_sk(key, sk)) + continue; + + if (sizeof(sh) + item_len BTRFS_SEARCH_ARGS_BUFSIZE) item_len = 0; if (sizeof(sh) + item_len + *sk_offset BTRFS_SEARCH_ARGS_BUFSIZE) { ret = 1; goto overflow; } - btrfs_item_key_to_cpu(leaf, key, i); - if (!key_in_sk(key, sk)) - continue; - sh.objectid = key-objectid; sh.offset = key-offset; sh.type = key-type; sh.len = item_len; sh.transid = found_transid; -- 1.8.2.1.419.ga0b97c6
[PATCH] btrfs: don't stop searching after encountering the wrong item
The search ioctl skips items that are too large for a result buffer, but inline items of a certain size occuring before any search result is found would trigger an overflow and stop the search entirely. Cc: sta...@vger.kernel.org Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641 Signed-off-by: Gabriel de Perthuis g2p.code+bt...@gmail.com --- (resent, with the correct header to have stable copied) fs/btrfs/ioctl.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 2c02310..f49b62f 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1794,23 +1794,23 @@ static noinline int copy_to_sk(struct btrfs_root *root, for (i = slot; i nritems; i++) { item_off = btrfs_item_ptr_offset(leaf, i); item_len = btrfs_item_size_nr(leaf, i); - if (item_len BTRFS_SEARCH_ARGS_BUFSIZE) + btrfs_item_key_to_cpu(leaf, key, i); + if (!key_in_sk(key, sk)) + continue; + + if (sizeof(sh) + item_len BTRFS_SEARCH_ARGS_BUFSIZE) item_len = 0; if (sizeof(sh) + item_len + *sk_offset BTRFS_SEARCH_ARGS_BUFSIZE) { ret = 1; goto overflow; } - btrfs_item_key_to_cpu(leaf, key, i); - if (!key_in_sk(key, sk)) - continue; - sh.objectid = key-objectid; sh.offset = key-offset; sh.type = key-type; sh.len = item_len; sh.transid = found_transid; -- 1.8.2.1.419.ga0b97c6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible to dedpulicate read-only snapshots for space-efficient backups
On Sun, 05 May 2013 12:07:17 +0200, Kai Krakow wrote: Hey list, I wonder if it is possible to deduplicate read-only snapshots. Background: I'm using an bash/rsync script[1] to backup my whole system on a nightly basis to an attached USB3 drive into a scratch area, then take a snapshot of this area. I'd like to have these snapshots immutable, so they should be read-only. Since rsync won't discover moved files but instead place a new copy of that in the backup, I'm running the wonderful bedup application[2] to deduplicate my backup drive from time to time and it almost always gains back a good pile of gigabytes. The rest of storage space issues is taken care of by using rsync's inplace option (although this won't cover the case of files moved and changed between backup runs) and using compress-force=gzip. I've read about ongoing work to integrate offline (and even online) deduplication into the kernel so that this process can be made atomic (and even block-based instead of file-based). This would - to my understandings - result in the immutable attribute no longer needed. So, given the fact above and for the case read-only snapshots cannot be used for this application currently, will these patches address the problem and read-only snapshots could be deduplicated? Or are read-only snapshots meant to be what the name suggests: Immutable, even for deduplication? There's no deep reason read-only snapshots should keep their storage immutable, they can be affected by raid rebalancing for example. The current bedup restriction comes from the clone call; Mark Fasheh's dedup ioctl[3] appears to be fine with snapshots. The bedup integration (in a branch) is a work in progress at the moment. I need to fix a scan bug, tweak parameters for the latest kernel dedup patch, remove a lot of logic that is now unnecessary, and figure out the compatibility story. Regards, Kai [1]: https://gist.github.com/kakra/5520370 [2]: https://github.com/g2p/bedup [3]: http://comments.gmane.org/gmane.comp.file-systems.btrfs/25062 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Best Practice - Partition, or not?
Hello If I want to manage a complete disk with btrfs, what's the Best Practice? Would it be best to create the btrfs filesystem on /dev/sdb, or would it be better to create just one partition from start to end and then do mkfs.btrfs /dev/sdb1? Partitions (GPT) are always more flexible and future-proof. If you ever need to shrink the btrfs filesystem and give the space to another partition, or do a conversion to lvm/bcache/luks (shameless plug: https://github.com/g2p/blocks ), it'd be stupid to be locked into your current setup for want of a few megabytes of space before your filesystem. Would the same recomendation hold true, if we're talking about huge disks, like 4TB or so? More so, since it can be infeasible to move this much data. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] Btrfs: online data deduplication
#define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \ struct btrfs_ioctl_dev_replace_args) +#define BTRFS_IOC_DEDUP_REGISTER _IO(BTRFS_IOCTL_MAGIC, 54) This number has already been used by the offline dedup patches. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] [RFC] btrfs: offline dedupe
On Sat, Apr 20, 2013 at 05:49:25PM +0200, Gabriel de Perthuis wrote: Hi, The following series of patches implements in btrfs an ioctl to do offline deduplication of file extents. I am a fan of this patch, the API is just right. I just have a few tweaks to suggest to the argument checking. Awesome, thanks for looking this over! At first the 1M limitation on the length was a bit inconvenient, but making repeated calls in userspace is okay and allows for better error recovery (for example, repeating the calls when only a portion of the ranges is identical). The destination file appears to be fragmented into 1M extents, but these are contiguous so it's not really a problem. Yeah I agree it's a bit inconvenient. To that end, I fixed things up so that instead of erroring, we just limit the dedupe to 1M. If you want to see what I'm talking about, the patch is at the top of my tree now: https://github.com/markfasheh/btrfs-extent-same/commit/b39f93c2e78385ceea850b59edbd759120543a8b This way userspace doesn't have to guess at what size is the max, and we can change it in the future, etc. Furthermore, I'm thinking it might even be better for us to just internally loop on the entire range asked for. That won't necessarily fix the issue where we fragment into 1M extents, but it would ease the interface even more. My only concern with looping over a large range would be the (almost) unbounded nature of the operation... For example, if someone passes in a 4 Gig range and 100 files to do that on, we could be in the ioctl for some time. The middle ground would be to loop like I was talking about but limit the maximum length (by just truncating the value, as above). The limit in this case would obviously be much larger than 1 megabyte but not so large that we can go off for an extreme amount of time. I'm thinking maybe 16 megabytes or so to start? A cursor-style API could work here: make the offset and length parameters in/out, exit early in case of error or after the read quota has been used up. The caller can retry as long as the length is nonzero (and at least one block), and the syscall will return frequently enough that it won't block an unmount operation or concurrent access to the ranges. Requiring the offset or the length to align is spurious however; it doesn't translate to any alignment in the extent tree (especially with compression). Requiring a minimum length of a few blocks or dropping the alignment condition entirely would make more sense. I'll take a look at this. Some of those checks are there for my own sanity at the moment. I really like that the start offset should align but there's no reason that length can't be aligned to blocksize internally. Are you sure that extents don't have to start at block boundaries? If that's the case and we never have to change the start offset (to align it) then we could drop the check entirely. I've had a look, and btrfs fiemap only sets FIEMAP_EXTENT_NOT_ALIGNED for inline extents, so the alignment requirement makes sense. The caller should do the alignment and decide if it wants to extend a bit and accept a not-same status or shrink a bit, so just keep it as is and maybe add explanatory comments. I notice there is a restriction on cross-subvolume deduplication. Hopefully it can be lifted like it was done in 3.6 for the clone ioctl. Ok if it was removed from clone then this might be a spurious check on my part. Most of the real extent work in btrfs-same is done by the code from the clone ioctl. Good to have this code shared (compression support is another win). bedup will need feature parity to switch from one ioctl to the other. Deduplicating frozen subvolumes works fine, which is great for backups. Basic integration with bedup, my offline deduplication tool, is in an experimental branch: https://github.com/g2p/bedup/tree/wip/dedup-syscall Thanks to this, I look forward to shedding most of the caveats given in the bedup readme and some unnecessary subtleties in the code. Again, I'm really glad this is working out for you :) I'll check out your bedup patch early this week. It will be instructive to see how another engineer uses the ioctl. See ranges_same and dedup_fileset. The ImmutableFDs stuff can be removed and the fact that dedup can be partially successful over a range will ripple through. I've made significant updates and changes from the original. In particular the structure passed is more fleshed out, this series has a high degree of code sharing between itself and the clone code, and the locking has been updated. The ioctl accepts a struct: struct btrfs_ioctl_same_args { __u64 logical_offset; /* in - start of extent in source */ __u64 length; /* in - length of extent */ __u16 total_files; /* in - total elements in info array */ Nit: total_files sounds like it would count the source file. dest_count would be better. By the way, extent-same might
Re: [PATCH 0/4] [RFC] btrfs: offline dedupe
Hi, The following series of patches implements in btrfs an ioctl to do offline deduplication of file extents. I am a fan of this patch, the API is just right. I just have a few tweaks to suggest to the argument checking. At first the 1M limitation on the length was a bit inconvenient, but making repeated calls in userspace is okay and allows for better error recovery (for example, repeating the calls when only a portion of the ranges is identical). The destination file appears to be fragmented into 1M extents, but these are contiguous so it's not really a problem. Requiring the offset or the length to align is spurious however; it doesn't translate to any alignment in the extent tree (especially with compression). Requiring a minimum length of a few blocks or dropping the alignment condition entirely would make more sense. I notice there is a restriction on cross-subvolume deduplication. Hopefully it can be lifted like it was done in 3.6 for the clone ioctl. Deduplicating frozen subvolumes works fine, which is great for backups. Basic integration with bedup, my offline deduplication tool, is in an experimental branch: https://github.com/g2p/bedup/tree/wip/dedup-syscall Thanks to this, I look forward to shedding most of the caveats given in the bedup readme and some unnecessary subtleties in the code. I've made significant updates and changes from the original. In particular the structure passed is more fleshed out, this series has a high degree of code sharing between itself and the clone code, and the locking has been updated. The ioctl accepts a struct: struct btrfs_ioctl_same_args { __u64 logical_offset; /* in - start of extent in source */ __u64 length; /* in - length of extent */ __u16 total_files; /* in - total elements in info array */ Nit: total_files sounds like it would count the source file. dest_count would be better. By the way, extent-same might be better named range-same, since there is no need for the input to fall on extent boundaries. __u16 files_deduped;/* out - number of files that got deduped */ __u32 reserved; struct btrfs_ioctl_same_extent_info info[0]; }; Userspace puts each duplicate extent (other than the source) in an item in the info array. As there can be multiple dedupes in one operation, each info item has it's own status and 'bytes_deduped' member. This provides a number of benefits: - We don't have to fail the entire ioctl because one of the dedupes failed. - Userspace will always know how much progress was made on a file as we always return the number of bytes deduped. #define BTRFS_SAME_DATA_DIFFERS 1 /* For extent-same ioctl */ struct btrfs_ioctl_same_extent_info { __s64 fd; /* in - destination file */ __u64 logical_offset; /* in - start of extent in destination */ __u64 bytes_deduped;/* out - total # of bytes we were able * to dedupe from this file */ /* status of this dedupe operation: * 0 if dedup succeeds * 0 for error * == BTRFS_SAME_DATA_DIFFERS if data differs */ __s32 status; /* out - see above description */ __u32 reserved; }; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with building instructions for btrfs-tools in https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories
There is a missing dependency: liblzo2-dev I suggest to make amendment to the wiki and add a liblzo2-dev to the apt-get line for Ubuntu/Debian. Added. Other distros may need some additions too. Anyone can edit the wiki, as the spambots will attest; a ConfirmEdit captcha at signup would be nice. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Permanent uncancellable balance
Hello, I have a filesystem that has become unusable because of a balance I can't stop. It is very close to full, and the balance is preventing me from growing it. It was started like this: sudo btrfs filesystem balance start -v -musage=60 -dusage=60 /srv/backups It has been stuck at 0% across reboots and kernel upgrades (currently on 3.8.1), and cancelling it had no effect: Balance on '/srv/backups' is running 0 out of about 5 chunks balanced (95 considered), 100% left According to atop it is writing but not reading anything. Unmounting never terminates, so does remounting ro, the only way to temporarilly kill it is to reboot. SIGKILL has no effect either. Is there *any* way I can get rid of it? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Permanent uncancellable balance
On Sat, 02 Mar 2013 17:12:37 +0600, Roman Mamedov wrote: Mount with the skip_balance option https://btrfs.wiki.kernel.org/index.php/Mount_options then you can issue btrfs fi balance cancel and it will succeed. Excellent, thank you. I had just thought of doing the same thing with ro and it worked. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
Hi, After mounting the system with noatime the problem disappeared, like in magic. Incidentally, the current version of bedup uses a private mountpoint with noatime whenever you don't give it the path to a mounted volume. You can use it with no arguments or designate a filesystem by its uuid or /dev path. All the writes must have came from the dealyed metadata copy process. Once all the metadata copy-update was done, file system speed was back to normal, but once the new day broke out, all the copying business needed to done again... This in 100% describes all the odd behavior. In particular apparently the problem had nothing to do with my complex block device setup, nor with bedup, nor with unison. Thank you again, Andrew! P.S. Maybe it is not be decided by me, but this small message about performance (not even labeled as warning) in https://btrfs.wiki.kernel.org/index.php/Mount_options IMHO should have been made more conspicuous, maybe put somewhere when the snapshot mechanism is described or in FAQ. I'll try to fix it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Corruption at start of files
Here is what I see in my kern.log (see below). For me this first happened when the filesystem was close to full (less than 1GB left), but someone on the irc channel mentioned a similar problem on suspend to ram. The files that have checksum failures end up with their first 4k filled with 0x01 bytes. They were seeing a lot of writes; things like firefox session data and cookie data, plus files that disappeared before I could call inode-resolve on them. I was running 3.6.3 when this happened; I've upgraded to -rcs since but I haven't tried to reproduce the bug deliberately. I didn't see relevant changes in the changelog. Oct 31 17:06:31 moulinex kernel: [93539.008465] BTRFS warning (device dm-16): Aborting unused transaction. Oct 31 17:06:31 moulinex kernel: [93539.011257] BTRFS warning (device dm-16): Aborting unused transaction. Oct 31 17:06:31 moulinex kernel: [93539.017137] BTRFS warning (device dm-16): Aborting unused transaction. Oct 31 17:06:46 moulinex kernel: [93554.728793] use_block_rsv: 16 callbacks suppressed Oct 31 17:06:46 moulinex kernel: [93554.728795] btrfs: block rsv returned -28 Oct 31 17:06:46 moulinex kernel: [93554.728796] [ cut here ] Oct 31 17:06:46 moulinex kernel: [93554.728818] WARNING: at /home/apw/COD/linux/fs/btrfs/extent-tree.c:6323 use_block_rsv+0x19f/0x1b0 [btrfs]() Oct 31 17:06:46 moulinex kernel: [93554.728819] Hardware name: System Product Name Oct 31 17:06:46 moulinex kernel: [93554.728820] Modules linked in: snd_seq_dummy vhost_net macvtap macvlan xt_recent bnep rfcomm bluetooth snd_hrtimer nls_utf8 sch_fq_codel ebtable_nat ebtables xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat bridge stp llc ppdev lp parport deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 glue_helper lrw serpent_generic xts gf128mul blowfish_generic blowfish_x86_64 blowfish_common cast5 des_generic xcbc rmd160 sha512_generic crypto_null af_key xfrm_algo binfmt_misc dm_crypt snd_hda_codec_hdmi snd_hda_codec_realtek eeepc_wmi asus_wmi sparse_keymap coretemp kvm_intel kvm dm_multipath scsi_dh microcode arc4 joydev snd_hda_intel snd_hda_codec snd_hwdep snd_pcm rt61pci rt2x00pci rt2x00lib snd_seq_midi snd_rawmidi mac80211 snd_seq_midi_event snd_seq snd_timer snd_seq_device snd cfg80211 soundcore snd_page_alloc eeprom_93cx6 serio_raw lpc_ich mei mac_hid k8temp hw mon_vid i2c_nforce2 firewire_sbp2 firew Oct 31 17:06:46 moulinex kernel: ire_core crc_itu_t psmouse ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_multiport xt_limit xt_tcpudp xt_addrtype xt_state ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables btrfs zlib_deflate libcrc32c raid10 raid0 multipath linear raid456 async_pq async_xor xor async_memcpy async_raid6_recov hid_generic raid6_pq async_tx hid_cherry usbhid hid raid1 ghash_clmulni_intel sata_via wmi aesni_intel ablk_helper cryptd aes_x86_64 r8169 i915 drm_kms_helper drm i2c_algo_bit video [last unloaded: ipmi_msghandler] Oct 31 17:06:46 moulinex kernel: [93554.728873] Pid: 2230, comm: btrfs-endio-wri Tainted: GW3.6.3-030603-generic #201210211349 Oct 31 17:06:46 moulinex kernel: [93554.728874] Call Trace: Oct 31 17:06:46 moulinex kernel: [93554.728880] [81056f6f] warn_slowpath_common+0x7f/0xc0 Oct 31 17:06:46 moulinex kernel: [93554.728882] [81056fca] warn_slowpath_null+0x1a/0x20 Oct 31 17:06:46 moulinex kernel: [93554.728889] [a01feedf] use_block_rsv+0x19f/0x1b0 [btrfs] Oct 31 17:06:46 moulinex kernel: [93554.728897] [a020260d] btrfs_alloc_free_block+0x3d/0x220 [btrfs] Oct 31 17:06:46 moulinex kernel: [93554.728904] [a01ef38d] ? balance_level+0xcd/0x890 [btrfs] Oct 31 17:06:46 moulinex kernel: [93554.728906] [81332e10] ? rb_insert_color+0x110/0x150 Oct 31 17:06:46 moulinex kernel: [93554.728916] [a022f16c] ? read_extent_buffer+0xbc/0x120 [btrfs] Oct 31 17:06:46 moulinex kernel: [93554.728918] [81178ebd] ? kmem_cache_alloc_trace+0x12d/0x150 Oct 31 17:06:46 moulinex kernel: [93554.728925] [a01ee3b2] __btrfs_cow_block+0x122/0x4f0 [btrfs] Oct 31 17:06:46 moulinex kernel: [93554.728927] [81136892] ? set_page_dirty+0x62/0x70 Oct 31 17:06:46 moulinex kernel: [93554.728930] [8169f37e] ? _raw_spin_lock+0xe/0x20 Oct 31 17:06:46 moulinex kernel: [93554.728936] [a01ee87c] btrfs_cow_block+0xfc/0x220 [btrfs] Oct 31 17:06:46 moulinex kernel: [93554.728943] [a01f29f8] btrfs_search_slot+0x368/0x740 [btrfs] Oct 31 17:06:46 moulinex kernel: [93554.728951] [a0206e84] btrfs_lookup_csum+0x74/0x190 [btrfs] Oct 31 17:06:46 moulinex kernel: [93554.728953] [81179cfc] ? kmem_cache_alloc+0x11c/0x150 Oct 31 17:06:46 moulinex kernel: [93554.728960]
Re: [PATCH][BTRFS-PROGS] Enhance btrfs fi df
On Fri, 02 Nov 2012 13:02:32 +0100, Goffredo Baroncelli wrote: On 2012-11-02 12:18, Martin Steigerwald wrote: Metadata, DUP is displayed as 3,50GB on the device level and as 1,75GB in total. I understand the logic behind this, but this could be a bit confusing. But it makes sense: Showing real allocation on device level makes sense, cause thats what really allocated on disk. Total makes some sense, cause thats what is being used from the tree by BTRFS. Yes, me too. At the first I was confused when you noticed this discrepancy. So I have to admit that it is not so obvious to understand. However we didn't find any way to make it more clear... It still looks confusing at first… We could use Chunk(s) capacity instead of total/size ? I would like an opinion from a english people point of view.. This is easy to fix, here's a mockup: Metadata,DUP: Size: 1.75GB ×2, Used: 627.84MB ×2 /dev/dm-03.50GB Data Metadata MetadataSystem System Single Single DUP Single DUP Unallocated /dev/dm-16 1.31TB 8.00MB 56.00GB4.00MB 16.00MB 0.00 == === == === === Total 1.31TB 8.00MB 28.00GB ×2 4.00MB 8.00MB ×20.00 Used 1.31TB 0.00 5.65GB ×2 0.00 152.00KB ×2 Also, I don't know if you could use libblkid, but it finds more descriptive names than dm-NN (thanks to some smart sorting logic). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][BTRFS-PROGS] Enhance btrfs fi df
On Fri, 02 Nov 2012 20:31:56 +0100, Goffredo Baroncelli wrote: On 11/02/2012 08:05 PM, Gabriel wrote: On Fri, 02 Nov 2012 13:02:32 +0100, Goffredo Baroncelli wrote: On 2012-11-02 12:18, Martin Steigerwald wrote: Metadata, DUP is displayed as 3,50GB on the device level and as 1,75GB in total. I understand the logic behind this, but this could be a bit confusing. But it makes sense: Showing real allocation on device level makes sense, cause thats what really allocated on disk. Total makes some sense, cause thats what is being used from the tree by BTRFS. Yes, me too. At the first I was confused when you noticed this discrepancy. So I have to admit that it is not so obvious to understand. However we didn't find any way to make it more clear... It still looks confusing at first… We could use Chunk(s) capacity instead of total/size ? I would like an opinion from a english people point of view.. This is easy to fix, here's a mockup: Metadata,DUP: Size: 1.75GB ×2, Used: 627.84MB ×2 /dev/dm-03.50GB Data Metadata MetadataSystem System Single Single DUP Single DUP Unallocated /dev/dm-16 1.31TB 8.00MB 56.00GB4.00MB 16.00MB 0.00 == === == === === Total 1.31TB 8.00MB 28.00GB ×2 4.00MB 8.00MB ×20.00 Used 1.31TB 0.00 5.65GB ×2 0.00 152.00KB ×2 Nice idea. Even tough I like the opposite: Data Metadata MetadataSystem System Single Single DUP Single DUP Unallocated /dev/dm-16 1.31TB 8.00MB 28.00GB x2 4.00MB 8.00MB x20.00 == === == === === Total 1.31TB 8.00MB 28.00GB4.00MB 8.00MB 0.00 Used 1.31TB 0.00 5.65GB 0.00 152.00KB However how your solution will became when RAID5/RAID6 will arrive ? mmm may be the solution is simpler: the x2 factor is applied only to DUP profile. The other profiles span different disks. That problem solved itself :) As another option, we can add a field/line which reports the RAID factor: Metadata,DUP: Size: 1.75GB, Used: 627.84MB, Raid factor: 2x /dev/dm-03.50GB Data Metadata Metadata System System Single Single DUP Single DUPUnallocated /dev/dm-16 1.31TB 8.00MB 56.00GB 4.00MB 16.00MB0.00 == == === Raid factor -- x2 - x2 - Total 1.31TB 8.00MB 28.00GB 4.00MB 8.00MB0.00 Used 1.31TB 0.00 5.65GB 0.00 152.00KB All fine options. Though if you remove the ×2 on the totals line, you should compute it instead (it looks like a tally, both sides of the == line should be equal). Now that I've started bikeshedding, here is something that I would find pretty much ideal: DataMetadata System Unallocated VolGroup/Btrfs Reserved 1.31TB 8.00MB + 2×28.00MB 16.00MB + 2×4.00MB - Used 1.31TB 2× 5.65GB 2×152.00KB - === == == === Total Reserved 1.31TB56.00GB24.00MB - Used 1.31TB11.30GB 304.00KB - Free 12.34GB44.70GB23.70MB - Also, I don't know if you could use libblkid, but it finds more descriptive names than dm-NN (thanks to some smart sorting logic). I don't think that it would be impossible to use libblkid, however it would be difficult to find spaces for longer device name I suggest cutting out the /dev and putting a line break after the name. The extra info makes it more human-friendly, and the line break may complicate machine parsing but the non-tabular format is better at that anyway. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][BTRFS-PROGS] Enhance btrfs fi df
On Fri, 02 Nov 2012 22:06:04 +, Hugo Mills wrote: On Fri, Nov 02, 2012 at 07:05:37PM +, Gabriel wrote: On Fri, 02 Nov 2012 13:02:32 +0100, Goffredo Baroncelli wrote: On 2012-11-02 12:18, Martin Steigerwald wrote: Metadata, DUP is displayed as 3,50GB on the device level and as 1,75GB in total. I understand the logic behind this, but this could be a bit confusing. But it makes sense: Showing real allocation on device level makes sense, cause thats what really allocated on disk. Total makes some sense, cause thats what is being used from the tree by BTRFS. Yes, me too. At the first I was confused when you noticed this discrepancy. So I have to admit that it is not so obvious to understand. However we didn't find any way to make it more clear... It still looks confusing at first… We could use Chunk(s) capacity instead of total/size ? I would like an opinion from a english people point of view.. This is easy to fix, here's a mockup: Metadata,DUP: Size: 1.75GB ×2, Used: 627.84MB ×2 /dev/dm-03.50GB I've not considered the full semantics of all this yet -- I'll try to do that tomorrow. However, I note that the ×2 here could become non-integer with the RAID-5/6 code (which is due Real Soon Now). In the first RAID-5/6 code drop, it won't even be simple to calculate where there are different-sized devices in the filesystem. Putting an exact figure on that number is potentially going to be awkward. I think we're going to need kernel help for working out what that number should be, in the general case. DUP can be nested below a device because it represents same-device redundancy (purpose: survive smudges but not device failure). On the other hand raid levels should occupy the same space on all linked devices (a necessary consequence of the guarantee that RAID5 can survive the loss of any device and RAID6 any two devices). The two probably won't need to be represented at the same time except during a reshape, because I imagine DUP gets converted to RAID (1 or 5) as soon as the second device is added. A 1→2 reshape would look a bit like this (doing only the data column and skipping totals): InitialDevice Reserved 1.21TB Used 1.21TB RAID1(InitialDevice, SecondDevice) Reserved 1.31TB + 100GB Used 2× 100GB RAID5, RAID6: same with fractions, n+1⁄n and n+2⁄n. Again, I'm raising minor points based on future capabilities, but I feel it's worth considering them at this stage, even if the correct answer is yes, we'll do this now, and deal with any other problems later. Hugo. Data Metadata MetadataSystem System Single Single DUP Single DUP Unallocated /dev/dm-16 1.31TB 8.00MB 56.00GB4.00MB 16.00MB 0.00 == === == === === Total 1.31TB 8.00MB 28.00GB ×2 4.00MB 8.00MB ×20.00 Used 1.31TB 0.00 5.65GB ×2 0.00 152.00KB ×2 Also, I don't know if you could use libblkid, but it finds more descriptive names than dm-NN (thanks to some smart sorting logic). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][BTRFS-PROGS] Enhance btrfs fi df
On Fri, 02 Nov 2012 21:46:35 +, Michael Kjörling wrote: On 2 Nov 2012 20:40 +, from g2p.c...@gmail.com (Gabriel): Now that I've started bikeshedding, here is something that I would find pretty much ideal: DataMetadata System Unallocated VolGroup/Btrfs Reserved 1.31TB 8.00MB + 2×28.00GB 16.00MB + 2×4.00MB - Used 1.31TB 2× 5.65GB 2×152.00KB === == == === Total Reserved 1.31TB56.00GB24.00MB - Used 1.31TB11.30GB 304.00KB Free 12.34GB44.70GB23.70MB - If we can take such liberties, then why bother with the 2× at all? It does save a line. Also, I think the B can go, since it's implied by talking about storage capacities. A lot of tools do this already; look at GNU df -h and ls -lh for just two examples. That gives you a few extra columns which can be used to make the table column spacing a little bigger even in an 80-column terminal. Good idea. I'm _guessing_ that you meant for metadata reserved to be 2 × 28 GB and not 2 × 28 MB, because otherwise the numbers really don't add up. Feh, that's just a typo from when I swapped the 8.00M to the left. DataMetadataSystemUnallocated VolGroup/Btrfs Reserved 1.31T 8.00M + 28.00G 16.00M + 4.00M- ResRedun - 28.00G 4.00M- Used 1.31T 5.65G 152.00K- UseRedun - 5.65G 152.00K- === == === Total Reserved 1.31T 56.01G24.00M- Used 1.31T 11.30G 304.00K- Free 12.34G 44.71G23.70M- This way, the numbers should add up nicely. (Redun for redundancy or something like that.) 8M + 28G + 28G = 56.01G, 5.65G + 5.65G = 11.30G, 56.01G - 11.30G = 44.71G. I'm not sure you couldn't even work 8.00M + 28.00G into a single 28.01G entry at Reserved/Metadata, with ResRedun/Metadata 28.00G. That would require some care when the units are different enough that the difference doesn't show up in the numbers, though, since then there is nothing to indicate that parts of the metadata is not stored in a redundant fashion. I tried to work out DUP vs RAID redundancy in my message to Hugo. If some redundancy scheme (RAID 5?) uses an oddball factor, that can still easily be expressed in a view like the above simply by displaying the user data and redundancy data separately, in exactly the same way. And personally, I feel that a summary view like this, for Data, if an exact number cannot be calculated, should display the _minimum amount of available free space_, with free space being _usable by user files_. If I start copying a 12.0GB file onto the file system exemplified above, I most assuredly _don't_ want to get a report of device full after 10 GB! (You mating female dog, you told me I had 12.3 GB free, wrote 10 GB and now you're saying there's NO free space?! To hell with this, I'm switching to Windows!) That also saves this tool from having to take into account possible compression ratios for when file system level compression is enabled, savings from possible deduplication of data, etc etc. Of course it also means that the amount of free space may shrink by less than the size of the added data, but hey, that's a nice bonus if your disk grows bigger as you add more data to it. :-) I think we can guarantee minimum amounts of free space, as long as data/metadata/system are segregated properly? OK, reshapes complicate this. For those we could to take the worst case between now and the completed reshape. Or maybe add a second tally: devices === total reserved used free === anticipated (reshaped 8% eta 3:12) reserved used free I suggest cutting out the /dev and putting a line break after the name. The extra info makes it more human-friendly, and the line break may complicate machine parsing but the non-tabular format is better at that anyway. That might work well for anything under /dev, but what about things that aren't? Absolute path for those, assuming it ever happens. And I stand by my earlier position that the tabular data shouldn't be machine-parsed anyway. As you say, the non-tabular format is better for that. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: find-new possibility of showing modified and deleted files/directories
On Thu, 01 Nov 2012 06:06:57 +0100, Arne Jansen wrote: On 11/01/2012 02:28 AM, Shane Spencer wrote: That's Plan B. I'll be making a btrfs stream decoder and doing in place edits. I need to move stuff around to other filesystem types otherwise I'd just store the stream or apply the stream to a remote snapshot. That's the whole point of the btrfs-send design: It's very easy to receive on different filesystems. A generic receiver is in preparation. And to make it even more generic: A sender using the same stream format is also in preparation for zfs. Consider the rsync bundle format as well. That should provide interoperability with any filesystem. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: find-new possibility of showing modified and deleted files/directories
On Thu, 01 Nov 2012 12:29:36 +0100, Arne Jansen wrote: On 01.11.2012 12:00, Gabriel wrote: On Thu, 01 Nov 2012 06:06:57 +0100, Arne Jansen wrote: On 11/01/2012 02:28 AM, Shane Spencer wrote: That's Plan B. I'll be making a btrfs stream decoder and doing in place edits. I need to move stuff around to other filesystem types otherwise I'd just store the stream or apply the stream to a remote snapshot. That's the whole point of the btrfs-send design: It's very easy to receive on different filesystems. A generic receiver is in preparation. And to make it even more generic: A sender using the same stream format is also in preparation for zfs. Consider the rsync bundle format as well. That should provide interoperability with any filesystem. Rsync is an interactive protocol. The idea with send/receive is that the stream can be generated without any interactions with receiver. You can store the stream somewhere, or replay it to many destinations. Same with rsync's batch mode. Here is more about it: http://manpages.ubuntu.com/manpages/precise/man1/rsync.1.html#contenttoc21 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Systemcall for offline deduplication
On Thu, 25 Oct 2012 23:26:14 -0700, Darrick J. Wong wrote: Now, here's my proposal for fixing that: A BTRFS_IOC_SAME_RANGE ioctl would be ideal. Takes two file descriptors, two offsets, one length, does some locking, checks that the ranges are identical (returns EINVAL if not), and defers to an implementation that works like clone_range with the metadata update and the writable volume restriction moved out. I didn't go with something block-based or extent-based because with compression and fragmentation, extents would very easily fail to be aligned. Thoughts on this interface? Anyone interested in getting this implemented, or at least providing some guidance and patch review? This sounds quite a bit like what Josef had proposed with the FILE_EXTENT_SAME ioctl a couple of years ago[1]. At the time, he was only interested in writing a userland dedupe program for various reasons, and afaict it hasn't gone anywhere. If you're going to do the comparing from userspace, I'd imagine you ought to have a better method to pin an extent than chattr +i... The immutable hack is a bit lame, but it will have to stay until we get a good kernel API. I guess you could create a temporary file, F_E_S the parts of the files you're trying to compare into the temp file, link together whichever parts you want to, and punch_hole the entire temp file before moving on. I think it's the case that if the candidate files get rewritten during the dedupe operation, the new data will be written elsewhere; the punch hole operation will release the disk space if its refcount becomes zero. The FILE_EXTENT_SAME proposal is not the one I'd prefer. The parameters (fds, offsets, one length) are fine. It's not as extent-based as the name implies (no extents in the parameters), except that it sill needs a single extent on the left side, which won't work for fragmented files. That alone may be worked around by creating a new tempfile to use on the source side, but that has downsides: it will unshare extents and might actually increase disk use, and it won't work on read-only snapshots. It is better to just pass fragmented offsets to the kernel and not put workarounds that reduce visibility for the implementation. The restrictions for compressed or encrypted files and cross-subvolume dedup are also inconvenient. That makes me more interested in an implementation based on clone_range, which has neither limitation. That's the proposal above. The offline dedupe scheme seems like a good way to reclaim disk space if you don't mind having fewer copies of data. I'm happy with the gains, although they are entirely dependent on having a lot of redundant data in the first place. The messier the better. As for online dedupe (which seems useful for reducing writes), would it be useful if one could, given a write request, compare each of the dirty pages in that request against whatever else the fs has loaded in the page cache, and try to dedupe against that? We could probably speed up the search by storing hashes of whatever we have in the page cache and using that to find candidates for the memcmp() test. This of course is not a comprehensive solution, but (a) we combine it with offline dedupe later and (b) we don't make a disk write out data that we've recently read or written. Obviously you'd want to be able to opt-in to this sort of thing with an inode flag or something. That's another kettle of fish, and will require an entirely different approach. ZFS has some experience doing that. While their implementation may reduce writes it is at the cost of storing hashes of every block in RAM. [1] http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg07779.html [1] https://github.com/g2p/bedup#readme -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Systemcall for offline deduplication
As for online dedupe (which seems useful for reducing writes), would it be useful if one could, given a write request, compare each of the dirty pages in that request against whatever else the fs has loaded in the page cache, and try to dedupe against that? We could probably speed up the search by storing hashes of whatever we have in the page cache and using that to find candidates for the memcmp() test. This of course is not a comprehensive solution, but (a) we combine it with offline dedupe later and (b) we don't make a disk write out data that we've recently read or written. Obviously you'd want to be able to opt-in to this sort of thing with an inode flag or something. That's another kettle of fish, and will require an entirely different approach. ZFS has some experience doing that. While their implementation may reduce writes it is at the cost of storing hashes of every block in RAM. Though your proposal is quite different from the ZFS thing, and might actually be useful for a larger public, so forget I said anything about it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix a sign bug causing invalid memory access in the ino_paths ioctl.
To see the problem, create many hardlinks to the same file (120 should do it), then look up paths by inode with: ls -i btrfs inspect inode-resolve -v $ino /mnt/btrfs I noticed the memory layout of the fspath-val data had some irregularities (some unnecessary gaps that stop appearing about halfway), so I'm not sure there aren't any bugs left in it. --- fs/btrfs/backref.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 868cf5b..29d05c6 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1131,7 +1131,7 @@ char *btrfs_iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, int slot; u64 next_inum; int ret; - s64 bytes_left = size - 1; + s64 bytes_left = ((s64)size) - 1; struct extent_buffer *eb = eb_in; struct btrfs_key found_key; int leave_spinning = path-leave_spinning; -- 1.7.12.117.gdc24c27 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] warn when skipping snapshots created with older
Thanks, I fixed the objectid test. Apply with --scissors. -- 8 -- Subject: [PATCH] btrfs send: warn when skipping snapshots created with older kernels. This message is more explicit than ERROR: could not resolve root_id, the message that will be shown immediately before `btrfs send` bails. Also skip invalid high OIDs, to prevent spurious warnings. Signed-off-by: Gabriel de Perthuis g2p.code+bt...@gmail.com --- send-utils.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/send-utils.c b/send-utils.c index a43d47e..03ca72a 100644 --- a/send-utils.c +++ b/send-utils.c @@ -224,13 +224,18 @@ int subvol_uuid_search_init(int mnt_fd, struct subvol_uuid_search *s) if ((sh-objectid != 5 sh-objectid BTRFS_FIRST_FREE_OBJECTID) || - sh-objectid == BTRFS_FREE_INO_OBJECTID) + sh-objectid BTRFS_LAST_FREE_OBJECTID) goto skip; if (sh-type == BTRFS_ROOT_ITEM_KEY) { /* older kernels don't have uuids+times */ if (sh-len sizeof(root_item)) { root_item_valid = 0; + fprintf(stderr, + Ignoring subvolume id %llu, + btrfs send needs snapshots + created with kernel 3.6+\n, + sh-objectid); goto skip; } root_item_ptr = (struct btrfs_root_item *) -- 1.7.12.117.gdc24c27 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Warn when skipping snapshots created with older kernels.
This message is more explicit than ERROR: could not resolve root_id, the message that will be shown immediately before `btrfs send` bails. Also skip invalid high OIDs. Signed-off-by: Gabriel de Perthuis g2p.code+bt...@gmail.com --- send-utils.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/send-utils.c b/send-utils.c index a43d47e..386aeb3 100644 --- a/send-utils.c +++ b/send-utils.c @@ -224,6 +224,7 @@ int subvol_uuid_search_init(int mnt_fd, struct subvol_uuid_search *s) if ((sh-objectid != 5 sh-objectid BTRFS_FIRST_FREE_OBJECTID) || + sh-objectid = BTRFS_LAST_FREE_OBJECTID || sh-objectid == BTRFS_FREE_INO_OBJECTID) goto skip; @@ -231,6 +232,11 @@ int subvol_uuid_search_init(int mnt_fd, struct subvol_uuid_search *s) /* older kernels don't have uuids+times */ if (sh-len sizeof(root_item)) { root_item_valid = 0; + fprintf(stderr, + Ignoring subvolume id %llu, + btrfs send needs snapshots + created with kernel 3.6+\n, + sh-objectid); goto skip; } root_item_ptr = (struct btrfs_root_item *) -- 1.7.12.117.gdc24c27 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bcp: fix off-by-one errors in path handling
This fixes a bug which causes the first character of each filename in the destination to be omitted. Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro --- bcp |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/bcp b/bcp index 5729e91..c6b4bef 100755 --- a/bcp +++ b/bcp @@ -137,7 +137,7 @@ for srci in xrange(0, src_args): statinfo = os.lstat(srcname) if srcname.startswith(src): -part = srcname[len(src) + 1:] +part = srcname[len(src):] if stat.S_ISLNK(statinfo.st_mode): copylink(srcname, dst, part, statinfo, None) @@ -153,7 +153,7 @@ for srci in xrange(0, src_args): for f in filenames: srcname = os.path.join(dirpath, f) if srcname.startswith(src): -part = srcname[len(src) + 1:] +part = srcname[len(src):] statinfo = os.lstat(srcname) copyfile(srcname, dst, part, statinfo, None) -- 1.6.4.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html