Re: how many chunk trees and extent trees present
On 04/18/2015 01:29 AM, David Sterba wrote: On Fri, Apr 17, 2015 at 09:19:11AM +, Hugo Mills wrote: In, some article i read that future there will be more chunk tree/ extent tree for single btrfs. Is this true. I recall, many moons ago, Chris saying that there probably wouldn't be. More extent trees tied to a set of fs trees/subvolumes would be very useful for certain usecases *cough*encryption*cough*. I didn't understand in full what's the idea here, but let met try.. would it not defeat the purpose of encryption which is not to let disk have the un-encrypted data ? Looks like I am missing something here. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Experimental btrfs encryption
The way btrfs encryption interacts with the keyring APIs is important, and "reconsidering later" will potentially represent an API/ABI break. Please keep it in mind. Yep. We would take considerable time to get the API frozen and integrated, as once its in, its there forever. So there are warnings as experimental / RFC. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Show a warning message if one of highest objectid reaches its max value
- It's better to show a warning message for the exceptional case that one of highest objectid (in most case, inode number) reaches its max value, BTRFS_LAST_FREE_OBJECTID. Show this message only once to avoid filling dmesg with it. - EOVERFLOW is more proper return value for this case. ENOSPC is for "No space left on device" case and objectid isn't related to any device. Signed-off-by: Satoru Takeuchi--- This patch can be applied to v4.5-rc6 --- fs/btrfs/inode-map.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index e50316c..a4860fd 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/inode-map.c @@ -555,8 +555,10 @@ int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid) int ret; mutex_lock(>objectid_mutex); - if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) { - ret = -ENOSPC; + if (WARN_ONCE(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID, + "BTRFS: The highest objectid reaches its max value %llu.\n", + BTRFS_LAST_FREE_OBJECTID)) { + ret = -EOVERFLOW; goto out; } -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Byte comparison before hashing
Nikolai Viktorovich wrote on 2016/03/03 20:50 -0300: First of all: Thanks for the great work on btrfs! How about taking only few bytes of data from each block and compare those, before attempting to hash? This way you could quickly discard blocks with really no similar data and hash only ones that matched the "first look" byte comparison? This could lead to less CPU consumption, since hashing is an intense operation, and to less memory footprint, since you don't need to store all those non-dedupable hashes. I assume you're talking about in-band dedup. For that case, byte-by-byte comparison is in our future planned feature list, for using faster but more conflicts algorithm. The problem we don't do it at beginning(and even not some time soon) is: 1) Lack of facility to read page without inode Read page inside one inode is easy, but without inode, maybe I'm missing something, but at least I didn't find a proper and easy facility to do it now. 2) Delayed ref hell Current implement is already spending a lot of code to handling possible delayed ref problem. If doing byte-by-byte comparison, we need to handle delayed ref for both extents(the one we read to compare and the one we are writing back) I prefer to do it only after we have a better way to deal with delayed ref. E.g only permit run_delayed_ref() inside commit_trans(). 3) Performance If one hash algorithm needs byte-to-byte comparison, then not the beginning, but the whole range must be read out and do the comparison. Some hash, IIRC md5, is easy to conflict, one user(attacker) can easily build two blocks, which have the same md5 and the same several beginning bytes. Due to this, if we do it performance may be impacted for hash hit case.(depending on the memory pressure). So we may finally add such feature for inband dedup, but not now, not anytime soon. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stray 4k extents with slow buffered writes
On Thu, Mar 03, 2016 at 02:13:09PM -0800, Liu Bo wrote: > On Thu, Mar 03, 2016 at 10:50:58PM +0100, Holger Hoffstätte wrote: > > On 03/03/16 21:47, Austin S. Hemmelgarn wrote: > > >> $mount | grep sdf > > >> /dev/sdf1 on /mnt/usb type btrfs > > >> (rw,relatime,space_cache=v2,subvolid=5,subvol=/) > > > Do you still see the same behavior with the old space_cache format? > > > This appears to be an issue of space management and allocation, so > > > this may be playing a part. > > > > I just did the clear_cache,space_cache=v1 dance. Now a download with > > bandwidth-limit=1M, dirty_expire=20s, commit=30 and *no* autodefrag > > first ended up looking like this: > > > > $filefrag -ek linux-4.5-rc6.tar.xz > > Filesystem type is: 9123683e > > File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes) > > ext: logical_offset:physical_offset: length: expected: flags: > >0:0..7427: 227197920.. 227205347: 7428: > >1: 7428.. 33027: 227205348.. 227230947: 25600: > >2:33028.. 53011: 227271164.. 227291147: 19984: 227230948: > >3:53012.. 72995: 227291148.. 227311131: 19984: > >4:72996.. 86291: 227311132.. 227324427: 13296: > > last,eof > > linux-4.5-rc6.tar.xz: 2 extents found > > > > Yay! But wait, there's more! > > > > $sync > > $filefrag -ek linux-4.5-rc6.tar.xz > > Filesystem type is: 9123683e > > File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes) > > ext: logical_offset:physical_offset: length: expected: flags: > >0:0..7423: 227197920.. 227205343: 7424: > >1: 7424..7427: 227169600.. 227169603: 4: 227205344: > >2: 7428.. 33023: 227205348.. 227230943: 25596: 227169604: > >3:33024.. 33027: 227169604.. 227169607: 4: 227230944: > >4:33028.. 53007: 227271164.. 227291143: 19980: 227169608: > >5:53008.. 53011: 227230948.. 227230951: 4: 227291144: > >6:53012.. 72991: 227291148.. 227311127: 19980: 227230952: > >7:72992.. 72995: 227230952.. 227230955: 4: 227311128: > >8:72996.. 86291: 227311132.. 227324427: 13296: 227230956: > > last,eof > > linux-4.5-rc6.tar.xz: 9 extents found > > > > Now I'm like ¯\(ツ)/¯ > > Yeah, after sync, I also get this file layout. OK...I think I've found why we get this weird layout, it's because btrfs applies COW for overwrites while ext4 just updates it in place. Here is my filefrag output after sync, # !filefrag filefrag -vb /mnt/btrfs/linux-4.5-rc6.tar.xz Filesystem type is: 9123683e File size of /mnt/btrfs/linux-4.5-rc6.tar.xz is 88362576 (86292 blocks, blocksize 1024) ext logical physical expected length flags 0 0123525020 150201737617372 4 25024 13350417380 30908 3 35932 195296 164412 4 4 35936 164416 195300 30876 5 66812 195300 195292 19480 eof /mnt/btrfs/linux-4.5-rc6.tar.xz: 6 extents found And the output of btrfs_dirty_pages, I grep for the first 4k single extent, # trace-cmd report -i /tmp/trace.dat | grep "dirty_page" | grep $((5020 << 10)) -A 2 -B 2 wget-29482 [003] 783746.039682: bprint: btrfs_dirty_pages: page start 5124096 end 5132287 wget-29482 [003] 783746.039771: bprint: btrfs_dirty_pages: page start 5128192 end 5144575 wget-29482 [003] 783746.263238: bprint: btrfs_dirty_pages: page start 5140480 end 5148671 wget-29482 [003] 783746.263304: bprint: btrfs_dirty_pages: page start 5144576 end 5160959 wget-29482 [003] 783746.263546: bprint: btrfs_dirty_pages: page start 5156864 end 5165055 So it turns out to be that wget writes the data as an overlapped way, extent [5140480, 4096) is written twice, and the second write to the extent can trigger a COW write when the first write to the extent has finish the endio. With mount -onodatacow, # !filefrag filefrag -vb /mnt/btrfs/linux-4.5-rc6.tar.xz Filesystem type is: 9123683e File size of /mnt/btrfs/linux-4.5-rc6.tar.xz is 88362576 (86292 blocks, blocksize 1024) ext logical physical expected length flags 0 0124165292 15292 13350417708 35872 2 41164 169376 30880 3 72044 200256 14248 eof /mnt/btrfs/linux-4.5-rc6.tar.xz: 2 extents found Anyway it's not due to any btrfs allocator bug (although I was thinking it was and trying to find it out...). Thanks, -liubo > > > > > With autodefrag the same happens, though it then eventually does the > > merging from 4k -> 256k. I went searching for that hardcoded 256k value > > and found it as default in ioctl.c:btrfs_defrag_file() when no threshold > > has been passed, as is the case for autodefrag. I'll try to increase that > >
Re: btrfs-progs: btrfs convert v2
Vytautas D wrote on 2016/03/03 15:33 +: Hi Qu, thanks for your work on btrfs convert. Does your btrfs convert work ( namely https://www.spinics.net/lists/linux-btrfs/msg49719.html patches ) change any caveats described here -> https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3#Before_first_use NO caveats is changed. The only visible change is, after convert, metadata are all in SINGLE profile other than DUP. The rework is aimed to fix the chunk type mismatch problem. (A lot of codes just for a workaround-able problem...) i.e. does larger than 1 gb files still needs defrag after which rollback is not possible ? Not sure why 1GB files needs defrag. I assume it's related to ext* block group size, the default size would be 128M for each block group, and since each block group will have metadata as its beginning, splitting large files into 120+M parts. Thans, Qu Thanks, Vytas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problems mounting subvolumes using nfs
On Fri, Mar 04, 2016 at 12:16:04AM +, niya levi wrote: > > i have a luks encrypted btrfs fileserver and have created this subvolume > structure on the server for my samba shares > > sambatop level > ---|homesubvolume > ---|---|gensubvolume > ---|---|---|s4usersubvolume > ---|profilesubvolume > ---|print_driversubvolume > > these are my export configurations > > cat /etc/exports > # /etc/exports - exports(5) - directories exported to NFS clients > # > /samba/home > 192.168.1.0/24(rw,no_subtree_check,no_root_squash,sync,sec=none:sys:krb5:krb5i:krb5p,insecure) > /samba/profiles > 192.168.1.0/24(rw,no_subtree_check,no_root_squash,sync,sec=none:sys:krb5:krb5i:krb5p,insecure) > /samba/Printer_drivers > 192.168.1.0/24(rw,no_subtree_check,no_root_squash,sync,sec=none:sys:krb5:krb5i:krb5p,insecure) > > cat /etc/fstab > # /etc/fstab: static file system information > # > # encryped samba home partition > /dev/mapper/samba_crypt /samba btrfsdefaults0 0 > > # mount > /dev/mapper/samba_crypt on /samba type btrfs (rw,relatime,space_cache) > > # exportfs > exportfs > /samba/home 192.168.1.0/24 > /samba/profiles 192.168.1.0/24 > /samba/Printer_drivers192.168.1.0/24 I don't know if this is related to your problem, but one thing you need to do here is specify different fsid fields for each exported subvol on the same FS. If you don't, NFS can't tell which subvol was intended, and will export the same subvol for each one. So: /samba/home -fsid=0x1729 192.168.1.0/24 /samba/profiles -fsid=0x172a 192.168.1.0/24 /samba/Printer_drivers-fsid=0x172b 192.168.1.0/24 I suspect that you may not be able to export nested subvols, either (which is probably your conclusion below). I haven't actually tested that theory, though. (I don't tend to use nested subvols at all; they're more trouble than they're worth in most cases, in my experience). I would suggest mounting, or bind-mounting, to /samba/$whatever every individual subvol that you want to be able to export, and putting in a separate export entry for it in /etc/exports. Hugo. > # btrfs subvolume list /samba > ID 257 gen 266 top level 5 path Printer_drivers > ID 258 gen 292 top level 5 path home > ID 259 gen 291 top level 5 path profiles > ID 261 gen 295 top level 258 path home/.snapshots > ID 262 gen 28 top level 261 path home/.snapshots/1/snapshot > ID 267 gen 262 top level 258 path home/gen > ID 268 gen 293 top level 267 path home/gen/s4user > ID 275 gen 61 top level 261 path home/.snapshots/12/snapshot > ID 278 gen 295 top level 268 path home/gen/s4user/.snapshots > ID 280 gen 72 top level 278 path home/gen/s4user/.snapshots/1/snapshot > ID 321 gen 143 top level 261 path home/.snapshots/36/snapshot > ID 322 gen 144 top level 278 path home/gen/s4user/.snapshots/22/snapshot > > on the server using autofs > i can mount /samba/profile and /samba/print_driver > i can also mount /samba/home and enter the gen and gen/s4user subvolumes > i cannot mount /samba/home/gen or /samba/home/gen/s4user > from what i can gather the problem is todo with the subvolume layout of > /samba/home/gen and /samba/home/gen/s4user or the export of /samba/home > or with the automount specification in auto.master or auto.samba_home > any pointers to getting the mounts to work would be appreciated > > # uname -a > Linux tardis 4.1.18-1-ARCH #1 SMP Sat Feb 20 17:48:11 MST 2016 armv7l > GNU/Linux > > # btrfs --version > btrfs-progs v4.4 > > # btrfs fi show > Label: 'smbhome' uuid: 1cec48c5-a2c5-490f-8e9b-63f325721169 > Total devices 1 FS bytes used 1.39MiB > devid1 size 149.05GiB used 2.02GiB path /dev/mapper/samba_crypt > > # btrfs fi df /samba > Data, single: total=8.00MiB, used=256.00KiB > System, DUP: total=8.00MiB, used=16.00KiB > Metadata, DUP: total=1.00GiB, used=1.12MiB > GlobalReserve, single: total=16.00MiB, used=0.00B > > cat auto.master > /samba/home /etc/autofs/auto.samba_home > /samba/profiles /etc/autofs/auto_profiles > /- /etc/autofs/auto.prindrvs > +dir:/etc/autofs/auto.master.d > +auto.master > > # cat auto_profiles > * -fstype=nfs,rw,nosuid,hard,sec=sys,vers=3 > my.server.my.domain:/samba/profiles/& > > # cat auto.prindrvs > /samba/Printer_drivers -fstype=nfs,rw,nosuid,hard,sec=sys,vers=3 > my.server.my.domain:/samba/Printer_drivers > > # cat auto.samba_home > * -fstype=nfs,rw,nosuid,hard,vers=3,sec=sys > my.server.my.domain:/samba/home/& > > # ls -al /samba/home/gen/s4user > ls: cannot access /samba/home/gen/s4user: No such file or directory > > # journalctl -f -u autofs > handle_packet: type = 3 > handle_packet_missing_indirect: token 53, name gen, request pid 22842 > attempting to mount entry /samba/home/gen > lookup_mount: lookup(file): looking up gen > lookup_mount: lookup(file): gen -> > -fstype=nfs,rw,nosuid,hard,vers=3,sec=sys
problems mounting subvolumes using nfs
i have a luks encrypted btrfs fileserver and have created this subvolume structure on the server for my samba shares sambatop level ---|homesubvolume ---|---|gensubvolume ---|---|---|s4usersubvolume ---|profilesubvolume ---|print_driversubvolume these are my export configurations cat /etc/exports # /etc/exports - exports(5) - directories exported to NFS clients # /samba/home 192.168.1.0/24(rw,no_subtree_check,no_root_squash,sync,sec=none:sys:krb5:krb5i:krb5p,insecure) /samba/profiles 192.168.1.0/24(rw,no_subtree_check,no_root_squash,sync,sec=none:sys:krb5:krb5i:krb5p,insecure) /samba/Printer_drivers 192.168.1.0/24(rw,no_subtree_check,no_root_squash,sync,sec=none:sys:krb5:krb5i:krb5p,insecure) cat /etc/fstab # /etc/fstab: static file system information # # encryped samba home partition /dev/mapper/samba_crypt /samba btrfsdefaults0 0 # mount /dev/mapper/samba_crypt on /samba type btrfs (rw,relatime,space_cache) # exportfs exportfs /samba/home 192.168.1.0/24 /samba/profiles 192.168.1.0/24 /samba/Printer_drivers192.168.1.0/24 # btrfs subvolume list /samba ID 257 gen 266 top level 5 path Printer_drivers ID 258 gen 292 top level 5 path home ID 259 gen 291 top level 5 path profiles ID 261 gen 295 top level 258 path home/.snapshots ID 262 gen 28 top level 261 path home/.snapshots/1/snapshot ID 267 gen 262 top level 258 path home/gen ID 268 gen 293 top level 267 path home/gen/s4user ID 275 gen 61 top level 261 path home/.snapshots/12/snapshot ID 278 gen 295 top level 268 path home/gen/s4user/.snapshots ID 280 gen 72 top level 278 path home/gen/s4user/.snapshots/1/snapshot ID 321 gen 143 top level 261 path home/.snapshots/36/snapshot ID 322 gen 144 top level 278 path home/gen/s4user/.snapshots/22/snapshot on the server using autofs i can mount /samba/profile and /samba/print_driver i can also mount /samba/home and enter the gen and gen/s4user subvolumes i cannot mount /samba/home/gen or /samba/home/gen/s4user from what i can gather the problem is todo with the subvolume layout of /samba/home/gen and /samba/home/gen/s4user or the export of /samba/home or with the automount specification in auto.master or auto.samba_home any pointers to getting the mounts to work would be appreciated # uname -a Linux tardis 4.1.18-1-ARCH #1 SMP Sat Feb 20 17:48:11 MST 2016 armv7l GNU/Linux # btrfs --version btrfs-progs v4.4 # btrfs fi show Label: 'smbhome' uuid: 1cec48c5-a2c5-490f-8e9b-63f325721169 Total devices 1 FS bytes used 1.39MiB devid1 size 149.05GiB used 2.02GiB path /dev/mapper/samba_crypt # btrfs fi df /samba Data, single: total=8.00MiB, used=256.00KiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=1.12MiB GlobalReserve, single: total=16.00MiB, used=0.00B cat auto.master /samba/home /etc/autofs/auto.samba_home /samba/profiles /etc/autofs/auto_profiles /- /etc/autofs/auto.prindrvs +dir:/etc/autofs/auto.master.d +auto.master # cat auto_profiles * -fstype=nfs,rw,nosuid,hard,sec=sys,vers=3 my.server.my.domain:/samba/profiles/& # cat auto.prindrvs /samba/Printer_drivers -fstype=nfs,rw,nosuid,hard,sec=sys,vers=3 my.server.my.domain:/samba/Printer_drivers # cat auto.samba_home * -fstype=nfs,rw,nosuid,hard,vers=3,sec=sys my.server.my.domain:/samba/home/& # ls -al /samba/home/gen/s4user ls: cannot access /samba/home/gen/s4user: No such file or directory # journalctl -f -u autofs handle_packet: type = 3 handle_packet_missing_indirect: token 53, name gen, request pid 22842 attempting to mount entry /samba/home/gen lookup_mount: lookup(file): looking up gen lookup_mount: lookup(file): gen -> -fstype=nfs,rw,nosuid,hard,vers=3,sec=sys my.server.my.domain:/samba/home/& parse_mount: parse(sun): expanded entry: -fstype=nfs,rw,nosuid,hard,vers=3,sec=sys my.server.my.domain:/samba/home/gen parse_mount: parse(sun): gathered options: fstype=nfs,rw,nosuid,hard,vers=3,sec=sys parse_mount: parse(sun): dequote("my.server.my.domain:/samba/home/gen") -> my.server.my.domain:/samba/home/gen parse_mount: parse(sun): core of entry: options=fstype=nfs,rw,nosuid,hard,vers=3,sec=sys, loc=my.server.my.domain:/samba/home/gen sun_mount: parse(sun): mounting root /samba/home, mountpoint gen, what my.server.my.domain:/samba/home/gen, fstype nfs, options rw,nosuid,hard,vers=3,sec=sys mount_mount: mount(nfs): root=/samba/home name=gen what=my.server.my.domain:/samba/home/gen, fstype=nfs, options=rw,nosuid,hard,vers=3,sec=sys mount_mount: mount(nfs): nfs options="rw,nosuid,hard,vers=3,sec=sys", nobind=0, nosymlink=0, ro=0 mount_mount: mount(nfs): calling mkdir_path /samba/home/gen mount_mount: mount(nfs): calling mount -t nfs -s -o rw,nosuid,hard,vers=3,sec=sys my.server.my.domain:/samba/home/gen /samba/home/gen >> mount.nfs: access denied by server while mounting my.server.my.domain:/samba/home/gen mount(nfs): nfs: mount
Byte comparison before hashing
First of all: Thanks for the great work on btrfs! How about taking only few bytes of data from each block and compare those, before attempting to hash? This way you could quickly discard blocks with really no similar data and hash only ones that matched the "first look" byte comparison? This could lead to less CPU consumption, since hashing is an intense operation, and to less memory footprint, since you don't need to store all those non-dedupable hashes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[btrfs-progs suspected BUG] Scrub info with Ubuntu-style subvol-as-root setup
Hi, Setup: btrfs root but Ubuntu style: /@/with root=@ as a kernel boot option. Been using it happily for years; not even sure if Ubuntu still uses this system for btrfs installs. btrfs fs is a partition on my GPT HDD. My fstab contains subvol=@ as option on the btrfs line, and adding/subtracting compress,autodefrag make no difference to this. >From Linux 4.4, I think, "btrfs scrub status" reports: "scrub status for fdd6a335-6edf---102cc8f5 no stats available total bytes scrubbed: 0.00B with 0 errors" at all times including after finishing. I think it must be to do with the subvol mounted as root; pretty sure that The scrub appears to be doing something, but no info is available. I haven't investigated much, it's not a show-stopper. I have waited until 4.4.1 btrfs-progs release before reporting, JIC. Just to be clear: I've been using this setup for installs on everything from Debian, Arch, Exherbo, CentOS, CoreOS (!), Gentoo, Funtoo .. without problems. I've never actually used Ubuntu - knowingly! Is there a heads-up about using this subvol-as-root setup I've missed? Does anyone else use it? It's handy for rollbacks. Thank you all. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stray 4k extents with slow buffered writes
On Thu, Mar 03, 2016 at 10:50:58PM +0100, Holger Hoffstätte wrote: > On 03/03/16 21:47, Austin S. Hemmelgarn wrote: > >> $mount | grep sdf > >> /dev/sdf1 on /mnt/usb type btrfs > >> (rw,relatime,space_cache=v2,subvolid=5,subvol=/) > > Do you still see the same behavior with the old space_cache format? > > This appears to be an issue of space management and allocation, so > > this may be playing a part. > > I just did the clear_cache,space_cache=v1 dance. Now a download with > bandwidth-limit=1M, dirty_expire=20s, commit=30 and *no* autodefrag > first ended up looking like this: > > $filefrag -ek linux-4.5-rc6.tar.xz > Filesystem type is: 9123683e > File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes) > ext: logical_offset:physical_offset: length: expected: flags: >0:0..7427: 227197920.. 227205347: 7428: >1: 7428.. 33027: 227205348.. 227230947: 25600: >2:33028.. 53011: 227271164.. 227291147: 19984: 227230948: >3:53012.. 72995: 227291148.. 227311131: 19984: >4:72996.. 86291: 227311132.. 227324427: 13296: last,eof > linux-4.5-rc6.tar.xz: 2 extents found > > Yay! But wait, there's more! > > $sync > $filefrag -ek linux-4.5-rc6.tar.xz > Filesystem type is: 9123683e > File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes) > ext: logical_offset:physical_offset: length: expected: flags: >0:0..7423: 227197920.. 227205343: 7424: >1: 7424..7427: 227169600.. 227169603: 4: 227205344: >2: 7428.. 33023: 227205348.. 227230943: 25596: 227169604: >3:33024.. 33027: 227169604.. 227169607: 4: 227230944: >4:33028.. 53007: 227271164.. 227291143: 19980: 227169608: >5:53008.. 53011: 227230948.. 227230951: 4: 227291144: >6:53012.. 72991: 227291148.. 227311127: 19980: 227230952: >7:72992.. 72995: 227230952.. 227230955: 4: 227311128: >8:72996.. 86291: 227311132.. 227324427: 13296: 227230956: last,eof > linux-4.5-rc6.tar.xz: 9 extents found > > Now I'm like ¯\(ツ)/¯ Yeah, after sync, I also get this file layout. > > With autodefrag the same happens, though it then eventually does the > merging from 4k -> 256k. I went searching for that hardcoded 256k value > and found it as default in ioctl.c:btrfs_defrag_file() when no threshold > has been passed, as is the case for autodefrag. I'll try to increase that > and see how much I can destroy. > > Also, rsync with --bwlimit=1m does _not_ seem to create files like this: > > $rsync (..) > $filefrag -ek linux-4.4.4.tar.bz2 > Filesystem type is: 9123683e > File size of linux-4.4.4.tar.bz2 is 105008928 (102548 blocks of 1024 bytes) > ext: logical_offset:physical_offset: length: expected: flags: >0:0..4095: 227197920.. 227202015: 4096: >1: 4096.. 25599: 227202016.. 227223519: 21504: >2:25600.. 51199: 227271164.. 227296763: 25600: 227223520: >3:51200.. 76799: 227296764.. 227322363: 25600: >4:76800.. 102547: 227322364.. 227348111: 25748: last,eof > linux-4.4.4.tar.bz2: 2 extents found > > Which looks exactly as one would expect, probably - as Chris' mail > just explained - it doesn't use O_APPEND, whereas wget apparently does. Interesting, my strace log shows wget doesn't open the file with O_APPEND. open("linux-4.5-rc6.tar.xz", O_WRONLY|O_CREAT|O_EXCL, 0666) = 4 Thanks, -liubo > > > I'd be somewhat curious to see if something similar happens on other > > filesystems with such low writeback timeouts. My thought in this > > case is that the issue is that BTRFS's allocator isn't smart enough > > to try and merge new extents into existing ones when possible. > > ext4 creates 1-2 extents, regardless of method. > > Holger > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stray 4k extents with slow buffered writes
On 03/03/16 21:47, Austin S. Hemmelgarn wrote: >> $mount | grep sdf >> /dev/sdf1 on /mnt/usb type btrfs >> (rw,relatime,space_cache=v2,subvolid=5,subvol=/) > Do you still see the same behavior with the old space_cache format? > This appears to be an issue of space management and allocation, so > this may be playing a part. I just did the clear_cache,space_cache=v1 dance. Now a download with bandwidth-limit=1M, dirty_expire=20s, commit=30 and *no* autodefrag first ended up looking like this: $filefrag -ek linux-4.5-rc6.tar.xz Filesystem type is: 9123683e File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0..7427: 227197920.. 227205347: 7428: 1: 7428.. 33027: 227205348.. 227230947: 25600: 2:33028.. 53011: 227271164.. 227291147: 19984: 227230948: 3:53012.. 72995: 227291148.. 227311131: 19984: 4:72996.. 86291: 227311132.. 227324427: 13296: last,eof linux-4.5-rc6.tar.xz: 2 extents found Yay! But wait, there's more! $sync $filefrag -ek linux-4.5-rc6.tar.xz Filesystem type is: 9123683e File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0..7423: 227197920.. 227205343: 7424: 1: 7424..7427: 227169600.. 227169603: 4: 227205344: 2: 7428.. 33023: 227205348.. 227230943: 25596: 227169604: 3:33024.. 33027: 227169604.. 227169607: 4: 227230944: 4:33028.. 53007: 227271164.. 227291143: 19980: 227169608: 5:53008.. 53011: 227230948.. 227230951: 4: 227291144: 6:53012.. 72991: 227291148.. 227311127: 19980: 227230952: 7:72992.. 72995: 227230952.. 227230955: 4: 227311128: 8:72996.. 86291: 227311132.. 227324427: 13296: 227230956: last,eof linux-4.5-rc6.tar.xz: 9 extents found Now I'm like ¯\(ツ)/¯ With autodefrag the same happens, though it then eventually does the merging from 4k -> 256k. I went searching for that hardcoded 256k value and found it as default in ioctl.c:btrfs_defrag_file() when no threshold has been passed, as is the case for autodefrag. I'll try to increase that and see how much I can destroy. Also, rsync with --bwlimit=1m does _not_ seem to create files like this: $rsync (..) $filefrag -ek linux-4.4.4.tar.bz2 Filesystem type is: 9123683e File size of linux-4.4.4.tar.bz2 is 105008928 (102548 blocks of 1024 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0..4095: 227197920.. 227202015: 4096: 1: 4096.. 25599: 227202016.. 227223519: 21504: 2:25600.. 51199: 227271164.. 227296763: 25600: 227223520: 3:51200.. 76799: 227296764.. 227322363: 25600: 4:76800.. 102547: 227322364.. 227348111: 25748: last,eof linux-4.4.4.tar.bz2: 2 extents found Which looks exactly as one would expect, probably - as Chris' mail just explained - it doesn't use O_APPEND, whereas wget apparently does. > I'd be somewhat curious to see if something similar happens on other > filesystems with such low writeback timeouts. My thought in this > case is that the issue is that BTRFS's allocator isn't smart enough > to try and merge new extents into existing ones when possible. ext4 creates 1-2 extents, regardless of method. Holger -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: incomplete conversion to RAID1?
Hi Duncan, > Of course either way assumes you don't run into some bug that will > prevent removal of that chunk, perhaps exactly the same one that kept it > from being removed during the normal raid1 conversion. If that happens, > the devs may well be interested in tracking it down, as I'm not aware of > anything similar being posted to the list. I've made up-to-date backups of this volume. Is one of these two methods more likely to trigger a potential bug? Also, this potential bug, if it's not just cosmetic wouldn't silently corrupt something in my pool, right? It's when things won't fail loudly and immediately that concerns me, but if that's not an issue then I'd prefer to try to gather potentially useful data. Thanks again for such a great, and super informative reply. I've been swamped with work so haven't finished replying to your last one (Re: btrfs-progs4.4 with linux-3.16.7 (with truncation of extends patch), Fri, 05 Feb 2016 21:58:26 -0800). To briefly reply: The last 3.5 years I've spent countless hours reading everything I could find on btrfs and zfs, and I chose to start testing btrfs in the fall of 2015. Currently I'm working on a major update of the Debian wiki btrfs page, I plan to package kdave's btrfsmaintenance scripts, and additionally publish some convenience scripts I use to make staying up-to-date with one's preferred LTS kernel a two-command affair. One thing I'd like to see on btrfs.wiki.kernel.org is an "at a glance" table of ranked btrfs features, according to riskiness. Say: 1) Safest configuration; keep backups, as always, just in case. 2) Features that might causes issues or that only occasionally trigger issues. 3) Still very experimental; only people who intend to help with development and debugging should use these. 4) Risk of corrupted data, your backups are useless. The benefit is then all distributions' wikis could point to this table. I've read OpenSuSE has patches to disable features in at least 3), and 4), and maybe in 2), so maybe it wouldn't be useful for them...but for everyone else... :-) Also, I think that it would be neat to have a list of subtle bugs that could benefit from more people trying to find them, and also a list of stuff to test that will provide the data necessary to help fix the "btrfs pools need to be babysit" issues I've read so often about. I'm not really able to understand anything more complex than a simple utility program, so the most I can help out with is writing reports, documentation, packaging, and some distribution integration stuff. I'll send more questions in our other thread wrt to updating the Debian wiki next week. It will be a bunch of stuff like "Does btrfs send > to a file count as a backup as of linux-4.4.x, or should you still be using another method?" Kind regards, Nicholas On 3 March 2016 at 00:53, Duncan <1i5t5.dun...@cox.net> wrote: > Nicholas D Steeves posted on Wed, 02 Mar 2016 20:25:46 -0500 as excerpted: > >> btrfs fi show >> Label: none uuid: 2757c0b7-daf1-41a5-860b-9e4bc36417d3 >> Total devices 2 FS bytes used 882.28GiB >> devid1 size 926.66GiB used 886.03GiB path /dev/sdb1 >> devid2 size 926.66GiB used 887.03GiB path /dev/sdc1 >> >> But this is what's troubling: >> >> btrfs fi df /.btrfs-admin/ >> Data, RAID1: total=882.00GiB, used=880.87GiB >> Data, single: total=1.00GiB, used=0.00B >> System, RAID1: total=32.00MiB, used=160.00KiB >> Metadata, RAID1: total=4.00GiB, used=1.41GiB >> GlobalReserve, single: total=496.00MiB, used=0.00B >> >> Do I still have 1.00GiB that isn't in RAID1? > > You have a 1 GiB empty data chunk still in single mode, explaining both > the extra line in btrfs fi df, and the 1 GiB discrepancy between the two > device usage values in btrfs fi show. > > It's empty, so it contains no data or metadata, and is thus more a > "cosmetic oddity" than a real problem, but wanting to be rid of it is > entirely understandable, and I'd want it gone as well. =:^) > > Happily, it should be easy enough to get rid of using balance filters. > There are at least a two such filters that should do it, so take your > pick. =:^) > > btrfs balance start -dusage=0 > > This is the one I normally use. -d is of course for data chunks. usage=N > says only balance chunks with less than or equal to N% usage, this > normally being used as a quick way to combine several partially used > chunks into fewer chunks, releasing the space from the reclaimed chunks > back to unallocated. Of course usage=0 means only deal with fully empty > chunks, so they don't have to be rewritten at all and can be directly > reclaimed. > > This used to be needed somewhat often, as until /relatively/ recent > kernels (tho a couple years ago now, 3.17 IIRC), btrfs wouldn't > automatically reclaim those chunks as it usually does now, and a manual > balance had to be done to reclaim them. Btrfs normally reclaims those on > its own now, but probably missed that one somewhere in your conversion >
Re: Stray 4k extents with slow buffered writes
On Thu, Mar 03, 2016 at 01:28:29PM +0100, Holger Hoffstätte wrote: > > Here's an observation that is not a bug (as in data corruption), just > somewhat odd and unnecessary behaviour. It could be considered a > performance or scalability bug. > > I've noticed that slow slow buffered writes create a huge number of > unnecessary 4k sized extents. At first I wrote it off as odd buffering > behaviour of the application (a download manager), but it can be easily > reproduced. For example: We saw this here with slowly appending log files. The basic problem is the VM triggers dirty writeback on the tail end of the file and either starts on the last page or our clustering code pulls in the last page. It leads to latencies because we start writing the last page in the file to disk and the application has to wait for the IO to finish before it can append to the file again. I'll get our patches in for the next merge window, it's basically just: don't write the last incomplete page in the file if we're O_APPEND and it isn't a data integrity writeback. We may want to rework it to drop the O_APPEND check. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stray 4k extents with slow buffered writes
On 2016-03-03 14:53, Holger Hoffstätte wrote: On 03/03/16 19:33, Liu Bo wrote: On Thu, Mar 03, 2016 at 01:28:29PM +0100, Holger Hoffstätte wrote: (..) I've noticed that slow slow buffered writes create a huge number of unnecessary 4k sized extents. At first I wrote it off as odd buffering behaviour of the application (a download manager), but it can be easily reproduced. For example: On a new fresh btrfs, I cannot reproduce the fragmented layout with "wget --limit-rate=1m", For better effect lower the bandwidth, 100k or so. [root@10-11-17-236 btrfs]# filefrag -v -b linux-4.5-rc6.tar.xz Filesystem type is: 9123683e File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks, blocksize 1024) ext logical physical expected length flags 0 0 1437445264 15264 149008 35884 2 41148 220848 184892 4 So you also have one, after ~35 MB. See below. 3 41152 184896 220852 35948 4 77100 220852 220844 9192 eof linux-4.5-rc6.tar.xz: 4 extents found No sync? filefrag is a notorious liar. ;) It changes things because you likely have a higher value set for vm/dirty_expire_centisecs or dirty_bytes explicitly configured; I have it set to 1000 (10s) to prevent large writebacks from choking everything. The default is probably still 30s aka 3000. Last I looked (about a month ago), the default was still 3000. I understand that I should get smaller extents overall, but not the stray 4k sized ones in regular intervals. Can you gather your mount options and 'btrfs fi show/df' output? I can reproduce that on another machine/drive where it also initially didn't show the 4k extents in a parallel-running filefrag, but did after a sync (when the extents were written). That was surprising. Anyway, it's just an external scratch drive..the mount options really don't matter much: $mount | grep sdf /dev/sdf1 on /mnt/usb type btrfs (rw,relatime,space_cache=v2,subvolid=5,subvol=/) Do you still see the same behavior with the old space_cache format? This appears to be an issue of space management and allocation, so this may be playing a part. $btrfs fi df /mnt/usb Data, single: total=4.00GiB, used=3.31GiB System, single: total=32.00MiB, used=16.00KiB Metadata, single: total=1.00GiB, used=4.45MiB GlobalReserve, single: total=16.00MiB, used=0.00B $btrfs fi show /mnt/usb Label: 'Test' uuid: 1d37a067-5b7d-4dcf-b2c1-7c5745b9c7a5 Total devices 1 FS bytes used 3.32GiB devid1 size 111.79GiB used 5.03GiB path /dev/sdf1 I then remounted with -ocommit=300 and set dirty_expire_centisecs=1 (100s). That results in a single large extent, even after sync, so writeback expiry and commit definitely play a part. Here is what it looks like when both dirty_expire and commit are set to very low 5s: I'd be somewhat curious to see if something similar happens on other filesystems with such low writeback timeouts. My thought in this case is that the issue is that BTRFS's allocator isn't smart enough to try and merge new extents into existing ones when possible. $filefrag -ek linux-4.4.4.tar.bz2 Filesystem type is: 9123683e File size of linux-4.4.4.tar.bz2 is 105008928 (102548 blocks of 1024 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0..5199: 227197920.. 227203119: 5200: 1: 5200..5203: 227169600.. 227169603: 4: 227203120: 2: 5204.. 15407: 227203124.. 227213327: 10204: 227169604: 3:15408.. 20623: 227213332.. 227218547: 5216: 227213328: 4:20624.. 20627: 227169604.. 227169607: 4: 227218548: 5:20628.. 30831: 227218552.. 227228755: 10204: 227169608: 6:30832.. 36047: 227228760.. 227233975: 5216: 227228756: 7:36048.. 36051: 227169608.. 227169611: 4: 227233976: 8:36052.. 41263: 227233980.. 227239191: 5212: 227169612: 9:41264.. 46479: 227271164.. 227276379: 5216: 227239192: 10:46480.. 46483: 227239196.. 227239199: 4: 227276380: 11:46484.. 51695: 227276384.. 227281595: 5212: 227239200: 12:51696.. 61903: 227281600.. 227291807: 10208: 227281596: 13:61904.. 61907: 227239200.. 227239203: 4: 227291808: 14:61908.. 67119: 227291812.. 227297023: 5212: 227239204: 15:67120.. 77327: 227297028.. 227307235: 10208: 227297024: 16:77328.. 77331: 227239204.. 227239207: 4: 227307236: 17:77332.. 82543: 227307240.. 227312451: 5212: 227239208: 18:82544.. 92751: 227312456.. 227322663: 10208: 227312452: 19:92752.. 92755: 227239208.. 227239211: 4: 227322664: 20:92756.. 97967: 227322668.. 227327879: 5212: 227239212: 21:97968.. 102547: 227239212.. 227243791: 4580: 227327880: last,eof linux-4.4.4.tar.bz2: 22 extents found There's definitely a pattern here. What I find particularly interesting here is that
Re: Stray 4k extents with slow buffered writes
On 03/03/16 19:33, Liu Bo wrote: > On Thu, Mar 03, 2016 at 01:28:29PM +0100, Holger Hoffstätte wrote: (..) >> I've noticed that slow slow buffered writes create a huge number of >> unnecessary 4k sized extents. At first I wrote it off as odd buffering >> behaviour of the application (a download manager), but it can be easily >> reproduced. For example: > > On a new fresh btrfs, I cannot reproduce the fragmented layout with "wget > --limit-rate=1m", For better effect lower the bandwidth, 100k or so. > [root@10-11-17-236 btrfs]# filefrag -v -b linux-4.5-rc6.tar.xz > Filesystem type is: 9123683e > File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks, blocksize > 1024) > ext logical physical expected length flags >0 0 1437445264 >15264 149008 35884 >2 41148 220848 184892 4 So you also have one, after ~35 MB. See below. >3 41152 184896 220852 35948 >4 77100 220852 220844 9192 eof > linux-4.5-rc6.tar.xz: 4 extents found No sync? filefrag is a notorious liar. ;) It changes things because you likely have a higher value set for vm/dirty_expire_centisecs or dirty_bytes explicitly configured; I have it set to 1000 (10s) to prevent large writebacks from choking everything. The default is probably still 30s aka 3000. I understand that I should get smaller extents overall, but not the stray 4k sized ones in regular intervals. > Can you gather your mount options and 'btrfs fi show/df' output? I can reproduce that on another machine/drive where it also initially didn't show the 4k extents in a parallel-running filefrag, but did after a sync (when the extents were written). That was surprising. Anyway, it's just an external scratch drive..the mount options really don't matter much: $mount | grep sdf /dev/sdf1 on /mnt/usb type btrfs (rw,relatime,space_cache=v2,subvolid=5,subvol=/) $btrfs fi df /mnt/usb Data, single: total=4.00GiB, used=3.31GiB System, single: total=32.00MiB, used=16.00KiB Metadata, single: total=1.00GiB, used=4.45MiB GlobalReserve, single: total=16.00MiB, used=0.00B $btrfs fi show /mnt/usb Label: 'Test' uuid: 1d37a067-5b7d-4dcf-b2c1-7c5745b9c7a5 Total devices 1 FS bytes used 3.32GiB devid1 size 111.79GiB used 5.03GiB path /dev/sdf1 I then remounted with -ocommit=300 and set dirty_expire_centisecs=1 (100s). That results in a single large extent, even after sync, so writeback expiry and commit definitely play a part. Here is what it looks like when both dirty_expire and commit are set to very low 5s: $filefrag -ek linux-4.4.4.tar.bz2 Filesystem type is: 9123683e File size of linux-4.4.4.tar.bz2 is 105008928 (102548 blocks of 1024 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0..5199: 227197920.. 227203119: 5200: 1: 5200..5203: 227169600.. 227169603: 4: 227203120: 2: 5204.. 15407: 227203124.. 227213327: 10204: 227169604: 3:15408.. 20623: 227213332.. 227218547: 5216: 227213328: 4:20624.. 20627: 227169604.. 227169607: 4: 227218548: 5:20628.. 30831: 227218552.. 227228755: 10204: 227169608: 6:30832.. 36047: 227228760.. 227233975: 5216: 227228756: 7:36048.. 36051: 227169608.. 227169611: 4: 227233976: 8:36052.. 41263: 227233980.. 227239191: 5212: 227169612: 9:41264.. 46479: 227271164.. 227276379: 5216: 227239192: 10:46480.. 46483: 227239196.. 227239199: 4: 227276380: 11:46484.. 51695: 227276384.. 227281595: 5212: 227239200: 12:51696.. 61903: 227281600.. 227291807: 10208: 227281596: 13:61904.. 61907: 227239200.. 227239203: 4: 227291808: 14:61908.. 67119: 227291812.. 227297023: 5212: 227239204: 15:67120.. 77327: 227297028.. 227307235: 10208: 227297024: 16:77328.. 77331: 227239204.. 227239207: 4: 227307236: 17:77332.. 82543: 227307240.. 227312451: 5212: 227239208: 18:82544.. 92751: 227312456.. 227322663: 10208: 227312452: 19:92752.. 92755: 227239208.. 227239211: 4: 227322664: 20:92756.. 97967: 227322668.. 227327879: 5212: 227239212: 21:97968.. 102547: 227239212.. 227243791: 4580: 227327880: last,eof linux-4.4.4.tar.bz2: 22 extents found There's definitely a pattern here. Out of curiosity I also tried the above run with autodefrag enabled, and that helped a little bit: it merges those 4k extents into 256k-sized ones with the adjacent followup extent. That was nice, but still a bit unexpected since we've been told autodefrag is for random writes. It also doesn't really explain the original behaviour. I guess I need to add autodefrag everywhere now. :) Thanks, Holger -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: Stray 4k extents with slow buffered writes
On Thu, Mar 03, 2016 at 01:28:29PM +0100, Holger Hoffstätte wrote: > > Here's an observation that is not a bug (as in data corruption), just > somewhat odd and unnecessary behaviour. It could be considered a > performance or scalability bug. > > I've noticed that slow slow buffered writes create a huge number of > unnecessary 4k sized extents. At first I wrote it off as odd buffering > behaviour of the application (a download manager), but it can be easily > reproduced. For example: > > holger>wget --limit-rate=1m > https://cdn.kernel.org/pub/linux/kernel/v4.x/testing/linux-4.5-rc6.tar.xz > (..downloads with 1 MB/s..) > holger>sync > holger>filefrag -ek linux-4.5-rc6.tar.xz > Filesystem type is: 9123683e > File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes) > ext: logical_offset:physical_offset: length: expected: flags: >0:0.. 14219: 230807476.. 230821695: 14220: >1:14220.. 14223: 230838148.. 230838151: 4: 230821696: >2:14224.. 29215: 230822324.. 230837315: 14992: 230838152: >3:29216.. 44207: 230838152.. 230853143: 14992: 230837316: >4:44208.. 44211: 230869576.. 230869579: 4: 230853144: >5:44212.. 59199: 230853968.. 230868955: 14988: 230869580: >6:59200.. 74191: 230869588.. 230884579: 14992: 230868956: >7:74192.. 74195: 230898332.. 230898335: 4: 230884580: >8:74196.. 86291: 230885620.. 230897715: 12096: 230898336: last,eof > linux-4.5-rc6.tar.xz: 9 extents found > > Slower writes will generate even more extents; another ~200MB file > had >900 extents. > > As expected defragment will collapse these stray extents with their > successors: > > holger>btrfs filesystem defragment linux-4.5-rc6.tar.xz > > holger>sync > holger>filefrag -ek linux-4.5-rc6.tar.xz > Filesystem type is: 9123683e > File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes) > ext: logical_offset:physical_offset: length: expected: flags: >0:0.. 14219: 230807476.. 230821695: 14220: >1:14220.. 29215: 230922128.. 230937123: 14996: 230821696: >2:29216.. 44207: 230838152.. 230853143: 14992: 230937124: >3:44208.. 59199: 230937124.. 230952115: 14992: 230853144: >4:59200.. 74191: 230869588.. 230884579: 14992: 230952116: >5:74192.. 86291: 230952116.. 230964215: 12100: 230884580: last,eof > linux-4.5-rc6.tar.xz: 6 extents found > > The obviously page-sized 4k extents happen to coincide with the 30s tx commit > (2 * ~15 MB at 1 MB/s). It looks like a benign race, as if the last dirty page > gets special treatment instead of being merged into wherever it should go. > That just seems wasteful to me. > > Anyone got an idea? On a new fresh btrfs, I cannot reproduce the fragmented layout with "wget --limit-rate=1m", [root@10-11-17-236 btrfs]# filefrag -v -b linux-4.5-rc6.tar.xz Filesystem type is: 9123683e File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks, blocksize 1024) ext logical physical expected length flags 0 0 1437445264 15264 149008 35884 2 41148 220848 184892 4 3 41152 184896 220852 35948 4 77100 220852 220844 9192 eof linux-4.5-rc6.tar.xz: 4 extents found My mount point has, /dev/loop0 /mnt/btrfs btrfs rw,seclabel,relatime,space_cache,subvolid=5,subvol=/ 0 0 And I'm using 4.5.0-rc4. # btrfs fi show /mnt/btrfs Label: none uuid: 599d68ae-b874-4db1-a3c3-33c4b783bfdd Total devices 1 FS bytes used 94.62MiB devid1 size 2.00GiB used 436.75MiB path /dev/loop0 # btrfs fi df /mnt/btrfs Data, single: total=216.00MiB, used=94.40MiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=102.38MiB, used=208.00KiB GlobalReserve, single: total=16.00MiB, used=0.00B Can you gather your mount options and 'btrfs fi show/df' output? Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: remove call to _need_to_be_root from btrfs/118
On Thu, Mar 03, 2016 at 07:32:57AM +, fdman...@kernel.org wrote: > From: Filipe Manana> > The function _need_to_be_root does not exists anymore as of commit > 56ff01f471c9 ("xfstests: remove _need_to_be_root"). > > A v2 of the patch that added test btrfs/118 without calling this > function was sent but not picked [1], instead v1 was picked. > > So fix this now. > > [1] https://patchwork.kernel.org/patch/8354831/ > > Signed-off-by: Filipe Manana Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: remove call to _need_to_be_root from btrfs/118
From: Filipe MananaThe function _need_to_be_root does not exists anymore as of commit 56ff01f471c9 ("xfstests: remove _need_to_be_root"). A v2 of the patch that added test btrfs/118 without calling this function was sent but not picked [1], instead v1 was picked. So fix this now. [1] https://patchwork.kernel.org/patch/8354831/ Signed-off-by: Filipe Manana --- tests/btrfs/118 | 1 - 1 file changed, 1 deletion(-) diff --git a/tests/btrfs/118 b/tests/btrfs/118 index 3ed1cbe..740cb92 100755 --- a/tests/btrfs/118 +++ b/tests/btrfs/118 @@ -45,7 +45,6 @@ _cleanup() . ./common/dmflakey # real QA test starts here -_need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-convert: document -O|--features flag
On Wed, Mar 02, 2016 at 04:00:28PM +, Vytas Dauksa wrote: > Copy-pasted description found at mkfs.btrfs. I did not bother with > feature list as it seemed to be incomplete. Convert has smaller set of supported features, so it's better to leave the list out and refer to the -O list-all command. Applied, thanks. > --- > Documentation/btrfs-convert.asciidoc | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/Documentation/btrfs-convert.asciidoc > b/Documentation/btrfs-convert.asciidoc > index ca3417f..3ec7d55 100644 > --- a/Documentation/btrfs-convert.asciidoc > +++ b/Documentation/btrfs-convert.asciidoc > @@ -83,6 +83,13 @@ rollback to the original ext2/3/4 filesystem if possible > set filesystem label during conversion > -L|--copy-label:: > use label from the converted filesystem > +-O|--features [,...]:: > +A list of filesystem features turned on at btrfs-convert time. Not all > features > +are supported by old kernels. To disable a feature, prefix it with '^'. > ++ > +To see all available features that btrfs-convert supports run: > ++ > ++btrfs-convert -O list-all+ > -p|--progress:: > show progress of conversion, on by default > --no-progress:: > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-progs: btrfs convert v2
Hi Qu, thanks for your work on btrfs convert. Does your btrfs convert work ( namely https://www.spinics.net/lists/linux-btrfs/msg49719.html patches ) change any caveats described here -> https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3#Before_first_use i.e. does larger than 1 gb files still needs defrag after which rollback is not possible ? Thanks, Vytas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
2016-03-03 6:57 GMT+02:00 Duncan <1i5t5.dun...@cox.net>: > > You're issue isn't the same, because all your space was allocated, > leaving only 1 MiB unallocated, which isn't normally enough to allocate a > new chunk to rewrite the data or metadata from the old chunks into. > > That's a known issue, with known workarounds as dealt with in the FAQ. > Ah, thanks, well it was surprising for me that balance failed with out of space when both data and metadata had not all been used and I thought it could just use space from those... especially as from FAQ: > If there is a lot of allocated but unused data or metadata chunks, > a balance may reclaim some of that allocated space. This is the > main reason for running a balance on a single-device filesystem. so I think regular balance should be smart enough that it could solve this on own and wouldn't need to specify any options. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stray 4k extents with slow buffered writes
Here's an observation that is not a bug (as in data corruption), just somewhat odd and unnecessary behaviour. It could be considered a performance or scalability bug. I've noticed that slow slow buffered writes create a huge number of unnecessary 4k sized extents. At first I wrote it off as odd buffering behaviour of the application (a download manager), but it can be easily reproduced. For example: holger>wget --limit-rate=1m https://cdn.kernel.org/pub/linux/kernel/v4.x/testing/linux-4.5-rc6.tar.xz (..downloads with 1 MB/s..) holger>sync holger>filefrag -ek linux-4.5-rc6.tar.xz Filesystem type is: 9123683e File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0.. 14219: 230807476.. 230821695: 14220: 1:14220.. 14223: 230838148.. 230838151: 4: 230821696: 2:14224.. 29215: 230822324.. 230837315: 14992: 230838152: 3:29216.. 44207: 230838152.. 230853143: 14992: 230837316: 4:44208.. 44211: 230869576.. 230869579: 4: 230853144: 5:44212.. 59199: 230853968.. 230868955: 14988: 230869580: 6:59200.. 74191: 230869588.. 230884579: 14992: 230868956: 7:74192.. 74195: 230898332.. 230898335: 4: 230884580: 8:74196.. 86291: 230885620.. 230897715: 12096: 230898336: last,eof linux-4.5-rc6.tar.xz: 9 extents found Slower writes will generate even more extents; another ~200MB file had >900 extents. As expected defragment will collapse these stray extents with their successors: holger>btrfs filesystem defragment linux-4.5-rc6.tar.xz holger>sync holger>filefrag -ek linux-4.5-rc6.tar.xz Filesystem type is: 9123683e File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0.. 14219: 230807476.. 230821695: 14220: 1:14220.. 29215: 230922128.. 230937123: 14996: 230821696: 2:29216.. 44207: 230838152.. 230853143: 14992: 230937124: 3:44208.. 59199: 230937124.. 230952115: 14992: 230853144: 4:59200.. 74191: 230869588.. 230884579: 14992: 230952116: 5:74192.. 86291: 230952116.. 230964215: 12100: 230884580: last,eof linux-4.5-rc6.tar.xz: 6 extents found The obviously page-sized 4k extents happen to coincide with the 30s tx commit (2 * ~15 MB at 1 MB/s). It looks like a benign race, as if the last dirty page gets special treatment instead of being merged into wherever it should go. That just seems wasteful to me. Anyone got an idea? -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Experimental btrfs encryption
Qu Wenruo cn.fujitsu.com> writes: > > - As of now uses "user" keytype, I am still considering/ > >evaluating other key type such as logon. > > UI things can always be reconsidered later. > Never a big problem. This is not only a UI concern, but an API/ABI concern. To use eCryptFS as an example, on mount(2) you pass the information for it to look up the key in the kernel keyring, and it looks for a key of type "user" - I have personally written code that uses trusted and encrypted keys, and the raw mount(2) call (sans any of the eCryptFS userspace libraries) to mount eCryptFS filesystems sealed to the TPM. If eCryptFS switched to using another key type, my code would cease to work unless the filesystem jumped through hoops to do so in a backwards-compatible way. Similarly, while eCryptFS uses a "user" key, it requires that key have a specific structure - as a result, the encrypted keys support added a key type of "ecryptfs" to create random keys with the appropriate structure, meaning the key type for unencrypted keys and the encrypted key key type field differ. This is surprising and non-obvious, and took some time to figure out in my implementation. The way btrfs encryption interacts with the keyring APIs is important, and "reconsidering later" will potentially represent an API/ABI break. Please keep it in mind. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how many chunk trees and extent trees present
On Fri, Feb 26, 2016 at 09:29:29AM +0800, Qu Wenruo wrote: > > > David Sterba wrote on 2015/04/17 19:29 +0200: > > On Fri, Apr 17, 2015 at 09:19:11AM +, Hugo Mills wrote: > >>> In, some article i read that future there will be more chunk tree/ extent > >>> tree for single btrfs. Is this true. > >> > >> I recall, many moons ago, Chris saying that there probably wouldn't > >> be. > > > > More extent trees tied to a set of fs trees/subvolumes would be very > > useful for certain usecases *cough*encryption*cough*. > > BTW, will such design makes reflink between different set of extents > fallback to normal copy? Yes, the actual reflink will not be possible. > And I'm pretty sure that inband dedup will be affected too... Yes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix loading of orphan roots leading to BUG_ON
On Thu, Mar 3, 2016 at 6:29 AM, Qu Wenruowrote: > > > wrote on 2016/03/02 15:49 +: >> >> From: Filipe Manana >> >> When looking for orphan roots during mount we can end up hitting a >> BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is >> replayed and qgroups are enabled. This is because after a log tree is >> replayed, a transaction commit is made, which triggers qgroup extent >> accounting which in turn does backref walking which ends up reading and >> inserting all roots in the radix tree fs_info->fs_root_radix, including >> orphan roots (deleted snapshots). So after the log tree is replayed, when >> finding orphan roots we hit the BUG_ON with the following trace: >> >> [118209.182438] [ cut here ] >> [118209.183279] kernel BUG at fs/btrfs/root-tree.c:314! >> [118209.184074] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC >> [118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic >> ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm >> psmouse >> processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 >> crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix >> libata >> virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs] >> [118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: GW >> 4.5.0-rc5-btrfs-next-24+ #1 >> [118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS by qemu-project.org 04/01/2014 >> [118209.186318] task: 8801ec131040 ti: 8800af34c000 task.ti: >> 8800af34c000 >> [118209.186318] RIP: 0010:[] [] >> btrfs_find_orphan_roots+0x1fc/0x244 [btrfs] >> [118209.186318] RSP: 0018:8800af34faa8 EFLAGS: 00010246 >> [118209.186318] RAX: ffef RBX: ffef RCX: >> 0001 >> [118209.186318] RDX: 8000 RSI: 0001 RDI: >> >> [118209.186318] RBP: 8800af34fb08 R08: 0001 R09: >> >> [118209.186318] R10: 8800af34f9f0 R11: 6db6db6db6db6db7 R12: >> 880171b97000 >> [118209.186318] R13: 8801ca9d65e0 R14: 8800afa2e000 R15: >> 1600 >> [118209.186318] FS: 7f5bcb914840() GS:88023edc() >> knlGS: >> [118209.186318] CS: 0010 DS: ES: CR0: 8005003b >> [118209.186318] CR2: 7f5bcaceb5d9 CR3: b49b5000 CR4: >> 06e0 >> [118209.186318] Stack: >> [118209.186318] fbff 010230ff 0101 >> ff84 >> [118209.186318] fbff 30ff 0101 >> 880082348000 >> [118209.186318] 8800afa2e000 8800afa2e000 >> >> [118209.186318] Call Trace: >> [118209.186318] [] open_ctree+0x1e37/0x21b9 [btrfs] >> [118209.186318] [] btrfs_mount+0x97e/0xaed [btrfs] >> [118209.186318] [] ? trace_hardirqs_on+0xd/0xf >> [118209.186318] [] mount_fs+0x67/0x131 >> [118209.186318] [] vfs_kern_mount+0x6c/0xde >> [118209.186318] [] btrfs_mount+0x1ac/0xaed [btrfs] >> [118209.186318] [] ? trace_hardirqs_on+0xd/0xf >> [118209.186318] [] ? lockdep_init_map+0xb9/0x1b3 >> [118209.186318] [] mount_fs+0x67/0x131 >> [118209.186318] [] vfs_kern_mount+0x6c/0xde >> [118209.186318] [] do_mount+0x8a6/0x9e8 >> [118209.186318] [] SyS_mount+0x77/0x9f >> [118209.186318] [] entry_SYSCALL_64_fastpath+0x12/0x6b >> [118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 >> 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 >> 02 <0f> 0b >> 4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00 >> [118209.186318] RIP [] >> btrfs_find_orphan_roots+0x1fc/0x244 [btrfs] >> [118209.186318] RSP >> [118209.230735] ---[ end trace 83938f987d85d477 ]--- >> >> So fix this by not treating the error -EEXIST, returned when attempting >> to insert a root already inserted by the backref walking code, as an >> error. >> >> The following test case for xfstests reproduces the bug: >> >>seq=`basename $0` >>seqres=$RESULT_DIR/$seq >>echo "QA output created by $seq" >>tmp=/tmp/$$ >>status=1 # failure is the default! >>trap "_cleanup; exit \$status" 0 1 2 3 15 >> >>_cleanup() >>{ >>_cleanup_flakey >>cd / >>rm -f $tmp.* >>} >> >># get standard environment, filters and checks >>. ./common/rc >>. ./common/filter >>. ./common/dmflakey >> >># real QA test starts here >>_supported_fs btrfs >>_supported_os Linux >>_require_scratch >>_require_dm_target flakey >>_require_metadata_journaling $SCRATCH_DEV >> >>rm -f $seqres.full >> >>_scratch_mkfs >>$seqres.full 2>&1 >>_init_flakey >>_mount_flakey >> >>_run_btrfs_util_prog quota enable $SCRATCH_MNT >> >># Create 2 directories with one file in one of them. >># We use these just to trigger a transaction commit later, moving the
Re: [PATCH] Btrfs: fix loading of orphan roots leading to BUG_ON
On Thu, Mar 3, 2016 at 4:31 AM, Duncan <1i5t5.dun...@cox.net> wrote: > fdmanana posted on Wed, 02 Mar 2016 15:49:38 + as excerpted: > >> When looking for orphan roots during mount we can end up hitting a >> BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is >> replayed and qgroups are enabled. > > This should hit 4.6, right? Will it hit 4.5 before release? It's not the first time you do a similar question, and if it's targeted at me, all I can tell you is I don't know. It's the maintainers (Chris, Josef, David) who decide when to pick patches and for which releases. > > Because I wasn't sure of current quota functionality status, but this bug > obviously resets the counter on my ongoing "two kernel cycles with no > known quota bugs before you try to use quotas" recommendation. You shouldn't spread such affirmation with such a level of certainty every time a user reports a problem. There are many bugs affecting the last 2 to 3 releases, but there are also many bugs present since btrfs was added to the linux kernel tree, and many others present for 2+ years, etc. > > Meanwhile, what /is/ current quota feature status? Other than this bug, > is it now considered known bug free, or is more quota reworking and/or > bug fixing known to be needed for 4.6 and beyond? > > IOW, given that two release cycles no known bugs counter, are we > realistically looking at that being 4.8, or are we now looking at 4.9 or > beyond for reasonable quota stability? I don't know. I generally don't look actively look at qgroups, and I'm not a user either. You can only take conclusions based on user bug reports. Probably there aren't more bugs for qgroups than there are for send/receive or even non-btrfs specific features for example. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix loading of orphan roots leading to BUG_ON
Duncan wrote on 2016/03/03 07:44 +: Qu Wenruo posted on Thu, 03 Mar 2016 14:26:45 +0800 as excerpted: Never heard the 2 release cycles principle, but seems to be not flex enough. From this point of view, every time Filipe(just an example, as he finds the most of bugs and corner case), some part or even the whole btrfs is not stable for 4 months. Well, once a feature is considered stable (relative to the rest of btrfs, anyway), the two-releases-without-bugs counter/clock no longer resets, and it's simply bugs in otherwise stable features as opposed to more bugs in still unstable features, resetting the clock on those features and preventing them from reaching (my measure of) stability. And it's more or less just a general rule of thumb, anyway, with per- feature and per-bug adjustments based on the amount of new code in the feature and how critical the bug is or isn't. But in my experience it /does/ seem a relatively good general rule of thumb level guideline, provided it is taken /as/ a rule of thumb level guideline and not applied too inflexibly. The other factor, however, would be the relative view into bugs... I suppose it's reasonably well known that in practice, one has to be a bit cautious about evaluating the stability of a project by the raw number of scary looking problems reported on the mailing list or bug tracker, in part, because that's what those are /for/, and while they're good at tracking that, they don't normally yield a good picture at all of all the hundreds or thousands or tens of thousands or millions actually using the project without problems at all. By the same token, the developer's viewpoint of a feature is likely to see quite a few more bugs, due to simple familarity with the topic and exposure on multiple channels (IRC/btrfs-list/private-mail/lkml/ filesystems-list/kernel-bugzilla/one-or-more-distro-lists...), than someone like me, a simple user/admin, tracking perhaps one or two of those channels. There's a lot of feature bugs that a feature developer is going to be aware of, that simply won't rise to my level of consciousness. But by the same token, if multiple people are suddenly reporting an issue, as will likely happen for the serious bugs, I'm likely to see one and possibly more reports of it here, and /will/ be aware of it. So what I'm saying is that, at my level of awareness at least, and assuming it is taken as the rule of thumb guideline I intend it as, the two releases without a known bug in a feature /guideline/ has from my experience turned out to be reasonably practical and reliable, tho I'd expect it would not and could not be workable at that level, applied by a feature dev, because by definition, that feature dev is going to know about way more bugs in the feature than I will, and as you said, applying it in that case would mean the feature /never/ stabilizes. Does that make more sense, now? Make sense now. Now back to the specific feature in question, btrfs quotas. Thanks for affirming that they are in general considered workable now. I'll mention that as developer perspective when I make my recommendations, and it will certainly positively influence my own recommendations, as well, tho I'll mention that there are still corner-case bugs being worked out, so recommend following quota related discussion and patches on the list for now as well. But considering it ready for normal use is already beyond what I felt ready to recommend before, so it's already a much more positive recommendation now that previously, even if it's still tempered with "but keep up with the list discussion and current on your kernels, and be aware there are still occasional corner-cases being worked out" as a caveat, which it should be said, is only slightly stronger than the general recommendation for btrfs itself. Thanks for your recommendation about qgroup, I'm also seeking some feedback from end users to either spot some corner case or enhance UI related design. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html