[PATCH] btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail
From: Zhao Lei zhao...@cn.fujitsu.com When read_tree_block() failed, we can see following dmesg: [ 134.371389] BUG: unable to handle kernel NULL pointer dereference at 0063 [ 134.372236] IP: [813a4a51] free_extent_buffer+0x21/0x90 [ 134.372236] PGD 0 [ 134.372236] Oops: [#1] SMP [ 134.372236] Modules linked in: [ 134.372236] CPU: 0 PID: 2289 Comm: mount Not tainted 4.2.0-rc1_HEAD_c65b99f046843d2455aa231747b5a07a999a9f3d_+ #115 [ 134.372236] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 134.372236] task: 88003b6e1a00 ti: 880011e6 task.ti: 880011e6 [ 134.372236] RIP: 0010:[813a4a51] [813a4a51] free_extent_buffer+0x21/0x90 ... [ 134.372236] Call Trace: [ 134.372236] [81379aa1] free_root_extent_buffers+0x91/0xb0 [ 134.372236] [81379c3d] free_root_pointers+0x17d/0x190 [ 134.372236] [813801b0] open_ctree+0x1ca0/0x25b0 [ 134.372236] [8144d017] ? disk_name+0x97/0xb0 [ 134.372236] [813558aa] btrfs_mount+0x8fa/0xab0 ... Reason: read_tree_block() changed to return error number on fail, and this value(not NULL) is set to tree_root-node, then subsequent code will run to: free_root_pointers() -free_root_extent_buffers() -free_extent_buffer() -atomic_read((extent_buffer *)(-E_XXX)-refs); and trigger above error. Fix: Set tree_root-node to NULL on fail to make error_handle code happy. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a9aadb2..f556c37 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2842,6 +2842,7 @@ int open_ctree(struct super_block *sb, !extent_buffer_uptodate(chunk_root-node)) { printk(KERN_ERR BTRFS: failed to read chunk root on %s\n, sb-s_id); + chunk_root-node = NULL; goto fail_tree_roots; } btrfs_set_root_node(chunk_root-root_item, chunk_root-node); @@ -2879,7 +2880,7 @@ retry_root_backup: !extent_buffer_uptodate(tree_root-node)) { printk(KERN_WARNING BTRFS: failed to read tree root on %s\n, sb-s_id); - + tree_root-node = NULL; goto recovery_tree_root; } -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I'd like a -r flag on btrfs subvolume delete
On 16 July 2015 at 11:35, Chris Murphy li...@colorremedies.com wrote: On Wed, Jul 15, 2015 at 6:11 PM, Johannes Ernst johannes.er...@gmail.com wrote: Cleaning this all up is a bit of pain, and btrfs subvolume delete -r dir would solve it nicely. [snip] How is all of this backed up properly? How is it restored properly? I think recursive snapshotting and subvolume deletion is not a good idea. I think it's a complicated and inelegant work around for improper subvolume organization. I for one would love to see authoritative documentation on proper subvolume organization. I was completely lost when writing snazzer and have so far received very little guidance or even offers of opinions on this ML. I've had to create my own logic in my scripts that automatically walk all subvolumes on all filesystems for the simple reason that explicitly enumerating it all for dozens of servers becomes a significant administration burden. I have different retention needs for /var (particularly /var/cache) than I do for /home, for example, so carving up my snapshots so that I can easily drop them from those parts of my filesystems which have a high churn rate (= more unique extents, occupying a lot of disk) and yet aren't as important (I need to retain fewer of them) is very useful. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I'd like a -r flag on btrfs subvolume delete
On Wed, Jul 15, 2015 at 9:12 PM, Paul Harvey csir...@gmail.com wrote: On 16 July 2015 at 11:35, Chris Murphy li...@colorremedies.com wrote: How is all of this backed up properly? How is it restored properly? I think recursive snapshotting and subvolume deletion is not a good idea. I think it's a complicated and inelegant work around for improper subvolume organization. I for one would love to see authoritative documentation on proper subvolume organization. The choice of improper wasn't ideal on my part. There's nothing directly wrong with nested subvolumes. But if you then combine them with snapshots and rollbacks, there are consequences that include more complication. If more than one things is doing snapshots and rollbacks, it requires some rules as to who can snapshot what and where those things go in order to avoid being snapshot again by some other tool, and then how things get reassembled. There are different kinds of rollbacks so that needs some rules or it'll just lead to confusion. I was completely lost when writing snazzer and have so far received very little guidance or even offers of opinions on this ML. A couple of developers have suggested the folly of nested subvolumes several times. Discovering the consequences of organizing subvolumes is a work in progress. I've mentioned a couple times over the years that distros are inevitably going to end up with fragmented and mutually incompatible approaches if they don't actively engage each other cooperatively. And that's turned out to be correct as Fedora, Ubuntu and SUSE all do things differently with their Btrfs organization. I've had to create my own logic in my scripts that automatically walk all subvolumes on all filesystems for the simple reason that explicitly enumerating it all for dozens of servers becomes a significant administration burden. I have different retention needs for /var (particularly /var/cache) than I do for /home, for example, so carving up my snapshots so that I can easily drop them from those parts of my filesystems which have a high churn rate (= more unique extents, occupying a lot of disk) and yet aren't as important (I need to retain fewer of them) is very useful. At the moment, I like the idea of subvolumes pretty much only at the top level of the file system (subvolid 5), and I like the naming convention suggested here: http://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html under the section What We Propose I don't really like the colons because those are special character so now I have to type three characters for each one of those. But anyway those then get assembled in FHS form via fstab using subvol= or subvolid= mount option, or whatever replaces fstab eventually. This way you can snapshot different subvolumes at different rates with different cleanup policies while keeping all of them out of the normally mounted FHS path. A side plus is that this also puts old libraries outside the FHS path, sort of like they're in a jail. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I'd like a -r flag on btrfs subvolume delete
On Wed, Jul 15, 2015 at 6:11 PM, Johannes Ernst johannes.er...@gmail.com wrote: Cleaning this all up is a bit of pain, and btrfs subvolume delete -r dir would solve it nicely. It's come up before: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg42455.html http://lists.freedesktop.org/archives/systemd-devel/2015-April/030297.html I'm concerned about the interaction of machinectl snapshots of its own subvolumes, and rpm-ostree, and snapper snapshots. The only really convincing argument for tested subvolumes I've read is as an explicit break from being included in the snapshotting above it in the hierarchy. So the /var/lib/machines organization burdens other projects or users with the problem of how / or /var is snapshot and then a rollback happens, how to properly reassemble the system that includes /var/lib/machines subvolumes that are now in a different tree. How does this all get located and assembled properly at boot time? How is all of this backed up properly? How is it restored properly? I think recursive snapshotting and subvolume deletion is not a good idea. I think it's a complicated and inelegant work around for improper subvolume organization. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Anyone tried out btrbk yet?
Marc MERLIN wrote (ao): On Wed, Jul 15, 2015 at 10:03:16AM +1000, Paul Harvey wrote: The way it works in snazzer (and btrbk and I think also btrfs-sxbackup as well), local snapshots continue to happen as normal (Eg. daily or hourly) and so when your backup media or backup server is finally available again, the size of each individual incremental is still the same as usual, it just has to perform more of them. Good point. My system is not as smart. Every night, it'll make a new backup and only send one incremental and hope it gets there. It doesn't make a bunch of incrementals and send multiple. The other options do a better job here. FWIW, I've written a bunch of scripts for making backups. The lot has grown over the past years to what is is now. Not very pretty to see, but reliable. The subvolumes backupadmin home root rootvolume and var are snapshotted every hour. Each subvolume has their own entry in crontab for the actual backup. For example rootvolume once a day, home and backupadmin every hour. The scripts uses tar to make a full backup every first backup of a subvolume that month, an incremental daily backup, and an incremental hourly backup if applicable. For a full backup the oldest available snapshot for that month is used, regardless of when the backup is started. This way the backup of each subvolume can be spread not to overload a system. Backups are running in the idle queue to not hinder other processes, are compressed with lbzip2 to utilize all cores, and are encrypted with gpg for obvious reasons. In my tests lbzip2 gives the best size/speed ratio compared to lzop, xz, bzip2, gzip, pxz and lz4(hc). The script outputs what files and directories are in the backup to the backupadmin subvolume. This data is compressed with lz4hc as lz4hc is the fastest to decompress (useful to determine which archive contains what you want restored). Archives get transfered to a remote server by ftp, as ftp is the leanest way of file transfer and supports resume. The initial connection is encrypted to hide username/password, but as the archive is already encrypted, the data channel is not. The ftp transfer is throttled to only use part of the available bandwith. A daily running script checks for archives which are not transfered yet due to remote server not available or failed connection or the like, and retransmits those archives. Snapshots and archives are pruned based on disk usage (yet another script). Restore can be done by hand from snapshots (obviously), or by a script from the locale archive if still available, or the remote archive. The restore script can search a specific date-time range, and checks both local and remote for the availability of an archive that contains the wanted. A bare metal restore can be done by fetching the archives from the remote host and pipe them directly into gpg/tar. No need for additional local storage and no delay. First the monthly full backup is restored, then every daily incremental since, and then every hourly since the youngest daily, if applicable. tar incremental restore is smart, and removes the files and directories that were removed between backups. Sander -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS raid6 unmountable after a couple of days of usage.
On 14/07/15 11:25 PM, Austin S Hemmelgarn wrote: On 2015-07-14 07:49, Austin S Hemmelgarn wrote: So, after experiencing this same issue multiple times (on almost a dozen different kernel versions since 4.0) and ruling out the possibility of it being caused by my hardware (or at least, the RAM, SATA controller and disk drives themselves), I've decided to report it here. The general symptom is that raid6 profile filesystems that I have are working fine for multiple weeks, until I either reboot or otherwise try to remount them, at which point the system refuses to mount them. Further updates, I just tried mounting the filesystem from the image above again, this time passing device= options for each device in the FS, and it seems to be working fine now. I've tried this with the other filesystems however, and they still won't mount. I have experienced a similar problem on a raid1 with kernels from 3.17 onward following a kernel panic. I have found that passing the other device as the main device to mount will often work. E.g. # mount -o device=/dev/sdb,device=/dev/sdc /dev/sdb /mountpoint open_ctree failed # mount -o device=/dev/sdb,device=/dev/sdc /dev/sdc /mountpoint mounts correctly. If I then do an immediate umount and try again I get the same thing, but after some time using the filesystem, I can umount and either device is working for the mount again. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Anyone tried out btrbk yet?
BTW, is anybody else experiencing btrfs-cleaner consuming heavy resources for a very long time when snapshots are removed? Note the TIME on one of these btrfs-cleaner processes. top - 13:01:15 up 21:09, 2 users, load average: 5.30, 4.80, 3.83 Tasks: 315 total, 3 running, 312 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.7 us, 50.2 sy, 0.0 ni, 47.8 id, 1.2 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 16431800 total, 177448 free, 1411876 used, 14842476 buff/cache KiB Swap: 8257532 total, 8257316 free, 216 used. 14420732 avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 4134 root 20 0 0 0 0 R 100.0 0.0 2:41.40 btrfs-cleaner 4183 root 20 0 0 0 0 R 99.7 0.0 191:11.33 btrfs-cleaner On Wed, Jul 15, 2015 at 9:42 AM, Donald Pearson donaldwhpear...@gmail.com wrote: Implementation question about your scripts Marc.. I've set up some routines for different backup and retention intervals and periods in cron but quickly ran in to stepping on my own toes by the locking mechanism. I could just disable the locking but I'm not sure if that's the best approach and I don't know what it was implemented to prevent in the first place. Thoughts? Thanks, Donald On Wed, Jul 15, 2015 at 3:00 AM, Sander san...@humilis.net wrote: Marc MERLIN wrote (ao): On Wed, Jul 15, 2015 at 10:03:16AM +1000, Paul Harvey wrote: The way it works in snazzer (and btrbk and I think also btrfs-sxbackup as well), local snapshots continue to happen as normal (Eg. daily or hourly) and so when your backup media or backup server is finally available again, the size of each individual incremental is still the same as usual, it just has to perform more of them. Good point. My system is not as smart. Every night, it'll make a new backup and only send one incremental and hope it gets there. It doesn't make a bunch of incrementals and send multiple. The other options do a better job here. FWIW, I've written a bunch of scripts for making backups. The lot has grown over the past years to what is is now. Not very pretty to see, but reliable. The subvolumes backupadmin home root rootvolume and var are snapshotted every hour. Each subvolume has their own entry in crontab for the actual backup. For example rootvolume once a day, home and backupadmin every hour. The scripts uses tar to make a full backup every first backup of a subvolume that month, an incremental daily backup, and an incremental hourly backup if applicable. For a full backup the oldest available snapshot for that month is used, regardless of when the backup is started. This way the backup of each subvolume can be spread not to overload a system. Backups are running in the idle queue to not hinder other processes, are compressed with lbzip2 to utilize all cores, and are encrypted with gpg for obvious reasons. In my tests lbzip2 gives the best size/speed ratio compared to lzop, xz, bzip2, gzip, pxz and lz4(hc). The script outputs what files and directories are in the backup to the backupadmin subvolume. This data is compressed with lz4hc as lz4hc is the fastest to decompress (useful to determine which archive contains what you want restored). Archives get transfered to a remote server by ftp, as ftp is the leanest way of file transfer and supports resume. The initial connection is encrypted to hide username/password, but as the archive is already encrypted, the data channel is not. The ftp transfer is throttled to only use part of the available bandwith. A daily running script checks for archives which are not transfered yet due to remote server not available or failed connection or the like, and retransmits those archives. Snapshots and archives are pruned based on disk usage (yet another script). Restore can be done by hand from snapshots (obviously), or by a script from the locale archive if still available, or the remote archive. The restore script can search a specific date-time range, and checks both local and remote for the availability of an archive that contains the wanted. A bare metal restore can be done by fetching the archives from the remote host and pipe them directly into gpg/tar. No need for additional local storage and no delay. First the monthly full backup is restored, then every daily incremental since, and then every hourly since the youngest daily, if applicable. tar incremental restore is smart, and removes the files and directories that were removed between backups. Sander -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS raid6 unmountable after a couple of days of usage.
On 2015-07-14 19:20, Chris Murphy wrote: On Tue, Jul 14, 2015 at 7:25 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2015-07-14 07:49, Austin S Hemmelgarn wrote: So, after experiencing this same issue multiple times (on almost a dozen different kernel versions since 4.0) and ruling out the possibility of it being caused by my hardware (or at least, the RAM, SATA controller and disk drives themselves), I've decided to report it here. The general symptom is that raid6 profile filesystems that I have are working fine for multiple weeks, until I either reboot or otherwise try to remount them, at which point the system refuses to mount them. I'm currently using btrfs-progs v4.1 with kernel 4.1.2, although I've been seeing this with versions of both since 4.0. Output of 'btrfs fi show' for the most recent fs that I had this issue with: Label: 'altroot' uuid: 86eef6b9-febe-4350-a316-4cb00c40bbc5 Total devices 4 FS bytes used 9.70GiB devid1 size 24.00GiB used 6.03GiB path /dev/mapper/vg-altroot.0 devid2 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.1 devid3 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.2 devid4 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.3 btrfs-progs v4.1 Each of the individual LVS that are in the FS is just a flat chunk of space on a separate disk from the others. The FS itself passes btrfs check just fine (no reported errors, exit value of 0), but the kernel refuses to mount it with the message 'open_ctree failed'. I've run btrfs chunk recover and attached the output from that. Here's a link to an image from 'btrfs image -c9 -w': https://www.dropbox.com/s/pl7gs305ej65u9q/altroot.btrfs.img?dl=0 (That link will expire in 30 days, let me know if you need access to it beyond that). The filesystems in question all see relatively light but consistent usage as targets for receiving daily incremental snapshots for on-system backups (and because I know someone will mention it, yes, I do have other backups of the data, these are just my online backups). Further updates, I just tried mounting the filesystem from the image above again, this time passing device= options for each device in the FS, and it seems to be working fine now. I've tried this with the other filesystems however, and they still won't mount. And it's the same message with the usual suspects: recovery, ro,recovery ? How about degraded even though it's not degraded? And what about 'btrfs rescue zero-log' ? Yeah, same result for both, and zero-log didn't help (although that kind of doesn't surprise me, as it was cleanly unmounted). Of course it's weird that btrfs check doesn't complain, but mount does. I don't understand that, so it's good you've got an image. If either recovery or zero-log fix the problem, my understanding is this suggests hardware did something Btrfs didn't expect. I've run into cases in the past where this happens, although not recently (last time I remember it happening was back around 3.14 I think); and, interestingly, running check --repair in those cases did fix things, although that didn't complain about any issues either. I've managed to get the other filesystems I was having issues with mounted again with the device= options and clear_cache after running btrfs dev scan a couple of times. It seems to me (at least from what I'm seeing) that there is some metadata that isn't synchronized properly between the disks. I've heard mention from multiple sources of similar issues happening occasionally with raid1 back around kernel 3.16-3.17, and passing a different device to mount helping with that. smime.p7s Description: S/MIME Cryptographic Signature
Re: btrfs subvolume clone or fork (btrfs-progs feature request)
On Fri, Jul 10, 2015 at 09:36:45AM -0400, Austin S Hemmelgarn wrote: Technically it's not really a bit. The snapshot relation is determined by the parent uuid value of a subvolume. I'm actually kind of curious, is the parent UUID actually used for anything outside of send/receive? AFAIK no. which in turn means that certain tasks are more difficult to script robustly. I don't deny the interface/output is imperfect for scripting purposes, maybe we can provide filters that would satisfy your usecase. Personally, I don't really do much direct scripting of BTRFS related tasks (although that might change if I can convince my boss that we should move to BTRFS for our server systems). Most of my complaint with the current arrangement is primarily aesthetic more than anything else. Ok understood, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I'd like a -r flag on btrfs subvolume delete
Rationale: cleaning up after containers, which may have created their own subvolumes. E.g. systemd-nspawn —boot —directory dir where dir is a subvolume. When done with the container, deleting fir directly doesn’t work, because we now also have a subvolume at dir/var/lib/machines and obviously there may be more that the container might have created. Cleaning this all up is a bit of pain, and btrfs subvolume delete -r dir would solve it nicely. Cheers, Johannes Ernst Blog: http://upon2020.com/ Twitter: @Johannes_Ernst GPG key: http://upon2020.com/public/pubkey.txt Check out UBOS, the Linux distro for personal servers I work on: http://ubos.net/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Anyone tried out btrbk yet?
On Wed, Jul 15, 2015 at 09:42:28AM -0500, Donald Pearson wrote: Implementation question about your scripts Marc.. make sure you Cc me then, I could have missed that Email :) I've set up some routines for different backup and retention intervals and periods in cron but quickly ran in to stepping on my own toes by the locking mechanism. I could just disable the locking but I'm not sure if that's the best approach and I don't know what it was implemented to prevent in the first place. Try --postfix servername it'll add the destination server in the snapshot rotation and the lockfile. Otherwise, you can just trivially modify the script to take --lock as an argument, or you can even ln -s btrfs-subvolume-backup btrfs-subvolume-backupserver2 and the script will automatically make a /var/run/btrfs-subvolume-backupserver2 as a lockfile. Hope this helps. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Anyone tried out btrbk yet?
On Wed, Jul 15, 2015 at 01:02:29PM -0500, Donald Pearson wrote: BTW, is anybody else experiencing btrfs-cleaner consuming heavy resources for a very long time when snapshots are removed? Yes, that's normal. It spends a long time to reclaim blocks and free them, especially if they are on a hard drive and not SSD. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS raid6 unmountable after a couple of days of usage.
On Wed, Jul 15, 2015 at 10:15 AM, Hugo Mills h...@carfax.org.uk wrote: There is at least one superblock on every device, usually two, and often three. Each superblock contains the virtual address of the roots of the root tree, the chunk tree and the log tree. Those are useless without having the chunk tree, so there's also some information about the chunk tree appended to the end of each superblock to bootstrap the virtual address space lookup. So maybe Austin can use btrfs-show-super -a on every device and see if there's anything different on some of the devices, that shouldn't be different? There must be something the kernel is tripping over that the use space tools aren't for some reason. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert btrfs-progs: mkfs: create only desired block groups for single device
On Wed, Jul 15, 2015 at 08:40:44AM +0800, Qu Wenruo wrote: BTW, for the mkfs test case, it will be delayed for a while as the following bugs are making things quite tricky. Good, thanks, no rush at the moment. The next release will be probably in line with kernel 4.2 with the usual exception of important bugfixes. 1) fsck ignore chunk errors and return 0. Cause is known and easy to fix, but if fixed, most of fsck test won't pass. As the following bug is causing problem. 2) btrfs-image restore bug, causing missing dev_extent for DUP chunk. Investigating. that's the reason causing a lot of dev extent missing in mkfs test. The image dumps may be intentionally incomplete so not all reported errors are necessarily a problem. The restored filesystem should set the METADUMP bit in the superblock so this can help. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS raid6 unmountable after a couple of days of usage.
On Wed, Jul 15, 2015 at 5:07 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote: I've managed to get the other filesystems I was having issues with mounted again with the device= options and clear_cache after running btrfs dev scan a couple of times. It seems to me (at least from what I'm seeing) that there is some metadata that isn't synchronized properly between the disks. OK see if this logic follows without mistakes: The fs metadata is raid6, and therefore is broken up across all drives. Since you successfully captured an image of the file system with btrfs-image, clearly user space tool is finding a minimum of n-2 drives. If it didn't complain of missing drives, it found n drives. And yet the kernel is not finding n drives. And even with degraded it still won't mount, therefore it's not finding n-2 drives. By drives I mean either the physical device, or more likely whatever minimal metadata is necessary for assembling all devices into a volume. I don't know what that nugget of information is that's on each physical device, separate from the superblocks (which I think is distributed at logical addresses and therefore not on every physical drive), and if we have any tools to extract just that and debug it. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Anyone tried out btrbk yet?
Implementation question about your scripts Marc.. I've set up some routines for different backup and retention intervals and periods in cron but quickly ran in to stepping on my own toes by the locking mechanism. I could just disable the locking but I'm not sure if that's the best approach and I don't know what it was implemented to prevent in the first place. Thoughts? Thanks, Donald On Wed, Jul 15, 2015 at 3:00 AM, Sander san...@humilis.net wrote: Marc MERLIN wrote (ao): On Wed, Jul 15, 2015 at 10:03:16AM +1000, Paul Harvey wrote: The way it works in snazzer (and btrbk and I think also btrfs-sxbackup as well), local snapshots continue to happen as normal (Eg. daily or hourly) and so when your backup media or backup server is finally available again, the size of each individual incremental is still the same as usual, it just has to perform more of them. Good point. My system is not as smart. Every night, it'll make a new backup and only send one incremental and hope it gets there. It doesn't make a bunch of incrementals and send multiple. The other options do a better job here. FWIW, I've written a bunch of scripts for making backups. The lot has grown over the past years to what is is now. Not very pretty to see, but reliable. The subvolumes backupadmin home root rootvolume and var are snapshotted every hour. Each subvolume has their own entry in crontab for the actual backup. For example rootvolume once a day, home and backupadmin every hour. The scripts uses tar to make a full backup every first backup of a subvolume that month, an incremental daily backup, and an incremental hourly backup if applicable. For a full backup the oldest available snapshot for that month is used, regardless of when the backup is started. This way the backup of each subvolume can be spread not to overload a system. Backups are running in the idle queue to not hinder other processes, are compressed with lbzip2 to utilize all cores, and are encrypted with gpg for obvious reasons. In my tests lbzip2 gives the best size/speed ratio compared to lzop, xz, bzip2, gzip, pxz and lz4(hc). The script outputs what files and directories are in the backup to the backupadmin subvolume. This data is compressed with lz4hc as lz4hc is the fastest to decompress (useful to determine which archive contains what you want restored). Archives get transfered to a remote server by ftp, as ftp is the leanest way of file transfer and supports resume. The initial connection is encrypted to hide username/password, but as the archive is already encrypted, the data channel is not. The ftp transfer is throttled to only use part of the available bandwith. A daily running script checks for archives which are not transfered yet due to remote server not available or failed connection or the like, and retransmits those archives. Snapshots and archives are pruned based on disk usage (yet another script). Restore can be done by hand from snapshots (obviously), or by a script from the locale archive if still available, or the remote archive. The restore script can search a specific date-time range, and checks both local and remote for the availability of an archive that contains the wanted. A bare metal restore can be done by fetching the archives from the remote host and pipe them directly into gpg/tar. No need for additional local storage and no delay. First the monthly full backup is restored, then every daily incremental since, and then every hourly since the youngest daily, if applicable. tar incremental restore is smart, and removes the files and directories that were removed between backups. Sander -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS raid6 unmountable after a couple of days of usage.
On Wed, Jul 15, 2015 at 09:45:17AM -0600, Chris Murphy wrote: On Wed, Jul 15, 2015 at 5:07 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote: I've managed to get the other filesystems I was having issues with mounted again with the device= options and clear_cache after running btrfs dev scan a couple of times. It seems to me (at least from what I'm seeing) that there is some metadata that isn't synchronized properly between the disks. OK see if this logic follows without mistakes: The fs metadata is raid6, and therefore is broken up across all drives. Since you successfully captured an image of the file system with btrfs-image, clearly user space tool is finding a minimum of n-2 drives. If it didn't complain of missing drives, it found n drives. And yet the kernel is not finding n drives. And even with degraded it still won't mount, therefore it's not finding n-2 drives. By drives I mean either the physical device, or more likely whatever minimal metadata is necessary for assembling all devices into a volume. I don't know what that nugget of information is that's on each physical device, separate from the superblocks (which I think is distributed at logical addresses and therefore not on every physical drive), and if we have any tools to extract just that and debug it. There is at least one superblock on every device, usually two, and often three. Each superblock contains the virtual address of the roots of the root tree, the chunk tree and the log tree. Those are useless without having the chunk tree, so there's also some information about the chunk tree appended to the end of each superblock to bootstrap the virtual address space lookup. The information at the end of the superblock seems to be a list of packed (key, struct btrfs_chunk) pairs for the System chunks. The struct btrfs_chunk contains info about the chunk as a whole, and each stripe making it up. The stripe information is a devid, an offset (presumably in physical address on the device), and a UUID. So, from btrfs dev scan the kernel has all the devid to (major, minor) mappings for devices. From one device, it reads a superblock, gets the list of (devid, offset) for the System chunks at the end of that superblock, and can then identify the location of the System chunks to read the full chunk tree. Once it's got the chunk tree, it can do virtual-physical lookups, and the root tree and log tree locations make sense. I don't know whether btrfs-image works any differently from that, or if so, how it differs. Hugo. -- Hugo Mills | Radio is superior to television: the pictures are hugo@... carfax.org.uk | better http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature