[PATCH] btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail

2015-07-15 Thread Zhaolei
From: Zhao Lei zhao...@cn.fujitsu.com

When read_tree_block() failed, we can see following dmesg:
 [  134.371389] BUG: unable to handle kernel NULL pointer dereference at 
0063
 [  134.372236] IP: [813a4a51] free_extent_buffer+0x21/0x90
 [  134.372236] PGD 0
 [  134.372236] Oops:  [#1] SMP
 [  134.372236] Modules linked in:
 [  134.372236] CPU: 0 PID: 2289 Comm: mount Not tainted 
4.2.0-rc1_HEAD_c65b99f046843d2455aa231747b5a07a999a9f3d_+ #115
 [  134.372236] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
 [  134.372236] task: 88003b6e1a00 ti: 880011e6 task.ti: 
880011e6
 [  134.372236] RIP: 0010:[813a4a51]  [813a4a51] 
free_extent_buffer+0x21/0x90
 ...
 [  134.372236] Call Trace:
 [  134.372236]  [81379aa1] free_root_extent_buffers+0x91/0xb0
 [  134.372236]  [81379c3d] free_root_pointers+0x17d/0x190
 [  134.372236]  [813801b0] open_ctree+0x1ca0/0x25b0
 [  134.372236]  [8144d017] ? disk_name+0x97/0xb0
 [  134.372236]  [813558aa] btrfs_mount+0x8fa/0xab0
 ...

Reason:
 read_tree_block() changed to return error number on fail,
 and this value(not NULL) is set to tree_root-node, then subsequent
 code will run to:
  free_root_pointers()
  -free_root_extent_buffers()
  -free_extent_buffer()
  -atomic_read((extent_buffer *)(-E_XXX)-refs);
 and trigger above error.

Fix:
 Set tree_root-node to NULL on fail to make error_handle code
 happy.

Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a9aadb2..f556c37 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2842,6 +2842,7 @@ int open_ctree(struct super_block *sb,
!extent_buffer_uptodate(chunk_root-node)) {
printk(KERN_ERR BTRFS: failed to read chunk root on %s\n,
   sb-s_id);
+   chunk_root-node = NULL;
goto fail_tree_roots;
}
btrfs_set_root_node(chunk_root-root_item, chunk_root-node);
@@ -2879,7 +2880,7 @@ retry_root_backup:
!extent_buffer_uptodate(tree_root-node)) {
printk(KERN_WARNING BTRFS: failed to read tree root on %s\n,
   sb-s_id);
-
+   tree_root-node = NULL;
goto recovery_tree_root;
}
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I'd like a -r flag on btrfs subvolume delete

2015-07-15 Thread Paul Harvey
On 16 July 2015 at 11:35, Chris Murphy li...@colorremedies.com wrote:
 On Wed, Jul 15, 2015 at 6:11 PM, Johannes Ernst
 johannes.er...@gmail.com wrote:

 Cleaning this all up is a bit of pain, and
 btrfs subvolume delete -r dir
 would solve it nicely.

[snip]

 How is all of this backed up properly? How is it restored properly? I
 think recursive snapshotting and subvolume deletion is not a good
 idea. I think it's a complicated and inelegant work around for
 improper subvolume organization.

I for one would love to see authoritative documentation on proper
subvolume organization. I was completely lost when writing snazzer and
have so far received very little guidance or even offers of opinions
on this ML.

I've had to create my own logic in my scripts that automatically walk
all subvolumes on all filesystems for the simple reason that
explicitly enumerating it all for dozens of servers becomes a
significant administration burden.

I have different retention needs for /var (particularly /var/cache)
than I do for /home, for example, so carving up my snapshots so that I
can easily drop them from those parts of my filesystems which have a
high churn rate (= more unique extents, occupying a lot of disk) and
yet aren't as important (I need to retain fewer of them) is very
useful.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I'd like a -r flag on btrfs subvolume delete

2015-07-15 Thread Chris Murphy
On Wed, Jul 15, 2015 at 9:12 PM, Paul Harvey csir...@gmail.com wrote:
 On 16 July 2015 at 11:35, Chris Murphy li...@colorremedies.com wrote:

 How is all of this backed up properly? How is it restored properly? I
 think recursive snapshotting and subvolume deletion is not a good
 idea. I think it's a complicated and inelegant work around for
 improper subvolume organization.

 I for one would love to see authoritative documentation on proper
 subvolume organization.

The choice of improper wasn't ideal on my part. There's nothing
directly wrong with nested subvolumes. But if you then combine them
with snapshots and rollbacks, there are consequences that include more
complication. If more than one things is doing snapshots and
rollbacks, it requires some rules as to who can snapshot what and
where those things go in order to avoid being snapshot again by some
other tool, and then how things get reassembled. There are different
kinds of rollbacks so that needs some rules or it'll just lead to
confusion.


 I was completely lost when writing snazzer and
 have so far received very little guidance or even offers of opinions
 on this ML.

A couple of developers have suggested the folly of nested subvolumes
several times. Discovering the consequences of organizing subvolumes
is a work in progress. I've mentioned a couple times over the years
that distros are inevitably going to end up with fragmented and
mutually incompatible approaches if they don't actively engage each
other cooperatively. And that's turned out to be correct as Fedora,
Ubuntu and SUSE all do things differently with their Btrfs
organization.



 I've had to create my own logic in my scripts that automatically walk
 all subvolumes on all filesystems for the simple reason that
 explicitly enumerating it all for dozens of servers becomes a
 significant administration burden.

 I have different retention needs for /var (particularly /var/cache)
 than I do for /home, for example, so carving up my snapshots so that I
 can easily drop them from those parts of my filesystems which have a
 high churn rate (= more unique extents, occupying a lot of disk) and
 yet aren't as important (I need to retain fewer of them) is very
 useful.

At the moment, I like the idea of subvolumes pretty much only at the
top level of the file system (subvolid 5), and I like the naming
convention suggested here:
http://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html
under the section What We Propose

I don't really like the colons because those are special character so
now I have to type three characters for each one of those. But anyway
those then get assembled in FHS form via fstab using subvol= or
subvolid= mount option, or whatever replaces fstab eventually.

This way you can snapshot different subvolumes at different rates with
different cleanup policies while keeping all of them out of the
normally mounted FHS path. A side plus is that this also puts old
libraries outside the FHS path, sort of like they're in a jail.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I'd like a -r flag on btrfs subvolume delete

2015-07-15 Thread Chris Murphy
On Wed, Jul 15, 2015 at 6:11 PM, Johannes Ernst
johannes.er...@gmail.com wrote:

 Cleaning this all up is a bit of pain, and
 btrfs subvolume delete -r dir
 would solve it nicely.

It's come up before:
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg42455.html
http://lists.freedesktop.org/archives/systemd-devel/2015-April/030297.html

I'm concerned about the interaction of machinectl snapshots of its own
subvolumes, and rpm-ostree, and snapper snapshots.

The only really convincing argument for tested subvolumes I've read is
as an explicit break from being included in the snapshotting above it
in the hierarchy.

So the /var/lib/machines organization burdens other projects or users
with the problem of how / or /var is snapshot and then a rollback
happens, how to properly reassemble the system that includes
/var/lib/machines subvolumes that are now in a different tree. How
does this all get located and assembled properly at boot time?

How is all of this backed up properly? How is it restored properly? I
think recursive snapshotting and subvolume deletion is not a good
idea. I think it's a complicated and inelegant work around for
improper subvolume organization.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anyone tried out btrbk yet?

2015-07-15 Thread Sander
Marc MERLIN wrote (ao):
 On Wed, Jul 15, 2015 at 10:03:16AM +1000, Paul Harvey wrote:
  The way it works in snazzer (and btrbk and I think also btrfs-sxbackup
  as well), local snapshots continue to happen as normal (Eg. daily or
  hourly) and so when your backup media or backup server is finally
  available again, the size of each individual incremental is still the
  same as usual, it just has to perform more of them.
  
 Good point. My system is not as smart. Every night, it'll make a new
 backup and only send one incremental and hope it gets there. It doesn't
 make a bunch of incrementals and send multiple.
 
 The other options do a better job here.

FWIW, I've written a bunch of scripts for making backups. The lot has
grown over the past years to what is is now. Not very pretty to see, but
reliable.

The subvolumes backupadmin home root rootvolume and var are snapshotted
every hour.

Each subvolume has their own entry in crontab for the actual backup.
For example rootvolume once a day, home and backupadmin every hour.

The scripts uses tar to make a full backup every first backup of a
subvolume that month, an incremental daily backup, and an incremental
hourly backup if applicable.

For a full backup the oldest available snapshot for that month is used,
regardless of when the backup is started. This way the backup of each
subvolume can be spread not to overload a system.

Backups are running in the idle queue to not hinder other processes, are
compressed with lbzip2 to utilize all cores, and are encrypted with gpg
for obvious reasons. In my tests lbzip2 gives the best size/speed ratio
compared to lzop, xz, bzip2, gzip, pxz and lz4(hc).

The script outputs what files and directories are in the backup to the
backupadmin subvolume. This data is compressed with lz4hc as lz4hc is
the fastest to decompress (useful to determine which archive contains
what you want restored).

Archives get transfered to a remote server by ftp, as ftp is the leanest
way of file transfer and supports resume. The initial connection is
encrypted to hide username/password, but as the archive is already
encrypted, the data channel is not. The ftp transfer is throttled to
only use part of the available bandwith.

A daily running script checks for archives which are not transfered yet
due to remote server not available or failed connection or the like, and
retransmits those archives.

Snapshots and archives are pruned based on disk usage (yet another
script).

Restore can be done by hand from snapshots (obviously), or by a script
from the locale archive if still available, or the remote archive.

The restore script can search a specific date-time range, and checks
both local and remote for the availability of an archive that contains
the wanted.

A bare metal restore can be done by fetching the archives from the
remote host and pipe them directly into gpg/tar. No need for additional
local storage and no delay. First the monthly full backup is restored,
then every daily incremental since, and then every hourly since the
youngest daily, if applicable. tar incremental restore is smart, and
removes the files and directories that were removed between backups.

Sander
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS raid6 unmountable after a couple of days of usage.

2015-07-15 Thread Ryan Bourne


On 14/07/15 11:25 PM, Austin S Hemmelgarn wrote:

On 2015-07-14 07:49, Austin S Hemmelgarn wrote:

So, after experiencing this same issue multiple times (on almost a
dozen different kernel versions since 4.0) and ruling out the
possibility of it being caused by my hardware (or at least, the RAM,
SATA controller and disk drives themselves), I've decided to report it
here.

The general symptom is that raid6 profile filesystems that I have are
working fine for multiple weeks, until I either reboot or otherwise
try to remount them, at which point the system refuses to mount them.





Further updates, I just tried mounting the filesystem from the image
above again, this time passing device= options for each device in the
FS, and it seems to be working fine now.  I've tried this with the other
filesystems however, and they still won't mount.



I have experienced a similar problem on a raid1 with kernels from 3.17
onward following a kernel panic.

I have found that passing the other device as the main device to mount
will often work.
E.g.

# mount -o device=/dev/sdb,device=/dev/sdc /dev/sdb /mountpoint
open_ctree failed

# mount -o device=/dev/sdb,device=/dev/sdc /dev/sdc /mountpoint
mounts correctly.

If I then do an immediate umount and try again I get the same thing, but
after some time using the filesystem, I can umount and either device is
working for the mount again.




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anyone tried out btrbk yet?

2015-07-15 Thread Donald Pearson
BTW, is anybody else experiencing btrfs-cleaner consuming heavy
resources for a very long time when snapshots are removed?

Note the TIME on one of these btrfs-cleaner processes.

top - 13:01:15 up 21:09,  2 users,  load average: 5.30, 4.80, 3.83
Tasks: 315 total,   3 running, 312 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us, 50.2 sy,  0.0 ni, 47.8 id,  1.2 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16431800 total,   177448 free,  1411876 used, 14842476 buff/cache
KiB Swap:  8257532 total,  8257316 free,  216 used. 14420732 avail Mem

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
 4134 root  20   0   0  0  0 R 100.0  0.0   2:41.40
btrfs-cleaner
 4183 root  20   0   0  0  0 R  99.7  0.0 191:11.33
btrfs-cleaner

On Wed, Jul 15, 2015 at 9:42 AM, Donald Pearson
donaldwhpear...@gmail.com wrote:
 Implementation question about your scripts Marc..

 I've set up some routines for different backup and retention intervals
 and periods in cron but quickly ran in to stepping on my own toes by
 the locking mechanism.  I could just disable the locking but I'm not
 sure if that's the best approach and I don't know what it was
 implemented to prevent in the first place.

 Thoughts?

 Thanks,
 Donald

 On Wed, Jul 15, 2015 at 3:00 AM, Sander san...@humilis.net wrote:
 Marc MERLIN wrote (ao):
 On Wed, Jul 15, 2015 at 10:03:16AM +1000, Paul Harvey wrote:
  The way it works in snazzer (and btrbk and I think also btrfs-sxbackup
  as well), local snapshots continue to happen as normal (Eg. daily or
  hourly) and so when your backup media or backup server is finally
  available again, the size of each individual incremental is still the
  same as usual, it just has to perform more of them.

 Good point. My system is not as smart. Every night, it'll make a new
 backup and only send one incremental and hope it gets there. It doesn't
 make a bunch of incrementals and send multiple.

 The other options do a better job here.

 FWIW, I've written a bunch of scripts for making backups. The lot has
 grown over the past years to what is is now. Not very pretty to see, but
 reliable.

 The subvolumes backupadmin home root rootvolume and var are snapshotted
 every hour.

 Each subvolume has their own entry in crontab for the actual backup.
 For example rootvolume once a day, home and backupadmin every hour.

 The scripts uses tar to make a full backup every first backup of a
 subvolume that month, an incremental daily backup, and an incremental
 hourly backup if applicable.

 For a full backup the oldest available snapshot for that month is used,
 regardless of when the backup is started. This way the backup of each
 subvolume can be spread not to overload a system.

 Backups are running in the idle queue to not hinder other processes, are
 compressed with lbzip2 to utilize all cores, and are encrypted with gpg
 for obvious reasons. In my tests lbzip2 gives the best size/speed ratio
 compared to lzop, xz, bzip2, gzip, pxz and lz4(hc).

 The script outputs what files and directories are in the backup to the
 backupadmin subvolume. This data is compressed with lz4hc as lz4hc is
 the fastest to decompress (useful to determine which archive contains
 what you want restored).

 Archives get transfered to a remote server by ftp, as ftp is the leanest
 way of file transfer and supports resume. The initial connection is
 encrypted to hide username/password, but as the archive is already
 encrypted, the data channel is not. The ftp transfer is throttled to
 only use part of the available bandwith.

 A daily running script checks for archives which are not transfered yet
 due to remote server not available or failed connection or the like, and
 retransmits those archives.

 Snapshots and archives are pruned based on disk usage (yet another
 script).

 Restore can be done by hand from snapshots (obviously), or by a script
 from the locale archive if still available, or the remote archive.

 The restore script can search a specific date-time range, and checks
 both local and remote for the availability of an archive that contains
 the wanted.

 A bare metal restore can be done by fetching the archives from the
 remote host and pipe them directly into gpg/tar. No need for additional
 local storage and no delay. First the monthly full backup is restored,
 then every daily incremental since, and then every hourly since the
 youngest daily, if applicable. tar incremental restore is smart, and
 removes the files and directories that were removed between backups.

 Sander
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS raid6 unmountable after a couple of days of usage.

2015-07-15 Thread Austin S Hemmelgarn

On 2015-07-14 19:20, Chris Murphy wrote:

On Tue, Jul 14, 2015 at 7:25 AM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:

On 2015-07-14 07:49, Austin S Hemmelgarn wrote:


So, after experiencing this same issue multiple times (on almost a dozen
different kernel versions since 4.0) and ruling out the possibility of it
being caused by my hardware (or at least, the RAM, SATA controller and disk
drives themselves), I've decided to report it here.

The general symptom is that raid6 profile filesystems that I have are
working fine for multiple weeks, until I either reboot or otherwise try to
remount them, at which point the system refuses to mount them.

I'm currently using btrfs-progs v4.1 with kernel 4.1.2, although I've been
seeing this with versions of both since 4.0.

Output of 'btrfs fi show' for the most recent fs that I had this issue
with:
  Label: 'altroot'  uuid: 86eef6b9-febe-4350-a316-4cb00c40bbc5
 Total devices 4 FS bytes used 9.70GiB
 devid1 size 24.00GiB used 6.03GiB path
/dev/mapper/vg-altroot.0
 devid2 size 24.00GiB used 6.01GiB path
/dev/mapper/vg-altroot.1
 devid3 size 24.00GiB used 6.01GiB path
/dev/mapper/vg-altroot.2
 devid4 size 24.00GiB used 6.01GiB path
/dev/mapper/vg-altroot.3

  btrfs-progs v4.1

Each of the individual LVS that are in the FS is just a flat chunk of
space on a separate disk from the others.

The FS itself passes btrfs check just fine (no reported errors, exit value
of 0), but the kernel refuses to mount it with the message 'open_ctree
failed'.

I've run btrfs chunk recover and attached the output from that.

Here's a link to an image from 'btrfs image -c9 -w':
https://www.dropbox.com/s/pl7gs305ej65u9q/altroot.btrfs.img?dl=0
(That link will expire in 30 days, let me know if you need access to it
beyond that).

The filesystems in question all see relatively light but consistent usage
as targets for receiving daily incremental snapshots for on-system backups
(and because I know someone will mention it, yes, I do have other backups of
the data, these are just my online backups).


Further updates, I just tried mounting the filesystem from the image above
again, this time passing device= options for each device in the FS, and it
seems to be working fine now.  I've tried this with the other filesystems
however, and they still won't mount.



And it's the same message with the usual suspects: recovery,
ro,recovery ? How about degraded even though it's not degraded? And
what about 'btrfs rescue zero-log' ?
Yeah, same result for both, and zero-log didn't help (although that kind 
of doesn't surprise me, as it was cleanly unmounted).


Of course it's weird that btrfs check doesn't complain, but mount
does. I don't understand that, so it's good you've got an image. If
either recovery or zero-log fix the problem, my understanding is this
suggests hardware did something Btrfs didn't expect.
I've run into cases in the past where this happens, although not 
recently (last time I remember it happening was back around 3.14 I 
think); and, interestingly, running check --repair in those cases did 
fix things, although that didn't complain about any issues either.


I've managed to get the other filesystems I was having issues with 
mounted again with the device= options and clear_cache after running 
btrfs dev scan a couple of times.  It seems to me (at least from what 
I'm seeing) that there is some metadata that isn't synchronized properly 
between the disks.  I've heard mention from multiple sources of similar 
issues happening occasionally with raid1 back around kernel 3.16-3.17, 
and passing a different device to mount helping with that.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-15 Thread David Sterba
On Fri, Jul 10, 2015 at 09:36:45AM -0400, Austin S Hemmelgarn wrote:
  Technically it's not really a bit. The snapshot relation is determined
  by the parent uuid value of a subvolume.
 I'm actually kind of curious, is the parent UUID actually used for 
 anything outside of send/receive?

AFAIK no.

  which in turn means that certain
  tasks are more difficult to script robustly.
 
  I don't deny the interface/output is imperfect for scripting purposes,
  maybe we can provide filters that would satisfy your usecase.
 
 Personally, I don't really do much direct scripting of BTRFS related 
 tasks (although that might change if I can convince my boss that we 
 should move to BTRFS for our server systems).  Most of my complaint with 
 the current arrangement is primarily aesthetic more than anything else.

Ok understood, thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I'd like a -r flag on btrfs subvolume delete

2015-07-15 Thread Johannes Ernst
Rationale: cleaning up after containers, which may have created their own 
subvolumes.

E.g.
systemd-nspawn —boot —directory dir

where dir is a subvolume. When done with the container, deleting fir directly 
doesn’t work, because we now also have a subvolume at
dir/var/lib/machines
and obviously there may be more that the container might have created.

Cleaning this all up is a bit of pain, and
btrfs subvolume delete -r dir
would solve it nicely.

Cheers,



Johannes Ernst
Blog: http://upon2020.com/
Twitter: @Johannes_Ernst
GPG key: http://upon2020.com/public/pubkey.txt
Check out UBOS, the Linux distro for personal servers I work on: 
http://ubos.net/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anyone tried out btrbk yet?

2015-07-15 Thread Marc MERLIN
On Wed, Jul 15, 2015 at 09:42:28AM -0500, Donald Pearson wrote:
 Implementation question about your scripts Marc..

make sure you Cc me then, I could have missed that Email :)

 I've set up some routines for different backup and retention intervals
 and periods in cron but quickly ran in to stepping on my own toes by
 the locking mechanism.  I could just disable the locking but I'm not
 sure if that's the best approach and I don't know what it was
 implemented to prevent in the first place.

Try --postfix servername
it'll add the destination server in the snapshot rotation and the
lockfile.

Otherwise, you can just trivially modify the script to take --lock as an
argument, or you can even 
ln -s btrfs-subvolume-backup btrfs-subvolume-backupserver2
and the script will automatically make a /var/run/btrfs-subvolume-backupserver2
as a lockfile.

Hope this helps.
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anyone tried out btrbk yet?

2015-07-15 Thread Marc MERLIN
On Wed, Jul 15, 2015 at 01:02:29PM -0500, Donald Pearson wrote:
 BTW, is anybody else experiencing btrfs-cleaner consuming heavy
 resources for a very long time when snapshots are removed?
 
Yes, that's normal. It spends a long time to reclaim blocks and free
them, especially if they are on a hard drive and not SSD.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS raid6 unmountable after a couple of days of usage.

2015-07-15 Thread Chris Murphy
On Wed, Jul 15, 2015 at 10:15 AM, Hugo Mills h...@carfax.org.uk wrote:

There is at least one superblock on every device, usually two, and
 often three. Each superblock contains the virtual address of the roots
 of the root tree, the chunk tree and the log tree. Those are useless
 without having the chunk tree, so there's also some information about
 the chunk tree appended to the end of each superblock to bootstrap the
 virtual address space lookup.

So maybe Austin can use btrfs-show-super -a on every device and see if
there's anything different on some of the devices, that shouldn't be
different? There must be something the kernel is tripping over that
the use space tools aren't for some reason.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert btrfs-progs: mkfs: create only desired block groups for single device

2015-07-15 Thread David Sterba
On Wed, Jul 15, 2015 at 08:40:44AM +0800, Qu Wenruo wrote:
 BTW, for the mkfs test case, it will be delayed for a while as the 
 following bugs are making things quite tricky.

Good, thanks, no rush at the moment. The next release will be probably
in line with kernel 4.2 with the usual exception of important bugfixes.

 1) fsck ignore chunk errors and return 0.
 Cause is known and easy to fix, but if fixed, most of fsck test won't pass.
 As the following bug is causing problem.
 
 2) btrfs-image restore bug, causing missing dev_extent for DUP chunk.
 Investigating. that's the reason causing a lot of dev extent missing in 
 mkfs test.

The image dumps may be intentionally incomplete so not all reported
errors are necessarily a problem. The restored filesystem should set the
METADUMP bit in the superblock so this can help.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS raid6 unmountable after a couple of days of usage.

2015-07-15 Thread Chris Murphy
On Wed, Jul 15, 2015 at 5:07 AM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:
 I've managed to get the other filesystems I was having issues with mounted
 again with the device= options and clear_cache after running btrfs dev scan
 a couple of times.  It seems to me (at least from what I'm seeing) that
 there is some metadata that isn't synchronized properly between the disks.

OK see if this logic follows without mistakes:

The fs metadata is raid6, and therefore is broken up across all
drives. Since you successfully captured an image of the file system
with btrfs-image, clearly user space tool is finding a minimum of n-2
drives. If it didn't complain of missing drives, it found n drives.

And yet the kernel is not finding n drives. And even with degraded it
still won't mount, therefore it's not finding n-2 drives.

By drives I mean either the physical device, or more likely whatever
minimal metadata is necessary for assembling all devices into a
volume. I don't know what that nugget of information is that's on each
physical device, separate from the superblocks (which I think is
distributed at logical addresses and therefore not on every physical
drive), and if we have any tools to extract just that and debug it.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anyone tried out btrbk yet?

2015-07-15 Thread Donald Pearson
Implementation question about your scripts Marc..

I've set up some routines for different backup and retention intervals
and periods in cron but quickly ran in to stepping on my own toes by
the locking mechanism.  I could just disable the locking but I'm not
sure if that's the best approach and I don't know what it was
implemented to prevent in the first place.

Thoughts?

Thanks,
Donald

On Wed, Jul 15, 2015 at 3:00 AM, Sander san...@humilis.net wrote:
 Marc MERLIN wrote (ao):
 On Wed, Jul 15, 2015 at 10:03:16AM +1000, Paul Harvey wrote:
  The way it works in snazzer (and btrbk and I think also btrfs-sxbackup
  as well), local snapshots continue to happen as normal (Eg. daily or
  hourly) and so when your backup media or backup server is finally
  available again, the size of each individual incremental is still the
  same as usual, it just has to perform more of them.

 Good point. My system is not as smart. Every night, it'll make a new
 backup and only send one incremental and hope it gets there. It doesn't
 make a bunch of incrementals and send multiple.

 The other options do a better job here.

 FWIW, I've written a bunch of scripts for making backups. The lot has
 grown over the past years to what is is now. Not very pretty to see, but
 reliable.

 The subvolumes backupadmin home root rootvolume and var are snapshotted
 every hour.

 Each subvolume has their own entry in crontab for the actual backup.
 For example rootvolume once a day, home and backupadmin every hour.

 The scripts uses tar to make a full backup every first backup of a
 subvolume that month, an incremental daily backup, and an incremental
 hourly backup if applicable.

 For a full backup the oldest available snapshot for that month is used,
 regardless of when the backup is started. This way the backup of each
 subvolume can be spread not to overload a system.

 Backups are running in the idle queue to not hinder other processes, are
 compressed with lbzip2 to utilize all cores, and are encrypted with gpg
 for obvious reasons. In my tests lbzip2 gives the best size/speed ratio
 compared to lzop, xz, bzip2, gzip, pxz and lz4(hc).

 The script outputs what files and directories are in the backup to the
 backupadmin subvolume. This data is compressed with lz4hc as lz4hc is
 the fastest to decompress (useful to determine which archive contains
 what you want restored).

 Archives get transfered to a remote server by ftp, as ftp is the leanest
 way of file transfer and supports resume. The initial connection is
 encrypted to hide username/password, but as the archive is already
 encrypted, the data channel is not. The ftp transfer is throttled to
 only use part of the available bandwith.

 A daily running script checks for archives which are not transfered yet
 due to remote server not available or failed connection or the like, and
 retransmits those archives.

 Snapshots and archives are pruned based on disk usage (yet another
 script).

 Restore can be done by hand from snapshots (obviously), or by a script
 from the locale archive if still available, or the remote archive.

 The restore script can search a specific date-time range, and checks
 both local and remote for the availability of an archive that contains
 the wanted.

 A bare metal restore can be done by fetching the archives from the
 remote host and pipe them directly into gpg/tar. No need for additional
 local storage and no delay. First the monthly full backup is restored,
 then every daily incremental since, and then every hourly since the
 youngest daily, if applicable. tar incremental restore is smart, and
 removes the files and directories that were removed between backups.

 Sander
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS raid6 unmountable after a couple of days of usage.

2015-07-15 Thread Hugo Mills
On Wed, Jul 15, 2015 at 09:45:17AM -0600, Chris Murphy wrote:
 On Wed, Jul 15, 2015 at 5:07 AM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
  I've managed to get the other filesystems I was having issues with mounted
  again with the device= options and clear_cache after running btrfs dev scan
  a couple of times.  It seems to me (at least from what I'm seeing) that
  there is some metadata that isn't synchronized properly between the disks.
 
 OK see if this logic follows without mistakes:
 
 The fs metadata is raid6, and therefore is broken up across all
 drives. Since you successfully captured an image of the file system
 with btrfs-image, clearly user space tool is finding a minimum of n-2
 drives. If it didn't complain of missing drives, it found n drives.
 
 And yet the kernel is not finding n drives. And even with degraded it
 still won't mount, therefore it's not finding n-2 drives.
 
 By drives I mean either the physical device, or more likely whatever
 minimal metadata is necessary for assembling all devices into a
 volume. I don't know what that nugget of information is that's on each
 physical device, separate from the superblocks (which I think is
 distributed at logical addresses and therefore not on every physical
 drive), and if we have any tools to extract just that and debug it.

   There is at least one superblock on every device, usually two, and
often three. Each superblock contains the virtual address of the roots
of the root tree, the chunk tree and the log tree. Those are useless
without having the chunk tree, so there's also some information about
the chunk tree appended to the end of each superblock to bootstrap the
virtual address space lookup.

   The information at the end of the superblock seems to be a list of
packed (key, struct btrfs_chunk) pairs for the System chunks. The
struct btrfs_chunk contains info about the chunk as a whole, and each
stripe making it up. The stripe information is a devid, an offset
(presumably in physical address on the device), and a UUID.

   So, from btrfs dev scan the kernel has all the devid to (major,
minor) mappings for devices. From one device, it reads a superblock,
gets the list of (devid, offset) for the System chunks at the end of
that superblock, and can then identify the location of the System
chunks to read the full chunk tree. Once it's got the chunk tree, it
can do virtual-physical lookups, and the root tree and log tree
locations make sense.

   I don't know whether btrfs-image works any differently from that,
or if so, how it differs.

   Hugo.

-- 
Hugo Mills | Radio is superior to television: the pictures are
hugo@... carfax.org.uk | better
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature