Re: state of btrfs snapshot limitations?

2018-09-19 Thread James A. Robinson
On Wed, Sep 19, 2018 at 4:04 PM Pete  wrote:
> snapshots.  You need to delete it out of them as well which defeats the
> idea of read only snapshots if you are using them.

I wouldn't say it defeats the idea of read-only snapshots. If you want
to be able to "go back in time" and see what changed, you have to
pay the price, right?  At least until someone figures out a quantum
computer that can generate all possible states of data at once!

> I've since made /tmp a subvolume to prevent snap-shotting to partially
> mitigate this.  I'm wondering if I should make /lib/modules one for the
> same reason.  In previous posts on this mailing list people have

Yes, in my scheme I'm excluding things like tmp and the trash folder
when I populate the actual backup directory.

I decided on a 'opt-in' scheme where I have to specifically select
specific folders to send over to the backup server, rather than
attempt to track everything on disk.  Of course I am in danger
of missing an important directory and not realizing it until it's too
late.

> Now I've seen it I can't un-see it!

I think we should just redefine end-of-line to 0x00 to let him have
his 0x0a! :)

So with my scheme, similar to yours, I've got a bit of overlap in
that I take a snapshot under hourly as well as a daily snapshot,
but my hourly will rotate off at the end of the 24 hour period.

Here's an example list of the snapshots accumulated so far,
you can see that the positions change as time proceeds and
older snapshots are replaced with newer ones for the
hour and minute buckets.

$ sudo btrfs subvolume list /snapshot/
ID 259 gen 2271 top level 5 path c
ID 1024 gen 1064 top level 5 path d/2018/0915
ID 1106 gen 1216 top level 5 path d/2018/0916
ID 1227 gen 1435 top level 5 path d/2018/0917
ID 1348 gen 1681 top level 5 path d/2018/0918
ID 1566 gen 2017 top level 5 path h/1700
ID 1571 gen 2026 top level 5 path h/1800
ID 1576 gen 2035 top level 5 path h/1900
ID 1581 gen 2044 top level 5 path h/2000
ID 1586 gen 2053 top level 5 path h/2100
ID 1591 gen 2062 top level 5 path h/2200
ID 1596 gen 2071 top level 5 path h/2300
ID 1602 gen 2081 top level 5 path h/
ID 1607 gen 2090 top level 5 path h/0100
ID 1612 gen 2099 top level 5 path h/0200
ID 1617 gen 2108 top level 5 path h/0300
ID 1622 gen 2119 top level 5 path h/0400
ID 1627 gen 2133 top level 5 path h/0500
ID 1632 gen 2148 top level 5 path h/0600
ID 1637 gen 2159 top level 5 path h/0700
ID 1642 gen 2169 top level 5 path h/0800
ID 1648 gen 2182 top level 5 path h/0900
ID 1654 gen 2193 top level 5 path h/1000
ID 1660 gen 2203 top level 5 path h/1100
ID 1666 gen 2213 top level 5 path h/1200
ID 1672 gen 2224 top level 5 path h/1300
ID 1678 gen 2239 top level 5 path h/1400
ID 1684 gen 2253 top level 5 path h/1500
ID 1687 gen 2260 top level 5 path m/30
ID 1688 gen 2263 top level 5 path m/45
ID 1689 gen 2266 top level 5 path d/2018/0919
ID 1690 gen 2267 top level 5 path h/1600
ID 1691 gen 2268 top level 5 path m/00
ID 1692 gen 2271 top level 5 path m/15

The script as it currently stands:

#!/bin/bash
#
# snapshots - sync and snapshot at intervals
#
# this script is assumed to be run on a 15 minute cycle,
# at 00, 15, 30, and 45 minutes after the hour.  It will
# sync ${source} to ${volume}/c/ and then take
# snapshots:
#
# ${volume}/d//
# ${volume}/h/
# ${volume}/m/
#
# The daily snapshot is taken when run at 00:00
# The hourly snapshot is taken when run at *:00 minutes.
# The minute snapshot is taken every time it is run.
#
# So the directory structure created under ${volume} will be:
#
#c: the most recently synced data from /backup
#
#d: a daily snapshot with the naming scheme /mmdd
#
#h: an hourly snapshot with the naming scheme hhmm, note
#that it is on a 24-hour cycle, so if it is currently 13:30 then
#snapshots  through 1300 are from today, and snapshots
#1400 through 2300 are from yesterday.
#
#m: a minute snapshot (00, 15, 30, 45), also on a cycle meaning
#if it is 14:20 then 00 and 15 are for 14:00 and 14:15, and 30 and
#45 are from 13:30 and 13:45.
#
umask 0077

# fully qualified path to backup
source="/backup/";

# btrfs volume where snapshots are managed
volume=/snapshot

# local lock dir / pid file
lockdir="/var/tmp/snapshots.lock";
pid="${lockdir}/pid";

# compute current year, month, day of the month, hour, and minute
t=($(/bin/date +"%Y %m %d %H %M"));
year=${t[0]};
month=${t[1]};
day=${t[2]};
hour=${t[3]};
min=${t[4]};

function unlock {
rm -rf "${lockdir}"
}

# is another instance already running?
mkdir "${lockdir}" 2>/dev/null
if [ "$?" != "0" ]; then
PID=$(/bin/cat "$pid");
if /bin/kill -0 "$PID" >/dev/null 2>&1; then
exit;
fi
else
trap unlock QUIT TERM EXIT INT
echo "$$" > "${pid}";
fi

# if volume is not mounted, terminate
if ! /bin/mount | /bin/grep -q "${volume}"; then
/bin/echo "$0: snapshot aborted, ${volume} is not mounted";
exit;
fi

# update 'c' subvolume
if [ ! -d "${volume}/c" ]; then

Re: state of btrfs snapshot limitations?

2018-09-19 Thread Pete
On 09/19/2018 03:41 PM, Piotr Pawłow wrote:
> Hello,
>> If the limit is 100 or less I'd need use a more complicated
>> rotation scheme.
> 
> If you just want to thin them out over time without having selected "special" 
> monthly, yearly etc snapshots, then my favorite scheme is to just compare the 
> age of a snapshot to the distance to its neighbours, and if the distance is 
> less than age / constant then delete it. If the constant is, for example, 12, 
> then it will start thinning out hourly snapshots after around 12 hours, 
> monthly after 12 months etc.
> 
> This is how it looks after 2 years with daily snapshots and the constant=6:
> 
> backup-20160328143825
> backup-20161210043001

User not dev here, I thought I'd share my experience.  That scheme looks
really interesting.  Though how you'd achieve that sounds like it might
be a little complex, one piece of script covers it.

My approach is to snapshot on three timeframes, daily, weekly and
monthly.  I store approximately 30 days worth of daily snapshots, 1
years worth of weekly snapshots and 1 years worth of monthly snapshots.
On reflection however, if I retail 1 years worth of weekly then the
monthly snapshots are redundant.  Perhaps a little adjustment is in order.

However, there are pitfalls, which I ironically hit today.  This is not
a btrfs issue, but a simple consequence of snap-shotting a system with a
reasonable amount of changes - the volume of data stored grows owing to
the many changed files and hence free space reduces even if the 'master'
sub volumes are kept tidy.  It does not matter how tidy you keep the
'master', tidying the data, if the old redundant data is hiding in the
snapshots.  You need to delete it out of them as well which defeats the
idea of read only snapshots if you are using them.  For example today I
deleted redundant kernel modules from both the root subvolume and the
snapshots, and similar for /tmp, only then did I free up 55 GB (!) to
give myself some free space.  (I have been frequently updating my kernel
and left some debugging options on resulting in many copies of far
larger kernel modules than intended).

I've since made /tmp a subvolume to prevent snap-shotting to partially
mitigate this.  I'm wondering if I should make /lib/modules one for the
same reason.  In previous posts on this mailing list people have
recommended making various cache and tmp directories separate subvolumes
to reduce loss of available disk space by snapshotting churning files
that have little value for retention.  I'm wondering if some guide to
snapshotting is appropriate to make people aware of the management
actions that might be required?


> I have a horrid perl "one-liner" to do the thinning (caution! it deletes 
> subvolumes without asking!):
> 
> perl -e 'for(@ARGV){open($in,"-|",qw(btrfs subvolume 
> show),$_);$ts{$_}=(map{/: \t+(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} \+\d{4})$/ 
> or die "error parsing: $_\n";0+`date --date "$1" +%s` or die 
> $!}grep{/Creation 
> time:/}<$in>)[0]}@s=sort{$ts{$b}<=>$ts{$a}}keys%ts;while(@s>2){($s1,$s2,$s3)=@s;if(($ts{$s1}-$ts{$s3})/2<(time-$ts{$s2})/12){system(qw(btrfs
>  subvolume delete),$s2);$s[1]=$s1};shift@s}' [snapshot ...]
> 
> (hey, everything can be a one-liner if you allow unlimited line length!)
> 

Now I've seen it I can't un-see it!





state of btrfs snapshot limitations?

2018-09-19 Thread Piotr Pawłow
Hello,
> If the limit is 100 or less I'd need use a more complicated
> rotation scheme.

If you just want to thin them out over time without having selected "special" 
monthly, yearly etc snapshots, then my favorite scheme is to just compare the 
age of a snapshot to the distance to its neighbours, and if the distance is 
less than age / constant then delete it. If the constant is, for example, 12, 
then it will start thinning out hourly snapshots after around 12 hours, monthly 
after 12 months etc.

This is how it looks after 2 years with daily snapshots and the constant=6:

backup-20160328143825
backup-20161210043001
backup-20170424033001
backup-20170830033001
backup-20171102043001
backup-20180105043001
backup-20180310043001
backup-20180411033001
backup-20180513033001
backup-20180614033001
backup-20180630033001
backup-20180716033001
backup-20180801033001
backup-20180809033001
backup-20180817033001
backup-20180825033001
backup-20180829033001
backup-20180902033001
backup-20180906033001
backup-20180908033001
backup-20180910033001
backup-20180912033001
backup-20180914033001
backup-20180915033001
backup-20180916033001
backup-20180917033001
backup-20180918033001
backup-20180919033001

Notice how I have 6 daily snapshots (from 09-14 to 09-19), then I have at least 
1 snapshot from each month 6 months back (04 to 09) and I would have at least 1 
snapshot from each year for 6 years if I kept them longer. I delete the oldest 
snapshot when free space gets too low.

I have a horrid perl "one-liner" to do the thinning (caution! it deletes 
subvolumes without asking!):

perl -e 'for(@ARGV){open($in,"-|",qw(btrfs subvolume show),$_);$ts{$_}=(map{/: 
\t+(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} \+\d{4})$/ or die "error parsing: 
$_\n";0+`date --date "$1" +%s` or die $!}grep{/Creation 
time:/}<$in>)[0]}@s=sort{$ts{$b}<=>$ts{$a}}keys%ts;while(@s>2){($s1,$s2,$s3)=@s;if(($ts{$s1}-$ts{$s3})/2<(time-$ts{$s2})/12){system(qw(btrfs
 subvolume delete),$s2);$s[1]=$s1};shift@s}' [snapshot ...]

(hey, everything can be a one-liner if you allow unlimited line length!)

I will take this opportunity to tidy it up a bit (below). Maybe someone else 
will find it useful or have some ideas for improvements. I would really like to 
avoid parsing "btrfs subvolume show" output (maybe python-btrfs can read 
subvolume creation time?)

#!/usr/bin/perl
use strict;
use warnings;

# map snapshot names to timestamps
my %ts;
for (@ARGV) {
    # run "btrfs subvolume show" for each snapshot
    open( my $in, "-|", qw(btrfs subvolume show), $_ );
    # convert "Creation time" from btrfs output to timestamp
    $ts{$_} = (
    map {
    /: \t+(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} \+\d{4})$/
  or die "error parsing: $_\n";
    # using "date" command to parse "Creation time"
    # who needs modules from CPAN right? ;)
    0 + `date --date "$1" +%s` or die $!
    } grep { /Creation time:/ } <$in>
    )[0];
}
# sort snapshot names by timestamps
my @s = sort { $ts{$b} <=> $ts{$a} } keys %ts;
while ( @s > 2 ) {
    my ( $s1, $s2, $s3 ) = @s;
    # compare average distance to age / 12
    # change 12 to some other value to keep more or less snapshots
    # higher value = more snapshots
    if ( ( $ts{$s1} - $ts{$s3} ) / 2 < ( time - $ts{$s2} ) / 12 ) {
    # caution! this runs "btrfs subvolume delete"
    # you can put "echo" before "btrfs" for a "dry run"
    system( qw(btrfs subvolume delete), $s2 );
    # we deleted $s2 snapshot, so replace $s2 with $s1
    $s[1] = $s1;
    }
    shift @s;
}



Re: state of btrfs snapshot limitations?

2018-09-15 Thread Qu Wenruo


On 2018/9/15 上午5:05, James A. Robinson wrote:
> The mail archive seems to indicate this list is appropriate
> for not only the technical coding issues, but also for user
> questions, so I wanted to pose a question here.  If I'm
> wrong about that, I apologize in advance.
> 
> The page
> 
> https://btrfs.wiki.kernel.org/index.php/Incremental_Backup
> 
> talks about the basic snapshot capabilities of btrfs and led
> me to look up what, if any, limits might apply.  I find some
> threads from a few years ago that talk about limiting the
> number of snapshots for a volume to 100.

This is mostly related to send and quota, and maybe for
snapshot/subvolume removal.
(Any personally I would recommend only 20 snapshots)

Both of them needs to do backref walk in their core functionality.
Increased number of reference introduced by snapshots could bring a huge
impact to quota especially.

We have some plan to enhance it, but for now if send/quota is important
to you, it's highly recommend to limit number of snapshots to a
reasonable number.

> 
> The reason I'm curious is I wanted to try and use the
> snapshot capability as a way of keeping a 'history' of a
> backup volume I maintain.  The backup doesn't change a
> lot overtime, but small changes are made to files within
> it daily.

Then normally it could lead to some dilemma.

Currently the mostly common way to know how much exclusively used space
one snapshot uses is btrfs quota (qgroup).
But a lot of snapshots bring tons of performance impact, sometimes even
unacceptable.

If one don't need to account how much space a snapshot really take, it
won't be a problem though.


Despite above things, I'd like to point out that, snapshot is not backup
(which I believe everyone should have already known).

And further more for btrfs specifically, since file trees (snapshots and
subvolumes) still share the same chunk/extent/csum trees, if one of such
essential trees gets corrupted (especially for extent tree), you may not
be able to mount the fs (at least unable to do RW mount).

So it's still pretty important to take real backup.

Thanks,
Qu

> 
> The Plan 9 OS has a nice archival filesystem that lets you
> easily maintain snapshots, and has various tools that make
> it simple to keep a /snapshot//mmdd snapshot going back
> for the life of the filesystem.
> 
> I wanted to try and replicate the basic functionality of
> that history using a non-plan-9 filesystem.  At first I
> tried rsnapshot but I find its technique of rotating and
> deleting backups is thrashing the disks to the point that it
> can't keep up with the rotations (the cp -al is fast, but
> the periodic rm -rf of older snapshots kills the disk).
> 
> With btrfs I was thinking perhaps I could more efficiently
> maintain the archive of changes over time using a snapshot.
> If this is an awful thought and I should just go away,
> please let me know.
> 
> If the limit is 100 or less I'd need use a more complicated
> rotation scheme.  For example with a layout like the
> following:
> 
> min/
> hour/
> day/
> month/
> year/
> 
> The idea being each bucket, min, hour, day, month, would
> be capped and older snapshots would be removed and replaced
> with newer ones over time.
> 
> so with a 15-minute snapshot cycle I'd end up with
> 
> min/[00,15,30,45]
> hour/[00-23]
> day/[01-31]
> month/[01-12]
> year/[2018,2019,...]
> 
> (72+ snapshots with room for a few years worth of yearly's).
> 
> But if things have changed with btrfs over the past few
> years and number of snapshots scales much higher, I would
> use the easier scheme:
> 
> /min/[00,15,30,45]
> /hourly/[00-23]
> /daily//
> 
> with 365 snapshots added per additional year.
> 



signature.asc
Description: OpenPGP digital signature


Re: state of btrfs snapshot limitations?

2018-09-15 Thread Hans van Kranenburg
On 09/15/2018 05:56 AM, James A. Robinson wrote:
> [...]
> 
> I've got to read up a bit more on subvolumes, I am missing some
> context from the warnings given by Chris regarding per-subvolume
> options.

Btrfs lets you mount the filesystem multiple times, e.g. with a
different subvolume id, so you can mount part of the filesystem somewhere.

Some of the mount options (many btrfs specific ones, like space_cache*,
autodefrag etc) get a value when doing the first mount, and subsequent
ones cannot change them any more, because they're filesystem wide behavior.

Some others can be changed on each individual mount (like the atime
options), and when omitting them you get the non-optimal default again.

-- 
Hans van Kranenburg


Re: state of btrfs snapshot limitations?

2018-09-14 Thread James A. Robinson
Wow, thank to everyone for all that information, I'm going to have to
take some time to digest everything. :)

I just wanted to quickly say one thing: As Duncan surmised, I'm not
treating this as my primary backup, but more of an experimental add-on
feature.  The primary backup goes to an ext4 partition that is then
rsynced to a 'current' btrfs subvolume, the latter is what I'm taking
snapshots of.  These both share a RAID 1+0 enclosure.  The partitions
are oversized, 3TB each, so based on this thread that size is something
to watch out for if I use up a lot more of the space.

For other backups I've also got a Amazon Glacier based backup (large
"offsite" but painful to recover with) via Arq, and an Apple
TimeCapsule backup (easy point-and-click restore for a single item, but
damn flakey and painfully slow for anything complicated).  So I'm not
trusting any one device or scheme.

As an aside, one of the things I loved about reading the Plan 9 papers
was their description of their three-tier storage.  Where they had "infinite"
long term storage via a WORM array, SCSI disks on big servers for the
"cache," and then local disk / memory for the files they were working
on right at that moment (http://doc.cat-v.org/plan_9/misc/cw/cw.pdf).

I do like the idea of periodic "copy to a clean disk" and "mkfs of the
old disk" scheme instead of a complicated in-place rebuilding and/or
defragmentation, I think I have enough capacity that I will probably
try that if/when it becomes necessary.

I've got to read up a bit more on subvolumes, I am missing some
context from the warnings given by Chris regarding per-subvolume
options.


Re: state of btrfs snapshot limitations?

2018-09-14 Thread Duncan
James A. Robinson posted on Fri, 14 Sep 2018 14:05:29 -0700 as excerpted:

> The mail archive seems to indicate this list is appropriate for not only
> the technical coding issues, but also for user questions, so I wanted to
> pose a question here.  If I'm wrong about that, I apologize in advance.

User questions are fine here.  In fact, there are a number of non-dev 
regulars here who normally take the non-dev level questions.  I'm one of 
them. =:^)

> The page
> 
> https://btrfs.wiki.kernel.org/index.php/Incremental_Backup
> 
> talks about the basic snapshot capabilities of btrfs and led me to look
> up what, if any, limits might apply.  I find some threads from a few
> years ago that talk about limiting the number of snapshots for a volume
> to 100.

Btrfs is optimized to make snapshotting very fast -- on an atomic copy-on-
write tree-based filesystem like btrfs it's pretty much just taking a new 
reference pointing at the current tree head so nothing in it disappears, 
and that's very fast -- but maintenance that works with existing 
snapshots (and other references) is often slower and doesn't always scale 
so nicely.  While from btrfs' perspective there's nothing "magical" about 
the number 100, in human terms it is of course easy to remember, and it's 
very roughly where the number of snapshots starts to take its toll on the 
time required for various filesystem maintenance tasks, including 
deleting snapshots, balance, fsck, quota maintenance, etc.

So the number of snapshots you can get away with depends primarily on 
three things:

1) Easiest and biggest factor:  If you don't need quotas, simply keeping 
that functionality turned off makes a big difference, and if you /do/ 
need them, turning them off temporarily for maintenance such as a 
rebalance, then doing a quota rescan when the balance is completed, can 
be the difference between a balance taking days or weeks with quotas on 
and constantly updating during the balance, vs. hours to a couple days 
turning quotas off during the balance.  There have been quite a number of 
people who have posted questions about balance not being practical (or 
even thinking it was hung) as it was taking "forever", that found simply 
turning quotas off (sometimes they didn't even know they were on, it was 
a distro setting) fixed the problem and that balance completed in a 
reasonable time after that.

(There have recently been patches to avoid some of the worst constant 
rescanning during balance, but as my own use-case doesn't require either 
quotas or snapshotting, I'm not following their status, and if quotas 
aren't required keeping them off will remain simplest and most efficient 
in any case.)

2) Use-case need for maintenance:  While (almost) any periodic-
snapshotting use-case is going to need snapshot thinning and thus 
snapshot removal as routine maintenance, some use-cases, particularly at 
the large scale, aren't going to find less routine maintenance tasks like 
full balance (converting between raid levels or adding/deleting devices 
to/from an existing filesystem) or check --repair, etc, useful; they'll 
simply swap in a hot-spare backup and mkfs the former working copy they 
would have otherwise needed maintenance on, because it's easier/simpler/
faster for them than trying to repair or change the device config of the 
existing filesystem, and their operating parameters already require the 
hot-spare resources for other reasons.

This is likely why a working fsck repair mechanism wasn't a high priority 
early on, and why it still has "holes" in the types of damage it can 
repair.  The big users such as facebook and oracle funding development 
simply don't find that sort of functionality useful as they hot-swap 
instead.  

But even for more "normal/personal" use-cases, if adding a device and 
rebalancing to make efficient use of it, or if repairing a broken 
filesystem when you already have the valuable stuff on it backed up 
anyway, is going to take days, with no guarantee all the problems will be 
fixed in any case for the repair case, even if it's going to take 
dropping by the local computer/electronics (super-)store for a new disk 
or three (remember the multi-device case), it may well make more sense to 
do that then to take days doing the repair/device-add with the existing 
filesystem.

Obviously if you aren't going to be repairing the filesystem or adding/
removing devices, the time that takes isn't a factor you need to worry 
about, and snapshot-deletion times are likely to be the only thing you 
need to worry about in terms of snapshot numbers scaling.

3) Backing-device speed, ssd vs. spinning-rust, etc, matters, but not as 
much as you might think, because for some filesystem maintenance 
operations, particularly with large numbers of snapshots/reflinks, parts 
of them are cpu- or memory-bound, not IO-bound.


So while 100 snapshots is a convenient number as a recommendation, it 
really depends.  On slow systems with quotas on and 

Re: state of btrfs snapshot limitations?

2018-09-14 Thread Chris Murphy
On Fri, Sep 14, 2018 at 3:05 PM, James A. Robinson
 wrote:

> https://btrfs.wiki.kernel.org/index.php/Incremental_Backup
>
> talks about the basic snapshot capabilities of btrfs and led
> me to look up what, if any, limits might apply.  I find some
> threads from a few years ago that talk about limiting the
> number of snapshots for a volume to 100.

It does seem variable and I'm not certain what the pattern is that
triggers pathological behavior. There's a container thread about a
year ago with someone using docker on Btrfs with more than 100K
containers, per day, but I don't know the turn over rate. That person
does say it's deletion that's expensive but not intolerably so.

My advice is you come up with as many strategies as you can implement.
Because if one strategy starts to implode with terrible performance,
you can just bail on it (or try fixing it, or submitting bug reports
to make Btrfs better down the road, etc.), and yet you still have one
or more other strategies that are still viable.

By strategy, you might want to implement both your ideal and
conservative approaches, and also something in the middle. Also, it's
reasonable to mirror those strategies on a different storage stack,
e.g. LVM thin volumes and XFS. LVM thin volumes are semi-cheap to
create, and semi-cheap to delete; where Btrfs snapshots are almost
free to create, and expensive to delete (varies depending on changes
in it or the subvolume it's created from). But if the LVM thin pool's
metadata pool runs out of space, it's big trouble. I expect to lose
all the LV's if that ever happens. Also, this strategy doesn't have
send/receive, so ordinary use of rsync is expensive since it reads and
compares both source and destination. The first answer for this
question contains a possible work around depending on hard links.

https://serverfault.com/questions/489289/handling-renamed-files-or-directories-in-rsync


With Btrfs big issues for scalability are the extent tree, which is
shared among all snapshots and subvolumes. Therefore, the bigger the
file system gets, in effect the more fragile the extent tree becomes.
The other thing is btrfs check is super slow with large volumes, some
people have dozen or more TiB file systems that take days to check.

I also agree with the noatime suggestion from Hans. Note this is a per
subvolume mount time option, so if you're using the subvol= or
subvolid= mount options, you need to noatime every time, once per file
system isn't enough.



-- 
Chris Murphy


Re: state of btrfs snapshot limitations?

2018-09-14 Thread James A. Robinson
Thanks very much for the useful information.  I'll give the simple
scheme a try, after I adjust mount preferences.

Jim


Re: state of btrfs snapshot limitations?

2018-09-14 Thread Hans van Kranenburg
Hi,

On 09/14/2018 11:05 PM, James A. Robinson wrote:
> The mail archive seems to indicate this list is appropriate
> for not only the technical coding issues, but also for user
> questions, so I wanted to pose a question here.  If I'm
> wrong about that, I apologize in advance.

It's fine. Your observation is correct.

> The page
> 
> https://btrfs.wiki.kernel.org/index.php/Incremental_Backup
> 
> talks about the basic snapshot capabilities of btrfs and led
> me to look up what, if any, limits might apply.  I find some
> threads from a few years ago that talk about limiting the
> number of snapshots for a volume to 100.
> 
> The reason I'm curious is I wanted to try and use the
> snapshot capability as a way of keeping a 'history' of a
> backup volume I maintain.  The backup doesn't change a
> lot overtime, but small changes are made to files within
> it daily.

The 100 above is just a number because users ask "ok, but *how* many?".

As far as I know, the real thing that is causing complexity for the
filesystem is how much actual changes are being done to the subvolume
all the time, after it's being snapshotted.

Creating btrfs snapshots is cheap. Only as soon as you start making
modifications, the subvolume in which you make the changes is going to
diverge from the other ones which share the same history. And changes
mean not only changes to data (changing, adding removing files), but
also pure metadata changes (e.g. using touch command on a file).

When just using the snapshots, opening and reading files etc, this
should however not be a big problem.

But, other btrfs specific actions are affected, like balance and device
remove, using quota.

In any case, make sure:
- you are not using quota / qgroups (highly affected by this sort of
complexity)
- you *always* mount with noatime (which is not the default, and yes,
noatime, not relatime or anything else) to prevent unnessary changes on
metadata which unnecessarily cause exactly this kind of complexity to
happen.

When doing this, and not having to use btrfs balance and add/remove
disks, and if the data doesn't change much over time (especially if it's
just adding new stuff all the time), you are likely able to have far
more snapshots of the thing.

> The Plan 9 OS has a nice archival filesystem that lets you
> easily maintain snapshots, and has various tools that make
> it simple to keep a /snapshot//mmdd snapshot going back
> for the life of the filesystem.
> 
> I wanted to try and replicate the basic functionality of
> that history using a non-plan-9 filesystem.  At first I
> tried rsnapshot but I find its technique of rotating and
> deleting backups is thrashing the disks to the point that it
> can't keep up with the rotations (the cp -al is fast, but
> the periodic rm -rf of older snapshots kills the disk).

Yes, btrfs snapshots are already a huge improvement compared to that.
(Also, cp -l causes a modifications to also be done in the "snapshots",
because it's still the same file, b)

> With btrfs I was thinking perhaps I could more efficiently
> maintain the archive of changes over time using a snapshot.
> If this is an awful thought and I should just go away,
> please let me know.
> 
> If the limit is 100 or less I'd need use a more complicated
> rotation scheme.  For example with a layout like the
> following:
> 
> min/
> hour/
> day/
> month/
> year/
> 
> The idea being each bucket, min, hour, day, month, would
> be capped and older snapshots would be removed and replaced
> with newer ones over time.
> 
> so with a 15-minute snapshot cycle I'd end up with
> 
> min/[00,15,30,45]
> hour/[00-23]
> day/[01-31]
> month/[01-12]
> year/[2018,2019,...]
> 
> (72+ snapshots with room for a few years worth of yearly's).
> 
> But if things have changed with btrfs over the past few
> years and number of snapshots scales much higher, I would
> use the easier scheme:
> 
> /min/[00,15,30,45]
> /hourly/[00-23]
> /daily//
> 
> with 365 snapshots added per additional year.

There are tools available that can do this for you. The one I use is
btrbk, https://github.com/digint/btrbk (probably packaged in your
favorite linux distro).

I'd say, just try it. Add a snapshot schedule in your btrbk config, and
set it to never expire older ones. Then, just see what happens, and only
if you start seeing things slow down a lot, start worrying about what to
do, and let us know how far you got.

Have fun,

P.S. Here's an unfinished page from a tutorial that I'm writing that is
still heavily under construction, which touches the subject of
snapshotting data and metadata. Maybe it might help to explain
"complexity starts when changing things" more:

https://github.com/knorrie/python-btrfs/blob/tutorial/tutorial/cows.md

-- 
Hans van Kranenburg


state of btrfs snapshot limitations?

2018-09-14 Thread James A. Robinson
The mail archive seems to indicate this list is appropriate
for not only the technical coding issues, but also for user
questions, so I wanted to pose a question here.  If I'm
wrong about that, I apologize in advance.

The page

https://btrfs.wiki.kernel.org/index.php/Incremental_Backup

talks about the basic snapshot capabilities of btrfs and led
me to look up what, if any, limits might apply.  I find some
threads from a few years ago that talk about limiting the
number of snapshots for a volume to 100.

The reason I'm curious is I wanted to try and use the
snapshot capability as a way of keeping a 'history' of a
backup volume I maintain.  The backup doesn't change a
lot overtime, but small changes are made to files within
it daily.

The Plan 9 OS has a nice archival filesystem that lets you
easily maintain snapshots, and has various tools that make
it simple to keep a /snapshot//mmdd snapshot going back
for the life of the filesystem.

I wanted to try and replicate the basic functionality of
that history using a non-plan-9 filesystem.  At first I
tried rsnapshot but I find its technique of rotating and
deleting backups is thrashing the disks to the point that it
can't keep up with the rotations (the cp -al is fast, but
the periodic rm -rf of older snapshots kills the disk).

With btrfs I was thinking perhaps I could more efficiently
maintain the archive of changes over time using a snapshot.
If this is an awful thought and I should just go away,
please let me know.

If the limit is 100 or less I'd need use a more complicated
rotation scheme.  For example with a layout like the
following:

min/
hour/
day/
month/
year/

The idea being each bucket, min, hour, day, month, would
be capped and older snapshots would be removed and replaced
with newer ones over time.

so with a 15-minute snapshot cycle I'd end up with

min/[00,15,30,45]
hour/[00-23]
day/[01-31]
month/[01-12]
year/[2018,2019,...]

(72+ snapshots with room for a few years worth of yearly's).

But if things have changed with btrfs over the past few
years and number of snapshots scales much higher, I would
use the easier scheme:

/min/[00,15,30,45]
/hourly/[00-23]
/daily//

with 365 snapshots added per additional year.