Re: efficiency of btrfs cow

2011-03-23 Thread Brian J. Murrell
On 11-03-06 11:06 AM, Calvin Walton wrote:
 
 To see exactly what's going on, you should use the btrfs filesystem df
 command to see how space is being allocated for data and metadata
 separately:

OK.  So with an empty filesystem, before my first copy (i.e. the base on
which the next copy will CoW from) df reports:

Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/mapper/btrfs--test-btrfs--test
 92274688056 922746824   1% /mnt/btrfs-test

and btrfs fi df reports:

Data: total=8.00MB, used=0.00
Metadata: total=1.01GB, used=24.00KB
System: total=12.00MB, used=4.00KB

after the first copy df and btrfs fi df report:

Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/mapper/btrfs--test-btrfs--test
 922746880 121402328 801344552  14% /mnt/btrfs-test

root@linux:/mnt/btrfs-test# cat .snapshots/monthly.22/metadata/btrfs_df-stop
Data: total=110.01GB, used=109.26GB
Metadata: total=5.01GB, used=3.26GB
System: total=12.00MB, used=24.00KB

So it's clear that total usage (as reported by df) was 121,402,328KB but
Metadata has two values:

Metadata: total=5.01GB, used=3.26GB

What's the difference between total and used?  And for that matter,
what's the difference between the total and used for Data
(total=110.01GB, used=109.26GB)?

Even if I take the largest values (i.e. the total values) for Data and
Metadata (each converted to KB first) and add them up they are:
120,607,211.52 which is not quite the 121,402,328 that df reports.
There is a 795,116.48KB discrepancy.

In any case, which value from a btrfs df fi should I be subtracting from
df's accounting to get a real accounting of the amount of data used?

Cheers,
b.



signature.asc
Description: OpenPGP digital signature


Re: efficiency of btrfs cow

2011-03-23 Thread Chester
I'm not a developer, but I think it goes something like this:
btrfs doesn't write the filesystem on the entire device/partition at
format time, rather, it dynamically increases the size of the
filesystem as data is used. That's why formating a disk in btrfs can
be so fast.

On Wed, Mar 23, 2011 at 12:39 PM, Brian J. Murrell
br...@interlinx.bc.ca wrote:

 On 11-03-06 11:06 AM, Calvin Walton wrote:
 
  To see exactly what's going on, you should use the btrfs filesystem df
  command to see how space is being allocated for data and metadata
  separately:

 OK.  So with an empty filesystem, before my first copy (i.e. the base on
 which the next copy will CoW from) df reports:

 Filesystem           1K-blocks      Used Available Use% Mounted on
 /dev/mapper/btrfs--test-btrfs--test
                     922746880        56 922746824   1% /mnt/btrfs-test

 and btrfs fi df reports:

 Data: total=8.00MB, used=0.00
 Metadata: total=1.01GB, used=24.00KB
 System: total=12.00MB, used=4.00KB

 after the first copy df and btrfs fi df report:

 Filesystem           1K-blocks      Used Available Use% Mounted on
 /dev/mapper/btrfs--test-btrfs--test
                     922746880 121402328 801344552  14% /mnt/btrfs-test

 root@linux:/mnt/btrfs-test# cat .snapshots/monthly.22/metadata/btrfs_df-stop
 Data: total=110.01GB, used=109.26GB
 Metadata: total=5.01GB, used=3.26GB
 System: total=12.00MB, used=24.00KB

 So it's clear that total usage (as reported by df) was 121,402,328KB but
 Metadata has two values:

 Metadata: total=5.01GB, used=3.26GB

 What's the difference between total and used?  And for that matter,
 what's the difference between the total and used for Data
 (total=110.01GB, used=109.26GB)?

 Even if I take the largest values (i.e. the total values) for Data and
 Metadata (each converted to KB first) and add them up they are:
 120,607,211.52 which is not quite the 121,402,328 that df reports.
 There is a 795,116.48KB discrepancy.

 In any case, which value from a btrfs df fi should I be subtracting from
 df's accounting to get a real accounting of the amount of data used?

 Cheers,
 b.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: efficiency of btrfs cow

2011-03-23 Thread Brian J. Murrell
On 11-03-23 11:53 AM, Chester wrote:
 I'm not a developer, but I think it goes something like this:
 btrfs doesn't write the filesystem on the entire device/partition at
 format time, rather, it dynamically increases the size of the
 filesystem as data is used. That's why formating a disk in btrfs can
 be so fast.

Indeed, this much is understood, which is why I am using btrfs fi df to
try to determine how much of the increase in raw device usage is the
dynamic allocation of metadata.

Cheers,
b.



signature.asc
Description: OpenPGP digital signature


Re: efficiency of btrfs cow

2011-03-23 Thread Kolja Dummann
 So it's clear that total usage (as reported by df) was 121,402,328KB but
 Metadata has two values:

 Metadata: total=5.01GB, used=3.26GB

 What's the difference between total and used?  And for that matter,
 what's the difference between the total and used for Data
 (total=110.01GB, used=109.26GB)?


total is the space allocated (reserved) for a kind usage (metadata or
data) the space allocated for a kind of usage can't be used for
something else. The used value is the space that is used from the
space that has been allocated for a kind of usage.

The wiki gives you a overview how to interpret the values:

https://btrfs.wiki.kernel.org/index.php/FAQ#btrfs_filesystem_df_.2Fmountpoint

cheers Kolja.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


efficiency of btrfs cow

2011-03-06 Thread Brian J. Murrell
I have a backup volume on an ext4 filesystem that is using rsync and
it's --link-dest option to create hard-linked incremental backups.  I
am sure everyone here is familiar with the technique but in case anyone
isn't basically it's effectively doing (each backup):

# cp -al /backup/previous-backup/ /backup/current-backup
# rsync -aAHX ... --exclude /backup / /backup/current-backup

The shortcoming of this of course is that it just takes 1 byte in a
(possibly huge) file to require that the whole file be recopied to the
backup.

btrfs and it's CoW capability to the rescue -- again, no surprise to
anyone here.

So I replicated a few of the directories in my backup volume to a btrfs
volume using snapshots for each backup to take advantage of CoW and with
any luck, avoid entire file duplication where only some subset of the
file has changed.

Overall, it seems that I saw success.  Most backups on btrfs were
smaller than their source, and overall, for all of the backups
replicated, the use was less.  There were some however that were
significantly larger.  Here's the analysis:

  Backup  btrfs  ext4
  --  -  
monthly.22:  112GiB 113GiB  98%
monthly.21:   14GiB  14GiB  95%
monthly.20:   19GiB  20GiB  94%
monthly.19:   12GiB  13GiB  94%
monthly.18:5GiB   6GiB  87%
monthly.17:   11GiB  12GiB  92%
monthly.16:8GiB  10GiB  82%
monthly.15:   16GiB  11GiB 146%
monthly.14:   19GiB  20GiB  94%
monthly.13:   21GiB  22GiB  96%
monthly.12:   61GiB  67GiB  91%
monthly.11:   24GiB  22GiB 106%
monthly.10:   22GiB  19GiB 114%
 monthly.9:   12GiB  13GiB  90%
 monthly.8:   15GiB  17GiB  91%
 monthly.7:9GiB  11GiB  87%
 monthly.6:8GiB   9GiB  85%
 monthly.5:   16GiB  18GiB  91%
 monthly.4:   13GiB  15GiB  89%
 monthly.3:   11GiB  19GiB  62%
 monthly.2:   29GiB  22GiB 134%
 monthly.1:   23GiB  24GiB  94%
 monthly.0:5GiB   5GiB  94%
 Total:  497GiB 512GiB  96%

btrfs use is a calculation of the df value of the fileystem before and
after each backup.  ext4 (rsync, really) use is calculated with du
-xks on the whole backup volume, which as you know only counts a
multiply hard-linked file's space use once.

So as you can see, for the most part, btrfs and CoW was more efficient,
but in some cases (i.e. monthly.15, monthly.11, monthly.10, monthly.2)
it was less efficient.

Taking the biggest anomaly, monthly.15, a du of just that directory on
both the btrfs and ext4 filesystems shows results I would expect:

btrfs: 136,876,580 monthly.15
ext4:  142,153,928 monthly.15

Yet the before and after df results show the btrfs usage higher than
ext4.  Is there some periodic jump in overhead used by btrfs that
would account for this mysterious increased usage in some of the copies?

Any other ideas for the anomalous results?

Cheers,
b.



signature.asc
Description: OpenPGP digital signature


Re: efficiency of btrfs cow

2011-03-06 Thread Calvin Walton
On Sun, 2011-03-06 at 10:46 -0500, Brian J. Murrell wrote:
 I have a backup volume on an ext4 filesystem that is using rsync and
 it's --link-dest option to create hard-linked incremental backups.  I
 am sure everyone here is familiar with the technique but in case anyone
 isn't basically it's effectively doing (each backup):

 So I replicated a few of the directories in my backup volume to a btrfs
 volume using snapshots for each backup to take advantage of CoW and with
 any luck, avoid entire file duplication where only some subset of the
 file has changed.
 
 Overall, it seems that I saw success.  Most backups on btrfs were
 smaller than their source, and overall, for all of the backups
 replicated, the use was less.  There were some however that were
 significantly larger.  Here's the analysis:

 Taking the biggest anomaly, monthly.15, a du of just that directory on
 both the btrfs and ext4 filesystems shows results I would expect:
 
 btrfs: 136,876,580 monthly.15
 ext4:  142,153,928 monthly.15
 
 Yet the before and after df results show the btrfs usage higher than
 ext4.  Is there some periodic jump in overhead used by btrfs that
 would account for this mysterious increased usage in some of the copies?

There actually is such a periodic jump in overhead, caused by the way
which btrfs dynamically allocates space for metadata as needed by the
creation of new files, which it does whenever the free metadata space
ratio reaches a threshold (it's probably more complicated than that, but
close enough for now).

To see exactly what's going on, you should use the btrfs filesystem df
command to see how space is being allocated for data and metadata
separately:

ayu ~ # btrfs fi df /
Data: total=266.01GB, used=249.35GB
System, DUP: total=8.00MB, used=36.00KB
Metadata, DUP: total=3.62GB, used=1.93GB
ayu ~ # df -h /
FilesystemSize  Used Avail Use% Mounted on
/dev/sda4 402G  254G  145G  64% /

If you use the btrfs tool's df command to account for space in your
testing, you should get much more accurate results.

-- 
Calvin Walton calvin.wal...@kepstin.ca

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: efficiency of btrfs cow

2011-03-06 Thread Brian J. Murrell
On 11-03-06 11:06 AM, Calvin Walton wrote:
 
 There actually is such a periodic jump in overhead,

Ahh.  So my instincts were correct.

 caused by the way
 which btrfs dynamically allocates space for metadata as needed by the
 creation of new files, which it does whenever the free metadata space
 ratio reaches a threshold (it's probably more complicated than that, but
 close enough for now).

Sounds fair enough.

 To see exactly what's going on, you should use the btrfs filesystem df
 command to see how space is being allocated for data and metadata
 separately:
 
 ayu ~ # btrfs fi df /
 Data: total=266.01GB, used=249.35GB
 System, DUP: total=8.00MB, used=36.00KB
 Metadata, DUP: total=3.62GB, used=1.93GB
 ayu ~ # df -h /
 FilesystemSize  Used Avail Use% Mounted on
 /dev/sda4 402G  254G  145G  64% /
 
 If you use the btrfs tool's df command to account for space in your
 testing, you should get much more accurate results.

Indeed!  Unfortunately that tool seems to be completely silent on my system:

# btrfs filesystem df /mnt/btrfs-test/
# btrfs filesystem df /mnt/btrfs-test

Where /mnt/btrfs-test is where I have the device that I created the
btrfs filesystem on mounted.  i.e.:

# grep btrfs /proc/mounts
/dev/mapper/btrfs--test-btrfs--test /mnt/btrfs-test btrfs rw,relatime 0 0

My btrfs-tools appears to be from 20101101.  The changelog says:

  * Merging upstream version 0.19+20101101.

Cheers,
b.



signature.asc
Description: OpenPGP digital signature


Re: efficiency of btrfs cow

2011-03-06 Thread Calvin Walton
On Sun, 2011-03-06 at 23:02 +0700, Fajar A. Nugraha wrote:
 On Sun, Mar 6, 2011 at 10:46 PM, Brian J. Murrell br...@interlinx.bc.ca 
 wrote:
  # cp -al /backup/previous-backup/ /backup/current-backup
  # rsync -aAHX ... --exclude /backup / /backup/current-backup
 
  The shortcoming of this of course is that it just takes 1 byte in a
  (possibly huge) file to require that the whole file be recopied to the
  backup.
 
 If you have snapshots anyway, why not :
 - create a snapshot before each backup run
 - use the same directory (e.g. just /backup), no need to cp anything
 - add --inplace to rsync

To add a bit to this: if you *do not* use the --inplace option on rsync,
rsync will rewrite the entire file, instead of updating the existing
file!
This of course negates some of the benefits of btrfs's COW support when
doing incremental backups.

-- 
Calvin Walton calvin.wal...@kepstin.ca

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: efficiency of btrfs cow

2011-03-06 Thread Brian J. Murrell
On 11-03-06 11:17 AM, Calvin Walton wrote:
 
 To add a bit to this: if you *do not* use the --inplace option on rsync,
 rsync will rewrite the entire file, instead of updating the existing
 file!

Of course.  As I mentioned to Fajar previously, I am indeed using
--inplace when copying from the existing archive to the new btrfs archive.

 This of course negates some of the benefits of btrfs's COW support when
 doing incremental backups.

Absolutely.

b.




signature.asc
Description: OpenPGP digital signature


Re: efficiency of btrfs cow

2011-03-06 Thread Brian J. Murrell
On 11-03-06 11:02 AM, Fajar A. Nugraha wrote:
 
 If you have snapshots anyway, why not :
 - create a snapshot before each backup run
 - use the same directory (e.g. just /backup), no need to cp anything
 - add --inplace to rsync

Which is exactly what I am doing.  There is no cp involved in making
the btrfs copies of the existing backup.  It's simply rsync -aAXH ...
--inplace from the existing backup archive to the new, btrfs archive.

Cheers,
b.




signature.asc
Description: OpenPGP digital signature


Re: efficiency of btrfs cow

2011-03-06 Thread Freddie Cash
On Sun, Mar 6, 2011 at 8:02 AM, Fajar A. Nugraha l...@fajar.net wrote:
 On Sun, Mar 6, 2011 at 10:46 PM, Brian J. Murrell br...@interlinx.bc.ca 
 wrote:
 # cp -al /backup/previous-backup/ /backup/current-backup
 # rsync -aAHX ... --exclude /backup / /backup/current-backup

 The shortcoming of this of course is that it just takes 1 byte in a
 (possibly huge) file to require that the whole file be recopied to the
 backup.

 If you have snapshots anyway, why not :
 - create a snapshot before each backup run
 - use the same directory (e.g. just /backup), no need to cp anything
 - add --inplace to rsync

You may also want to test with/without --no-whole-file as well.
That's most useful when the two filesystems are on the same system and
should reduce the amount of data copied around, as it forces rsync to
only use file deltas.  This is very much a win on ZFS, which is also
CoW, so it should be a win on Btrfs.


-- 
Freddie Cash
fjwc...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html