Re: Various Questions

2011-01-09 Thread Carl Cook

I'd rather not do the copy again unless necessary, as it took a day.

Directories look identical, but who knows?  I'm going to try and figure out how 
to do a file-by-file crc check, for peace of mind.


On Sat 08 January 2011 17:26:25 Freddie Cash wrote:
 On Sat, Jan 8, 2011 at 5:25 AM, Carl Cook cac...@quantum-sci.com wrote:
 
  In addition to the questions below, if anyone has a chance could you advise 
  on why my destination drive has more data  than the source after this 
  command:
  # rsync --hard-links --delete --inplace --archive --numeric-ids 
  /media/disk/* /home
  sending incremental file list
 
 What happens if you delete /home, then run the command again, but
 without the *?  You generally don't use wildcards for the source or
 destination when using rsync.  You just tell it which directory to
 start in.
 
 If you do an ls /home and ls /media/disk are they different?
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Various Questions

2011-01-09 Thread Alan Chandler



On 09/01/11 13:37, Fajar A. Nugraha wrote:

On Sun, Jan 9, 2011 at 8:16 PM, Carl Cookcac...@quantum-sci.com  wrote:
   

I'd rather not do the copy again unless necessary, as it took a day.

Directories look identical, but who knows?  I'm going to try and figure out how 
to do a file-by-file crc check, for peace of mind.
 

try du --apparent-size -slh
It should rule out any differences caused by sparse files and hardlinks.

   


On Sat 08 January 2011 17:26:25 Freddie Cash wrote:
 

On Sat, Jan 8, 2011 at 5:25 AM, Carl Cookcac...@quantum-sci.com  wrote:
   

In addition to the questions below, if anyone has a chance could you advise on 
why my destination drive has more data  than the source after this command:
# rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* 
/home
 

Are you SURE you don't get the command mixed up? The last argument to
rsync should be the destination. Your command looks like you're
copying things to /home.
   


What is also important is that use of * - it means all the . files at 
the top level are NOT being copied


rsync is clever enough to notice if you have the / at the end of the 
source to know whether you want the directory to be put into the 
destination or the contents of the directory.  The / at the end of the 
source means copy the contents.


This could be (I am not sure of the exact scope of --delete) the reason 
why the destination has more data than the source.  If --delete is not 
deleting /home/.* files (if there any there).


--
Alan Chandler
http://www.chandlerfamily.org.uk

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Various Questions

2011-01-08 Thread Carl Cook

In addition to the questions below, if anyone has a chance could you advise on 
why my destination drive has more data  than the source after this command:
# rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* 
/home
sending incremental file list
sent 658660 bytes  received 2433 bytes  1322186.00 bytes/sec
total size is 1355368091626  speedup is 2050192.77

# df /media/disk
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/md2 1868468340 1315408384 553059956  71% /media/disk
# df /home
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/sdb 3907029168 1325491836 2581537332  34% /home




On Fri 07 January 2011 10:55:43 Carl Cook wrote:
 
 Wow, this rsync and backup system is pretty amazing.  I've always just tarred 
 each directory manually, but now find I can RELIABLY automate backups, and 
 have SOLID versioning to boot.  Thanks to everyone who advised, especially 
 Freddie and Anthony.
 
 I am still waiting for hardware for my backup server, but have been 
 preparing.  On the backup server I'll be doing pull backups for everything 
 except my phone (which is connected intermittently).  I'm going to set up a 
 cron script on the backup server to pull backups once a week (as opposed to 
 once/mo which I've done for 12 years).  I am at a loss how to to lock the 
 database on the HTPC while exporting the dump, as per Lloyd Standish, but 
 will study it.  (Freddie gave a nice script, but it doesn't seem to 
 lock/flush first)  Also don't know how to email results/success/fail on 
 completion, as I've not a very good coder.
 
 But here is my proposed cron:
 btrfs subvolume snapshot hex:///home /media/backups/snapshots/hex-{DATE}
 rsync --archive --hard-links --delete-during --delete-excluded --inplace 
 --numeric-ids -e ssh --exclude-from=/media/backups/exclude-hex hex:///home 
 /media/backups/hex
 btrfs subvolume snapshot droog:///home /media/backups/snapshots/droog-{DATE}
 rsync --archive --hard-links --delete-during --delete-excluded --inplace 
 --numeric-ids -e ssh --exclude-from=/media/backups/exclude-droog 
 droog:///home /media/backups/droog
 
 My root filesystems are ext4, so I guess they cannot be snapshotted before 
 backup.  My home directories are/will be BTRFS though.
 
 
 On Fri 07 January 2011 08:14:17 Hubert Kario wrote:
  I'd suggest at least 
  mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd
  if you really want raid0
 
  I don't fully understand -m or -d.  Why would this make a truer raid0 that 
  with no options?
 
 I am beginning to suspect that this is the -default- behavior, as described 
 in the wiki:
 # Create a filesystem across four drives (metadata mirrored, data striped)
 
 Should I turn off the writeback cache on each drive when running BTRFS?
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Various Questions

2011-01-08 Thread Ian! D. Allen
On Sat, Jan 08, 2011 at 05:25:19AM -0800, Carl Cook wrote:
 In addition to the questions below, if anyone has a chance could you
 advise on why my destination drive has more data than the source after
 this command:
 # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* 
 /home
 sending incremental file list
 sent 658660 bytes  received 2433 bytes  1322186.00 bytes/sec
 total size is 1355368091626  speedup is 2050192.77
 
 # df /media/disk
 Filesystem   1K-blocks  Used Available Use% Mounted on
 /dev/md2 1868468340 1315408384 553059956  71% /media/disk
 # df /home
 Filesystem   1K-blocks  Used Available Use% Mounted on
 /dev/sdb 3907029168 1325491836 2581537332  34% /home

This has little to do with btrfs; it happens with many file systems due
to file system infrastructure details such as directory sizes, sparse
file handling, file fragmentation, etc.

For example: If you have a directory with a huge number of file names
in it, the actual directory disk space used will be large and will not
be reclaimed when you delete all the file names from the directory.
You would have to remove the directory itself and recreate it to reclaim
that space.  Also, using rsync without --sparse (which can't work with
--inplace), sparse files on the source may get expanded to take real
disk blocks on the destination.

Unless you use dd to copy a partition exactly, including all the file
system infrastructure details, any copy you make will be subject to the
vagaries of how the file system decides to lay out the data.

-- 
| Ian! D. Allen  -  idal...@idallen.ca  -  Ottawa, Ontario, Canada
| Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Various Questions

2011-01-08 Thread Freddie Cash
On Sat, Jan 8, 2011 at 5:25 AM, Carl Cook cac...@quantum-sci.com wrote:

 In addition to the questions below, if anyone has a chance could you advise 
 on why my destination drive has more data  than the source after this command:
 # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* 
 /home
 sending incremental file list

What happens if you delete /home, then run the command again, but
without the *?  You generally don't use wildcards for the source or
destination when using rsync.  You just tell it which directory to
start in.

If you do an ls /home and ls /media/disk are they different?

-- 
Freddie Cash
fjwc...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Various Questions

2011-01-07 Thread Carl Cook

On Fri 07 January 2011 08:14:17 Hubert Kario wrote:
 I'd suggest at least 
 mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd
 if you really want raid0

I don't fully understand -m or -d.  Why would this make a truer raid0 that with 
no options?


Is it necessary to use fdisk on new drives in creating a BTRFS multi-drive 
array?  Or is this all that's needed:
# mkfs.btrfs /dev/sdb /dev/sdc
# btrfs filesystem show

Is this related to 'subvolumes'?  The FAQ implies that a subvolume is like a 
directory, but also like a partition.  What's the rationale for being able to 
create a subvolume under a subvolume, as Hubert says so he can use the 
shadow_copy module for samba to publish the snapshots  to windows clients.  I 
don't have any windows clients, but what difference does his structure make?

I know that if using SATA+LVM, turn off the writeback cache on the drive, as it 
doesn't do cash flushing, and ensure NCQ is on.  But does this also apply to a 
BTRFS array?  If so, is this done in rc.local with 
hdparm -I /dev/sdb
hdparm -I /dev/sdc


How do you know what options to rsync are on by default?  I can't find this 
anywhere.  For example, it seems to me that --perms -ogE  --hard-links and 
--delete-excluded should be on by default, for a true sync?

If using the  --numeric-ids switch for rsync, do you just have to manually make 
sure the IDs and usernames are the same on source and destination machines?

For files that fail to transfer, wouldn't it be wise to use  --partial-dir=DIR 
to at least recover part of lost files?

The rsync man page says that rsync uses ssh by default, but is that the case?  
I think -e may be related to engaging ssh, but don't understand the explanation.

So for my system where there is a backup server, I guess I run the rsync daemon 
on the backup server which presents a port, then when the other systems decide 
it's time for a backup (cron) they:
- stop mysql, dump the database somewhere, start mysql;
- connect to the backup server's rsync port and dump their data to (hopefully) 
some specific place there.
Right?




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Various Questions

2011-01-07 Thread C Anthony Risinger
On Fri, Jan 7, 2011 at 11:15 AM, Carl Cook cac...@quantum-sci.com wrote:

 On Fri 07 January 2011 08:14:17 Hubert Kario wrote:
 I'd suggest at least
 mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd
 if you really want raid0

 I don't fully understand -m or -d.  Why would this make a truer raid0 that 
 with no options?

this will give you RAID0 for your data, but RAID1 for your metadata,
making it less likely that the FS itself gets corrupted, even though
you will lose some data in crash cases, if i understand correctly.

 Is it necessary to use fdisk on new drives in creating a BTRFS multi-drive 
 array?  Or is this all that's needed:
 # mkfs.btrfs /dev/sdb /dev/sdc
 # btrfs filesystem show

depends on whether you need /boot partitions or other partitions.
what you have works fine though.

 Is this related to 'subvolumes'?  The FAQ implies that a subvolume is like a 
 directory, but also like a partition.  What's the rationale for being able to 
 create a subvolume under a subvolume, as Hubert says so he can use the 
 shadow_copy module for samba to publish the snapshots  to windows clients.  
 I don't have any windows clients, but what difference does his structure make?

just his preference to put it there... the snapshot of a snapshot can
go anywhere.  it doesn't have to reside under it's parent, the
parent was just used as a base, it's not bound to it in any way AFAIK.

 How do you know what options to rsync are on by default?  I can't find this 
 anywhere.  For example, it seems to me that --perms -ogE  --hard-links and 
 --delete-excluded should be on by default, for a true sync?

the links and command Freddie Cash posted are a really good base to work from.

 So for my system where there is a backup server, I guess I run the rsync 
 daemon on the backup server which presents a port, then when the other 
 systems decide it's time for a backup (cron) they:
 - stop mysql, dump the database somewhere, start mysql;
 - connect to the backup server's rsync port and dump their data to 
 (hopefully) some specific place there.
 Right?

you don't have to stop mysql, you just need to freeze any new,
incoming writes, and flush (ie. let finish) whatever is happening
right now.  this ensures mysql is _internally_ consistent on the disk.

see comment by Lloyd Standish here:

http://dev.mysql.com/doc/refman/5.1/en/backup-methods.html

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Various Questions

2011-01-07 Thread Freddie Cash
On Fri, Jan 7, 2011 at 9:15 AM, Carl Cook cac...@quantum-sci.com wrote:
 How do you know what options to rsync are on by default?  I can't find this 
 anywhere.  For example, it seems to me that --perms -ogE  --hard-links and 
 --delete-excluded should be on by default, for a true sync?

Who cares which ones are on by default?  List the ones you want to use
on the command-line, everytime.  That way, if the defaults change,
your setup won't.

 If using the  --numeric-ids switch for rsync, do you just have to manually 
 make sure the IDs and usernames are the same on source and destination 
 machines?

You use the --numeric-ids switch so that it *doesn't* matter if the
IDs/usernames are the same.  It just sends the ID number on the wire.
Sure, if you do an ls on the backup box, the username will appear to
be messed up.  But if you compare the user ID assigned to the file,
and the user ID to the backed up etc/passwd file, they are correct.
Then, if you ever need to restore the HTPC from backups, the
etc/passwd file is transferred over, the user IDs are transferred
over, and when you do an ls on the HTPC, everything matches up
correctly.

 For files that fail to transfer, wouldn't it be wise to use  
 --partial-dir=DIR to at least recover part of lost files?

Or, just run rsync again, if the connection is dropped.

 The rsync man page says that rsync uses ssh by default, but is that the case? 
  I think -e may be related to engaging ssh, but don't understand the 
 explanation.

Does it matter what the default is, if you specify exactly how you
want it to work on the command-line?

 So for my system where there is a backup server, I guess I run the rsync 
 daemon on the backup server which presents a port, then when the other 
 systems decide it's time for a backup (cron) they:
 - stop mysql, dump the database somewhere, start mysql;
 - connect to the backup server's rsync port and dump their data to 
 (hopefully) some specific place there.
 Right?

That's one way (push backups).  It works ok for small numbers of
systems being backed up.  But get above a handful of machines, and it
gets very hard to time everything so that you don't hammer the disks
on the backup server.

Pull backups (backups server does everything) works better, in my
experience.  Then you just script things up once, run 1 script, worry
about 1 schedule, and everything is stored on the backups server.  No
need to run rsync daemons everywhere, just run the rsync client, using
-e ssh, and let it do everything.

If you need it to run a script on the remote machine first, that's
easy enough to do:
  - ssh to remote system, run script to stop DBs, dump DBs, snapshot
FS, whatever
  - then run rsync
  - ssh to remote system run script to start DBs, delete snapshot, whatever

You're starting to over-think things.  Keep it simple, don't worry
about defaults, specify everything you want to do, and do it all from
the backups box.

-- 
Freddie Cash
fjwc...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Various Questions

2011-01-07 Thread Carl Cook

Wow, this rsync and backup system is pretty amazing.  I've always just tarred 
each directory manually, but now find I can RELIABLY automate backups, and have 
SOLID versioning to boot.  Thanks to everyone who advised, especially Freddie 
and Anthony.

I am still waiting for hardware for my backup server, but have been preparing.  
On the backup server I'll be doing pull backups for everything except my phone 
(which is connected intermittently).  I'm going to set up a cron script on the 
backup server to pull backups once a week (as opposed to once/mo which I've 
done for 12 years).  I am at a loss how to to lock the database on the HTPC 
while exporting the dump, as per Lloyd Standish, but will study it.  (Freddie 
gave a nice script, but it doesn't seem to lock/flush first)  Also don't know 
how to email results/success/fail on completion, as I've not a very good coder.

But here is my proposed cron:
btrfs subvolume snapshot hex:///home /media/backups/snapshots/hex-{DATE}
rsync --archive --hard-links --delete-during --delete-excluded --inplace 
--numeric-ids -e ssh --exclude-from=/media/backups/exclude-hex hex:///home 
/media/backups/hex
btrfs subvolume snapshot droog:///home /media/backups/snapshots/droog-{DATE}
rsync --archive --hard-links --delete-during --delete-excluded --inplace 
--numeric-ids -e ssh --exclude-from=/media/backups/exclude-droog droog:///home 
/media/backups/droog

My root filesystems are ext4, so I guess they cannot be snapshotted before 
backup.  My home directories are/will be BTRFS though.


On Fri 07 January 2011 08:14:17 Hubert Kario wrote:
 I'd suggest at least 
 mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd
 if you really want raid0

 I don't fully understand -m or -d.  Why would this make a truer raid0 that 
 with no options?

I am beginning to suspect that this is the -default- behavior, as described in 
the wiki:
# Create a filesystem across four drives (metadata mirrored, data striped)

Should I turn off the writeback cache on each drive when running BTRFS?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html