Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-09 Thread dan
just playing devils advocate here as this conversation has already chosen
its direction.

would it be reasonable to compare two filesystems based on their journal?  I
am assuming that essentially all backuppc installations are on a journaled
filesystem or could be upgraded to one.  wouldnt replaying that journal to a
tar file, sending the tar file to the remote host, then restoring the
journal be pretty efficient?

I have no experience with this, just wanted to throw another concept out
there.


On Mon, Dec 8, 2008 at 10:49 PM, Jeffrey J. Kosowsky
<[EMAIL PROTECTED]>wrote:

> Holger Parplies wrote at about 04:10:17 +0100 on Tuesday, December 9, 2008:
>  > Hi,
>  >
>  > Jeffrey J. Kosowsky wrote on 2008-12-08 09:37:16 -0500 [Re:
> [BackupPC-users] Advice on creating duplicate backup server]:
>  > >
>  > > It just hit me that given the known architecture of the pool and cpool
>  > > directories shouldn't it be possible to come up with a scheme that
>  > > works better than either rsync (which can choke on too many hard
>  > > links) and 'dd' (which has no notion of incremental and requires you
>  > > to resize the filesystem etc.).
>  >
>  > yes, that hit someone on the list several years ago (I don't remember
> the
>  > name, sorry). I implemented the idea he sketched (well, more or less,
> there's
>  > some work left to make it really useful).
>  >
>  > > My thought is as follows:
>  > > 1. First, recurse through the pc directory to create a list of
>  > >files/paths and the corresponding pool links.
>  > >Note that finding the pool links can be done in one of several
>  > >ways:
>  > >- Method 1: Create a sorted list of pool files (which should be
>  > >  significantly shorter than the list of all files due to the
>  > > nature of pooling and therefore require less memory than rsyn)
>  > > and then look up the links.
>  >
>  > Wrong. You need one entry per inode that points to an arbitrary path
> (the
>  > first one you copy). Every file(*) is in the pool, meaning a list of all
> pool
>  > files is exactly what you need. A different way to look at it: every
> file with
>  > a link count > 1 is a pooled file, and it's these files that cause
> rsync&co
>  > problems, not single link files. (Well, yes, rsync pre-3 needed a
> complete
>  > list of all files.)
> OK. I had assumed (wrongly) that rsync needed to keep track of each
> file that is hard-linked, not just one copy.
> Still, there are some savings by knowing that you can find your one
> copy in the pool and you don't have to look at all through the pc tree.
>  >
>  > (*) Files that are not in the pool:
>  > 1.) 0-byte files. They take up no file system blocks, so pooling
> them
>  > saves only inodes. Not pooling them makes things simpler.
>  > 2.) log files (they get appended to; that would make pooling
> somewhat
>  > difficult; besides, what chance is there of a pool hit?),
>  > backups files (including backups.old)
>  > attrib files are pooled, contrary to popular belief, and that makes
>  > sense, because they are often identical with the same attrib file
> from
>  > the previous backup(s).
> Yes. I am aware of this from the routines I wrote to check/fix pool
> consistency and missing links to the pool
>  >
>  >
>  > The algorithm I implemented is somewhat similar:
>  > 1.) Walk pool/, cpool/ and pc/, printing information on the files and
>  > directories to a file (which will be quite large; by default I put
> it
>  > on the destination pool FS, because there should be large amounts of
>  > space there).
>  > 2.) Sort the file with the 'sort' command. The lines in the file are
>  > designed such that they will be sorted into a meaningful order:
>  > - directories first, so I can create them and subsequently not worry
>  >   about whether the place I want to copy/link a file to already
> exists
>  >   or not
>  > - files next, sorted by inode number, with the (c)pool file
> preceeding its
>  >   pc/ links
>  >   The consequence is that I get all references to one inode on
> adjacent
>  >   lines. The first time, I copy the file. For the repetitions, I
> link to
>  >   the first copy. All I need to keep in memory is something like one
> line
>  >   from the file list, one "previous inode number", one "file name of
>  >   previous inode".
>  > 'sort' handles hug

Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 04:10:17 +0100 on Tuesday, December 9, 2008:
 > Hi,
 > 
 > Jeffrey J. Kosowsky wrote on 2008-12-08 09:37:16 -0500 [Re: [BackupPC-users] 
 > Advice on creating duplicate backup server]:
 > > 
 > > It just hit me that given the known architecture of the pool and cpool
 > > directories shouldn't it be possible to come up with a scheme that
 > > works better than either rsync (which can choke on too many hard
 > > links) and 'dd' (which has no notion of incremental and requires you
 > > to resize the filesystem etc.).
 > 
 > yes, that hit someone on the list several years ago (I don't remember the
 > name, sorry). I implemented the idea he sketched (well, more or less, there's
 > some work left to make it really useful).
 > 
 > > My thought is as follows:
 > > 1. First, recurse through the pc directory to create a list of
 > >files/paths and the corresponding pool links.
 > >Note that finding the pool links can be done in one of several
 > >ways:
 > >- Method 1: Create a sorted list of pool files (which should be
 > >  significantly shorter than the list of all files due to the
 > > nature of pooling and therefore require less memory than rsyn)
 > > and then look up the links.
 > 
 > Wrong. You need one entry per inode that points to an arbitrary path (the
 > first one you copy). Every file(*) is in the pool, meaning a list of all pool
 > files is exactly what you need. A different way to look at it: every file 
 > with
 > a link count > 1 is a pooled file, and it's these files that cause rsync&co
 > problems, not single link files. (Well, yes, rsync pre-3 needed a complete
 > list of all files.)
OK. I had assumed (wrongly) that rsync needed to keep track of each
file that is hard-linked, not just one copy.
Still, there are some savings by knowing that you can find your one
copy in the pool and you don't have to look at all through the pc tree.
 > 
 > (*) Files that are not in the pool:
 > 1.) 0-byte files. They take up no file system blocks, so pooling them
 > saves only inodes. Not pooling them makes things simpler.
 > 2.) log files (they get appended to; that would make pooling somewhat
 > difficult; besides, what chance is there of a pool hit?),
 > backups files (including backups.old)
 > attrib files are pooled, contrary to popular belief, and that makes
 > sense, because they are often identical with the same attrib file from
 > the previous backup(s).
Yes. I am aware of this from the routines I wrote to check/fix pool
consistency and missing links to the pool
 > 
 > 
 > The algorithm I implemented is somewhat similar:
 > 1.) Walk pool/, cpool/ and pc/, printing information on the files and
 > directories to a file (which will be quite large; by default I put it
 > on the destination pool FS, because there should be large amounts of
 > space there).
 > 2.) Sort the file with the 'sort' command. The lines in the file are
 > designed such that they will be sorted into a meaningful order:
 > - directories first, so I can create them and subsequently not worry
 >   about whether the place I want to copy/link a file to already exists
 >   or not
 > - files next, sorted by inode number, with the (c)pool file preceeding 
 > its
 >   pc/ links
 >   The consequence is that I get all references to one inode on adjacent
 >   lines. The first time, I copy the file. For the repetitions, I link to
 >   the first copy. All I need to keep in memory is something like one line
 >   from the file list, one "previous inode number", one "file name of
 >   previous inode".
 > 'sort' handles huge files quite nicely, but it seems to create large
 > (amounts of) files under /tmp, possibly under $TMPDIR if you set that 
 > (not
 > sure). You need to make sure you've got the space, but if you're copying 
 > a
 > multi-GB/TB pool, you probably have. My guess is that the necessary 
 > amount
 > of space roughly equals the size of the file I'm sorting.
 > 3.) Walk the sorted file, line by line, creating directories and copying 
 > files
 > (with File::Copy::cp, but I plan to change that to PoolWrite, so I can 
 > add
 > (part of) one pool to an existing second pool, or something that
 > communicates over TCP/IP, so I can copy to a different machine) and
 > linking files (with Perl function link()).
 > In theory, a pool could also be compressed or uncompressed on the fly
 > (uncompressed for copying to zfs, for instance).

Yes... I was thinking very similarly tho

Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 Thread Holger Parplies
Hi,

Jeffrey J. Kosowsky wrote on 2008-12-08 09:37:16 -0500 [Re: [BackupPC-users] 
Advice on creating duplicate backup server]:
> 
> It just hit me that given the known architecture of the pool and cpool
> directories shouldn't it be possible to come up with a scheme that
> works better than either rsync (which can choke on too many hard
> links) and 'dd' (which has no notion of incremental and requires you
> to resize the filesystem etc.).

yes, that hit someone on the list several years ago (I don't remember the
name, sorry). I implemented the idea he sketched (well, more or less, there's
some work left to make it really useful).

> My thought is as follows:
> 1. First, recurse through the pc directory to create a list of
>files/paths and the corresponding pool links.
>Note that finding the pool links can be done in one of several
>ways:
>- Method 1: Create a sorted list of pool files (which should be
>  significantly shorter than the list of all files due to the
>nature of pooling and therefore require less memory than rsyn)
>and then look up the links.

Wrong. You need one entry per inode that points to an arbitrary path (the
first one you copy). Every file(*) is in the pool, meaning a list of all pool
files is exactly what you need. A different way to look at it: every file with
a link count > 1 is a pooled file, and it's these files that cause rsync&co
problems, not single link files. (Well, yes, rsync pre-3 needed a complete
list of all files.)

(*) Files that are not in the pool:
1.) 0-byte files. They take up no file system blocks, so pooling them
saves only inodes. Not pooling them makes things simpler.
2.) log files (they get appended to; that would make pooling somewhat
difficult; besides, what chance is there of a pool hit?),
backups files (including backups.old)
attrib files are pooled, contrary to popular belief, and that makes
sense, because they are often identical with the same attrib file from
the previous backup(s).


The algorithm I implemented is somewhat similar:
1.) Walk pool/, cpool/ and pc/, printing information on the files and
directories to a file (which will be quite large; by default I put it
on the destination pool FS, because there should be large amounts of
space there).
2.) Sort the file with the 'sort' command. The lines in the file are
designed such that they will be sorted into a meaningful order:
- directories first, so I can create them and subsequently not worry
  about whether the place I want to copy/link a file to already exists
  or not
- files next, sorted by inode number, with the (c)pool file preceeding its
  pc/ links
  The consequence is that I get all references to one inode on adjacent
  lines. The first time, I copy the file. For the repetitions, I link to
  the first copy. All I need to keep in memory is something like one line
  from the file list, one "previous inode number", one "file name of
  previous inode".
'sort' handles huge files quite nicely, but it seems to create large
(amounts of) files under /tmp, possibly under $TMPDIR if you set that (not
sure). You need to make sure you've got the space, but if you're copying a
multi-GB/TB pool, you probably have. My guess is that the necessary amount
of space roughly equals the size of the file I'm sorting.
3.) Walk the sorted file, line by line, creating directories and copying files
(with File::Copy::cp, but I plan to change that to PoolWrite, so I can add
(part of) one pool to an existing second pool, or something that
communicates over TCP/IP, so I can copy to a different machine) and
linking files (with Perl function link()).
In theory, a pool could also be compressed or uncompressed on the fly
(uncompressed for copying to zfs, for instance).


Once again, because people seem to be determined to miss the point: it's *not*
processing by sorted inode numbers in order to save disk seeks that is the
point, it's the fact that the 'link' system call takes two paths

link $source_path, $dest_path; # to use Perl notation

while the 'stat' system call gives you only an inode number. To link a
filename to a previously copied inode, you need to know the name you copied it
to. A general purpose tool can't know when it will need the information, so it
needs to keep information on all inodes with link count > 1 it has encountered.
You can keep a mapping of inode_number->file_name in memory for a few thousand
files, but not for hundreds of millions. By sorting the list by inode number,
I can be sure that I'll never need the info for one inode again once I've
reached the next inode, so I only have to keep info for one file in memory,
regardl

Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 Thread dan
you could mess around with LVM snapshots.  I hear that you can make an LVM
snapshot and rsync that over, then restore it to the backup LVM.  I have not
tried this but have seen examples around the net.

have you tried rsync3?  it works for me. I dont quite have 3TB so I cant
really advise you on that size, Im not sure where the line is on file count
that rsync3 cant handle.

ZFS would be ideal for this but you have to make the leap to a
solaris/opensolaris kernel.  ZFS Fuse is completely non-functional for
backuppc as it will crash as soon as you starting hitting the filesystem and
the delayed write caching kicks in.  ZFS on freebsd is not mature enough and
tends to crash out with heavy IO.

with zfs it works something like this:
http://blogs.sun.com/clive/resource/zfs_repl.ksh

you can send a full zfs snapshot like
zfs send /pool/[EMAIL PROTECTED] | ssh remotehost zfs recv -v
/remotepool/remotefs
or send an incremental afterwards with
zfs send -i /pool/[EMAIL PROTECTED] | ssh remotehost zfs recv -F -v
/remotepool/remotefs

feel free to compress the ssh stream with -C if you like, but I would first
check your bandwidth usage and see if you are using the whole thing.  If
not, then the compression will slow you down.

The real downside here is the switch to solaris if you are a linux person.
You can also try nexenta which is the opensolaris kernel on a debian/ubuntu
userland complete with apt.

You also get filesystem level compression with ZFS so you dont need to
compress your pool.  This should make recovering files outside of backuppc a
little more convenient .

how is a tape taking 1-2weeks?  1 week = 5.2KB/s.  If you are that IO
constrained, nothing is going to work right for you.  How full is your pool?

you could also consider not keeping a copy of the pool remotely be rather
pullting a tar backup off the backuppc system on some schedule and sending
that to the remote machine for storage.

The problem with using NBD or anything like that and using 'dd' is that
there is no resume support and with 3TB you are likely to get errors every
now and then.  even with a full T1 you are stuck at at least 6hour with
theoritical numbers and are probably looking at %50 more than that.

as far as some other scheme to syncing up the pools, hardlinks will get
you.

You could use find to traverse the entire pool and take some info down on
each file such as name, size, type etc etc and then use some fancy perl to
sort this out into managable groups and then use rsync on individual files.


On Mon, Dec 8, 2008 at 7:37 AM, Jeffrey J. Kosowsky
<[EMAIL PROTECTED]>wrote:

> Stuart Luscombe wrote at about 10:02:04 + on Monday, December 8, 2008:
>  > Hi there,
>  >
>  >
>  >
>  > I've been struggling with this for a little while now so I thought it
> about
>  > time I got some help!
>  >
>  >
>  >
>  > We currently have a server running BackupPC v3.1.0 which has a pool of
>  > around 3TB and we've got to a stage where a tape backup of of the pool
> is
>  > taking 1-2 weeks, which isn't effective at all.  The decision was made
> to
>  > buy a server that is an exact duplicate of our current one and have it
>  > hosted in another building, as a 2 week old backup isn't ideal in the
> event
>  > of a disaster.
>  >
>  >
>  >
>  > I've got the OS (CentOS) installed on the new server and have installed
>  > BackupPC v3.1.0, but I'm having problems working out how to sync the
> pool
>  > with the main backup server. I managed to rsync the cpool folder without
> any
>  > real bother, but the pool folder is the problem, if I try an rsync it
>  > eventually dies with an 'out of memory' error (the server has 8GB), and
> a cp
>  > -a didn't seem to work either, as the server filled up, assumedly as
> it's
>  > not copying the hard links correctly?
>  >
>  >
>  >
>  > So my query here really is am I going the right way about this? If not,
>  > what's the best method to take so that say once a day the duplicate
> server
>  > gets updated.
>  >
>  >
>  >
>  > Many Thanks
>
> It just hit me that given the known architecture of the pool and cpool
> directories shouldn't it be possible to come up with a scheme that
> works better than either rsync (which can choke on too many hard
> links) and 'dd' (which has no notion of incremental and requires you
> to resize the filesystem etc.).
>
> My thought is as follows:
> 1. First, recurse through the pc directory to create a list of
>   files/paths and the corresponding pool links.
>   Note that finding the pool links can be done in one of several
>   ways:
>   - Method 1: Create a sorted list of pool files (which should be
> significantly shorter than the list of all files due to the
> nature of pooling and therefore require less memory than rsyn)
> and then look up the links.
>   - Method 2: Calculate the md5sum file path of the file to determine
> out where it is in the pool. Where necessary, determine among
> chain duplicates
>   - Method 3: Not possible ye

Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 Thread Jeffrey J. Kosowsky
Stuart Luscombe wrote at about 10:02:04 + on Monday, December 8, 2008:
 > Hi there,
 > 
 >  
 > 
 > I've been struggling with this for a little while now so I thought it about
 > time I got some help!
 > 
 >  
 > 
 > We currently have a server running BackupPC v3.1.0 which has a pool of
 > around 3TB and we've got to a stage where a tape backup of of the pool is
 > taking 1-2 weeks, which isn't effective at all.  The decision was made to
 > buy a server that is an exact duplicate of our current one and have it
 > hosted in another building, as a 2 week old backup isn't ideal in the event
 > of a disaster.
 > 
 >  
 > 
 > I've got the OS (CentOS) installed on the new server and have installed
 > BackupPC v3.1.0, but I'm having problems working out how to sync the pool
 > with the main backup server. I managed to rsync the cpool folder without any
 > real bother, but the pool folder is the problem, if I try an rsync it
 > eventually dies with an 'out of memory' error (the server has 8GB), and a cp
 > -a didn't seem to work either, as the server filled up, assumedly as it's
 > not copying the hard links correctly?
 > 
 >  
 > 
 > So my query here really is am I going the right way about this? If not,
 > what's the best method to take so that say once a day the duplicate server
 > gets updated.
 > 
 >  
 > 
 > Many Thanks

It just hit me that given the known architecture of the pool and cpool
directories shouldn't it be possible to come up with a scheme that
works better than either rsync (which can choke on too many hard
links) and 'dd' (which has no notion of incremental and requires you
to resize the filesystem etc.).

My thought is as follows:
1. First, recurse through the pc directory to create a list of
   files/paths and the corresponding pool links.
   Note that finding the pool links can be done in one of several
   ways:
   - Method 1: Create a sorted list of pool files (which should be
 significantly shorter than the list of all files due to the
 nature of pooling and therefore require less memory than rsyn)
 and then look up the links.
   - Method 2: Calculate the md5sum file path of the file to determine
 out where it is in the pool. Where necessary, determine among
 chain duplicates
   - Method 3: Not possible yet but would be possible if the md5sum
 file paths were appended to compressed backups. This would add very
 little to the storage but it would allow you to very easily
 determine the right link. If so then you could just read the link
 path from the file. 

  Files with only 1 link (i.e. no hard links) would be tagged for
  straight copying.

2. Then rsync *just* the pool -- this should be no problem since by
   definition there are no hard links within the pool itself

3. Finally, run through the list generated in #1 to create the new pc
   directory by creating the necessary links (and for files with no
   hard links, just copy/rsync them)

The above could also be easily adapted to allow for "incremental" syncing.
Specifically, in #1, you would use rsync to just generate a list of
*changed* files in the pc directory. In #2, you would continue to use
rsync to just sync *changed* pool entries. In #3 you would only act on
the shortened incremental sync list generated in #1.

The more I think about it, the more I LIKE the idea of appending the
md5sums file paths to compressed pool files (Method #3) since this
would make the above very fast. (Note if I were implementing this, I
would also include the chain number in cases where there are multiple
files with the same md5sum path and of course then BackupPC_nightly
would have to adjust this any time it changed around the chain
numbering).

Even without the above, Method #1 would still be much less memory
intensive than rsync and Method #2 while potentially a little slow
would require very little memory and wouldn't be nearly that bad if
you are doing incremental backups.

--
Just as any FYI, if anyone wants to implement method #2, here is the
routine I use to generate the md5sum file path from a (compressed)
file (note that it is based on the analogous uncompressed version in
Lib.pm).

use BackupPC::Lib;
use BackupPC::Attrib;
use BackupPC::FileZIO;

use constant _128KB   => 131072;
use constant _1MB => 1048576;

# Compute the MD5 digest of a compressed file. This is the compressed
# file version of the Lib.pm function File2MD5.
# For efficiency we don't use the whole file for big files
#   - for files <= 256K we use the file size and the whole file.
#   - for files <= 1M we use the file size, the first 128K and
# the last 128K.
#   - for files > 1M, we use the file size, the first 128K and
# the 8th 128K (ie: the 128K up to 1MB).
# See the documentation for a discussion of the tradeoffs in
# how much data we use and how many collisions we get.
#
# Returns the MD5 digest (a hex s

Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 Thread Thomas Karcher
Hi,

> > Instead of trying to sync the pool, can't you just run a second  
> > BackupPC server that also backs up your machines?
> If you don't need the current backup history on the redundant server, save
> yourself the pain of the initial pool copy and just follow this path -
> presuming network and client load constraints allow you to.

What do you guys think of DRBD as a solution to this?

Benefits:
- fast synchronization
- backup server failover possible
- no changes to BackupPC itself or the backup strategy

Caveats:
- the very same device parameters on both servers: identical file system
and size etc.
- if not on a cluster file system, only one side gets to read/modify the
data


Thomas




signature.asc
Description: This is a digitally signed message part
--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 Thread Adam Goryachev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Holger Parplies wrote:
> Hi,
> 
> Nils Breunese (Lemonbit) wrote on 2008-12-08 12:23:40 +0100 [Re: 
> [BackupPC-users] Advice on creating duplicate backup server]:
>> Stuart Luscombe wrote:
>>
>>> I?ve got the OS (CentOS) installed on the new server and have  
>>> installed BackupPC v3.1.0, but I?m having problems working out how  
>>> to sync the pool with the main backup server.
> 
> I can't help you with *keeping* the pools in sync (other than recommending to

How about my personal favourite - enbd or nbd I've had success using
this to mirror a fileserver with it's "hot standby" partner... Simply
setup all your drives as needed in your "master", format, use as needed
(as you already have), then setup your drives in your slave (ie,
raid1/5/6/etc) but don't format them. Then setup nbd/enbd (very simple,
run one command on the slave and one on the master). Now, the tricky
part, follow carefully:
1) umount the drive on the master.
2) create a raid1 array on the master with one device being your
filesystem you unmounted in (1) and the second device "missing"
3) hot-add the device from nbd (/dev/nbd/0) to your new raid1 array
4) Use mdadm to configure the device in (3) as a write-mostly or
write-only if possible.

Now you have a real-time mirror on a remote machine. If everything goes
pear-shaped, you do something like this:
1) Make sure the master is dead...
2) kill nbd on the slave
3) mount the device you used for nbd as /var/lib/backuppc
4) start backuppc on the slave

(PS, this assumes you have some method of sync'ing the other system
configs between the two machines (hint - rsync)...

You may need to experiment a bit, but perhaps LVM + snapshots might help
as well

Of course, the simplest method to ensure off-site and up to date backups
is to simple run a second independant backuppc server, assuming you have
enough time + bandwidth...

Hope that helps...

Regards,
Adam
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk9JTsACgkQGyoxogrTyiVQxQCdG3oRrrHYj4b5WY+TzkBNxDIh
x50AoLYsfYeE1qYjdbC81CQAuCR0Tw/a
=Gto6
-END PGP SIGNATURE-

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 Thread Holger Parplies
Hi,

Nils Breunese (Lemonbit) wrote on 2008-12-08 12:23:40 +0100 [Re: 
[BackupPC-users] Advice on creating duplicate backup server]:
> Stuart Luscombe wrote:
> 
> > I?ve got the OS (CentOS) installed on the new server and have  
> > installed BackupPC v3.1.0, but I?m having problems working out how  
> > to sync the pool with the main backup server.

I can't help you with *keeping* the pools in sync (other than recommending to
run the backups from both servers, like Nils said), but I may be able to help
you with an initial copy - presuming 'dd' doesn't work, which would be the
preferred method. Can you mount either the old pool on the new machine or the
new pool on the old machine via NFS? Or even better, put both disk sets in one
machine for copying? You would need to shut down BackupPC for the duration of
the copy - is that feasible? 3TB means you're facing about 10 hours even with
'dd', fast hardware and no intervening network - anything more complicated
will obviously take longer. Your pool size is 3TB - how large is the file
system it is on? Is the destination device at least the same size?
How many files are there in your pool?

> > I managed to rsync the  
> > cpool folder without any real bother, but the pool folder is the  
> > problem,

Err, 'pool' or 'pc'? ;-)

> > and a cp ?a didn?t seem to work  
> > either, as the server filled up, assumedly as it?s not copying the  
> > hard links correctly?

That is an interesting observation. I was always wondering exactly in which
way cp would fail.

> > So my query here really is am I going the right way about this? If  
> > not, what?s the best method to take so that say once a day the  
> > duplicate server gets updated.

Well, Dan, zfs? ;-)
Presuming we can get an initial copy done (does anyone have any ideas on how
to *verify* a 3TB pool copy?), would migrating the BackupPC servers to an
Opensolaris kernel be an option, or is that too "experimental"?

> Check the archives for a *lot* of posts on this subject. The general  
> conclusion is that copying or rsyncing big pools just doesn't work  
> because of the large number of hardlinks used by BackupPC. Using rsync  
> 3.x instead of 2.x seems to need a lot less memory, but it just ends  
> at some point.

Because the basic problem for *any general purpose tool* remains: you need a
full inode number to file name mapping for *all files* (there are next to no
files with only one link in a pool FS), meaning *at least* something like 50
bytes per file, probably significantly more. You do the maths.

Apparently, cp simply ignores hardlinks once malloc() starts failing, but I'm
just guessing.

This doesn't mean it can't be done. It just means *general purpose tools* will
start to fail at some point.

> A lot of people run into this when they want to migrate  
> their pool to another machine or bigger hard drive. In that case the  
> usual advice is to use dd to copy the partition and then grow the  
> filesystem once it's copied over.

The only problem being that this limits you to the same FS with the same
parameters (meaning if you've set up an ext3 FS with too high or too low
inodes to block ratio, you can't fix it this way). And the fact remains that
copying huge amounts of data simply takes time.

> Instead of trying to sync the pool, can't you just run a second  
> BackupPC server that also backs up your machines?

If you don't need the current backup history on the redundant server, save
yourself the pain of the initial pool copy and just follow this path -
presuming network and client load constraints allow you to.

One other thing: is your pool size due to the amount of backed up data or due
to a long backup history? If you just want to ensure you have a recent version
of your data (but not the complete backup history) in the event of a
catastrophe, archives (rather than a copy of the complete pool) may be what
you're looking for.

Regards,
Holger

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 Thread Nils Breunese (Lemonbit)
Stuart Luscombe wrote:

>
> I’ve got the OS (CentOS) installed on the new server and have  
> installed BackupPC v3.1.0, but I’m having problems working out how  
> to sync the pool with the main backup server. I managed to rsync the  
> cpool folder without any real bother, but the pool folder is the  
> problem, if I try an rsync it eventually dies with an ‘out of  
> memory’ error (the server has 8GB), and a cp –a didn’t seem to work  
> either, as the server filled up, assumedly as it’s not copying the  
> hard links correctly?
>
> So my query here really is am I going the right way about this? If  
> not, what’s the best method to take so that say once a day the  
> duplicate server gets updated.

Check the archives for a *lot* of posts on this subject. The general  
conclusion is that copying or rsyncing big pools just doesn't work  
because of the large number of hardlinks used by BackupPC. Using rsync  
3.x instead of 2.x seems to need a lot less memory, but it just ends  
at some point. A lot of people run into this when they want to migrate  
their pool to another machine or bigger hard drive. In that case the  
usual advice is to use dd to copy the partition and then grow the  
filesystem once it's copied over.

dd'ing your complete pool every day isn't going to work either I  
guess. Instead of trying to sync the pool, can't you just run a second  
BackupPC server that also backs up your machines?

Nils Breunese.
--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 Thread Johan Ehnberg
Stuart Luscombe wrote:
> Hi there,
> 
>  
> 
> I’ve been struggling with this for a little while now so I thought it 
> about time I got some help!
> 
>  
> 
> We currently have a server running BackupPC v3.1.0 which has a pool of 
> around 3TB and we’ve got to a stage where a tape backup of of the pool 
> is taking 1-2 weeks, which isn’t effective at all.  The decision was 
> made to buy a server that is an exact duplicate of our current one and 
> have it hosted in another building, as a 2 week old backup isn’t ideal 
> in the event of a disaster.
> 
>  
> 
> I’ve got the OS (CentOS) installed on the new server and have installed 
> BackupPC v3.1.0, but I’m having problems working out how to sync the 
> pool with the main backup server. I managed to rsync the cpool folder 
> without any real bother, but the pool folder is the problem, if I try an 
> rsync it eventually dies with an ‘out of memory’ error (the server has 
> 8GB), and a cp –a didn’t seem to work either, as the server filled up, 
> assumedly as it’s not copying the hard links correctly?
> 
>  
> 
> So my query here really is am I going the right way about this? If not, 
> what’s the best method to take so that say once a day the duplicate 
> server gets updated.
> 
>  
> 
> Many Thanks
> 

All of the folders have to be on the same filesystem, and they should 
all be synced all at once. Otherwise rsync won't know about hardlinks.

Also, have you tried 'cp -a --preserve=all'? It should be mostly 
redundant but may be worth a shot.

Best regards,
Johan


--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Advice on creating duplicate backup server

2008-12-08 Thread Stuart Luscombe
Hi there,

 

I've been struggling with this for a little while now so I thought it about
time I got some help!

 

We currently have a server running BackupPC v3.1.0 which has a pool of
around 3TB and we've got to a stage where a tape backup of of the pool is
taking 1-2 weeks, which isn't effective at all.  The decision was made to
buy a server that is an exact duplicate of our current one and have it
hosted in another building, as a 2 week old backup isn't ideal in the event
of a disaster.

 

I've got the OS (CentOS) installed on the new server and have installed
BackupPC v3.1.0, but I'm having problems working out how to sync the pool
with the main backup server. I managed to rsync the cpool folder without any
real bother, but the pool folder is the problem, if I try an rsync it
eventually dies with an 'out of memory' error (the server has 8GB), and a cp
-a didn't seem to work either, as the server filled up, assumedly as it's
not copying the hard links correctly?

 

So my query here really is am I going the right way about this? If not,
what's the best method to take so that say once a day the duplicate server
gets updated.

 

Many Thanks

 

--

Stuart Luscombe

Systems Administrator

Dementia Research Centre

8-11 Queen Square

WC1N 3BG London

Direct: 08451 555 000 72 3875

Web : http://www.dementia.ion.ucl.ac.uk  

 

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/