Re: [HACKERS] Database backup mechanism

2007-02-12 Thread Robert Treat
On Friday 09 February 2007 08:16, Doug Knight wrote:
 I would also be interested in any creative ways to reduce the size and
 time to backup databases/clusters. We were just having a conversation
 about this yesterday. We were mulling over things like using rsync to
 only backup files in the database directory tree that actually changed.
 Or maybe doing a selective backup of files based on modified times, etc,
 but were unsure if this would be a safe, reliable way to backup a
 reduced set of data.


The biggest factor with backing up TB databases is the amount of activity you 
have going on.  If you have a fairly static data set, you can try using rsync 
or custom pg_dump scripts piping specific tables to something like split as 
needed. Both of these methods will require some finagling, so make sure to 
test them before using them.  You can probably also use something like Slony, 
though the initial data copy will prove painful, but again on a fairly static 
set of data it could work. 

If you're doing large amounts of transactional activity, these methods are 
going to become unworkable.  We push about 2 GB of WAL an hour on one of our 
systems, and the only method that seems workable is using PITR with weekly 
filesystem snapshots as the base and then copying the xlogs offline for 
re-play.  It's still tricky to get right, but it seems to work.  

-- 
Robert Treat
Database Architect
OmniTI Computer Consulting, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Database backup mechanism

2007-02-09 Thread RaviKumar . Mandala
Hi Folks,

We have a requirement to deal with large databases of the size Terabytes 
when we go into production. What is the best database back-up mechanism 
and possible issues?

pg_dump can back-up database but the dump file is limited by OS file-size 
limit. What about the option of compressing the dump file? How much time 
does it generally take for large databases? I heard, that it would be way 
too long (even one or days). I haven't tried it out, though.

What about taking zipped back-up of the database directory? We tried this 
out but the checkpoint data in pg_xlogs directory is also being backed-up. 
Since these logs keeps on increasing from day1 of database creation, the 
back_up size if increasing drastically.
Can we back-up certain subdirectories without loss of information or 
consistency..?

Any quick comments/suggestions in this regard would be very helpful.

Thanks in advance,
Ravi Kumar Mandala

Re: [HACKERS] Database backup mechanism

2007-02-09 Thread Heikki Linnakangas

[EMAIL PROTECTED] wrote:
We have a requirement to deal with large databases of the size Terabytes 
when we go into production. What is the best database back-up mechanism 
and possible issues?


It depends.

Make sure you read Chapter 23. Backup and Restore of the user manual:

http://www.postgresql.org/docs/8.2/interactive/backup.html

It discusses pg_dump and restore, as well as file system level backup. 
You'll probably want to set up continuous archiving, which allows you to 
take a file-system level backup without shutting down the database.


What about taking zipped back-up of the database directory? We tried this 
out but the checkpoint data in pg_xlogs directory is also being backed-up. 
Since these logs keeps on increasing from day1 of database creation, the 
back_up size if increasing drastically.


The amount of WAL files in pg_xlog directory is controlled by the 
checkpoint_segments configuration parameter.


Can we back-up certain subdirectories without loss of information or 
consistency..?


No.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Database backup mechanism

2007-02-09 Thread Doug Knight
I would also be interested in any creative ways to reduce the size and
time to backup databases/clusters. We were just having a conversation
about this yesterday. We were mulling over things like using rsync to
only backup files in the database directory tree that actually changed.
Or maybe doing a selective backup of files based on modified times, etc,
but were unsure if this would be a safe, reliable way to backup a
reduced set of data.

Doug Knight
WSI Inc.
Andover, MA
 
On Fri, 2007-02-09 at 12:45 +0530, [EMAIL PROTECTED] wrote:
 
 Hi Folks, 
 
 We have a requirement to deal with large databases of the size
 Terabytes when we go into production. What is the best database
 back-up mechanism and possible issues? 
 
 pg_dump can back-up database but the dump file is limited by OS
 file-size limit. What about the option of compressing the dump file?
 How much time does it generally take for large databases? I heard,
 that it would be way too long (even one or days). I haven't tried it
 out, though. 
 
 What about taking zipped back-up of the database directory? We tried
 this out but the checkpoint data in pg_xlogs directory is also being
 backed-up. Since these logs keeps on increasing from day1 of database
 creation, the back_up size if increasing drastically. 
 Can we back-up certain subdirectories without loss of information or
 consistency..? 
 
 Any quick comments/suggestions in this regard would be very helpful. 
 
 Thanks in advance, 
 Ravi Kumar Mandala


Re: [HACKERS] Database backup mechanism

2007-02-09 Thread Andrew Dunstan

[EMAIL PROTECTED] wrote:


Hi Folks,

We have a requirement to deal with large databases of the size 
Terabytes when we go into production. What is the best database 
back-up mechanism and possible issues?


pg_dump can back-up database but the dump file is limited by OS 
file-size limit. What about the option of compressing the dump file? 
How much time does it generally take for large databases? I heard, 
that it would be way too long (even one or days). I haven't tried it 
out, though.


What about taking zipped back-up of the database directory? We tried 
this out but the checkpoint data in pg_xlogs directory is also being 
backed-up. Since these logs keeps on increasing from day1 of database 
creation, the back_up size if increasing drastically.
Can we back-up certain subdirectories without loss of information or 
consistency..?


Any quick comments/suggestions in this regard would be very helpful.



Please ask in the correct forum, either pgsql-general or pgsql-admin. 
This list is strictly for discussion of development of postgres, not 
usage questions.


(If all you need is a pg_dump backup, maybe you could just pipe its 
output to something like 'split -a 5 -b 1000m - mybackup')


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Database backup mechanism

2007-02-09 Thread Richard Huxton

Doug Knight wrote:

I would also be interested in any creative ways to reduce the size and
time to backup databases/clusters. We were just having a conversation
about this yesterday. We were mulling over things like using rsync to
only backup files in the database directory tree that actually changed.
Or maybe doing a selective backup of files based on modified times, etc,
but were unsure if this would be a safe, reliable way to backup a
reduced set of data.


Most of your virtual-layer filesystems should offer a snapshot facility 
that lets the database think its writing to the files while you see a 
static version. You could rsync that against an older file-level copy as 
the base copy in a PITR backup.


Note - even with a snapshot facility you need to use PITR or stop the 
database to get a guaranteed working copy.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org