Re: [GENERAL] Problem with restoring from backup on 9.0.2

2011-01-28 Thread hubert depesz lubaczewski
On Thu, Jan 27, 2011 at 10:23:52PM +, Thom Brown wrote:
 Depesz, did you ever resolve this?

Robert Treat did some digging. Current status is: slav backup work as
long as you don't enable hot standby.

I will be working on omnipitr-backup-slave fix, but can't give you eta
at the moment.

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem with restoring from backup on 9.0.2

2011-01-27 Thread Thom Brown
On 27 December 2010 18:24, hubert depesz lubaczewski dep...@depesz.com wrote:
 hi,
 this mail will be relatively long, because I need to explain in details
 what/how I do.

 Simple summary:

 when restoring from hot backup, with out xlogs in pg_xlog/, but instead
 using recovery.conf to get xlogs from wal archive, I get WAL ends
 before consistent recovery point in case it shouldn't happen.

 Details:

 I have following setup:

 master pg is sending wal segments to 2 places:
  - /shared/walarchive - long term storage, compressed content
  - /shared/4backups   - short term storage, just to do hot backups on slave, 
 not compressed

 for sending I use omnipitr-archive, with 2 -dr options

 slave pg is configured to read wal segments from /shared/4backups (using `cp 
 -i
 ...`) *and* then to switch to streaming replication.

 additionally, on slave there is archive cleanup program, which cleans
 /shared/4backups but only if /flags/backup file is not present (which is
 only present when backup happens).

 to make backups I use omnipitr-backup-slave, which does:

 1. creates /flags/backup file
 2. gets snapshot of pg_controldata
 3. makes backup_label in $PGDATA, where START WAL LOCATION is taken from
 Latest checkpoint's REDO location from step #2, and CHECKPOINT
 LOCATION: comes from Latest checkpoint location in #2
 4. compresses whole $PGDATA
 5. gets snapshot of pg_controldata
 6. waits till new snapshot of pg_controldata (makes another one every 5
 seconds) will have different (from step #4) value in Latest checkpoint
 location
 7. makes .backup file in /shared/4backups that contains the same
 information as #3, but with STOP WAL LOCATION taken from Latest
 checkpoint location from #6
 8. compresses /shared/4backups to secondary backup file

 afterwards I get 2 files - data.tar.gz and xlog.tar.gz.

 Now. I got those files, moved them to 3rd server, uncompressed.

 Afterwards I got PGDATA directory, with xlogs in pg_xlog, and no
 recovery.conf.

 When starting from this backup - pg starts, reads all necessary segments
 from pg_xlog, finished recovery, and starts without any problem as
 standalone.

 but, when i, after extraction, before starting - remove all xlog files,
 and move .backup file from pg_xlog to /shared/walarchive, and setup
 recovery.conf using normal omnipitr-restore, to get wal segments from
 /shared/walarchive - pg starts to behave weirdly.

 It starts. Reads all segments from the first mentioned in backup_label
 to the latest available in /shared/walarchvie, and then it fails with
 WAL ends before consistent recovery point - but this is *well* past
 the point that it should know that the state is consistent!

 Files content from my situation:

 =$ cat backup_label.old
 START WAL LOCATION: 33/2213B200 (file 000100330022)
 CHECKPOINT LOCATION: 33/23101410
 START TIME: 2010-12-27 11:53:44 GMT
 LABEL: OmniPITR_Slave_Hot_Backup

 (the file got already renamed to .old by pg recovery process)

 =$ zcat /mnt/db/prod/walarchive/000100330022.0013B200.backup.gz
 START WAL LOCATION: 33/2213B200 (file 000100330022)
 STOP WAL LOCATION: 33/2C24C800 (file 00010033002C)
 CHECKPOINT LOCATION: 33/23101410
 START TIME: 2010-12-27 11:53:44 GMT
 START TIME: 2010-12-27 12:06:17 GMT
 LABEL: OmniPITR_Slave_Hot_Backup

 last 20 lines from log;

 =$ tail -n 20 pg_log/postgresql-2010-12-27_172759.log
 2010-12-27 17:44:16 UTC  [15768]: [1934-1] LOG:  restored log file 
 0001003A00B1 from archive
 2010-12-27 17:44:16 UTC  [15768]: [1935-1] LOG:  restored log file 
 0001003A00B2 from archive
 2010-12-27 17:44:16 UTC  [15768]: [1936-1] LOG:  restored log file 
 0001003A00B3 from archive
 2010-12-27 17:44:17 UTC  [15768]: [1937-1] LOG:  restored log file 
 0001003A00B4 from archive
 2010-12-27 17:44:17 UTC  [15768]: [1938-1] LOG:  restored log file 
 0001003A00B5 from archive
 2010-12-27 17:44:18 UTC  [15768]: [1939-1] LOG:  restored log file 
 0001003A00B6 from archive
 2010-12-27 17:44:18 UTC  [15768]: [1940-1] LOG:  restored log file 
 0001003A00B7 from archive
 2010-12-27 17:44:19 UTC  [15768]: [1941-1] LOG:  restored log file 
 0001003A00B8 from archive
 2010-12-27 17:44:19 UTC  [15768]: [1942-1] LOG:  restored log file 
 0001003A00B9 from archive
 2010-12-27 17:44:20 UTC  [15768]: [1943-1] LOG:  restored log file 
 0001003A00BA from archive
 2010-12-27 17:44:20 UTC  [15768]: [1944-1] LOG:  restored log file 
 0001003A00BB from archive
 2010-12-27 17:44:21 UTC  [15768]: [1945-1] LOG:  restored log file 
 0001003A00BC from archive
 2010-12-27 17:44:21 UTC  [15768]: [1946-1] LOG:  restored log file 
 0001003A00BD from archive
 2010-12-27 17:44:23 UTC  [15768]: [1947-1] LOG:  could not open file 
 pg_xlog/0001003A00BE (log file 58, segment 190): No such file 
 or directory
 2010-12-27 17:44:23 UTC  

[GENERAL] Problem with restoring from backup on 9.0.2

2010-12-27 Thread hubert depesz lubaczewski
hi,
this mail will be relatively long, because I need to explain in details
what/how I do.

Simple summary:

when restoring from hot backup, with out xlogs in pg_xlog/, but instead
using recovery.conf to get xlogs from wal archive, I get WAL ends
before consistent recovery point in case it shouldn't happen.

Details:

I have following setup:

master pg is sending wal segments to 2 places:
  - /shared/walarchive - long term storage, compressed content
  - /shared/4backups   - short term storage, just to do hot backups on slave, 
not compressed

for sending I use omnipitr-archive, with 2 -dr options

slave pg is configured to read wal segments from /shared/4backups (using `cp -i
...`) *and* then to switch to streaming replication.

additionally, on slave there is archive cleanup program, which cleans
/shared/4backups but only if /flags/backup file is not present (which is
only present when backup happens).

to make backups I use omnipitr-backup-slave, which does:

1. creates /flags/backup file
2. gets snapshot of pg_controldata
3. makes backup_label in $PGDATA, where START WAL LOCATION is taken from
Latest checkpoint's REDO location from step #2, and CHECKPOINT
LOCATION: comes from Latest checkpoint location in #2
4. compresses whole $PGDATA
5. gets snapshot of pg_controldata
6. waits till new snapshot of pg_controldata (makes another one every 5
seconds) will have different (from step #4) value in Latest checkpoint
location
7. makes .backup file in /shared/4backups that contains the same
information as #3, but with STOP WAL LOCATION taken from Latest
checkpoint location from #6
8. compresses /shared/4backups to secondary backup file

afterwards I get 2 files - data.tar.gz and xlog.tar.gz.

Now. I got those files, moved them to 3rd server, uncompressed.

Afterwards I got PGDATA directory, with xlogs in pg_xlog, and no
recovery.conf.

When starting from this backup - pg starts, reads all necessary segments
from pg_xlog, finished recovery, and starts without any problem as
standalone.

but, when i, after extraction, before starting - remove all xlog files,
and move .backup file from pg_xlog to /shared/walarchive, and setup
recovery.conf using normal omnipitr-restore, to get wal segments from
/shared/walarchive - pg starts to behave weirdly.

It starts. Reads all segments from the first mentioned in backup_label
to the latest available in /shared/walarchvie, and then it fails with
WAL ends before consistent recovery point - but this is *well* past
the point that it should know that the state is consistent!

Files content from my situation:

=$ cat backup_label.old 
START WAL LOCATION: 33/2213B200 (file 000100330022)
CHECKPOINT LOCATION: 33/23101410
START TIME: 2010-12-27 11:53:44 GMT
LABEL: OmniPITR_Slave_Hot_Backup

(the file got already renamed to .old by pg recovery process)

=$ zcat /mnt/db/prod/walarchive/000100330022.0013B200.backup.gz 
START WAL LOCATION: 33/2213B200 (file 000100330022)
STOP WAL LOCATION: 33/2C24C800 (file 00010033002C)
CHECKPOINT LOCATION: 33/23101410
START TIME: 2010-12-27 11:53:44 GMT
START TIME: 2010-12-27 12:06:17 GMT
LABEL: OmniPITR_Slave_Hot_Backup

last 20 lines from log;

=$ tail -n 20 pg_log/postgresql-2010-12-27_172759.log 
2010-12-27 17:44:16 UTC  [15768]: [1934-1] LOG:  restored log file 
0001003A00B1 from archive
2010-12-27 17:44:16 UTC  [15768]: [1935-1] LOG:  restored log file 
0001003A00B2 from archive
2010-12-27 17:44:16 UTC  [15768]: [1936-1] LOG:  restored log file 
0001003A00B3 from archive
2010-12-27 17:44:17 UTC  [15768]: [1937-1] LOG:  restored log file 
0001003A00B4 from archive
2010-12-27 17:44:17 UTC  [15768]: [1938-1] LOG:  restored log file 
0001003A00B5 from archive
2010-12-27 17:44:18 UTC  [15768]: [1939-1] LOG:  restored log file 
0001003A00B6 from archive
2010-12-27 17:44:18 UTC  [15768]: [1940-1] LOG:  restored log file 
0001003A00B7 from archive
2010-12-27 17:44:19 UTC  [15768]: [1941-1] LOG:  restored log file 
0001003A00B8 from archive
2010-12-27 17:44:19 UTC  [15768]: [1942-1] LOG:  restored log file 
0001003A00B9 from archive
2010-12-27 17:44:20 UTC  [15768]: [1943-1] LOG:  restored log file 
0001003A00BA from archive
2010-12-27 17:44:20 UTC  [15768]: [1944-1] LOG:  restored log file 
0001003A00BB from archive
2010-12-27 17:44:21 UTC  [15768]: [1945-1] LOG:  restored log file 
0001003A00BC from archive
2010-12-27 17:44:21 UTC  [15768]: [1946-1] LOG:  restored log file 
0001003A00BD from archive
2010-12-27 17:44:23 UTC  [15768]: [1947-1] LOG:  could not open file 
pg_xlog/0001003A00BE (log file 58, segment 190): No such file or 
directory
2010-12-27 17:44:23 UTC  [15768]: [1948-1] LOG:  redo done at 3A/BDFFF6E8
2010-12-27 17:44:23 UTC  [15768]: [1949-1] LOG:  last completed transaction was 
at log time 2010-12-27 17:43:23.558028+00