Re: [GENERAL] Problem with restoring from backup on 9.0.2
On Thu, Jan 27, 2011 at 10:23:52PM +, Thom Brown wrote: Depesz, did you ever resolve this? Robert Treat did some digging. Current status is: slav backup work as long as you don't enable hot standby. I will be working on omnipitr-backup-slave fix, but can't give you eta at the moment. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problem with restoring from backup on 9.0.2
On 27 December 2010 18:24, hubert depesz lubaczewski dep...@depesz.com wrote: hi, this mail will be relatively long, because I need to explain in details what/how I do. Simple summary: when restoring from hot backup, with out xlogs in pg_xlog/, but instead using recovery.conf to get xlogs from wal archive, I get WAL ends before consistent recovery point in case it shouldn't happen. Details: I have following setup: master pg is sending wal segments to 2 places: - /shared/walarchive - long term storage, compressed content - /shared/4backups - short term storage, just to do hot backups on slave, not compressed for sending I use omnipitr-archive, with 2 -dr options slave pg is configured to read wal segments from /shared/4backups (using `cp -i ...`) *and* then to switch to streaming replication. additionally, on slave there is archive cleanup program, which cleans /shared/4backups but only if /flags/backup file is not present (which is only present when backup happens). to make backups I use omnipitr-backup-slave, which does: 1. creates /flags/backup file 2. gets snapshot of pg_controldata 3. makes backup_label in $PGDATA, where START WAL LOCATION is taken from Latest checkpoint's REDO location from step #2, and CHECKPOINT LOCATION: comes from Latest checkpoint location in #2 4. compresses whole $PGDATA 5. gets snapshot of pg_controldata 6. waits till new snapshot of pg_controldata (makes another one every 5 seconds) will have different (from step #4) value in Latest checkpoint location 7. makes .backup file in /shared/4backups that contains the same information as #3, but with STOP WAL LOCATION taken from Latest checkpoint location from #6 8. compresses /shared/4backups to secondary backup file afterwards I get 2 files - data.tar.gz and xlog.tar.gz. Now. I got those files, moved them to 3rd server, uncompressed. Afterwards I got PGDATA directory, with xlogs in pg_xlog, and no recovery.conf. When starting from this backup - pg starts, reads all necessary segments from pg_xlog, finished recovery, and starts without any problem as standalone. but, when i, after extraction, before starting - remove all xlog files, and move .backup file from pg_xlog to /shared/walarchive, and setup recovery.conf using normal omnipitr-restore, to get wal segments from /shared/walarchive - pg starts to behave weirdly. It starts. Reads all segments from the first mentioned in backup_label to the latest available in /shared/walarchvie, and then it fails with WAL ends before consistent recovery point - but this is *well* past the point that it should know that the state is consistent! Files content from my situation: =$ cat backup_label.old START WAL LOCATION: 33/2213B200 (file 000100330022) CHECKPOINT LOCATION: 33/23101410 START TIME: 2010-12-27 11:53:44 GMT LABEL: OmniPITR_Slave_Hot_Backup (the file got already renamed to .old by pg recovery process) =$ zcat /mnt/db/prod/walarchive/000100330022.0013B200.backup.gz START WAL LOCATION: 33/2213B200 (file 000100330022) STOP WAL LOCATION: 33/2C24C800 (file 00010033002C) CHECKPOINT LOCATION: 33/23101410 START TIME: 2010-12-27 11:53:44 GMT START TIME: 2010-12-27 12:06:17 GMT LABEL: OmniPITR_Slave_Hot_Backup last 20 lines from log; =$ tail -n 20 pg_log/postgresql-2010-12-27_172759.log 2010-12-27 17:44:16 UTC [15768]: [1934-1] LOG: restored log file 0001003A00B1 from archive 2010-12-27 17:44:16 UTC [15768]: [1935-1] LOG: restored log file 0001003A00B2 from archive 2010-12-27 17:44:16 UTC [15768]: [1936-1] LOG: restored log file 0001003A00B3 from archive 2010-12-27 17:44:17 UTC [15768]: [1937-1] LOG: restored log file 0001003A00B4 from archive 2010-12-27 17:44:17 UTC [15768]: [1938-1] LOG: restored log file 0001003A00B5 from archive 2010-12-27 17:44:18 UTC [15768]: [1939-1] LOG: restored log file 0001003A00B6 from archive 2010-12-27 17:44:18 UTC [15768]: [1940-1] LOG: restored log file 0001003A00B7 from archive 2010-12-27 17:44:19 UTC [15768]: [1941-1] LOG: restored log file 0001003A00B8 from archive 2010-12-27 17:44:19 UTC [15768]: [1942-1] LOG: restored log file 0001003A00B9 from archive 2010-12-27 17:44:20 UTC [15768]: [1943-1] LOG: restored log file 0001003A00BA from archive 2010-12-27 17:44:20 UTC [15768]: [1944-1] LOG: restored log file 0001003A00BB from archive 2010-12-27 17:44:21 UTC [15768]: [1945-1] LOG: restored log file 0001003A00BC from archive 2010-12-27 17:44:21 UTC [15768]: [1946-1] LOG: restored log file 0001003A00BD from archive 2010-12-27 17:44:23 UTC [15768]: [1947-1] LOG: could not open file pg_xlog/0001003A00BE (log file 58, segment 190): No such file or directory 2010-12-27 17:44:23 UTC
[GENERAL] Problem with restoring from backup on 9.0.2
hi, this mail will be relatively long, because I need to explain in details what/how I do. Simple summary: when restoring from hot backup, with out xlogs in pg_xlog/, but instead using recovery.conf to get xlogs from wal archive, I get WAL ends before consistent recovery point in case it shouldn't happen. Details: I have following setup: master pg is sending wal segments to 2 places: - /shared/walarchive - long term storage, compressed content - /shared/4backups - short term storage, just to do hot backups on slave, not compressed for sending I use omnipitr-archive, with 2 -dr options slave pg is configured to read wal segments from /shared/4backups (using `cp -i ...`) *and* then to switch to streaming replication. additionally, on slave there is archive cleanup program, which cleans /shared/4backups but only if /flags/backup file is not present (which is only present when backup happens). to make backups I use omnipitr-backup-slave, which does: 1. creates /flags/backup file 2. gets snapshot of pg_controldata 3. makes backup_label in $PGDATA, where START WAL LOCATION is taken from Latest checkpoint's REDO location from step #2, and CHECKPOINT LOCATION: comes from Latest checkpoint location in #2 4. compresses whole $PGDATA 5. gets snapshot of pg_controldata 6. waits till new snapshot of pg_controldata (makes another one every 5 seconds) will have different (from step #4) value in Latest checkpoint location 7. makes .backup file in /shared/4backups that contains the same information as #3, but with STOP WAL LOCATION taken from Latest checkpoint location from #6 8. compresses /shared/4backups to secondary backup file afterwards I get 2 files - data.tar.gz and xlog.tar.gz. Now. I got those files, moved them to 3rd server, uncompressed. Afterwards I got PGDATA directory, with xlogs in pg_xlog, and no recovery.conf. When starting from this backup - pg starts, reads all necessary segments from pg_xlog, finished recovery, and starts without any problem as standalone. but, when i, after extraction, before starting - remove all xlog files, and move .backup file from pg_xlog to /shared/walarchive, and setup recovery.conf using normal omnipitr-restore, to get wal segments from /shared/walarchive - pg starts to behave weirdly. It starts. Reads all segments from the first mentioned in backup_label to the latest available in /shared/walarchvie, and then it fails with WAL ends before consistent recovery point - but this is *well* past the point that it should know that the state is consistent! Files content from my situation: =$ cat backup_label.old START WAL LOCATION: 33/2213B200 (file 000100330022) CHECKPOINT LOCATION: 33/23101410 START TIME: 2010-12-27 11:53:44 GMT LABEL: OmniPITR_Slave_Hot_Backup (the file got already renamed to .old by pg recovery process) =$ zcat /mnt/db/prod/walarchive/000100330022.0013B200.backup.gz START WAL LOCATION: 33/2213B200 (file 000100330022) STOP WAL LOCATION: 33/2C24C800 (file 00010033002C) CHECKPOINT LOCATION: 33/23101410 START TIME: 2010-12-27 11:53:44 GMT START TIME: 2010-12-27 12:06:17 GMT LABEL: OmniPITR_Slave_Hot_Backup last 20 lines from log; =$ tail -n 20 pg_log/postgresql-2010-12-27_172759.log 2010-12-27 17:44:16 UTC [15768]: [1934-1] LOG: restored log file 0001003A00B1 from archive 2010-12-27 17:44:16 UTC [15768]: [1935-1] LOG: restored log file 0001003A00B2 from archive 2010-12-27 17:44:16 UTC [15768]: [1936-1] LOG: restored log file 0001003A00B3 from archive 2010-12-27 17:44:17 UTC [15768]: [1937-1] LOG: restored log file 0001003A00B4 from archive 2010-12-27 17:44:17 UTC [15768]: [1938-1] LOG: restored log file 0001003A00B5 from archive 2010-12-27 17:44:18 UTC [15768]: [1939-1] LOG: restored log file 0001003A00B6 from archive 2010-12-27 17:44:18 UTC [15768]: [1940-1] LOG: restored log file 0001003A00B7 from archive 2010-12-27 17:44:19 UTC [15768]: [1941-1] LOG: restored log file 0001003A00B8 from archive 2010-12-27 17:44:19 UTC [15768]: [1942-1] LOG: restored log file 0001003A00B9 from archive 2010-12-27 17:44:20 UTC [15768]: [1943-1] LOG: restored log file 0001003A00BA from archive 2010-12-27 17:44:20 UTC [15768]: [1944-1] LOG: restored log file 0001003A00BB from archive 2010-12-27 17:44:21 UTC [15768]: [1945-1] LOG: restored log file 0001003A00BC from archive 2010-12-27 17:44:21 UTC [15768]: [1946-1] LOG: restored log file 0001003A00BD from archive 2010-12-27 17:44:23 UTC [15768]: [1947-1] LOG: could not open file pg_xlog/0001003A00BE (log file 58, segment 190): No such file or directory 2010-12-27 17:44:23 UTC [15768]: [1948-1] LOG: redo done at 3A/BDFFF6E8 2010-12-27 17:44:23 UTC [15768]: [1949-1] LOG: last completed transaction was at log time 2010-12-27 17:43:23.558028+00