Re: [HACKERS] testing HS/SR - invalid magic number

2010-04-15 Thread Erik Rijkers
On Wed, April 14, 2010 08:23, Heikki Linnakangas wrote:
 Erik Rijkers wrote:
 This replication test that was working well earlier (it ran daily), stopped 
 working

 This is probably because of this change:

 date: 2010/04/12 09:52:29;  author: heikki;  state: Exp;  lines: +71 -23
 Change the logic to decide when to delete old WAL segments, so that it
 doesn't take into account how far the WAL senders are. This way a hung
 WAL sender doesn't prevent old WAL segments from being recycled/removed
 in the primary, ultimately causing the disk to fill up. Instead add
 standby_keep_segments setting to control how many old WAL segments are
 kept in the primary. This also makes it more reliable to use streaming
 replication without WAL archiving, assuming that you set
 standby_keep_segments high enough.

 If you generate enough WAL records in the master that the standby can't
 keep up, the primary will eventually delete a WAL segment that hasn't
 been streamed to the standby yet, hence the requested WAL segment
 00010032 has already been removed error.

ah, I hadn't seen that change, and setting standby_keep_segments
to some higher value (I chose 1 for now) prevents these errors.

It shouldn't
 remove the segment before it's successfully archived, though, and your
 logs show that the standby can't find that file in the archive either.
 Is archiving set up properly?

It wasn't really - I was archiving on the primary server, instead of copying
to the standby server.  I'll fix that, thanks.


Btw, typo alert:  documentation 18.5.4, subhead 'standby_keep_segments',
says: 'replciation' for 'replication'.

http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html#RUNTIME-CONFIG-REPLICATION


thanks,


Erik Rijkers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - invalid magic number

2010-04-15 Thread Heikki Linnakangas
Erik Rijkers wrote:
 Btw, typo alert:  documentation 18.5.4, subhead 'standby_keep_segments',
 says: 'replciation' for 'replication'.
 
 http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html#RUNTIME-CONFIG-REPLICATION

Thanks, fixed.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - invalid magic number

2010-04-14 Thread Heikki Linnakangas
Erik Rijkers wrote:
 On Wed, April 14, 2010 02:34, Erik Rijkers wrote:
 This replication test that was working well earlier (it ran daily), stopped 
 working
 after reinstall of new instances of cvs HEAD. I think the change must have 
 been today (or at least
 recent).

 This test copies a schema from a 8.4.3 instance to the 9.0 primary:

 pg_dump -c -h /tmp -p 5432 -n public --no-owner --no-privileges ms \
| psql -1qtA -h /tmp -p 7575 -d replicas

 
 The above does indeed not work anymore, but the logfiles I sent a little 
 earlier were from
  a different pg_dump statements:
 
 pg_dump -c -h /tmp -p 5432 -t public.mytable --no-owner --no-privileges ms \
   | psql -1qtA -h /tmp -p 7575 -d replicas
 
 ... copying table by table to see if that improved matters, which it didn't.

Streaming replication is completely oblivious of what's in the WAL it
streams, you would probably get the same result with bulk insert or update.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - invalid magic number

2010-04-14 Thread Heikki Linnakangas
Erik Rijkers wrote:
 This replication test that was working well earlier (it ran daily), stopped 
 working
 after reinstall of new instances of cvs HEAD. I think the change must have 
 been today (or at least
 recent).
 ...
 -- logfile standby:
 ...
 2010-04-14 02:21:11 CEST 5601   start=2010-04-14 02:18:22 CEST FATAL:  could 
 not receive data from
 WAL stream: FATAL:  requested WAL segment 00010032 has 
 already been removed
 
 cp: cannot stat 
 `/var/data1/pg_stuff/dump/hotslave/replication_archive/00010032':
 No such file or directory
 2010-04-14 02:21:11 CEST 5598   start=2010-04-14 02:18:22 CEST LOG:  invalid 
 magic number  in
 log file 0, segment 50, offset 13795328
 cp: cannot stat 
 `/var/data1/pg_stuff/dump/hotslave/replication_archive/00010032':
 No such file or directory
 2010-04-14 02:21:11 CEST 5784   start=2010-04-14 02:21:11 CEST LOG:  
 streaming replication
 successfully connected to primary

This is probably because of this change:

 date: 2010/04/12 09:52:29;  author: heikki;  state: Exp;  lines: +71 -23
 Change the logic to decide when to delete old WAL segments, so that it
 doesn't take into account how far the WAL senders are. This way a hung
 WAL sender doesn't prevent old WAL segments from being recycled/removed
 in the primary, ultimately causing the disk to fill up. Instead add
 standby_keep_segments setting to control how many old WAL segments are
 kept in the primary. This also makes it more reliable to use streaming
 replication without WAL archiving, assuming that you set
 standby_keep_segments high enough.

If you generate enough WAL records in the master that the standby can't
keep up, the primary will eventually delete a WAL segment that hasn't
been streamed to the standby yet, hence the requested WAL segment
00010032 has already been removed error. It shouldn't
remove the segment before it's successfully archived, though, and your
logs show that the standby can't find that file in the archive either.
Is archiving set up properly?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] testing HS/SR - invalid magic number

2010-04-13 Thread Erik Rijkers
This replication test that was working well earlier (it ran daily), stopped 
working
after reinstall of new instances of cvs HEAD. I think the change must have been 
today (or at least
recent).

This test copies a schema from a 8.4.3 instance to the 9.0 primary:

pg_dump -c -h /tmp -p 5432 -n public --no-owner --no-privileges ms \
   | psql -1qtA -h /tmp -p 7575 -d replicas


I tried it twice, with a complete reinstall from the cvs repository;
last one just now (created 20100414 0128), and once a few hours earlier.





-- logfile primary:

LOG:  database system was shut down at 2010-04-14 02:15:01 CEST
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
2010-04-14 02:15:14 CEST 14756   start=2010-04-14 02:15:14 CEST LOG:  database 
system was shut
down at 2010-04-14 02:15:12 CEST
2010-04-14 02:15:14 CEST 14752   start=2010-04-14 02:15:13 CEST LOG:  database 
system is ready to
accept connections
2010-04-14 02:18:23 CEST 15100 [unknown] rijkers start=2010-04-14 02:18:22 CEST 
LOG:  replication
connection authorized: user=rijkers host=127.0.0.1 port=55248
[...]
2010-04-14 02:21:10 CEST 15100 [unknown] rijkers start=2010-04-14 02:18:22 CEST 
FATAL:  requested
WAL segment 00010032 has already been removed
2010-04-14 02:21:11 CEST 15377 [unknown] rijkers start=2010-04-14 02:21:11 CEST 
LOG:  replication
connection authorized: user=rijkers host=127.0.0.1 port=50869
2010-04-14 02:21:11 CEST 15377 [unknown] rijkers start=2010-04-14 02:21:11 CEST 
FATAL:  requested
WAL segment 00010032 has already been removed
2010-04-14 02:21:16 CEST 15384 [unknown] rijkers start=2010-04-14 02:21:16 CEST 
LOG:  replication
connection authorized: user=rijkers host=127.0.0.1 port=50870
2010-04-14 02:21:16 CEST 15384 [unknown] rijkers start=2010-04-14 02:21:16 CEST 
FATAL:  requested
WAL segment 00010032 has already been removed
2010-04-14 02:21:21 CEST 15400 [unknown] rijkers start=2010-04-14 02:21:21 CEST 
LOG:  replication
connection authorized: user=rijkers host=127.0.0.1 port=50871
2010-04-14 02:21:22 CEST 15400 [unknown] rijkers start=2010-04-14 02:21:21 CEST 
FATAL:  requested
WAL segment 00010032 has already been removed
2010-04-14 02:21:26 CEST 15406 [unknown] rijkers start=2010-04-14 02:21:26 CEST 
LOG:  replication
connection authorized: user=rijkers host=127.0.0.1 port=50872

etc.


-- logfile standby:
2010-04-14 02:18:22 CEST 5598   start=2010-04-14 02:18:22 CEST LOG:  database 
system was
interrupted; last known up at 2010-04-14 02:17:54 CEST
2010-04-14 02:18:22 CEST 5598   start=2010-04-14 02:18:22 CEST LOG:  entering 
standby mode
cp: cannot stat 
`/var/data1/pg_stuff/dump/hotslave/replication_archive/00010001':
No such file or directory
2010-04-14 02:18:23 CEST 5601   start=2010-04-14 02:18:22 CEST LOG:  streaming 
replication
successfully connected to primary
2010-04-14 02:18:24 CEST 5598   start=2010-04-14 02:18:22 CEST LOG:  redo 
starts at 0/120
2010-04-14 02:18:24 CEST 5598   start=2010-04-14 02:18:22 CEST LOG:  consistent 
recovery state
reached at 0/200
2010-04-14 02:18:24 CEST 5597   start=2010-04-14 02:18:22 CEST LOG:  database 
system is ready to
accept read only connections
2010-04-14 02:21:11 CEST 5601   start=2010-04-14 02:18:22 CEST FATAL:  could 
not receive data from
WAL stream: FATAL:  requested WAL segment 00010032 has already 
been removed

cp: cannot stat 
`/var/data1/pg_stuff/dump/hotslave/replication_archive/00010032':
No such file or directory
2010-04-14 02:21:11 CEST 5598   start=2010-04-14 02:18:22 CEST LOG:  invalid 
magic number  in
log file 0, segment 50, offset 13795328
cp: cannot stat 
`/var/data1/pg_stuff/dump/hotslave/replication_archive/00010032':
No such file or directory
2010-04-14 02:21:11 CEST 5784   start=2010-04-14 02:21:11 CEST LOG:  streaming 
replication
successfully connected to primary
2010-04-14 02:21:11 CEST 5784   start=2010-04-14 02:21:11 CEST FATAL:  could 
not receive data from
WAL stream: FATAL:  requested WAL segment 00010032 has already 
been removed

cp: cannot stat 
`/var/data1/pg_stuff/dump/hotslave/replication_archive/00010032':
No such file or directory
2010-04-14 02:21:11 CEST 5598   start=2010-04-14 02:18:22 CEST LOG:  invalid 
magic number  in
log file 0, segment 50, offset 13795328
cp: cannot stat 
`/var/data1/pg_stuff/dump/hotslave/replication_archive/00010032':
No such file or directory
2010-04-14 02:21:16 CEST 5790   start=2010-04-14 02:21:16 CEST LOG:  streaming 
replication
successfully connected to primary
2010-04-14 02:21:16 CEST 5790   start=2010-04-14 02:21:16 CEST FATAL:  could 
not receive data from
WAL stream: FATAL:  requested WAL segment 00010032 has already 
been removed

Re: [HACKERS] testing HS/SR - invalid magic number

2010-04-13 Thread Erik Rijkers
On Wed, April 14, 2010 02:34, Erik Rijkers wrote:
 This replication test that was working well earlier (it ran daily), stopped 
 working
 after reinstall of new instances of cvs HEAD. I think the change must have 
 been today (or at least
 recent).

 This test copies a schema from a 8.4.3 instance to the 9.0 primary:

 pg_dump -c -h /tmp -p 5432 -n public --no-owner --no-privileges ms \
| psql -1qtA -h /tmp -p 7575 -d replicas


The above does indeed not work anymore, but the logfiles I sent a little 
earlier were from
 a different pg_dump statements:

pg_dump -c -h /tmp -p 5432 -t public.mytable --no-owner --no-privileges ms \
  | psql -1qtA -h /tmp -p 7575 -d replicas

... copying table by table to see if that improved matters, which it didn't.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers