subject:"\[HACKERS\] Hot Backup with rsync fails at pg

On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 This fixes both the subtrans and clog bugs in one patch.

 I don't see the point of changing StartupCLOG() to be an empty
 function and adding a new function TrimCLOG() that does everything
 StartupCLOG() used to do.

 +1 ... I found that overly cute also.

It would have been even easier to move StartupCLOG() later, but then
we'd need a big comment explaining why CLOG starts up at one point and
subtrans starts up at another point, since that is very confusing way
of doing things. I wrote it that way first and it definitely looks
strange.

It's much easier to understand that StartupCLOG() is actually a no-op
and that we need to trim the clog at the end of recovery in all cases.

The patch isn't meant to be cute, just a better of way of expressing
what needs to be done, so I think the patch should stay that way.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Heikki Linnakangas


On 27.10.2011 02:29, Florian Pflug wrote:

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
   The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
   checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
   extension makes it to the disk, and before any trace of the XID assigned
   in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.


No, clog extension is WAL-logged while holding the XidGenLock. At step 
3, LogStandbySnapshot() would block until the clog-extension record is 
written to WAL, so crash recovery would see and replay that record 
before calling StartupCLOG().


That can happen during hot standby, though, because StartupCLOG() is 
called earlier.



My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
CheckPointGuts().

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.


Hmm, I don't think that fully fixes the problem. Even if you're certain 
that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might 
decide to truncate it away again while the checkpoint is running.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Robert Haas

On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 This fixes both the subtrans and clog bugs in one patch.

 I don't see the point of changing StartupCLOG() to be an empty
 function and adding a new function TrimCLOG() that does everything
 StartupCLOG() used to do.

 +1 ... I found that overly cute also.

 It would have been even easier to move StartupCLOG() later, but then
 we'd need a big comment explaining why CLOG starts up at one point and
 subtrans starts up at another point, since that is very confusing way
 of doing things. I wrote it that way first and it definitely looks
 strange.

 It's much easier to understand that StartupCLOG() is actually a no-op
 and that we need to trim the clog at the end of recovery in all cases.

If it's a no-op, why have it at all?  I know we have a bunch of places
in the code where we have empty stubs where there used to be
initialization or cleanup code, but I've never found that particularly
good style.  If something no longer requires initialization in a
certain place, I think we should nuke the whole function.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug

On Oct27, 2011, at 08:57 , Heikki Linnakangas wrote:
 On 27.10.2011 02:29, Florian Pflug wrote:
 Per my theory about the cause of the problem in my other mail, I think you
 might see StartupCLOG failures even during crash recovery, provided that
 wal_level was set to hot_standby when the primary crashed. Here's how
 
 1) We start a checkpoint, and get as far as LogStandbySnapshot()
 2) A backend does AssignTransactionId, and gets as far as 
 GetTransactionoId().
   The assigned XID requires CLOG extension.
 3) The checkpoint continues, and LogStandbySnapshot () advances the
   checkpoint's nextXid to the XID assigned in (2).
 4) We crash after writing the checkpoint record, but before the CLOG
   extension makes it to the disk, and before any trace of the XID assigned
   in (2) makes it to the xlog.
 
 Then StartupCLOG() would fail at the end of recovery, because we'd end up
 with a nextXid whose corresponding CLOG page doesn't exist.
 
 No, clog extension is WAL-logged while holding the XidGenLock. At step 3,
 LogStandbySnapshot() would block until the clog-extension record is written
 to WAL, so crash recovery would see and replay that record before calling
 StartupCLOG().

Hm, true. But it still seems wrong for LogStandbySnapshot() to modify the
checkpoint's nextXid, and even more wrong to do that only if wal_mode =
hot_standby. Plus, I think it's a smart idea to verify that the required
parts of the CLOG are available at the start of recovery. Because if they're
missing, the data on the standby *will* be corrupted. Is there any argument
against doiing LogStandbySnapshot() earlier (i.e., at the time oldestActiveXid
is computed)?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com 
 wrote:
 This fixes both the subtrans and clog bugs in one patch.

 I don't see the point of changing StartupCLOG() to be an empty
 function and adding a new function TrimCLOG() that does everything
 StartupCLOG() used to do.

 +1 ... I found that overly cute also.

 It would have been even easier to move StartupCLOG() later, but then
 we'd need a big comment explaining why CLOG starts up at one point and
 subtrans starts up at another point, since that is very confusing way
 of doing things. I wrote it that way first and it definitely looks
 strange.

 It's much easier to understand that StartupCLOG() is actually a no-op
 and that we need to trim the clog at the end of recovery in all cases.

 If it's a no-op, why have it at all?  I know we have a bunch of places
 in the code where we have empty stubs where there used to be
 initialization or cleanup code, but I've never found that particularly
 good style.  If something no longer requires initialization in a
 certain place, I think we should nuke the whole function.

It is a no-op for exactly the same reason other similar functions are
no-ops - it used to do something but now does not.

Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog
will immediately ask why?.
IMHO it's easier to have an obviously named function than a comment -
its less invasive for a backpatch as well.

I'm following current code style. If you wish to change that, feel
free to change this and all other locations that do this. Until then,
doing this makes most sense and follows current coding style.

If I had done it the way you suggest, I don't doubt someone would say
in about 6 months Which idiot removed StartupClog()?.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug f...@phlo.org wrote:

 Per my theory about the cause of the problem in my other mail, I think you
 might see StartupCLOG failures even during crash recovery, provided that
 wal_level was set to hot_standby when the primary crashed. Here's how

 1) We start a checkpoint, and get as far as LogStandbySnapshot()
 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
  The assigned XID requires CLOG extension.
 3) The checkpoint continues, and LogStandbySnapshot () advances the
  checkpoint's nextXid to the XID assigned in (2).
 4) We crash after writing the checkpoint record, but before the CLOG
  extension makes it to the disk, and before any trace of the XID assigned
  in (2) makes it to the xlog.

 Then StartupCLOG() would fail at the end of recovery, because we'd end up
 with a nextXid whose corresponding CLOG page doesn't exist.

Clog extension holds XidGenLock, as does LogStandbySnapshot, which
specifically excludes the above scenario.


 Quite aside from that concern, I think it's probably not a good idea
 for the nextXid value of a checkpoint to depend on whether wal_level
 was set to hot_standby or not. Our recovery code is already quite complex
 and hard to test, and this just adds one more combination that has to
 be thought about while coding and that needs to be tested.

 My suggestion is to fix the CLOG problem in that same way that you fixed
 the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
 CheckPointGuts().

 Here's what I image CreateCheckPoint() should look like:

 1) LogStandbySnapshot() and fill out oldestActiveXid
 2) Fill out REDO
 3) Wait for concurrent commits
 4) Fill out nextXid and the other fields
 5) CheckPointGuts()
 6) Rest

 It's then no longer necessary for LogStandbySnapshot() do modify
 the nextXid, since we fill out nextXid after LogStandbySnapshot() and
 will thus derive a higher value than LogStandbySnapshot() would have.

 We could then also fold GetOldestActiveTransactionId() back into
 your proposed LogStandbySnapshot() and thus don't need two ProcArray
 traversals.

I think you make a good case for doing this.

However, I'm concerned that moving LogStandbySnapshot() in a backpatch
seems more risky than it's worth. We could easily introduce a new bug
into what we would all agree is a complex piece of code. Minimal
change seems best in this case.

And also, 2 ProcArray traversals is not a problem there.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug

On Oct27, 2011, at 15:51 , Simon Riggs wrote:
 On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug f...@phlo.org wrote:
 Here's what I image CreateCheckPoint() should look like:
 
 1) LogStandbySnapshot() and fill out oldestActiveXid
 2) Fill out REDO
 3) Wait for concurrent commits
 4) Fill out nextXid and the other fields
 5) CheckPointGuts()
 6) Rest
 
 It's then no longer necessary for LogStandbySnapshot() do modify
 the nextXid, since we fill out nextXid after LogStandbySnapshot() and
 will thus derive a higher value than LogStandbySnapshot() would have.
 
 We could then also fold GetOldestActiveTransactionId() back into
 your proposed LogStandbySnapshot() and thus don't need two ProcArray
 traversals.
 
 I think you make a good case for doing this.
 
 However, I'm concerned that moving LogStandbySnapshot() in a backpatch
 seems more risky than it's worth. We could easily introduce a new bug
 into what we would all agree is a complex piece of code. Minimal
 change seems best in this case.

OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
Your proposed patch changes that, which also carries a risk since something
could depend on these values being in sync. Especially since both the logged
snapshot and oldestActiveXid influence the snapshot tracking on the slave.

But since you wrote most of that code, your judgement about the relative
risks of these two approaches obviously out-weights mine.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 It's much easier to understand that StartupCLOG() is actually a no-op
 and that we need to trim the clog at the end of recovery in all cases.

 If it's a no-op, why have it at all?

 It is a no-op for exactly the same reason other similar functions are
 no-ops - it used to do something but now does not.

 Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog
 will immediately ask why?.

I think it's a good point that StartupCLog doesn't exist in a vacuum
but should be parallel to the init functions for the other SLRU modules.
So at this point I think I agree with Simon's approach.  However, the
obvious next question is whether those other modules don't need to be
changed also, and if not why not.

Another issue is that if StartupCLog is left as a no-op, what will
happen if someone mistakenly tries to access clog before the trim
function is called?  It would be a good idea to make sure that such
a thing results in an easily-identifiable failure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug f...@phlo.org wrote:

 I think you make a good case for doing this.

 However, I'm concerned that moving LogStandbySnapshot() in a backpatch
 seems more risky than it's worth. We could easily introduce a new bug
 into what we would all agree is a complex piece of code. Minimal
 change seems best in this case.

 OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
 Your proposed patch changes that, which also carries a risk since something
 could depend on these values being in sync. Especially since both the logged
 snapshot and oldestActiveXid influence the snapshot tracking on the slave.

 But since you wrote most of that code, your judgement about the relative
 risks of these two approaches obviously out-weights mine.

We must move oldestActiveXid since that is the source of a bug. There
is no need to move LogStandbySnapshot(), so I am suggesting we don't
do that for the backpatch. I was going to implement it the way you
suggest in HEAD, since I agree that is a cleaner way.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Thu, Oct 27, 2011 at 3:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 However, the
 obvious next question is whether those other modules don't need to be
 changed also, and if not why not.

Good point.

StartupSubtrans() is also changed by this patch, since it will be
supplied with an earlier initialisation value.

StartupMultiXact() didn't need changing, I thought, but I will review further.

 Another issue is that if StartupCLog is left as a no-op, what will
 happen if someone mistakenly tries to access clog before the trim
 function is called?  It would be a good idea to make sure that such
 a thing results in an easily-identifiable failure.

The old StartupCLOG() didn't do anything that was essential to using
the clog, which is why its a no-op.

You can still use the clog, just with zero startup.

Maybe setting the current page should go in at startup, will think.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug

On Oct27, 2011, at 16:30 , Simon Riggs wrote:
 On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug f...@phlo.org wrote:
 
 I think you make a good case for doing this.
 
 However, I'm concerned that moving LogStandbySnapshot() in a backpatch
 seems more risky than it's worth. We could easily introduce a new bug
 into what we would all agree is a complex piece of code. Minimal
 change seems best in this case.
 
 OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
 Your proposed patch changes that, which also carries a risk since something
 could depend on these values being in sync. Especially since both the logged
 snapshot and oldestActiveXid influence the snapshot tracking on the slave.
 
 But since you wrote most of that code, your judgement about the relative
 risks of these two approaches obviously out-weights mine.
 
 We must move oldestActiveXid since that is the source of a bug. There
 is no need to move LogStandbySnapshot(), so I am suggesting we don't
 do that for the backpatch. I was going to implement it the way you
 suggest in HEAD, since I agree that is a cleaner way.

Sound good.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Tue, Oct 25, 2011 at 10:06 PM, Chris Redekop ch...@replicon.com wrote:
 Chris, can you rearrange the backup so you copy the pg_control file as
 the first act after the pg_start_backup?
 I tried this and it doesn't seem to make any difference.

It won't, that was a poor initial diagnosis on my part.

 I also tried the
 patch and I can no longer reproduce the subtrans error

Good

, however instead it
 now it starts up, but never gets to the point where it'll accept
 connections.  It starts up but if I try to do anything I always get FATAL:
  the database system is starting up...even if the load is removed from the
 primary, the standby still never finishes starting up.
...
 2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG:  consistent state delayed
 because recovery snapshot incomplete
...

This is a different problem and has already been reported by one of
your colleagues in a separate thread, and answered in detail by me
there. There is no bug related to this error message.

From here, it looks like the published fixes the originally reported problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Oct25, 2011, at 14:51 , Simon Riggs wrote:
 On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug f...@phlo.org wrote:
 
 What I don't understand is how this affects the CLOG. How does 
 oldestActiveXID
 factor into CLOG initialization?
 
 It is an entirely different error.

Ah, OK. I assumed that you believe the wrong oldestActiveXID computation
solved both the SUBTRANS-related *and* the CLOG-related errors, since you
said We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly at the start of the mail.

 Chris' clog error was caused by a file read error. The file was
 opened, we did a seek within the file and then the call to read()
 failed to return a complete page from the file.
 
 The xid shown is 22811359, which is the nextxid in the control file.
 
 So StartupClog() must have failed trying to read the clog page from disk.

Yep.

 That isn't a Hot Standby problem, a recovery problem nor is it certain
 its a PostgreSQL problem.

It's very likely that it's a PostgreSQL problem, though. It's probably
not a pilot error since it happens even for backups taken with pg_basebackup(),
so the only explanation other than a PostgreSQL bug is broken hardware or
a pretty serious kernel/filesystem bug.

 OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
 files during recovery, so maybe we can think of a way to make recovery
 cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

As long as we don't understand how the CLOG-related errors happen in
the first place, I think it's a bad idea to silence them.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Oct25, 2011, at 13:39 , Florian Pflug wrote:
 On Oct25, 2011, at 11:13 , Simon Riggs wrote:
 On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs si...@2ndquadrant.com wrote:
 We are starting recovery at the right place but we are initialising
 the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
 being derived later than it should be, which can cause problems if
 this then means that whole pages are unitialised in subtrans. The bug
 only shows up if you do enough transactions (2048 is always enough) to
 move to the next subtrans page between the redo pointer and the
 checkpoint record while at the same time we do not have a long running
 transaction that spans those two points. That's just enough to happen
 reasonably frequently on busy systems and yet just enough to have
 slipped through testing.
 
 We must either
 
 1. During CreateCheckpoint() we should derive oldestActiveXid before
 we derive the redo location
 
 (1) looks the best way forwards in all cases.
 
 Let me see if I understand this
 
 The probem seems to be that we currently derive oldestActiveXid end the end of
 the checkpoint, just before writing the checkpoint record. Since we use
 oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before
 that checkpoint record (but after the REDO location, of course) may very well
 contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt
 to touch these XID's SUBTRANS state, we error out.
 
 Your patch seems sensible, because the checkpoint logically occurs at the
 REDO location not the checkpoint's location, so we ought to log an 
 oldestActiveXID
 corresponding to that location.

Thinking about this some more (and tracing through the code), I realized that
things are a bit more complicated.

What we actually need to ensure, I think, is that the XID we pass to 
StartupSUBTRANS()
is earlier than any top-level XID in XLOG_XACT_ASSIGNMENT records. Which, at 
first
glance, implies that we ought to use the nextId at the *beginning* of the 
checkpoint
for SUBTRANS initialization. At second glace, however, that'd be wrong, because
backends emit XLOG_XACT_ASSIGNMENT only every PGPROC_MAX_CACHED_SUBXIDS sub-xid
assignment. Thus, an XLOG_XACT_ASSIGNMENT written *after* the checkpoint has 
started
may contain sub-XIDs which were assigned *before* the checkpoint has started.

Using oldestActiveXID works around that because we guarantee that sub-XIDs are 
always
larger than their parent XIDs and because only active transactions can produce
XLOG_XACT_ASSIGNMENT records.

So your patch is fine, but I think the reasoning about why oldestActiveXID is
the correct value for StartupSUBTRANS deserves an explanation somewhere.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Wed, Oct 26, 2011 at 12:16 PM, Florian Pflug f...@phlo.org wrote:

 Chris' clog error was caused by a file read error. The file was
 opened, we did a seek within the file and then the call to read()
 failed to return a complete page from the file.

 The xid shown is 22811359, which is the nextxid in the control file.

 So StartupClog() must have failed trying to read the clog page from disk.

 Yep.

 That isn't a Hot Standby problem, a recovery problem nor is it certain
 its a PostgreSQL problem.

 It's very likely that it's a PostgreSQL problem, though. It's probably
 not a pilot error since it happens even for backups taken with 
 pg_basebackup(),
 so the only explanation other than a PostgreSQL bug is broken hardware or
 a pretty serious kernel/filesystem bug.

The way forwards here is for someone to show the clog file that causes
the error and find out why the call to read() fails.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Aidan Van Dyk

On Wed, Oct 26, 2011 at 7:43 AM, Simon Riggs si...@2ndquadrant.com wrote:

It's very likely that it's a PostgreSQL problem, though. It's probably
not a pilot error since it happens even for backups taken with
pg_basebackup(),
so the only explanation other than a PostgreSQL bug is broken hardware or
a pretty serious kernel/filesystem bug.

The way forwards here is for someone to show the clog file that causes
the error and find out why the call to read() fails.

Sorry, I thought the problem was obvious. Either that, of I've
completely missed something in these threads... I'll admit to not
following this one very closely anymore...

When the backup started, the clog was small. So on the recovering
instance, the clog is small. PostgreSQL is supposed to be able to
deal with any file as it was when the backup starts.

When the backup is stopped, clog is big. But that file was copied
after the backup was started, not after the backup finished. So its
size is only guarenteed to be as big as it was when the backup
started. Recovery is responsible for extending it as it was extended
during the backup period on the master.

The read fails because their is no data at the location it's trying to
read from, because clog hasn't been extended yet by recovery.

a.
--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk ai...@highrise.ca wrote:

 The read fails because their is no data at the location it's trying to
 read from, because clog hasn't been extended yet by recovery.

You don't actually know that, though I agree it seems a reasonable
guess and was my first thought also.

The error is very specifically referring to 22811359, which is the
nextxid from pg_control and updated by checkpoint.

22811359 is mid-way through a clog page, so prior xids will already
have been allocated, pages extended and then those pages fsyncd before
the end of pg_start_backup().  So it shouldn't be possible for that
page to be absent from the base backup, unless the base backup was
taken without a preceding checkpoint, which seems is not the case from
the script output.

Note that if you are correct, then the solution is to extend clog,
which Florian disagrees with as a solution.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Oct26, 2011, at 15:12 , Simon Riggs wrote:
 On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk ai...@highrise.ca wrote:
 
 The read fails because their is no data at the location it's trying to
 read from, because clog hasn't been extended yet by recovery.
 
 You don't actually know that, though I agree it seems a reasonable
 guess and was my first thought also.

The actual error message also supports that theory. Here's the relevant
snippet from the OP's log (Found in 
ca9fd2fe.1d8d2%linas.virba...@continuent.com)

2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 1188673
2011-09-21 13:41:05 CEST DETAIL:  Could not read from file pg_clog/0001 at 
offset 32768: Success.

Note that it says Success at the end of the second log entry. That
can only happen, I think, if we're trying to read the page adjacent to
the last page in the file. The seek would be successfull, and the subsequent
read() would indicate EOF by returning zero bytes. None of the calls would
set errno. If there was a real IO error, read() would set errno, and if the
page wasn't adjacent to the last page in the file, seek() would set errno.
In both cases we'd see the corresponding error messag, not Success.

 The error is very specifically referring to 22811359, which is the
 nextxid from pg_control and updated by checkpoint.

Where does that XID come from? The reference to that XID in the archives
that I can find is in your message
CA+U5nMKUUoA8kRG=itfso5nzue3x_kdjz78eaun3_fkmq-u...@mail.gmail.com

 22811359 is mid-way through a clog page, so prior xids will already
 have been allocated, pages extended and then those pages fsyncd before
 the end of pg_start_backup().  So it shouldn't be possible for that
 page to be absent from the base backup, unless the base backup was
 taken without a preceding checkpoint, which seems is not the case from
 the script output.

Or unless the nextId we store in the checkpoint is for some reason higher
than it should be. Or unless nextId somehow gets mangled during recovery.
Or unless there's some interaction between VACUUM and CHECKPOINTS that
we're overlooking...

 Note that if you are correct, then the solution is to extend clog,
 which Florian disagrees with as a solution.

That's not what I said. As you said, the CLOG page corresponding to nextId
*should* always be accessible at the start of recovery (Unless whole file
has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
Yet the error suggest that the CLOG is, in fact, too short. What I said
is that we shouldn't apply any fix (for the CLOG problem) before we understand
the reason for that apparent contradiction.

Doing it nevertheless to get rid of this seems dangerous. What happens, for
example, to the CLOG state of transactions earlier than the checkpoint's
nextId? There COMMIT record may very well lie before the checkpoint's REDO
pointer, so the CLOG we copied better contained their correct state. Yet if
it does, then why isn't the nextId's CLOG page accessible?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Chris Redekop

 And I think they also reported that if they didn't run hot standby,
 but just normal recovery into a new master, it didn't have the problem
 either, i.e. without hotstandby, recovery ran, properly extended the
 clog, and then ran as a new master fine.

Yes this is correct...attempting to start as hotstandby will produce the
pg_clog error repeatedly and then without changing anything else, just
turning hot standby off it will start up successfully.

 This fits the OP's observation ob the
 problem vanishing when pg_start_backup() does an immediate checkpoint.

Note that this is *not* the behaviour I'm seeingit's possible it happens
more frequently without the immediate checkpoint, but I am seeing it happen
even with the immediate checkpoint.

 This is a different problem and has already been reported by one of
 your colleagues in a separate thread, and answered in detail by me
 there. There is no bug related to this error message.

Excellent...I will continue this discussion in that thread.

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Oct26, 2011, at 15:57 , Florian Pflug wrote:
 As you said, the CLOG page corresponding to nextId
 *should* always be accessible at the start of recovery (Unless whole file
 has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
 Yet the error suggest that the CLOG is, in fact, too short. What I said
 is that we shouldn't apply any fix (for the CLOG problem) before we understand
 the reason for that apparent contradiction.

Ha! I think I've got a working theory.

In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
record, and then call CheckPointGuts() which does the actual writing and 
fsyncing.
So far, that fine. If a transaction ID is assigned before we compute the
checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() 
will
make sure the new CLOG page goes to disk.

But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
fine too, except that LogStandbySnapshot() re-assigned the *current* value of
ShmemVariableCache-nextXid to the checkpoint's nextXid field.

Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
whose CLOG page hasn't necessarily made it to the disk yet. The longer 
CheckPointGuts()
takes to finish it's work the more likely it becomes (assuming that CLOG writing
and syncing doesn't happen at the very end). This fits the OP's observation ob 
the
problem vanishing when pg_start_backup() does an immediate checkpoint.

I dunno how to this fix, though, since I don't really understand why
LogStandbySnapshot() needs to modify the checkpoint's nextXid.Simon, is there 
some
documentation on what assumptions the hot standby code makes about the various 
XID
fields included in a checkpoint?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Aidan Van Dyk

On Wed, Oct 26, 2011 at 9:57 AM, Florian Pflug f...@phlo.org wrote:
On Oct26, 2011, at 15:12 , Simon Riggs wrote:
On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk ai...@highrise.ca wrote:

The read fails because their is no data at the location it's trying to
read from, because clog hasn't been extended yet by recovery.

You don't actually know that, though I agree it seems a reasonable
guess and was my first thought also.

The actual error message also supports that theory. Here's the relevant
snippet from the OP's log (Found in
ca9fd2fe.1d8d2%linas.virba...@continuent.com)

2011-09-21 13:41:05 CEST FATAL: could not access status of transaction
1188673
2011-09-21 13:41:05 CEST DETAIL: Could not read from file pg_clog/0001 at
offset 32768: Success.

Note that it says Success at the end of the second log entry. That
can only happen, I think, if we're trying to read the page adjacent to
the last page in the file. The seek would be successfull, and the subsequent
read() would indicate EOF by returning zero bytes. None of the calls would
set errno. If there was a real IO error, read() would set errno, and if the
page wasn't adjacent to the last page in the file, seek() would set errno.
In both cases we'd see the corresponding error messag, not Success.

And even more pointedly, in the original go around on this:
http://article.gmane.org/gmane.comp.db.postgresql.devel.general/174056

He reported that clog/ after pg_start_backup call:
-rw--- 1 postgres postgres 8192 Sep 23 14:31

Changed during the rsync phase to this:
-rw--- 1 postgres postgres 16384 Sep 23 14:33

But on the slave, of course, it was copied before it was extend so it
was the original size (that's ok, that's the point of recovery after
the backup):
-rw--- 1 postgres postgres 8192 Sep 23 14:31

With the error:
2011-09-23 14:33:46 CEST FATAL: could not access status of transaction 37206
2011-09-23 14:33:46 CEST DETAIL: Could not read from file
pg_clog/ at offset 8192: Success.

And that error happens *before* recovery even can get attempted.

And that if he copied the recent clog/ from the master, it did start up.

And I think they also reported that if they didn't run hot standby,
but just normal recovery into a new master, it didn't have the problem
either, i.e. without hotstandby, recovery ran, properly extended the
clog, and then ran as a new master fine.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Oct26, 2011, at 17:36 , Chris Redekop wrote:
  And I think they also reported that if they didn't run hot standby,
  but just normal recovery into a new master, it didn't have the problem
  either, i.e. without hotstandby, recovery ran, properly extended the
  clog, and then ran as a new master fine.
 
 Yes this is correct...attempting to start as hotstandby will produce the
 pg_clog error repeatedly and then without changing anything else, just
 turning hot standby off it will start up successfully.

Yup, because with hot standby disabled (on the client side), StartupCLOG()
happens after recovery has completed. That, at the very least, makes the
problem very unlikely to occur in the non-hot-standby case. I'm not sure
it's completely impossible, though.

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
   The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
   checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
   extension makes it to the disk, and before any trace of the XID assigned
   in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

  This fits the OP's observation ob the
  problem vanishing when pg_start_backup() does an immediate checkpoint.
 
 Note that this is *not* the behaviour I'm seeingit's possible it happens
 more frequently without the immediate checkpoint, but I am seeing it happen
 even with the immediate checkpoint.

Yeah, I should have said of the problem's likelihood decreasing instead
of vanishing. The point is, the longer the checkpoint takes, the higher
the chance the nextId is advanced far enough to require a CLOG extension.

That alone isn't enough to trigger the error - the CLOG extension must also
*not* make it to the disk before the checkpoint completes - but it's
a required precondition for the error to occur.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug f...@phlo.org wrote:
 On Oct26, 2011, at 15:57 , Florian Pflug wrote:
 As you said, the CLOG page corresponding to nextId
 *should* always be accessible at the start of recovery (Unless whole file
 has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
 Yet the error suggest that the CLOG is, in fact, too short. What I said
 is that we shouldn't apply any fix (for the CLOG problem) before we 
 understand
 the reason for that apparent contradiction.

 Ha! I think I've got a working theory.

 In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
 record, and then call CheckPointGuts() which does the actual writing and 
 fsyncing.
 So far, that fine. If a transaction ID is assigned before we compute the
 checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() 
 will
 make sure the new CLOG page goes to disk.

 But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
 CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
 fine too, except that LogStandbySnapshot() re-assigned the *current* value of
 ShmemVariableCache-nextXid to the checkpoint's nextXid field.

 Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), 
 but
 before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
 whose CLOG page hasn't necessarily made it to the disk yet. The longer 
 CheckPointGuts()
 takes to finish it's work the more likely it becomes (assuming that CLOG 
 writing
 and syncing doesn't happen at the very end). This fits the OP's observation 
 ob the
 problem vanishing when pg_start_backup() does an immediate checkpoint.

This is the correct explanation. I've just come back into Wifi range,
so I was just writing to you with this explanation but your original
point that nextxid must be wrong deserves credit. OTOH I was just
waiting to find out what the reason for the physical read was, rather
than guessing.

Notice that the nextxid value isn't wrong, its just not the correct
value to use for starting clog.

As it turns out the correct fix is actually just to skip StartupClog()
until the end of recovery because it does nothing useful when executed
at that time. When I wrote the original code I remember thinking that
StartupClog() is superfluous at that point.

Brewing a patch now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Brewing a patch now.

Latest thinking... confirmations or other error reports please.

This fixes both the subtrans and clog bugs in one patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


oldestActiveXid_fixed.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Brewing a patch now.

 Latest thinking... confirmations or other error reports please.

 This fixes both the subtrans and clog bugs in one patch.

I'll be looking to commit that tomorrow afternoon as two separate
patches with appropriate credits.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Chris Redekop

FYI I have given this patch a good test and can now no longer reproduce
either the subtrans nor the clog error.  Thanks guys!


On Wed, Oct 26, 2011 at 11:09 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
  On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
 
  Brewing a patch now.
 
  Latest thinking... confirmations or other error reports please.
 
  This fixes both the subtrans and clog bugs in one patch.

 I'll be looking to commit that tomorrow afternoon as two separate
 patches with appropriate credits.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

On Oct26, 2011, at 18:08 , Simon Riggs wrote:
 On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug f...@phlo.org wrote:
 On Oct26, 2011, at 15:57 , Florian Pflug wrote:
 Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), 
 but
 before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
 whose CLOG page hasn't necessarily made it to the disk yet. The longer 
 CheckPointGuts()
 takes to finish it's work the more likely it becomes (assuming that CLOG 
 writing
 and syncing doesn't happen at the very end). This fits the OP's observation 
 ob the
 problem vanishing when pg_start_backup() does an immediate checkpoint.
 
 As it turns out the correct fix is actually just to skip StartupClog()
 until the end of recovery because it does nothing useful when executed
 at that time. When I wrote the original code I remember thinking that
 StartupClog() is superfluous at that point.

Hm, that fixes the problem in the hot standby case, but as I said in my
reply to Chris Redekop, normal crash recovery is also at risk (although
the probability of hitting the bug is much smaller there). Here's my
reasoning from that other mail:

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
  The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
  checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
  extension makes it to the disk, and before any trace of the XID assigned
  in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

Quite aside from that concern, I think it's probably not a good idea
for the nextXid value of a checkpoint to depend on whether wal_level
was set to hot_standby or not. Our recovery code is already quite complex
and hard to test, and this just adds one more combination that has to
be thought about while coding and that needs to be tested.

My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
CheckPointGuts(). 

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.

We could then also fold GetOldestActiveTransactionId() back into
your proposed LogStandbySnapshot() and thus don't need two ProcArray
traversals.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Robert Haas

On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Brewing a patch now.

 Latest thinking... confirmations or other error reports please.

 This fixes both the subtrans and clog bugs in one patch.

I don't see the point of changing StartupCLOG() to be an empty
function and adding a new function TrimCLOG() that does everything
StartupCLOG() used to do.  Seems simpler to just move the calls to
StartupCLOG() wherever they need to be - i.e. remove the one that
happens before WAL replay, and extricate the one at end-of-recovery
from the if block which currently contains it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 This fixes both the subtrans and clog bugs in one patch.

 I don't see the point of changing StartupCLOG() to be an empty
 function and adding a new function TrimCLOG() that does everything
 StartupCLOG() used to do.

+1 ... I found that overly cute also.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Simon Riggs

On Mon, Oct 24, 2011 at 7:13 AM, Florian Pflug f...@phlo.org wrote:

 I think Simon's theory that we're starting recovery from the wrong place,
 i.e. should start with an earlier WAL location, is probably correct. The
 question is, why?

Err, that's not what I said and I don't mean that. Having said that,
what I said about pg_control being invalid would imply that, so is
wrong also.

We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
being derived later than it should be, which can cause problems if
this then means that whole pages are unitialised in subtrans. The bug
only shows up if you do enough transactions (2048 is always enough) to
move to the next subtrans page between the redo pointer and the
checkpoint record while at the same time we do not have a long running
transaction that spans those two points. That's just enough to happen
reasonably frequently on busy systems and yet just enough to have
slipped through testing.

We must either

1. During CreateCheckpoint() we should derive oldestActiveXid before
we derive the redo location

2. Change the way subtrans pages are initialized during recovery so we
don't rely on oldestActiveXid

I need to think some more before a decision on this in my own mind,
but I lean towards doing (1) as a longer term fix and doing (2) as a
short term fix for existing releases. I expect to have a fix later
today.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Simon Riggs

On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs si...@2ndquadrant.com wrote:

 We are starting recovery at the right place but we are initialising
 the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
 being derived later than it should be, which can cause problems if
 this then means that whole pages are unitialised in subtrans. The bug
 only shows up if you do enough transactions (2048 is always enough) to
 move to the next subtrans page between the redo pointer and the
 checkpoint record while at the same time we do not have a long running
 transaction that spans those two points. That's just enough to happen
 reasonably frequently on busy systems and yet just enough to have
 slipped through testing.

 We must either

 1. During CreateCheckpoint() we should derive oldestActiveXid before
 we derive the redo location

 2. Change the way subtrans pages are initialized during recovery so we
 don't rely on oldestActiveXid

 I need to think some more before a decision on this in my own mind,
 but I lean towards doing (1) as a longer term fix and doing (2) as a
 short term fix for existing releases. I expect to have a fix later
 today.

(1) looks the best way forwards in all cases.

Patch attached. Will be backpatched to 9.0

I think it is possible to avoid taking XidGenLock during
GetRunningTransactions() now, but I haven't included that change in
this patch.

Any other comments before commit?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


oldestActiveXid_fixed.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Florian Pflug

On Oct25, 2011, at 11:13 , Simon Riggs wrote:
 On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs si...@2ndquadrant.com wrote:
 We are starting recovery at the right place but we are initialising
 the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
 being derived later than it should be, which can cause problems if
 this then means that whole pages are unitialised in subtrans. The bug
 only shows up if you do enough transactions (2048 is always enough) to
 move to the next subtrans page between the redo pointer and the
 checkpoint record while at the same time we do not have a long running
 transaction that spans those two points. That's just enough to happen
 reasonably frequently on busy systems and yet just enough to have
 slipped through testing.
 
 We must either
 
 1. During CreateCheckpoint() we should derive oldestActiveXid before
 we derive the redo location

 (1) looks the best way forwards in all cases.

Let me see if I understand this

The probem seems to be that we currently derive oldestActiveXid end the end of
the checkpoint, just before writing the checkpoint record. Since we use
oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before
that checkpoint record (but after the REDO location, of course) may very well
contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt
to touch these XID's SUBTRANS state, we error out.

Your patch seems sensible, because the checkpoint logically occurs at the
REDO location not the checkpoint's location, so we ought to log an 
oldestActiveXID
corresponding to that location.

What I don't understand is how this affects the CLOG. How does oldestActiveXID
factor into CLOG initialization?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Simon Riggs

On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug f...@phlo.org wrote:

 What I don't understand is how this affects the CLOG. How does oldestActiveXID
 factor into CLOG initialization?

It is an entirely different error.

Chris' clog error was caused by a file read error. The file was
opened, we did a seek within the file and then the call to read()
failed to return a complete page from the file.

The xid shown is 22811359, which is the nextxid in the control file.

So StartupClog() must have failed trying to read the clog page from disk.

That isn't a Hot Standby problem, a recovery problem nor is it certain
its a PostgreSQL problem.

OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
files during recovery, so maybe we can think of a way to make recovery
cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Chris Redekop

 Chris, can you rearrange the backup so you copy the pg_control file as
 the first act after the pg_start_backup?

I tried this and it doesn't seem to make any difference.  I also tried the
patch and I can no longer reproduce the subtrans error, however instead it
now it starts up, but never gets to the point where it'll accept
connections.  It starts up but if I try to do anything I always get FATAL:
 the database system is starting up...even if the load is removed from the
primary, the standby still never finishes starting up.  Attached below is
a log of one of these startup attempts.  In my testing with the patch
applied approx 3 in 10 attempts start up successfully, 7 in 10 attempts go
into the db is starting up statethe pg_clog error is still there, but
seems much harder to reproduce nowI've seen it only once since applying
the patch (out of probably 50 or 60 under-load startup attempts).  It does
seem to be moody like that thoit will be very difficult to reproduce
for a while, and then it will happen damn-near every time for a
while...weirdness

On a bit of a side note, I've been thinking of changing my scripts so that
they perform an initial rsync prior to doing the
startbackup-rsync-stopbackup just so that the second rsync will be
fasterso that the backup is in progress for a shorter period of time, as
while it is running it will stop other standbys from starting upthis
shouldn't cause any issues eh?


2011-10-25 13:43:24.035 MDT [15072]: [1-1] LOG:  database system was
interrupted; last known up at 2011-10-25 13:43:11 MDT
2011-10-25 13:43:24.035 MDT [15072]: [2-1] LOG:  creating missing WAL
directory pg_xlog/archive_status
2011-10-25 13:43:24.037 MDT [15072]: [3-1] LOG:  entering standby mode
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION 2/CF00
2011-10-25 13:43:24.041 MDT [15073]: [1-1] LOG:  streaming replication
successfully connected to primary
2011-10-25 13:43:24.177 MDT [15092]: [1-1] FATAL:  the database system is
starting up
2011-10-25 13:43:24.781 MDT [15072]: [4-1] DEBUG:  checkpoint record is at
2/CF81A478
2011-10-25 13:43:24.781 MDT [15072]: [5-1] DEBUG:  redo record is at
2/CF20; shutdown FALSE
2011-10-25 13:43:24.781 MDT [15072]: [6-1] DEBUG:  next transaction ID:
0/4634700; next OID: 1188228
2011-10-25 13:43:24.781 MDT [15072]: [7-1] DEBUG:  next MultiXactId: 839;
next MultiXactOffset: 1686
2011-10-25 13:43:24.781 MDT [15072]: [8-1] DEBUG:  oldest unfrozen
transaction ID: 1669, in database 1
2011-10-25 13:43:24.781 MDT [15072]: [9-1] DEBUG:  transaction ID wrap limit
is 2147485316, limited by database with OID 1
2011-10-25 13:43:24.783 MDT [15072]: [10-1] DEBUG:  resetting unlogged
relations: cleanup 1 init 0
2011-10-25 13:43:24.791 MDT [15072]: [11-1] DEBUG:  initializing for hot
standby
2011-10-25 13:43:24.791 MDT [15072]: [12-1] LOG:  consistent recovery state
reached at 2/CF81A4D0
2011-10-25 13:43:24.791 MDT [15072]: [13-1] LOG:  redo starts at 2/CF20
2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG:  consistent state delayed
because recovery snapshot incomplete
2011-10-25 13:43:25.019 MDT [15072]: [15-1] CONTEXT:  xlog redo  running
xacts:
nextXid 4634700 latestCompletedXid 4634698 oldestRunningXid 4634336; 130
xacts:
4634336 4634337 4634338 4634339 4634340 4634341 4634342 4634343 4634344
4634345
4634346 4634347 4634348 4634349 4634350 4634351 4634352 4634353 4634354
4634355
4634356 4634357 4634358 4634359 4634360 4634361 4634362 4634363 4634364
4634365
4634366 4634367 4634368 4634369 4634370 4634371 4634515 4634516 4634517
4634518
4634519 4634520 4634521 4634522 4634523 4634524 4634525 4634526 4634527
4634528
4634529 4634530 4634531 4634532 4634533 4634534 4634535 4634536 4634537
4634538
4634539 4634540 4634541 4634542 4634543 4634385 4634386 4634387 4634388
4634389
4634390 4634391 4634392 4634393 4634394 4634395 4634396 4634397 4634398
4634399
4634400 4634401 4634402 4634403 4634404 4634405 4634406 4634407 4634408
4634409
4634410 4634411 4634412 4634413 4634414 4634415 4634416 4634417 4634418
4634419
4634420 4634579 4634580 4634581 4634582 4634583 4634584 4634585 4634586
4634587
4634588 4634589 4634590 4634591 4634592 4634593 4634594 4634595 4634596
4634597
4634598 4634599 4634600 4634601 4634602 4634603 4634604 4634605 4634606
4634607;
 subxid ovf
2011-10-25 13:43:25.240 MDT [15130]: [1-1] FATAL:  the database system is
starting up
DEBUG:  standby sync_rep_test has now caught up with primary
2011-10-25 13:43:26.304 MDT [15167]: [1-1] FATAL:  the database system is
starting up
2011-10-25 13:43:27.366 MDT [15204]: [1-1] FATAL:  the database system is
starting up
2011-10-25 13:43:28.426 MDT [15241]: [1-1] FATAL:  the database system is
starting up
2011-10-25 13:43:29.461 MDT [15275]: [1-1] FATAL:  the database system is
starting up
and so on...


On Tue, Oct 25, 2011 at 6:51 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug f...@phlo.org

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Chris Redekop


 That isn't a Hot Standby problem, a recovery problem nor is it certain
 its a PostgreSQL problem.

Do you have any theories on this that I could help investigate?  It happens
even when using pg_basebackup and it persists until another sync is
performed, so the files must be in some state that that it can't recover
fromwithout understanding the internals just viewing from an
outside perspective, I don't really see how this could not be a PostgreSQL
problem

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-24 Thread Florian Pflug

On Oct24, 2011, at 01:27 , Simon Riggs wrote:
 FATAL:  could not access status of transaction 21110784
 which, in pg_subtrans, is the first xid on a new subtrans page. So we
 have missed zeroing a page.
 
 pg_control shows ... Latest checkpoint's oldestActiveXID:  2111
 which shows quite clearly that the pg_control file is later than it should be.

But shouldn't pg_control be largely irrelevant in a hot backup scenario? Most
(all?) of the information contained therein should be overwritten with the
contents of the checkpoint referenced by the backup label, shouldn't it?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-24 Thread Florian Pflug

On Oct23, 2011, at 22:48 , Daniel Farina wrote:
 It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
 hot-standby path functionality) to be called before that code is
 executed, but it is anyway right now.

I think the idea is to check that the CLOG part which recovery *won't*
overwrite is consistent (or rather, given the simplicity of the check,
at least accessible)

Heikki said the following somewhere else in this thread when I suggested
something similar to your proposal:

 There are pretty clear rules on what state clog can be in. When you launch 
 postmaster in a standby:
 
 * Any clog preceding the nextXid from the checkpoint record we start 
 recovery from, must either be valid, or the clog file must be missing 
 altogether (which can happen when it was vacuumed away while the backup in 
 progress - if the clog is still needed at the end of backup it must not be 
 missing, of course).
 * Any clog following nextXid can be garbled or missing.
 
 Recovery will overwrite any clog after nextXid from the WAL, but not the 
 clog before it.

I think Simon's theory that we're starting recovery from the wrong place,
i.e. should start with an earlier WAL location, is probably correct. The
question is, why?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-23 Thread Daniel Farina

On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop ch...@replicon.com wrote:
 Well, on the other hand maybe there is something wrong with the data.
  Here's the test/steps I just did -
 1. I do the pg_basebackup when the master is under load, hot slave now will
 not start up but warm slave will.
 2. I start a warm slave and let it catch up to current
 3. On the slave I change 'hot_standby=on' and do a 'service postgresql
 restart'
 4. The postgres fails to restart with the same error.
 5. I turn hot_standby back off and postgres starts back up fine as a warm
 slave
 6. I then turn off the load, the slave is all caught up, master and slave
 are both sitting idle
 7. I, again, change 'hot_standby=on' and do a service restart
 8. Again it fails, with the same error, even though there is no longer any
 load.
 9. I repeat this warmstart/hotstart cycle a couple more times until to my
 surprise, instead of failing, it successfully starts up as a hot standby
 (this is after maybe 5 minutes or so of sitting idle)
 So...given that it continued to fail even after the load had been turned of,
 that makes me believe that the data which was copied over was invalid in
 some way.  And when a checkpoint/logrotation/somethingelse occurred when not
 under load it cleared itself upI'm shooting in the dark here
 Anyone have any suggestions/ideas/things to try?

Having digged at this a little -- but not too much -- the problem
seems to be that postgres is reading the commit logs way, way too
early, that is to say, before it has played enough WAL to be
'consistent' (the WAL between pg_start and pg_stop backup).  I have
not been able to reproduce this problem (I think) after the message
from postgres suggesting it has reached a consistent state; at that
time I am able to go into hot-standby mode.

The message is like: consistent recovery state reached at %X/%X.
(this is the errmsg)

It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
hot-standby path functionality) to be called before that code is
executed, but it is anyway right now.  I'm not sure if this oversight
is simply an oversight, or indicative of a misplaced assumption
somewhere.  Basically, my thoughts for a fix are to suppress
hot_standby = on (in spirit) before the consistent recovery state is
reached.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-23 Thread Simon Riggs

On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina dan...@heroku.com wrote:
 On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop ch...@replicon.com wrote:
 Well, on the other hand maybe there is something wrong with the data.
  Here's the test/steps I just did -
 1. I do the pg_basebackup when the master is under load, hot slave now will
 not start up but warm slave will.
 2. I start a warm slave and let it catch up to current
 3. On the slave I change 'hot_standby=on' and do a 'service postgresql
 restart'
 4. The postgres fails to restart with the same error.
 5. I turn hot_standby back off and postgres starts back up fine as a warm
 slave
 6. I then turn off the load, the slave is all caught up, master and slave
 are both sitting idle
 7. I, again, change 'hot_standby=on' and do a service restart
 8. Again it fails, with the same error, even though there is no longer any
 load.
 9. I repeat this warmstart/hotstart cycle a couple more times until to my
 surprise, instead of failing, it successfully starts up as a hot standby
 (this is after maybe 5 minutes or so of sitting idle)
 So...given that it continued to fail even after the load had been turned of,
 that makes me believe that the data which was copied over was invalid in
 some way.  And when a checkpoint/logrotation/somethingelse occurred when not
 under load it cleared itself upI'm shooting in the dark here
 Anyone have any suggestions/ideas/things to try?

 Having digged at this a little -- but not too much -- the problem
 seems to be that postgres is reading the commit logs way, way too
 early, that is to say, before it has played enough WAL to be
 'consistent' (the WAL between pg_start and pg_stop backup).  I have
 not been able to reproduce this problem (I think) after the message
 from postgres suggesting it has reached a consistent state; at that
 time I am able to go into hot-standby mode.

 The message is like: consistent recovery state reached at %X/%X.
 (this is the errmsg)

 It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
 hot-standby path functionality) to be called before that code is
 executed, but it is anyway right now.  I'm not sure if this oversight
 is simply an oversight, or indicative of a misplaced assumption
 somewhere.  Basically, my thoughts for a fix are to suppress
 hot_standby = on (in spirit) before the consistent recovery state is
 reached.

Not sure about that, but I'll look at where this comes from.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-23 Thread Simon Riggs

On Sun, Oct 16, 2011 at 2:33 AM, Chris Redekop ch...@replicon.com wrote:

 pg_subtrans: http://pastebin.com/qAXEHAQt

I confirm this as a HS issue and will investigate from here.

FATAL:  could not access status of transaction 21110784
which, in pg_subtrans, is the first xid on a new subtrans page. So we
have missed zeroing a page.

pg_control shows ... Latest checkpoint's oldestActiveXID:  2111
which shows quite clearly that the pg_control file is later than it should be.

Chris, can you rearrange the backup so you copy the pg_control file as
the first act after the pg_start_backup?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-23 Thread Simon Riggs

On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina dan...@heroku.com wrote:

 Having digged at this a little -- but not too much -- the problem
 seems to be that postgres is reading the commit logs way, way too
 early, that is to say, before it has played enough WAL to be
 'consistent' (the WAL between pg_start and pg_stop backup).  I have
 not been able to reproduce this problem (I think) after the message
 from postgres suggesting it has reached a consistent state; at that
 time I am able to go into hot-standby mode.

The WAL appears too early because the other control info is later than
it should be.

So this is approx backwards and nothing related to consistent state,
but thanks for drawing my attention to this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-17 Thread Chris Redekop

I can confirm that both the pg_clog and pg_subtrans errors do occur when
using pg_basebackup instead of rsync.  The data itself seems to be fine
because using the exact same data I can start up a warm standby no problem,
it is just the hot standby that will not start up.


On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop ch...@replicon.com wrote:

   Linas, could you capture the output of pg_controldata *and* increase
 the
   log level to DEBUG1 on the standby? We should then see nextXid value of
   the checkpoint the recovery is starting from.
 
  I'll try to do that whenever I'm in that territory again... Incidentally,
  recently there was a lot of unrelated-to-this-post work to polish things
 up
  for a talk being given at PGWest 2011 Today :)
 
   I also checked what rsync does when a file vanishes after rsync
 computed the
   file list, but before it is sent. rsync 3.0.7 on OSX, at least,
 complains
   loudly, and doesn't sync the file. It BTW also exits non-zero, with a
 special
   exit code for precisely that failure case.
 
  To be precise, my script has logic to accept the exit code 24, just as
  stated in PG manual:
 
  Docs For example, some versions of rsync return a separate exit code for
  Docs vanished source files, and you can write a driver script to
 accept
  Docs this exit code as a non-error case.

 I also am running into this issue and can reproduce it very reliably.  For
 me, however, it happens even when doing the fast backup like so:
 pg_start_backup('whatever', true)...my traffic is more write-heavy than
 linas's tho, so that might have something to do with it.  Yesterday it
 reliably errored out on pg_clog every time, but today it is
 failing sporadically on pg_subtrans (which seems to be past where the
 pg_clog error was)the only thing that has changed is that I've changed
 the log level to debug1I wouldn't think that could be related though.
  I've linked the requested pg_controldata and debug1 logs for both errors.
  Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
 pg_controldata, and then the postgres debug1 log produced from a subsequent
 startup attempt.

 pg_clog: http://pastebin.com/mTfdcjwH
 pg_subtrans: http://pastebin.com/qAXEHAQt

 Any workarounds would be very appreciated.would copying clog+subtrans
 before or after the rest of the data directory (or something like that) make
 any difference?

 Thanks!

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-17 Thread Chris Redekop

Well, on the other hand maybe there is something wrong with the data.
 Here's the test/steps I just did -
1. I do the pg_basebackup when the master is under load, hot slave now will
not start up but warm slave will.
2. I start a warm slave and let it catch up to current
3. On the slave I change 'hot_standby=on' and do a 'service postgresql
restart'
4. The postgres fails to restart with the same error.
5. I turn hot_standby back off and postgres starts back up fine as a warm
slave
6. I then turn off the load, the slave is all caught up, master and slave
are both sitting idle
7. I, again, change 'hot_standby=on' and do a service restart
8. Again it fails, with the same error, even though there is no longer any
load.
9. I repeat this warmstart/hotstart cycle a couple more times until to my
surprise, instead of failing, it successfully starts up as a hot standby
(this is after maybe 5 minutes or so of sitting idle)

So...given that it continued to fail even after the load had been turned of,
that makes me believe that the data which was copied over was invalid in
some way.  And when a checkpoint/logrotation/somethingelse occurred when not
under load it cleared itself upI'm shooting in the dark here

Anyone have any suggestions/ideas/things to try?


On Mon, Oct 17, 2011 at 2:13 PM, Chris Redekop ch...@replicon.com wrote:

 I can confirm that both the pg_clog and pg_subtrans errors do occur when
 using pg_basebackup instead of rsync.  The data itself seems to be fine
 because using the exact same data I can start up a warm standby no problem,
 it is just the hot standby that will not start up.


 On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop ch...@replicon.com wrote:

   Linas, could you capture the output of pg_controldata *and* increase
 the
   log level to DEBUG1 on the standby? We should then see nextXid value
 of
   the checkpoint the recovery is starting from.
 
  I'll try to do that whenever I'm in that territory again...
 Incidentally,
  recently there was a lot of unrelated-to-this-post work to polish things
 up
  for a talk being given at PGWest 2011 Today :)
 
   I also checked what rsync does when a file vanishes after rsync
 computed the
   file list, but before it is sent. rsync 3.0.7 on OSX, at least,
 complains
   loudly, and doesn't sync the file. It BTW also exits non-zero, with a
 special
   exit code for precisely that failure case.
 
  To be precise, my script has logic to accept the exit code 24, just as
  stated in PG manual:
 
  Docs For example, some versions of rsync return a separate exit code
 for
  Docs vanished source files, and you can write a driver script to
 accept
  Docs this exit code as a non-error case.

 I also am running into this issue and can reproduce it very reliably.  For
 me, however, it happens even when doing the fast backup like so:
 pg_start_backup('whatever', true)...my traffic is more write-heavy than
 linas's tho, so that might have something to do with it.  Yesterday it
 reliably errored out on pg_clog every time, but today it is
 failing sporadically on pg_subtrans (which seems to be past where the
 pg_clog error was)the only thing that has changed is that I've changed
 the log level to debug1I wouldn't think that could be related though.
  I've linked the requested pg_controldata and debug1 logs for both errors.
  Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
 pg_controldata, and then the postgres debug1 log produced from a subsequent
 startup attempt.

 pg_clog: http://pastebin.com/mTfdcjwH
 pg_subtrans: http://pastebin.com/qAXEHAQt

 Any workarounds would be very appreciated.would copying clog+subtrans
 before or after the rest of the data directory (or something like that) make
 any difference?

 Thanks!

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-15 Thread Chris Redekop

  Linas, could you capture the output of pg_controldata *and* increase the
  log level to DEBUG1 on the standby? We should then see nextXid value of
  the checkpoint the recovery is starting from.

 I'll try to do that whenever I'm in that territory again... Incidentally,
 recently there was a lot of unrelated-to-this-post work to polish things
up
 for a talk being given at PGWest 2011 Today :)

  I also checked what rsync does when a file vanishes after rsync computed
the
  file list, but before it is sent. rsync 3.0.7 on OSX, at least,
complains
  loudly, and doesn't sync the file. It BTW also exits non-zero, with a
special
  exit code for precisely that failure case.

 To be precise, my script has logic to accept the exit code 24, just as
 stated in PG manual:

 Docs For example, some versions of rsync return a separate exit code for
 Docs vanished source files, and you can write a driver script to accept
 Docs this exit code as a non-error case.

I also am running into this issue and can reproduce it very reliably.  For
me, however, it happens even when doing the fast backup like so:
pg_start_backup('whatever', true)...my traffic is more write-heavy than
linas's tho, so that might have something to do with it.  Yesterday it
reliably errored out on pg_clog every time, but today it is
failing sporadically on pg_subtrans (which seems to be past where the
pg_clog error was)the only thing that has changed is that I've changed
the log level to debug1I wouldn't think that could be related though.
 I've linked the requested pg_controldata and debug1 logs for both errors.
 Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
pg_controldata, and then the postgres debug1 log produced from a subsequent
startup attempt.

pg_clog: http://pastebin.com/mTfdcjwH
pg_subtrans: http://pastebin.com/qAXEHAQt

Any workarounds would be very appreciated.would copying clog+subtrans
before or after the rest of the data directory (or something like that) make
any difference?

Thanks!

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-29 Thread Linas Virbalas

 Linas, could you capture the output of pg_controldata *and* increase the
 log level to DEBUG1 on the standby? We should then see nextXid value of
 the checkpoint the recovery is starting from.

I'll try to do that whenever I'm in that territory again... Incidentally,
recently there was a lot of unrelated-to-this-post work to polish things up
for a talk being given at PGWest 2011 Today :)

 I also checked what rsync does when a file vanishes after rsync computed the
 file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
 loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
 exit code for precisely that failure case.

To be precise, my script has logic to accept the exit code 24, just as
stated in PG manual:

Docs For example, some versions of rsync return a separate exit code for
Docs vanished source files, and you can write a driver script to accept
Docs this exit code as a non-error case.

--
Sincerely,
Linas Virbalas
http://flyingclusters.blogspot.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-29 Thread Florian Pflug

On Sep29, 2011, at 17:44 , Linas Virbalas wrote:
 I also checked what rsync does when a file vanishes after rsync computed the
 file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
 loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
 exit code for precisely that failure case.
 
 To be precise, my script has logic to accept the exit code 24, just as
 stated in PG manual:
 
 Docs For example, some versions of rsync return a separate exit code for
 Docs vanished source files, and you can write a driver script to accept
 Docs this exit code as a non-error case.

Oh, cool. I was about to suggest that we add something along these lines
to the docs - guess I should RTFM from time to time ;-)

best regards,
Florian Pflug

 

 
 --
 Sincerely,
 Linas Virbalas
 http://flyingclusters.blogspot.com/
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-27 Thread Florian Pflug

On Sep23, 2011, at 21:10 , Robert Haas wrote:
 So the actual error message in the last test was:
 
 2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 
 1188673
 
 ...but we can't tell if that was before or after nextXid, which seems
 like it would be useful to know.
 
 If Linas can rerun his experiment, but also capture the output of
 pg_controldata before firing up the standby for the first time, then
 we'd able to see that information.

Hm, wouldn't pg_controldata quite certainly show a nextId beyond the clog
if copied after pg_clog/*?

Linas, could you capture the output of pg_controldata *and* increase the
log level to DEBUG1 on the standby? We should then see nextXid value of
the checkpoint the recovery is starting from.

FWIW, I've had a few more theories about what's going on, but none survived
after looking at the code. My first guess was that there maybe are circumstances
under which the nextId from the control file, instead of the one from the
pre-backup checkpoint, ends up becoming the standby's nextXid. But there doesn't
seem to be a way for that to happen.

My next theory was that something increments nextIdx before StartupCLOG().
The only possible candidate seems to be PrescanPreparedTransactions(), which
does increment nextXid if it's smaller than some sub-xid of a prepared xact.
But we only call that before StartupCLOG() if we're starting from a
shutdown checkpoint, which shouldn't be the case for the OP.

I also checked what rsync does when a file vanishes after rsync computed the
file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
exit code for precisely that failure case.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-24 Thread Daniel Farina

On Fri, Sep 23, 2011 at 9:45 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Sep 23, 2011 at 11:43 AM, Aidan Van Dyk ai...@highrise.ca wrote:
 On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

 Unfortunately, it's impossible, because the error message Could not read
 from file pg_clog/0001 at offset 32768: Success is shown (and startup
 aborted) before the turn for redo starts at message arrives.

 It looks to me that pg_clog/0001 exists, but it shorter than recovery
 expects. Which shouldn't happen, of course, because the start-backup
 checkpoint should flush all the clog that's needed by recovery to disk
 before the backup procedure begins to them.

 I think the point here is that recover *never starts*.  Something in
 the standby startup is looking for a value in a clog block that
 recovery hadn't had a chance to replay (produce) yet.

 Ah.  I think you are right - Heikki made the same point.  Maybe some
 of the stuff that happens just after this comment:

        /*
         * Initialize for Hot Standby, if enabled. We won't let backends in
         * yet, not until we've reached the min recovery point specified in
         * control file and we've established a recovery snapshot from a
         * running-xacts WAL record.
         */


 ...actually needs to be postponed until after we've reached consistency?

We have a number of backups that are like this, and the problem is
entirely reproducible for those.  We always get around it by disabling
hot standby for a while (until consistency is reached) I poked at
xlog.c a bit, and to me seems entirely likely that StartupCLOG is
being called early -- way too early, or at least parts of it.
Presumably(?) it is being called so early in the hot standby path so
that the status of transactions can be known for the purposes of
querying, but it's happening before consistency is reached, ergo not
many invariants (outside of checkpointed things like pg_controldata)
are likely to hold...such as clog being the right size to locate the
transaction status of a page.

Anyway, sorry for dropping the ball on pushing that one; we've been
using this workaround for a while after taking a look at the mechanism
and deciding it was probably not a problem (except for a sound night's
sleep).  We've now seen this dozens of times.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Linas Virbalas

On 9/22/11 6:59 PM, Euler Taveira de Oliveira eu...@timbira.com wrote:

 If needed, I could do that, if I had the exact procedure... Currently,
 during the start of the backup I take the following information:
 
 Just show us the output of pg_start_backup and part of the standby log with
 the following message 'redo starts at' and the subsequent messages up to the
 failure.

Unfortunately, it's impossible, because the error message Could not read
from file pg_clog/0001 at offset 32768: Success is shown (and startup
aborted) before the turn for redo starts at message arrives.

For comparison purposes, here is the log from the original message:

2011-09-21 13:41:05 CEST LOG:  database system was interrupted; last known
up at 2011-09-21 13:40:50 CEST
Restoring 0014.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/0014.history': No such file
or directory
Restoring 0013.history
2011-09-21 13:41:05 CEST LOG:  restored log file 0013.history from
archive
! 2011-09-21 13:41:05 CEST LOG:  entering standby mode
Restoring 0013000600DC
2011-09-21 13:41:05 CEST LOG:  restored log file 0013000600DC
from archive
Restoring 0013000600DB
2011-09-21 13:41:05 CEST LOG:  restored log file 0013000600DB
from archive
2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction
1188673
2011-09-21 13:41:05 CEST DETAIL:  Could not read from file pg_clog/0001 at
offset 32768: Success.
2011-09-21 13:41:05 CEST LOG:  startup process (PID 13819) exited with exit
code 1
2011-09-21 13:41:05 CEST LOG:  aborting startup due to startup process
Failure

As you can see, there is no redo starts at message. If we compare this to
a healthy standby startup, we can see that the pg_clog error appears after
the entering standby mode, but before the redo starts at, hence the
latter is not reached. Healthy log example:

2011-09-23 09:52:31 CEST LOG:  database system was interrupted; last known
up at 2011-09-23 09:52:15 CEST
Restoring 0007.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/0007.history': No such file
or directory
Restoring 0006.history
2011-09-23 09:52:31 CEST LOG:  restored log file 0006.history from
archive
2011-09-23 09:52:31 CEST LOG:  entering standby mode
Restoring 00060065
2011-09-23 09:52:31 CEST LOG:  restored log file 00060065
from archive
2011-09-23 09:52:31 CEST LOG:  redo starts at 0/6520
2011-09-23 09:52:31 CEST LOG:  consistent recovery state reached at
0/6600
2011-09-23 09:52:31 CEST LOG:  database system is ready to accept read only
connections
Restoring 00060066
mv: cannot stat `/opt/PostgreSQL/9.1/archive/00060066': No
such file or directory
Restoring 0007.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/0007.history': No such file
or directory
2011-09-23 09:52:31 CEST LOG:  streaming replication successfully connected
to primary

--
Best regards,
Linas Virbalas
http://flyingclusters.blogspot.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load


On 23.09.2011 11:02, Linas Virbalas wrote:

On 9/22/11 6:59 PM, Euler Taveira de Oliveiraeu...@timbira.com  wrote:


If needed, I could do that, if I had the exact procedure... Currently,
during the start of the backup I take the following information:


Just show us the output of pg_start_backup and part of the standby log with
the following message 'redo starts at' and the subsequent messages up to the
failure.


Unfortunately, it's impossible, because the error message Could not read
from file pg_clog/0001 at offset 32768: Success is shown (and startup
aborted) before the turn for redo starts at message arrives.


It looks to me that pg_clog/0001 exists, but it shorter than recovery 
expects. Which shouldn't happen, of course, because the start-backup 
checkpoint should flush all the clog that's needed by recovery to disk 
before the backup procedure begins to them.


Can you do ls -l pg_clog in the master and the backup, and 
pg_controldata on the backup dir, and post the results, please?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Florian Pflug

On Sep23, 2011, at 10:41 , Heikki Linnakangas wrote:
 On 23.09.2011 11:02, Linas Virbalas wrote:
 On 9/22/11 6:59 PM, Euler Taveira de Oliveiraeu...@timbira.com  wrote:
 
 If needed, I could do that, if I had the exact procedure... Currently,
 during the start of the backup I take the following information:
 
 Just show us the output of pg_start_backup and part of the standby log with
 the following message 'redo starts at' and the subsequent messages up to the
 failure.
 
 Unfortunately, it's impossible, because the error message Could not read
 from file pg_clog/0001 at offset 32768: Success is shown (and startup
 aborted) before the turn for redo starts at message arrives.
 
 It looks to me that pg_clog/0001 exists, but it shorter than recovery 
 expects. Which shouldn't happen, of course, because the start-backup 
 checkpoint should flush all the clog that's needed by recovery to disk before 
 the backup procedure begins to them.

Yeah. What confuses me though is that we fail while trying to read from the 
clog. When do we do that during normal (non-standby) recovery?

One other possibility would be that the problem is somehow triggered by vacuum 
running while the start-backup checkpoint is commencing. That'd explain why the 
problem goes away with immediate checkpoints, because those make the windows 
mach smaller. But I haven't got a concrete theory of whats happening..

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load


On 23.09.2011 11:48, Florian Pflug wrote:

On Sep23, 2011, at 10:41 , Heikki Linnakangas wrote:

On 23.09.2011 11:02, Linas Virbalas wrote:

On 9/22/11 6:59 PM, Euler Taveira de Oliveiraeu...@timbira.com   wrote:


If needed, I could do that, if I had the exact procedure... Currently,
during the start of the backup I take the following information:


Just show us the output of pg_start_backup and part of the standby log with
the following message 'redo starts at' and the subsequent messages up to the
failure.


Unfortunately, it's impossible, because the error message Could not read
from file pg_clog/0001 at offset 32768: Success is shown (and startup
aborted) before the turn for redo starts at message arrives.


It looks to me that pg_clog/0001 exists, but it shorter than recovery expects. 
Which shouldn't happen, of course, because the start-backup checkpoint should 
flush all the clog that's needed by recovery to disk before the backup 
procedure begins to them.


Yeah. What confuses me though is that we fail while trying to read from the 
clog. When do we do that during normal (non-standby) recovery?


I believe the OP is setting up a standby. He didn't say if it's a hot 
standby, though. That changes the startup sequence a bit.


At the end of recovery, we read the last clog page, containing the 
next XID that will be assigned to a transaction, and zero out the future 
part of that clog file (StartupCLOG). In hot standby, we do that 
earlier, after reading the checkpoint record but before we start 
replaying records.



One other possibility would be that the problem is somehow triggered by vacuum 
running while the start-backup checkpoint is commencing. That'd explain why the 
problem goes away with immediate checkpoints, because those make the windows 
mach smaller. But I haven't got a concrete theory of whats happening..


Hmm, that's a good theory. The logic in SimpleLruPhysicakReadPage() 
tolerates non-existent files, because the old clog files might've been 
truncated away by a vacuum after the backup started. That's OK, because 
they will be recreated, and later deleted again, during the WAL replay. 
But what if something like this happens:


0. pg_start_backup().
1. rsync starts. It gets a list of all files. It notes that pg_clog/0001 
exists.

2. vacuum runs, and deletes pg_clog/0001
3. rsync starts to copy pg_clog/0001. Oops, it's gone. Instead of 
leaving it out altogether, it includes it as an empty file.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Linas Virbalas

On 9/23/11 12:05 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 It looks to me that pg_clog/0001 exists, but it shorter than recovery
 expects. Which shouldn't happen, of course, because the start-backup
 checkpoint should flush all the clog that's needed by recovery to disk
 before the backup procedure begins to them.
 
 Yeah. What confuses me though is that we fail while trying to read from the
 clog. When do we do that during normal (non-standby) recovery?
 
 I believe the OP is setting up a standby. He didn't say if it's a hot
 standby, though. That changes the startup sequence a bit.

I confirm that it is a Hot Standby with Streaming Replication.

 One other possibility would be that the problem is somehow triggered by
 vacuum running while the start-backup checkpoint is commencing. That'd
 explain why the problem goes away with immediate checkpoints, because those
 make the windows mach smaller. But I haven't got a concrete theory of whats
 happening..
 
 Hmm, that's a good theory. The logic in SimpleLruPhysicakReadPage()
 tolerates non-existent files, because the old clog files might've been
 truncated away by a vacuum after the backup started. That's OK, because
 they will be recreated, and later deleted again, during the WAL replay.
 But what if something like this happens:
 
 0. pg_start_backup().
 1. rsync starts. It gets a list of all files. It notes that pg_clog/0001
 exists.
 2. vacuum runs, and deletes pg_clog/0001
 3. rsync starts to copy pg_clog/0001. Oops, it's gone. Instead of
 leaving it out altogether, it includes it as an empty file.

I cannot confirm that. I have tried this scenario one more time and was
observing how pg_clog/ looked on the master and standby. Here's an overview:

1. pg_start_backup('base_backup') - waiting...

2. Checking the size of pg_clog periodically on the master:

-bash-3.2$ ls -l pg_clog
total 8
-rw--- 1 postgres postgres 8192 Sep 23 14:31 
-bash-3.2$ ls -l pg_clog
total 8
-rw--- 1 postgres postgres 8192 Sep 23 14:31 
-bash-3.2$ ls -l pg_clog
total 16

3. Somewhere just before (1) step releases and we get into rsync phase, the
pg_clog size changes:

-rw--- 1 postgres postgres 16384 Sep 23 14:33 
-bash-3.2$ ls -l pg_clog
total 16
-rw--- 1 postgres postgres 16384 Sep 23 14:33 
-bash-3.2$ ls -l pg_clog
total 16
-rw--- 1 postgres postgres 16384 Sep 23 14:33 

4. The rsync does transfer the file:

...
INFO   | jvm 1| 2011/09/23 14:33:43 | 2011-09-23 14:33:43,881 INFO
management.script.ScriptPlugin base/16394/16405_fsm
INFO   | jvm 1| 2011/09/23 14:33:43 | 2011-09-23 14:33:43,881 INFO
management.script.ScriptPlugin base/16394/pg_internal.init
INFO   | jvm 1| 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/
INFO   | jvm 1| 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/12691
INFO   | jvm 1| 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/12696
INFO   | jvm 1| 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/12697
INFO   | jvm 1| 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/pg_internal.init
INFO   | jvm 1| 2011/09/23 14:33:43 | 2011-09-23 14:33:43,887 INFO
management.script.ScriptPlugin pg_clog/
INFO   | jvm 1| 2011/09/23 14:33:43 | 2011-09-23 14:33:43,887 INFO
management.script.ScriptPlugin pg_multixact/offsets/
INFO   | jvm 1| 2011/09/23 14:33:43 | 2011-09-23 14:33:43,887 INFO
management.script.ScriptPlugin pg_notify/
...

But on the standby its size is the old one (thus, it seems, that the size
changed after the rsync transfer and before the pg_stop_backup() was
called):

ls -l pg_clog/
total 8
-rw--- 1 postgres postgres 8192 Sep 23 14:31 

5. Hence, the server doesn't start and pg_log complains:

2011-09-23 14:33:45 CEST LOG:  streaming replication successfully connected
to primary
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  could not access status of transaction
37206
2011-09-23 14:33:46 CEST DETAIL:  Could not read from file pg_clog/ at
offset 8192: Success.

6. Now, if I do something that, of course, should never be done, and copy
this  file from master to the standby soon enough (without even
starting/stopping backup), the standby starts up successfully.

--
Hope this helps,
Linas Virbalas
http://flyingclusters.blogspot.com/


-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Robert Haas

On Fri, Sep 23, 2011 at 8:47 AM, Linas Virbalas
linas.virba...@continuent.com wrote:
 But on the standby its size is the old one (thus, it seems, that the size
 changed after the rsync transfer and before the pg_stop_backup() was
 called):

Now that seems pretty weird - I don't think that file should ever shrink.

Are you sure that the rsync isn't starting until after
pg_start_backup() completes?  Because if you were starting it just a
tiny bit early, that would explain the observed symptoms perfectly, I
think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Linas Virbalas

 But on the standby its size is the old one (thus, it seems, that the size
 changed after the rsync transfer and before the pg_stop_backup() was
 called):

 Now that seems pretty weird - I don't think that file should ever shrink.

It seems, I was not clear in my last example. The pg_clog file doesn't shrink. 
On the contrary, when rsync kicks in it is still small, but after it completes 
and somewhere around the pg_stop_backup() call, the pg_clog file grows on the 
master. Hence, it is left smaller on the standby.

 Are you sure that the rsync isn't starting until after
 pg_start_backup() completes?

100% - the procedure is scripted in a single threaded application, so rsync is 
started only after pg_start_backup(...) returns.

 Because if you were starting it just a
 tiny bit early, that would explain the observed symptoms perfectly, I
 think.

I agree, but that is not the case.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Aidan Van Dyk

On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

Unfortunately, it's impossible, because the error message Could not read
from file pg_clog/0001 at offset 32768: Success is shown (and startup
aborted) before the turn for redo starts at message arrives.

It looks to me that pg_clog/0001 exists, but it shorter than recovery
expects. Which shouldn't happen, of course, because the start-backup
checkpoint should flush all the clog that's needed by recovery to disk
before the backup procedure begins to them.

I think the point here is that recover *never starts*. Something in
the standby startup is looking for a value in a clog block that
recovery hadn't had a chance to replay (produce) yet.

So the standby is looking into the data directory *before* recovery
has had a chance to run, and based on that,goes to look for something
in clog page that wasn't guarenteed to exists at the start of the
backup period, and bombing out before recovery has a chance to start
replaying WAL and write the new clog page.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Alvaro Herrera


Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011:
 On 9/23/11 12:05 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

 But on the standby its size is the old one (thus, it seems, that the size
 changed after the rsync transfer and before the pg_stop_backup() was
 called):
 
 ls -l pg_clog/
 total 8
 -rw--- 1 postgres postgres 8192 Sep 23 14:31 

Sounds like rsync is caching the file size at the start of the run, and
then copying that many bytes, ignoring the growth that occurred after it
started.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Magnus Hagander

On Sep 23, 2011 5:59 PM, Alvaro Herrera alvhe...@commandprompt.com
wrote:


 Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011:
  On 9/23/11 12:05 PM, Heikki Linnakangas
  heikki.linnakan...@enterprisedb.com wrote:

  But on the standby its size is the old one (thus, it seems, that the
size
  changed after the rsync transfer and before the pg_stop_backup() was
  called):
 
  ls -l pg_clog/
  total 8
  -rw--- 1 postgres postgres 8192 Sep 23 14:31 

 Sounds like rsync is caching the file size at the start of the run, and
 then copying that many bytes, ignoring the growth that occurred after it
 started.


That pretty much matches what Daniel does when he got the same failure case.


Is this not allowed? Shouldn't wal replay fix this?

/Magnus

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Andres Freund

Hi,

On Wednesday 21 Sep 2011 16:44:30 Linas Virbalas wrote:
 2011-09-21 13:41:05 CEST DETAIL:  Could not read from file pg_clog/0001
 at offset 32768: Success.
Any chance you can attach gdb to the startup process and provide a backtrace 
from the place where this message is printed?

Greetings,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load


On 23.09.2011 19:03, Magnus Hagander wrote:

On Sep 23, 2011 5:59 PM, Alvaro Herreraalvhe...@commandprompt.com
wrote:



Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011:

On 9/23/11 12:05 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:



But on the standby its size is the old one (thus, it seems, that the

size

changed after the rsync transfer and before the pg_stop_backup() was
called):

ls -l pg_clog/
total 8
-rw--- 1 postgres postgres 8192 Sep 23 14:31 


Sounds like rsync is caching the file size at the start of the run, and
then copying that many bytes, ignoring the growth that occurred after it
started.


That pretty much matches what Daniel does when he got the same failure case.

Is this not allowed? Shouldn't wal replay fix this?


That's OK. The effect is the same as if rsync had copied the file at the 
start.


What's not OK is to store the file as empty or truncated file in the 
backup, when the file is deleted from pg_clog while rsync is running. 
The file must have length = the length the file had when backup 
started, or the file must be omitted from the backup altogether.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Robert Haas

On Fri, Sep 23, 2011 at 11:43 AM, Aidan Van Dyk ai...@highrise.ca wrote:
 On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

 Unfortunately, it's impossible, because the error message Could not read
 from file pg_clog/0001 at offset 32768: Success is shown (and startup
 aborted) before the turn for redo starts at message arrives.

 It looks to me that pg_clog/0001 exists, but it shorter than recovery
 expects. Which shouldn't happen, of course, because the start-backup
 checkpoint should flush all the clog that's needed by recovery to disk
 before the backup procedure begins to them.

 I think the point here is that recover *never starts*.  Something in
 the standby startup is looking for a value in a clog block that
 recovery hadn't had a chance to replay (produce) yet.

Ah.  I think you are right - Heikki made the same point.  Maybe some
of the stuff that happens just after this comment:

/*
 * Initialize for Hot Standby, if enabled. We won't let backends in
 * yet, not until we've reached the min recovery point specified in
 * control file and we've established a recovery snapshot from a
 * running-xacts WAL record.
 */


...actually needs to be postponed until after we've reached consistency?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Florian Pflug

On Sep23, 2011, at 18:03 , Magnus Hagander wrote:
 On Sep 23, 2011 5:59 PM, Alvaro Herrera alvhe...@commandprompt.com wrote:
  Sounds like rsync is caching the file size at the start of the run, and
  then copying that many bytes, ignoring the growth that occurred after it
  started.
 
 That pretty much matches what Daniel does when he got the same failure case.
 
 Is this not allowed? Shouldn't wal replay fix this?

I don't see how it could be forbidden. ISTM that we absolutely *have* to be
able to deal with each byte of a file's date, including it's meta-data, being in
any state it was between the time pg_start_backup() returned and 
pg_stop_backup()
was issued. With the individual states being in no way synchronized.
(OK, in reality restricting this to individual 512-byte blocks is probably 
enough,
but still...).

This, I think, is also the reasons we need to force full_page_writes to on
during a hot backup. If a page was modified at all between pg_start_backup() and
pg_stop_backup(), we essentially have to assume it's totally garbled. 

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Florian Pflug

On Sep23, 2011, at 18:45 , Robert Haas wrote:
 Ah.  I think you are right - Heikki made the same point.  Maybe some
 of the stuff that happens just after this comment:
 
/*
 * Initialize for Hot Standby, if enabled. We won't let backends in
 * yet, not until we've reached the min recovery point specified in
 * control file and we've established a recovery snapshot from a
 * running-xacts WAL record.
 */
 
 
 ...actually needs to be postponed until after we've reached consistency?

I came the the same conclusion. It seems the before we've reached consistency,
we shouldn't attempt to read anything because the data can be pretty arbitrarily
garbled.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load