Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-17 Thread Andres Freund
On 2013-01-17 01:38:31 -0500, Tom Lane wrote:
 But having said that ... are we sure this code is not actually broken?
 ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
 cannot safely attempt to send an error message to the client either;
 but ereport(FATAL) will try exactly that.

You're absolutely right.

ISTM, to fix it we would have to either provide a COMERROR like facility
for FATAL errors or just remove the requirement of raising an error
exactly there.
If I remember what I tried before correctly the latter seems to involve
setting openssl into nonblocking mode and check for the saved error in
the EINTR loop in be-secure and re-raise the error from there.

Do we want to backport either - there hasn't been any report that I
could link to it right now, but on the other hand it could possibly
cause rather ugly problems (data leakage, segfaults, code execution
aren't all that improbable)?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-17 Thread Tom Lane
Pavan Deolasee pavan.deola...@gmail.com writes:
 On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
 cannot safely attempt to send an error message to the client either;
 but ereport(FATAL) will try exactly that.

 I thought since FATAL will force the backend to exit, we don't care much
 about corrupted OpenSSL state. I even thought that's why we raise ERROR to
 FATAL so that the backend can start in a clean state. But clearly I'm
 missing a point here because you don't think that way.

If we were to simply exit(1), leaving the kernel to close the client
socket, it'd be safe enough because control would never have returned to
OpenSSL.  But this code doesn't do that.  What we're looking at is that
we've interrupted OpenSSL at some arbitrary point, and now we're going
to make fresh calls to it to try to pump the FATAL error message out to
the client.  It seems fairly unlikely that that's safe.  I'm not sure
I credit Andres' worry of arbitrary code execution, but I do fear that
OpenSSL could get confused to the point of freezing up, or even more
likely that it would transmit garbage to the client, which rather
defeats the purpose.

Don't see a nice fix.  The COMMERROR approach (ie, don't try to send
anything to the client, only the log) is not nice at all since the
client would get the impression that the server crashed.  On the other
hand, anything else requires waiting till we get control back from
OpenSSL, which might be a long time, and meanwhile we're still holding
locks that prevent WAL recovery from proceeding.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-17 Thread Andres Freund
On 2013-01-17 10:19:23 -0500, Tom Lane wrote:
 Pavan Deolasee pavan.deola...@gmail.com writes:
  On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
  cannot safely attempt to send an error message to the client either;
  but ereport(FATAL) will try exactly that.
 
  I thought since FATAL will force the backend to exit, we don't care much
  about corrupted OpenSSL state. I even thought that's why we raise ERROR to
  FATAL so that the backend can start in a clean state. But clearly I'm
  missing a point here because you don't think that way.
 
 If we were to simply exit(1), leaving the kernel to close the client
 socket, it'd be safe enough because control would never have returned to
 OpenSSL.  But this code doesn't do that.  What we're looking at is that
 we've interrupted OpenSSL at some arbitrary point, and now we're going
 to make fresh calls to it to try to pump the FATAL error message out to
 the client.  It seems fairly unlikely that that's safe.  I'm not sure
 I credit Andres' worry of arbitrary code execution, but I do fear that
 OpenSSL could get confused to the point of freezing up, or even more
 likely that it would transmit garbage to the client, which rather
 defeats the purpose.

I don't think its likely either, I seem to remember it copying arround
function pointers though, so it seems possible with some bad luck.

 Don't see a nice fix.  The COMMERROR approach (ie, don't try to send
 anything to the client, only the log) is not nice at all since the
 client would get the impression that the server crashed.  On the other
 hand, anything else requires waiting till we get control back from
 OpenSSL, which might be a long time, and meanwhile we're still holding
 locks that prevent WAL recovery from proceeding.

I think we can make openssl return pretty much immediately if we assume
recv() can reliably interrupted by signals, possibly by setting the
socket to nonblocking in the signal handler.
We just need to tell openssl not to retry immediately and we should be
fine. Given that quite some people use openssl with nonblocking sockets,
that code path should be reasonably safe.

That still requires ugliness around saving the error and reraising it
after returning from openssl though...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-16 Thread Abhijit Menon-Sen
At 2012-12-29 14:23:45 -0500, sfr...@snowman.net wrote:

 Regarding the actual comment, here's the wording that I'd use:

Sorry for nitpicking, but we can't long jumps made me cringe.
Here's a slightly more condensed version:

/*
 * We can't use ereport(ERROR) here, because any longjmps
 * in DoingCommandRead state run the risk of violating our
 * protocol or the SSL protocol, by interrupting OpenSSL in
 * the middle of changing its internal state.
 *
 * Currently, the only option is to promote ERROR to FATAL
 * until we figure out a better way to handle errors in this
 * state.
 */

Patch along these lines attached, which also removes trailing
whitespace from the original patch.

-- Abhijit
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 407c548..5b952d4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2908,6 +2908,17 @@ ProcessInterrupts(void)
 			DisableNotifyInterrupt();
 			DisableCatchupInterrupt();
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
+
+			/*
+			 * We can't use ereport(ERROR) here, because any longjmps
+			 * in DoingCommandRead state run the risk of violating our
+			 * protocol or the SSL protocol, by interrupting OpenSSL in
+			 * the middle of changing its internal state.
+			 *
+			 * Currently, the only option is to promote ERROR to FATAL
+			 * until we figure out a better way to handle errors in this
+			 * state.
+			 */
 			if (DoingCommandRead)
 ereport(FATAL,
 		(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-16 Thread Pavan Deolasee
On Thu, Jan 17, 2013 at 9:26 AM, Abhijit Menon-Sen a...@2ndquadrant.comwrote:

 At 2012-12-29 14:23:45 -0500, sfr...@snowman.net wrote:
 
  Regarding the actual comment, here's the wording that I'd use:

 Sorry for nitpicking, but we can't long jumps made me cringe.
 Here's a slightly more condensed version:

 /*
  * We can't use ereport(ERROR) here, because any longjmps
  * in DoingCommandRead state run the risk of violating our
  * protocol or the SSL protocol, by interrupting OpenSSL in
  * the middle of changing its internal state.
  *
  * Currently, the only option is to promote ERROR to FATAL
  * until we figure out a better way to handle errors in this
  * state.
  */

 Patch along these lines attached, which also removes trailing
 whitespace from the original patch.


Thanks Stephen and Abhijit for improving the comments. I like this wording.
So +1 from my side. Abhijit, do you want to add the patch and change the CF
status appropriately ?

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-16 Thread Tom Lane
Abhijit Menon-Sen a...@2ndquadrant.com writes:
 Sorry for nitpicking, but we can't long jumps made me cringe.

Agreed :-(

 Here's a slightly more condensed version:

I find this still not quite right, because where it's placed, it applies
to both the DoingCommandRead and !DoingCommandRead branches of the
if/else statement.  The wording would be okay if placed inside the first
if branch, but I think visually that would look ugly.  I'm inclined to
suggest wording it like

* If we're in DoingCommandRead state, we can't use ereport(ERROR),
* because any longjmp would risk interrupting OpenSSL operations
* and thus confusing that library and/or violating wire protocol.

plus your second para as-is.

But having said that ... are we sure this code is not actually broken?
ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
cannot safely attempt to send an error message to the client either;
but ereport(FATAL) will try exactly that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-16 Thread Pavan Deolasee
On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:


 But having said that ... are we sure this code is not actually broken?


I'm not.


 ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
 cannot safely attempt to send an error message to the client either;
 but ereport(FATAL) will try exactly that.


I thought since FATAL will force the backend to exit, we don't care much
about corrupted OpenSSL state. I even thought that's why we raise ERROR to
FATAL so that the backend can start in a clean state. But clearly I'm
missing a point here because you don't think that way.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-29 Thread Stephen Frost
Pavan,

* Pavan Deolasee (pavan.deola...@gmail.com) wrote:
 On Tue, Dec 4, 2012 at 6:01 PM, Pavan Deolasee 
 pavan.deola...@gmail.comwrote:
  Thanks Andres. I also read the original thread and I now understand why we
  are using FATAL here, at least until we have a better solution. Obviously
  the connection reset is no good either because as someone commented in the
  original discussion, I thought that I'm seeing a server crash while it was
  not.
 
 How about attached comment to be added at the appropriate place ? I've
 extracted this mostly from Tom's explanation in the original thread.

I was hoping to see an update to the actual error messages returned in
this patch..  I agree that it's good to add the comments but that
doesn't do anything to help the user out in this case.

Regarding the actual comment, here's the wording that I'd use:

---
If we are in DoingCommandRead state, we can't use ereport(ERROR)
because we can't long jumps in this state.  If we attempt to
longjmps in this state, we not only risk breaking protocol at
our level, but also risk leaving openssl in an inconsistent
state, either violating the ssl protocol or having its internal
state trashed because it was interrupted while in the middle of
changing that state.

Currently, the only option is to promote ERROR to FATAL until
we figure out a way to handle errors more effectively while in
this state.
---

If you agree with that wording update, can you update the patch?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-04 Thread Andres Freund
Hi,

On 2012-12-04 12:30:43 +0530, Pavan Deolasee wrote:
 I was trying some simple queries on a Hot Standby with streaming
 replication.

 On standby, I do this:
 postgres=# begin transaction isolation level repeatable read;
 BEGIN
 postgres=# explain verbose select count(b) from test WHERE a  10;

 On master, I insert a bunch of tuples in the table and run VACUUM ANALYZE.
 postgres=# INSERT INTO test VALUES (generate_series(110001,12), 'foo',
 1);
 INSERT 0 1
 postgres=# VACUUM ANALYZE test;
 VACUUM

 After max_standby_streaming_delay, the standby starts cancelling the
 queries. I get an error like this on the standby:
 postgres=# explain verbose select count(b) from test WHERE a  10;
 FATAL:  terminating connection due to conflict with recovery
 DETAIL:  User query might have needed to see row versions that must be
 removed.
 HINT:  In a moment you should be able to reconnect to the database and
 repeat your command.
 server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 The connection to the server was lost. Attempting reset: Succeeded.

 So I've couple questions/concerns here

 1. Why to throw a FATAL error here ? A plain ERROR should be enough to
 abort the transaction. There are four places in ProcessInterrupts() where
 we throw these kind of errors and three of them are FATAL.

The problem here is that were in IDLE IN TRANSACTION in this case. Which
currently cannot be cancelled (i.e. pg_cancel_backend() just won't do
anything).

There are two problems making this non-trivial. For one, while we're in
IDLE IN TXN the client doesn't expect a response on a protocol level, so
we can't simply ereport() at that time.
For another, when were in IDLE IN TXN we're potentially inside openssl
so we can't jump out of there anyway because that would quite likely
corrupt the internal state of openssl.

I tried to fix this before (c.f. Idle in transaction cancellation or
similar) but while I had some kind of fix for the first issue (i saved
the error and reported it later when the protocol state allows it) I
missed the jumping out of openssl bit. I think its not that hard to
solve though. I remember having something preliminary but I never had
the time to finish it. If I remember correctly the trick was to set
openssl into non-blocking mode temporarily and return to the caller
inside be-secure.c:my_sock_read.
At that location ProcessInterrupts can run safely, error out silently,
and reraise the error once were in the right protocol state.

 2836 else if (RecoveryConflictPending  RecoveryConflictRetryable)
 2837 {
 2838 pgstat_report_recovery_conflict(RecoveryConflictReason);
 2839 ereport(FATAL,
 2840 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 2841   errmsg(terminating connection due to conflict with
 recovery),
 2842  errdetail_recovery_conflict()));
 2843 }
 2844 else if (RecoveryConflictPending)
 2845 {
 2846 /* Currently there is only one non-retryable recovery
 conflict */
 2847 Assert(RecoveryConflictReason ==
 PROCSIG_RECOVERY_CONFLICT_DATABASE);
 2848 pgstat_report_recovery_conflict(RecoveryConflictReason);
 2849 ereport(FATAL,
 2850 (errcode(ERRCODE_DATABASE_DROPPED),
 2851   errmsg(terminating connection due to conflict with
 recovery),
 2852  errdetail_recovery_conflict()));
 2853 }

 AFAICS the first of these should be ereport(ERROR). Otherwise irrespective
 of whether RecoveryConflictRetryable is true or false, we will always
 ereport(FATAL).

Which is fine, because were below if (ProcDiePending). Note there's a
separate path for QueryCancelPending. We go on to kill connections once
the normal conflict handling has tried several times.

 2. For my test, the error message itself looks wrong because I did not
 actually remove any rows on the master. VACUUM probably marked a bunch of
 pages as all-visible and that should have triggered a conflict on the
 standby in order to support index-only scans. IMHO we should  improve the
 error message to avoid any confusion. Or we can add a new ProcSignalReason
 to differentiate between a cancel due to clean up vs visibilitymap_set()
 operation.

I think we desparately need to improve *all* of these message with
significantly more detail (cause for cancellation, relation, current
xid, conflicting xid, current/last query).

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-04 Thread Pavan Deolasee
On Tue, Dec 4, 2012 at 1:44 PM, Andres Freund and...@2ndquadrant.comwrote:


 
  After max_standby_streaming_delay, the standby starts cancelling the
  queries. I get an error like this on the standby:
  postgres=# explain verbose select count(b) from test WHERE a  10;
  FATAL:  terminating connection due to conflict with recovery
  DETAIL:  User query might have needed to see row versions that must be
  removed.
  HINT:  In a moment you should be able to reconnect to the database and
  repeat your command.
  server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
  The connection to the server was lost. Attempting reset: Succeeded.
 
  So I've couple questions/concerns here
 
  1. Why to throw a FATAL error here ? A plain ERROR should be enough to
  abort the transaction. There are four places in ProcessInterrupts() where
  we throw these kind of errors and three of them are FATAL.

 The problem here is that were in IDLE IN TRANSACTION in this case. Which
 currently cannot be cancelled (i.e. pg_cancel_backend() just won't do
 anything).

 There are two problems making this non-trivial. For one, while we're in
 IDLE IN TXN the client doesn't expect a response on a protocol level, so
 we can't simply ereport() at that time.
 For another, when were in IDLE IN TXN we're potentially inside openssl
 so we can't jump out of there anyway because that would quite likely
 corrupt the internal state of openssl.

 I tried to fix this before (c.f. Idle in transaction cancellation or
 similar) but while I had some kind of fix for the first issue (i saved
 the error and reported it later when the protocol state allows it) I
 missed the jumping out of openssl bit. I think its not that hard to
 solve though. I remember having something preliminary but I never had
 the time to finish it. If I remember correctly the trick was to set
 openssl into non-blocking mode temporarily and return to the caller
 inside be-secure.c:my_sock_read.


Thanks Andres. I also read the original thread and I now understand why we
are using FATAL here, at least until we have a better solution. Obviously
the connection reset is no good either because as someone commented in the
original discussion, I thought that I'm seeing a server crash while it was
not.




 
  AFAICS the first of these should be ereport(ERROR). Otherwise
 irrespective
  of whether RecoveryConflictRetryable is true or false, we will always
  ereport(FATAL).

 Which is fine, because were below if (ProcDiePending). Note there's a
 separate path for QueryCancelPending. We go on to kill connections once
 the normal conflict handling has tried several times.


Ok. Understood.I now see that every path below if (ProcDiePending) will
call FATAL, albeit with different error codes. That explains the current
code.




 I think we desparately need to improve *all* of these message with
 significantly more detail (cause for cancellation, relation, current
 xid, conflicting xid, current/last query).


I agree.

Thanks,
Pavan


-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-04 Thread Pavan Deolasee
On Tue, Dec 4, 2012 at 6:01 PM, Pavan Deolasee pavan.deola...@gmail.comwrote:



 Thanks Andres. I also read the original thread and I now understand why we
 are using FATAL here, at least until we have a better solution. Obviously
 the connection reset is no good either because as someone commented in the
 original discussion, I thought that I'm seeing a server crash while it was
 not.


How about attached comment to be added at the appropriate place ? I've
extracted this mostly from Tom's explanation in the original thread.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


clarify-fatal-error-in-conflict-resolve.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby, conflict resolution

2009-01-26 Thread Simon Riggs

On Sun, 2009-01-25 at 16:19 +, Simon Riggs wrote:
 On Fri, 2009-01-23 at 21:30 +0200, Heikki Linnakangas wrote:
 
  Ok, then I think we have a little race condition. The startup process 
  doesn't get any reply indicating that the target backend has
  processed 
  the SIGINT and set the cached conflict LSN. The target backend might 
  move ahead using the old LSN for a little while, even though the
  startup 
  process has already gone ahead and replayed a vacuum record.
  
  Another tiny issue is that it looks like a new conflict LSN always 
  overwrites the old one. But you should always use the oldest
  conflicted 
  LSN in the checks, not the newest.
 
 That makes it easier, because it is either not set, or it is set and
 does not need to be reset as new conflict LSNs appear.
 
 I can see a simple scheme emerging, which I will detail tomorrow
 morning.

Rather than signalling, we could use a hasconflict boolean for each proc
in a shared data structure. It can be read without spinlock, but should
only be written while holding spinlock.

Each time we read a block we check if hasconflict is set. If it is, we
grab spinlock, recheck if it is set, if so read the conflict details,
clear the flag and drop the spinlock.


The aim of this type of conflict resolution was to reduce the footprint
of users that would be effected and defer it as much as possible. We've
spent time getting the latestCompletedXid, but we know deriving that
value is very difficult in the btree case at least. So what I would like
to do is pass the relid of a conflict across as well and use that to
reduce the footprint, now that we are performing the test inside the
buffer manager.

We would keep a relid cache with a very small number of relids, perhaps
just one, maybe as many as 4 or 8, so that we can fit relids and
associated LSNs in a single cache line. We can match the relid using a
simple for loop, which we know is well optimised when there is no
dependency between the elements of the loop and the loop has a
compile-time fixed number of iterations.

I would be inclined to make this a separate shared memory area rather
than try to weld that onto PGPROC. We could index that using backendid.

If the relid cache overflows, we just apply a general LSN value.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby, conflict resolution

2009-01-26 Thread Heikki Linnakangas

Simon Riggs wrote:

Rather than signalling, we could use a hasconflict boolean for each proc
in a shared data structure. It can be read without spinlock, but should
only be written while holding spinlock.

Each time we read a block we check if hasconflict is set. If it is, we
grab spinlock, recheck if it is set, if so read the conflict details,
clear the flag and drop the spinlock.


Yeah, that seems workable.


The aim of this type of conflict resolution was to reduce the footprint
of users that would be effected and defer it as much as possible. We've
spent time getting the latestCompletedXid, but we know deriving that
value is very difficult in the btree case at least. So what I would like
to do is pass the relid of a conflict across as well and use that to
reduce the footprint, now that we are performing the test inside the
buffer manager.


I agree that would be useful, but I'd prefer to keep it simple for now...

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby, conflict resolution

2009-01-25 Thread Simon Riggs

On Fri, 2009-01-23 at 21:30 +0200, Heikki Linnakangas wrote:

 Ok, then I think we have a little race condition. The startup process 
 doesn't get any reply indicating that the target backend has
 processed 
 the SIGINT and set the cached conflict LSN. The target backend might 
 move ahead using the old LSN for a little while, even though the
 startup 
 process has already gone ahead and replayed a vacuum record.
 
 Another tiny issue is that it looks like a new conflict LSN always 
 overwrites the old one. But you should always use the oldest
 conflicted 
 LSN in the checks, not the newest.

That makes it easier, because it is either not set, or it is set and
does not need to be reset as new conflict LSNs appear.

I can see a simple scheme emerging, which I will detail tomorrow
morning.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby, conflict resolution

2009-01-23 Thread Simon Riggs

On Fri, 2009-01-23 at 17:51 +0200, Heikki Linnakangas wrote:

 In ERROR mode, you don't really want to interrupt the target backend. In 
 ReadBuffer, you're checking a global variable, 
 BufferRecoveryConflictPending on each call, and if it's set, you check 
 the buffer's LSN against the LSN of the earliest LSN conflicting LSN, 
 and throw an error if it's greater than that. Why do we jump through so 
 many hoops to get the earliest conflicting LSN to where it's needed? At 
 the moment:
 
 1. Startup process sets the LSN in the target backend's PGPROC entry, 
 and signals it with SIGINT.
 2. The target backend receives the signal; ProcessInterrupts is called 
 either immediately or at the next CHECK_FOR_INTERRUPTS() call.
 3. ProcessInterrupts reads the value from PGPROC, and passes it to bufmgr.c
 
 ISTM that if ReadBuffer read the value directly from the PGPROC entry, 
 there would be no need for the signaling (in the ERROR mode).

That is possible and I considered it. If we did it that way we would
need to read the PGPROC each time we read a buffer. AFAICS we would need
to use a spinlock to do that since reading an XLogRecPtr would not be
atomic.

So doing it the way I've done it allows us to use a local variable which
can be more easily cached and avoids the locking overhead.

We do still need to signal anyway for the FATAL case, so we're not
significantly affecting the patch footprint by changing that.

 Correct me if I'm wrong, but I thought the idea of this new conflict 
 resolution was that the startup process doesn't need to wait for the 
 target backend to die. Instead, the target backend knows to commit 
 suicide if it stumbles into a buffer that's been modified in a 
 conflicting way. Looking at ResolveRecoveryConflictWithVirtualXIDs, it 
 looks like we still wait.

err, no, that's just an oversight, not intentional.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby, conflict resolution

2009-01-23 Thread Heikki Linnakangas

Simon Riggs wrote:

On Fri, 2009-01-23 at 17:51 +0200, Heikki Linnakangas wrote:
ISTM that if ReadBuffer read the value directly from the PGPROC entry, 
there would be no need for the signaling (in the ERROR mode).


That is possible and I considered it. If we did it that way we would
need to read the PGPROC each time we read a buffer. AFAICS we would need
to use a spinlock to do that since reading an XLogRecPtr would not be
atomic.


Hmm, I think you could make it lock-free by ordering the instructions 
carefully. Not sure it's worth the code complexity, though.


Correct me if I'm wrong, but I thought the idea of this new conflict 
resolution was that the startup process doesn't need to wait for the 
target backend to die. Instead, the target backend knows to commit 
suicide if it stumbles into a buffer that's been modified in a 
conflicting way. Looking at ResolveRecoveryConflictWithVirtualXIDs, it 
looks like we still wait.


err, no, that's just an oversight, not intentional.


Ok, then I think we have a little race condition. The startup process 
doesn't get any reply indicating that the target backend has processed 
the SIGINT and set the cached conflict LSN. The target backend might 
move ahead using the old LSN for a little while, even though the startup 
process has already gone ahead and replayed a vacuum record.


Another tiny issue is that it looks like a new conflict LSN always 
overwrites the old one. But you should always use the oldest conflicted 
LSN in the checks, not the newest.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby, conflict resolution

2009-01-23 Thread Simon Riggs

On Fri, 2009-01-23 at 21:30 +0200, Heikki Linnakangas wrote:

  Correct me if I'm wrong, but I thought the idea of this new conflict 
  resolution was that the startup process doesn't need to wait for the 
  target backend to die. Instead, the target backend knows to commit 
  suicide if it stumbles into a buffer that's been modified in a 
  conflicting way. Looking at ResolveRecoveryConflictWithVirtualXIDs, it 
  looks like we still wait.
  
  err, no, that's just an oversight, not intentional.
 
 Ok, then I think we have a little race condition. The startup process 
 doesn't get any reply indicating that the target backend has processed 
 the SIGINT and set the cached conflict LSN. The target backend might 
 move ahead using the old LSN for a little while, even though the startup 
 process has already gone ahead and replayed a vacuum record.

Hah! You had me either way. Very neat :-)

I'll think some more and reply, but not tonight.

 Another tiny issue is that it looks like a new conflict LSN always 
 overwrites the old one. But you should always use the oldest conflicted 
 LSN in the checks, not the newest.

OK

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers