Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Andrew Dunstan



Hannu Krosing wrote:

Actually we came up with a solution to this - use filesystem level
snapshots (like LVM2+XFS or ZFS), and redirect backends with
long-running queries to use fs snapshot mounted to a different
mountpoint.

I don't think Simon has yet put full support for it in code, but it is
clearly _the_ solution for those who want to eat the cake and have it
too.

  



How does that work if you're using mutiple file systems via tablespaces 
(e.g. indexes in a different TS)?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Gregory Stark
Hannu Krosing ha...@krosing.net writes:

 Actually we came up with a solution to this - use filesystem level
 snapshots (like LVM2+XFS or ZFS), and redirect backends with
 long-running queries to use fs snapshot mounted to a different
 mountpoint.

Uhm, how do you determine which snapshot to direct the backend to? There could
have been several generations of tuples in that tid since your query started.
Do you take a snapshot every time there's a vacuum-snapshot conflict and
record which snapshot goes with that snapshot?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Robert Haas
 I don't see any way around the fact that when a tuple is removed, it's
 gone and can't be accessed by queries. Either you don't remove it, or
 you kill the query.

 Actually we came up with a solution to this - use filesystem level
 snapshots (like LVM2+XFS or ZFS), and redirect backends with
 long-running queries to use fs snapshot mounted to a different
 mountpoint.

 I don't think Simon has yet put full support for it in code, but it is
 clearly _the_ solution for those who want to eat the cake and have it
 too.

I think _the_ solution is to notice when you're about to vacuum a page
that is still visible to a running backend on the standby, and save
that page off to a separate cache of old page versions (perhaps using
the relation fork mechanism).  I suspect filesystem-level snapshots
can be made to work, but it's never going to be a portable solution,
and I suspect you're going to need a lot of the same bookkeeping
anyway (like, when you have page X in shared buffers, which version of
page X is it, anyway?).

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Hannu Krosing
On Tue, 2009-02-03 at 08:40 -0500, Andrew Dunstan wrote:
 
 Hannu Krosing wrote:
  Actually we came up with a solution to this - use filesystem level
  snapshots (like LVM2+XFS or ZFS), and redirect backends with
  long-running queries to use fs snapshot mounted to a different
  mountpoint.
 
  I don't think Simon has yet put full support for it in code, but it is
  clearly _the_ solution for those who want to eat the cake and have it
  too.
 

 
 
 How does that work if you're using mutiple file systems via tablespaces 
 (e.g. indexes in a different TS)?

Basically the same way we do WAL shipping up to 8.3. 

That is using external scripts. Once we have enough experience, we could
try to move tha actual snapshot-mount-switch inside the core

-- 
--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Hannu Krosing
On Tue, 2009-02-03 at 09:14 -0500, Robert Haas wrote:
  I don't see any way around the fact that when a tuple is removed, it's
  gone and can't be accessed by queries. Either you don't remove it, or
  you kill the query.
 
  Actually we came up with a solution to this - use filesystem level
  snapshots (like LVM2+XFS or ZFS), and redirect backends with
  long-running queries to use fs snapshot mounted to a different
  mountpoint.
 
  I don't think Simon has yet put full support for it in code, but it is
  clearly _the_ solution for those who want to eat the cake and have it
  too.
 
 I think _the_ solution is to notice when you're about to vacuum a page
 that is still visible to a running backend on the standby, and save
 that page off to a separate cache of old page versions (perhaps using
 the relation fork mechanism).  I suspect filesystem-level snapshots
 can be made to work, but it's never going to be a portable solution,
 and I suspect you're going to need a lot of the same bookkeeping
 anyway (like, when you have page X in shared buffers, which version of
 page X is it, anyway?).

The idea was to switch file pointers inside the backend needing old
versions, (and then flush cache if needed) so the only bookkeeping you
need is which fs snapshots you need to keep and which can be released.



-- 
--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Simon Riggs

On Tue, 2009-02-03 at 09:14 -0500, Robert Haas wrote:

 I think _the_ solution is to notice when you're about to vacuum a page
 that is still visible to a running backend on the standby, and save
 that page off to a separate cache of old page versions (perhaps using
 the relation fork mechanism).

I'll let you write that, for the next release...

The problem with all of this is we've been talking about it for 8 months
now and various opinions are held by people. What is being presented is
a broad set of options (summarised from Wiki)

1. Wait then cancel
a) always for databases, tablespaces and locks
b) deferred cancelation for buffer changes

2. We connect to Primary server from Standby server and keep a
transaction open using contrib/dblink functions, then commit as soon as
we are done.

3. We pause recovery by issuing a pg_recovery_pause() function, or start
server in pause mode using recovery_started_paused = on.

Yes, it's a critical area to the success of the feature. But this is
enough for first release and for us to get user feedback.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Hannu Krosing
On Tue, 2009-02-03 at 13:50 +, Gregory Stark wrote:
 Hannu Krosing ha...@krosing.net writes:
 
  Actually we came up with a solution to this - use filesystem level
  snapshots (like LVM2+XFS or ZFS), and redirect backends with
  long-running queries to use fs snapshot mounted to a different
  mountpoint.
 
 Uhm, how do you determine which snapshot to direct the backend to? There could
 have been several generations of tuples in that tid since your query started.
 Do you take a snapshot every time there's a vacuum-snapshot conflict and
 record which snapshot goes with that snapshot?

The most sensible thing to do seems to wait for some configurable period
(say a few seconds or a few minutes), delaying WAL apply, and then to do
the snaphot, mount it and redirect all running transactions to use
_current_ filesystem snapshot, and then resume WAL application. 

As each of the transactions running on saved fs snapshots complete, they
are switced back to main/live fs view. 

When the last there transaction ends, the snapshot is unmounted and
released.

-- 
--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Simon Riggs

On Tue, 2009-02-03 at 08:40 -0500, Andrew Dunstan wrote:
 
 Hannu Krosing wrote:
  Actually we came up with a solution to this - use filesystem level
  snapshots (like LVM2+XFS or ZFS), and redirect backends with
  long-running queries to use fs snapshot mounted to a different
  mountpoint.
 
  I don't think Simon has yet put full support for it in code, but it is
  clearly _the_ solution for those who want to eat the cake and have it
  too.

 How does that work if you're using mutiple file systems via tablespaces 
 (e.g. indexes in a different TS)?

It's a great idea and easy to do, but I can't do everything in one go.

The main reasons not to are multiple file system difficulties and lack
of a mainstream Linux solution, and more simply lack of time and test
resources.

So not now, maybe later.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Robert Haas
On Tue, Feb 3, 2009 at 9:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2009-02-03 at 09:14 -0500, Robert Haas wrote:
 I think _the_ solution is to notice when you're about to vacuum a page
 that is still visible to a running backend on the standby, and save
 that page off to a separate cache of old page versions (perhaps using
 the relation fork mechanism).

 I'll let you write that, for the next release...

LOL.  How many sponsorship dollars are available for that project?

 The problem with all of this is we've been talking about it for 8 months
 now and various opinions are held by people. What is being presented is
 a broad set of options (summarised from Wiki)

I think everyone understands that these are things we might want to do
down the line, not things we need to have now.  For this release, I
was under the impression that we'd pretty much settled on implementing
(1) and maybe (3) but not (2) from the below list.

 1. Wait then cancel
 a) always for databases, tablespaces and locks
 b) deferred cancelation for buffer changes

 2. We connect to Primary server from Standby server and keep a
 transaction open using contrib/dblink functions, then commit as soon as
 we are done.

 3. We pause recovery by issuing a pg_recovery_pause() function, or start
 server in pause mode using recovery_started_paused = on.

 Yes, it's a critical area to the success of the feature. But this is
 enough for first release and for us to get user feedback.

I completely agree.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Simon Riggs

On Tue, 2009-02-03 at 15:55 +0100, Andres Freund wrote:
 Hi,
 
 On 02/03/2009 02:26 PM, Hannu Krosing wrote:
  I don't see any way around the fact that when a tuple is removed, it's
  gone and can't be accessed by queries. Either you don't remove it, or
  you kill the query.
  Actually we came up with a solution to this - use filesystem level
  snapshots (like LVM2+XFS or ZFS), and redirect backends with
  long-running queries to use fs snapshot mounted to a different
  mountpoint.
 Isn't that really, really expensive?
 
 A single write on the master logical volume yields writes of PE size 
 for _every_ single snapshot (the first time the block is touched) - 
 considering that there could quite many such snapshots I don't think 
 that this is really feasible - io quite possible might be saturated.

If we did that we would provide an option to select the MVCC snapshot
that went with the filesystem snapshot. There need not be one per user.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Hannu Krosing
On Tue, 2009-02-03 at 10:19 -0500, Robert Haas wrote:
 On Tue, Feb 3, 2009 at 9:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On Tue, 2009-02-03 at 09:14 -0500, Robert Haas wrote:
  I think _the_ solution is to notice when you're about to vacuum a page
  that is still visible to a running backend on the standby, and save
  that page off to a separate cache of old page versions (perhaps using
  the relation fork mechanism).
 
  I'll let you write that, for the next release...
 
 LOL.  How many sponsorship dollars are available for that project?
 
  The problem with all of this is we've been talking about it for 8 months
  now and various opinions are held by people. What is being presented is
  a broad set of options (summarised from Wiki)
 
 I think everyone understands that these are things we might want to do
 down the line, not things we need to have now.  For this release, I
 was under the impression that we'd pretty much settled on implementing
 (1) and maybe (3) but not (2) from the below list.

(2) is something that you can always do manually if you need it. So no
reason to support it in HS code explicitly.

Once you keep trx open on master, 1 and 3 should not happen anymore
until you close that trx.

  1. Wait then cancel
  a) always for databases, tablespaces and locks
  b) deferred cancelation for buffer changes
 
  2. We connect to Primary server from Standby server and keep a
  transaction open using contrib/dblink functions, then commit as soon as
  we are done.
 
  3. We pause recovery by issuing a pg_recovery_pause() function, or start
  server in pause mode using recovery_started_paused = on.
 
  Yes, it's a critical area to the success of the feature. But this is
  enough for first release and for us to get user feedback.
 
 I completely agree.
 
 ...Robert
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Hannu Krosing
On Tue, 2009-02-03 at 14:28 +, Simon Riggs wrote:
 On Tue, 2009-02-03 at 08:40 -0500, Andrew Dunstan wrote:
  
  Hannu Krosing wrote:
   Actually we came up with a solution to this - use filesystem level
   snapshots (like LVM2+XFS or ZFS), and redirect backends with
   long-running queries to use fs snapshot mounted to a different
   mountpoint.
  
   I don't think Simon has yet put full support for it in code, but it is
   clearly _the_ solution for those who want to eat the cake and have it
   too.
 
  How does that work if you're using mutiple file systems via tablespaces 
  (e.g. indexes in a different TS)?
 
 It's a great idea and easy to do, but I can't do everything in one go.
 
 The main reasons not to are multiple file system difficulties and lack
 of a mainstream Linux solution, and more simply lack of time and test
 resources.

More general, but also lot harder, solution would be going back to roots
and implement what original postgres 4.2 and earlier versions were meant
to do - namely VACUUM was not meant to just discard older versions , but
rather move it to WORM storage (write once read many was all the rage
back then :) .

If we did that in a way that each relation, at least on warm standby ,
has its own archive fork, possibly in a separate tablespace for
cheaper storage, then we could basically apply WAL's as fast we want and
just move the old versions to archive. It will be slower(ish),
especially for HOT updates, but may be a good solution for lots of
usecases.

And the decision to do the archiving on master and WAL-copy to slave, or
just do it on slave only could probably be left to user.

Reintroducing keeping old tuples forever would also allow us to bring
back time travel feature, that is 

SELECT  AS OF 'yesterday afternoon'::timestamp;

Which was thrown out at the times we got WAL-logging.

To be really useful we should also have some way to know trx timestamps,
but that can be easily done using ticker feature from Slony -
SkyTools/pgQ, which could be run a a separate server thread similar to
what we do with background writer, autovacuum  etc. now.

 
 So not now, maybe later.
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Andres Freund

Hi,

On 02/03/2009 02:26 PM, Hannu Krosing wrote:

I don't see any way around the fact that when a tuple is removed, it's
gone and can't be accessed by queries. Either you don't remove it, or
you kill the query.

Actually we came up with a solution to this - use filesystem level
snapshots (like LVM2+XFS or ZFS), and redirect backends with
long-running queries to use fs snapshot mounted to a different
mountpoint.

Isn't that really, really expensive?

A single write on the master logical volume yields writes of PE size 
for _every_ single snapshot (the first time the block is touched) - 
considering that there could quite many such snapshots I don't think 
that this is really feasible - io quite possible might be saturated.


The default PE size is 4MB - but on most bigger systems it is set to a 
bigger size, so its just getting worse for bigger systems.


Sure, one might say, that this is an LVM deficiency - but I do knot know 
of any snapshot-able block layer doing it that way.


Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Simon Riggs

On Tue, 2009-02-03 at 18:09 +0200, Hannu Krosing wrote:
 On Tue, 2009-02-03 at 14:28 +, Simon Riggs wrote:
  On Tue, 2009-02-03 at 08:40 -0500, Andrew Dunstan wrote:
   
   Hannu Krosing wrote:
Actually we came up with a solution to this - use filesystem level
snapshots (like LVM2+XFS or ZFS), and redirect backends with
long-running queries to use fs snapshot mounted to a different
mountpoint.
   
I don't think Simon has yet put full support for it in code, but it is
clearly _the_ solution for those who want to eat the cake and have it
too.
  
   How does that work if you're using mutiple file systems via tablespaces 
   (e.g. indexes in a different TS)?
  
  It's a great idea and easy to do, but I can't do everything in one go.
  
  The main reasons not to are multiple file system difficulties and lack
  of a mainstream Linux solution, and more simply lack of time and test
  resources.
 
 More general, but also lot harder, solution would be going back to roots
 and implement what original postgres 4.2 and earlier versions were meant
 to do - namely VACUUM was not meant to just discard older versions , but
 rather move it to WORM storage (write once read many was all the rage
 back then :) .
 
 If we did that in a way that each relation, at least on warm standby ,
 has its own archive fork, possibly in a separate tablespace for
 cheaper storage, then we could basically apply WAL's as fast we want and
 just move the old versions to archive. It will be slower(ish),
 especially for HOT updates, but may be a good solution for lots of
 usecases.
 
 And the decision to do the archiving on master and WAL-copy to slave, or
 just do it on slave only could probably be left to user.
 
 Reintroducing keeping old tuples forever would also allow us to bring
 back time travel feature, that is 
 
 SELECT  AS OF 'yesterday afternoon'::timestamp;
 
 Which was thrown out at the times we got WAL-logging.
 
 To be really useful we should also have some way to know trx timestamps,
 but that can be easily done using ticker feature from Slony -
 SkyTools/pgQ, which could be run a a separate server thread similar to
 what we do with background writer, autovacuum  etc. now.

That might be the way to do the Named Restore Points that is
frequently requested.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-02-03 Thread Hannu Krosing
On Wed, 2009-01-28 at 22:19 +0200, Heikki Linnakangas wrote:
 Tom Lane wrote:
...
  Well, those unexpectedly cancelled queries could have represented
  critical functionality too.  I think this argument calls the entire
  approach into question.  If there is no safe setting for the parameter
  then we need to find a way to not have the parameter.
 
 We've gone through that already. Different ideas were hashed out around 
 September. There's four basic feasible approaches to what to do when an 
 incoming WAL record conflicts with a running read-only query:
 
 1. Kill the query. (max_standby_delay=0)
 2. Wait for the query to finish before continuing (max_standby_delay=-1)
 3. Have a feedback loop from standby to master, feeding an OldestXmin to 
 the master, preventing it from removing tuples that are still needed in 
 the standby.
 4. Allow the query to continue, knowing that it will return wrong results.
 
 I don't consider 4 to be an option. Option 3 has its own set of 
 drawbacks, as a standby can then cause bloat in the master, and in any 
 case we're not going to have it in this release. And then there's some 
 middle ground, like wait a while and then kill the query 
 (max_standby_delay  0).
 
 I don't see any way around the fact that when a tuple is removed, it's 
 gone and can't be accessed by queries. Either you don't remove it, or 
 you kill the query.

Actually we came up with a solution to this - use filesystem level
snapshots (like LVM2+XFS or ZFS), and redirect backends with
long-running queries to use fs snapshot mounted to a different
mountpoint.

I don't think Simon has yet put full support for it in code, but it is
clearly _the_ solution for those who want to eat the cake and have it
too.

 
 I think the max_standby_delay setting is fairly easy to explain. It 
 shouldn't be too hard for a DBA to set it correctly.
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Gregory Stark

I skimmed through the Hot Standby patch for a preliminary review. I noted the
following things, some minor tweaks, some just questions. None of the things I
noted are big issues unless some of the questions uncover issues.


1) This code is obviously a cut-pasto:

+   else if (strcmp(tok1, max_standby_delay) == 0)
+   {
+   errno = 0;
+   maxStandbyDelay = (TransactionId) strtoul(tok2, NULL, 0);
+   if (errno == EINVAL || errno == ERANGE)
+   ereport(FATAL,
+(errmsg(max_standby_delay is not a valid number: \%s\,
+tok2)));

Have you ever tested the behaviour with max_standby_delay = -1 ?

Also, the default max_standby_delay is currently 0 -- ie, kill queries
immediately upon detecting any conflicts at all -- which I don't really think
anyone would be happy with. I still *strongly* feel the default has to be the
non-destructive conservative -1.


2) This hard coded constant of 500ms seems arbitrary to me. If your use case
is a heavily-used reporting engine you might get this message all the time. I
think this either has to be configurable (warn_standby_delay?) or based on
max_standby_delay or some other parameter somehow.

+   /*
+* log that we have been waiting for a while now...
+*/
+   if (!logged  standbyWait_ms  500)


3) These two blocks of code seem unsatisfactory:

!   /*
!* Keep track of the block number of the lastBlockVacuumed, so
!* we can scan those blocks as well during WAL replay. This then
!* provides concurrency protection and allows btrees to be used
!* while in recovery.
!*/
!   if (lastBlockVacuumed  vstate-lastBlockVacuumed)
!   vstate-lastBlockVacuumed = lastBlockVacuumed;
! 


+   /*
+* XXXHS we don't actually need to read the block, we
+* just need to confirm it is unpinned. If we had a special call
+* into the buffer manager we could optimise this so that
+* if the block is not in shared_buffers we confirm it as unpinned.
+*/
+   buffer = XLogReadBufferExtended(xlrec-node, MAIN_FORKNUM, blkno, 
RBM_NORMAL);
+   if (BufferIsValid(buffer))
+   {
+   LockBufferForCleanup(buffer);   
+   UnlockReleaseBuffer(buffer);
+   }

Aside from the XXX comment (I thought we actually had such a call now, but if
not shouldn't we just add one instead of carping?) I'm not convinced this
handles all the cases that can arise. Notable, what happens if a previous
vacuum died in the middle of the scan?

I think we have a vacuum id which we use already with btrees to the same
issue. It seems to me this be more robust if you stamped the xlog record with
that id number and ensured it matched the id of the lastBloockVacuumed? Then
you could start from 0 if the id changes.


4) Why is this necessary?

+   if (IsRecoveryProcessingMode()  
+   locktag-locktag_type == LOCKTAG_OBJECT 
+   lockmode  AccessShareLock)
+   ereport(ERROR,
+   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg(cannot acquire lockmode %s on database objects while 
recovery is in progress, 
+   lockMethodTable-lockModeNames[lockmode]),
+errhint(Only AccessShareLock can be acquired on database 
objects during recovery.)));
+ 

Obviously we can't lock records (SELECT FOR UPDATE) since that requires write
access. But shouldn't we be able to do manual LOCK TABLE calls?

I hope this isn't the only interlock we have against trying to do write i/o or
DDL against tables...?


5) The code for killing off conflicting transactions looks odd to me but I
haven't really traced through it to see what it's doing. It seems to kill off
just one process? What if there are several holding the lock in question?
Also, it doesn't seem to take into account how long the various transactions
have held the lock?

Is there a risk of, for example, if you have a long report running against a
table and lots of OLTP requests against the same table it seems the conflict
resolution code would fire randomly into the crowd and hit mostly innocent
OLTP transactions until eventually it found the culprit report?


Also, it kills of backends using SIGINT which I *think* Tom went through and
made safe earlier this release, right?

In any case if the backend doesn't die off promptly we wait forever with no
further warnings or log messages. I would think we should at least print some
sort of message here, even if it's a can't happen elog. The doubling thing
is probably unnecessary too in this case.

+   if (!XLogRecPtrIsValid(conflict_LSN))
+   {
+   /* wait awhile for it to die */
+   pg_usleep(wontDieWait * 5000L);
+   wontDieWait *= 2;
+   }
+   }


6) I still don't 

Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Simon Riggs

On Wed, 2009-01-28 at 18:55 +, Gregory Stark wrote:

 I skimmed through the Hot Standby patch for a preliminary review. I noted the
 following things, some minor tweaks, some just questions. None of the things I
 noted are big issues unless some of the questions uncover issues.

Thanks for your attention.

 1) This code is obviously a cut-pasto:
 
 +   else if (strcmp(tok1, max_standby_delay) == 0)
 +   {
 +   errno = 0;
 +   maxStandbyDelay = (TransactionId) strtoul(tok2, NULL, 0);
 +   if (errno == EINVAL || errno == ERANGE)
 +   ereport(FATAL,
 +(errmsg(max_standby_delay is not a valid number: \%s\,
 +tok2)));
 
 Have you ever tested the behaviour with max_standby_delay = -1 ?

 Also, the default max_standby_delay is currently 0 -- ie, kill queries
 immediately upon detecting any conflicts at all -- which I don't really think
 anyone would be happy with. 

Agreed. As explained when I published that patch it is deliberately
severe to allow testing of conflict resolution and feedback on it.

 I still *strongly* feel the default has to be the
 non-destructive conservative -1.

I don't. Primarily, we must support high availability. It is much better
if we get people saying I get my queries cancelled and we say RTFM and
change parameter X, than if people say my failover was 12 hours behind
when I needed it to be 10 seconds behind and I lost a $1 million because
of downtime of Postgres and we say RTFM and change parameter X.

 2) This hard coded constant of 500ms seems arbitrary to me. If your use case
 is a heavily-used reporting engine you might get this message all the time. I
 think this either has to be configurable (warn_standby_delay?) or based on
 max_standby_delay or some other parameter somehow.
 
 +   /*
 +* log that we have been waiting for a while now...
 +*/
 +   if (!logged  standbyWait_ms  500)

Greg, that's a DEBUG5 message.

 3) These two blocks of code seem unsatisfactory:
 
 !   /*
 !* Keep track of the block number of the lastBlockVacuumed, so
 !* we can scan those blocks as well during WAL replay. This then
 !* provides concurrency protection and allows btrees to be used
 !* while in recovery.
 !*/
 !   if (lastBlockVacuumed  vstate-lastBlockVacuumed)
 !   vstate-lastBlockVacuumed = lastBlockVacuumed;
 ! 
 
 
 +   /*
 +* XXXHS we don't actually need to read the block, we
 +* just need to confirm it is unpinned. If we had a special call
 +* into the buffer manager we could optimise this so that
 +* if the block is not in shared_buffers we confirm it as 
 unpinned.
 +*/
 +   buffer = XLogReadBufferExtended(xlrec-node, MAIN_FORKNUM, blkno, 
 RBM_NORMAL);
 +   if (BufferIsValid(buffer))
 +   {
 +   LockBufferForCleanup(buffer);   
 +   UnlockReleaseBuffer(buffer);
 +   }
 
 Aside from the XXX comment (I thought we actually had such a call now, but if
 not shouldn't we just add one instead of carping?)

We should add many things. The equivalent optimisation of VACUUM in
normal running has not been done either. Considering we have both HOT
and visibility map enhanced VACUUM, its not a priority.

  I'm not convinced this
 handles all the cases that can arise. Notable, what happens if a previous
 vacuum died in the middle of the scan?

Nothing, because it won't then have removed any heap rows.

 I think we have a vacuum id which we use already with btrees to the same
 issue. It seems to me this be more robust if you stamped the xlog record with
 that id number and ensured it matched the id of the lastBloockVacuumed? Then
 you could start from 0 if the id changes.

 4) Why is this necessary?
 
 +   if (IsRecoveryProcessingMode()  
 +   locktag-locktag_type == LOCKTAG_OBJECT 
 +   lockmode  AccessShareLock)
 +   ereport(ERROR,
 +   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 +errmsg(cannot acquire lockmode %s on database objects while 
 recovery is in progress, 
 +   lockMethodTable-lockModeNames[lockmode]),
 +errhint(Only AccessShareLock can be acquired on database 
 objects during recovery.)));
 + 
 
 Obviously we can't lock records (SELECT FOR UPDATE) since that requires write
 access. But shouldn't we be able to do manual LOCK TABLE calls?

Read the rest of the comments on the locking code section.

 I hope this isn't the only interlock we have against trying to do write i/o or
 DDL against tables...?

No it's not.

 5) The code for killing off conflicting transactions looks odd to me but I
 haven't really traced through it to see what it's doing. It seems to kill off
 just one process? 

No, why do you think that?

 What if there are several holding the lock in 

Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Joshua D. Drake
On Wed, 2009-01-28 at 19:27 +, Simon Riggs wrote:
 On Wed, 2009-01-28 at 18:55 +, Gregory Stark wrote:

 Agreed. As explained when I published that patch it is deliberately
 severe to allow testing of conflict resolution and feedback on it.
 
  I still *strongly* feel the default has to be the
  non-destructive conservative -1.
 
 I don't. Primarily, we must support high availability. It is much better
 if we get people saying I get my queries cancelled and we say RTFM and
 change parameter X, than if people say my failover was 12 hours behind
 when I needed it to be 10 seconds behind and I lost a $1 million because
 of downtime of Postgres and we say RTFM and change parameter X.

If the person was stupid enough to configure it for such as thing they
deserve to the lose the money. Not to mention we have already lost them
as a user because they will blame postgresql regardless of reality as
evidenced by their inability to RTFM (or have a vendor that RTFMs) in
the first place.

I got to vote with Greg on this one.


Sincerely,

Joshua D. Drake


-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Heikki Linnakangas

Gregory Stark wrote:

6) I still don't understand why you need unobserved_xids. We don't need this
in normal running, an xid we don't know for certain is committed is exactly
the same as a transaction we know is currently running or aborted. So why do
you need it during HS?


In normal operation, any transaction that's in-progress has an entry in 
ProcArray. GetSnapshot() gathers the xids of all those in-progress 
transactions, so that they're seen as not-committed even when the 
they're later marked as committed in clog.


In HS, we might see the first WAL record of transaction 10 before we see 
the first WAL record of transaction 9. Without unobserved_xids, if you 
take a snapshot in the standby between those two WAL records, xid 10 is 
included in the snapshot, but 9 is not. If xact 9 later commits, it's 
marked in clog as committed, and it will suddenly appear as visible to 
the snapshot. To avoid that, when we replay the first WAL record of xact 
10, we also add 9 to the unobserved xid array, so that it's included in 
snapshots.


So, you can think of the unobserved xids array as an extension of 
ProcArray. The entries are like light-weight PGPROC entries. In fact I 
proposed earlier to simply create dummy PGPROC entries instead.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Simon Riggs

On Wed, 2009-01-28 at 18:55 +, Gregory Stark wrote:
 I still don't understand why you need unobserved_xids. We don't need
 this in normal running, an xid we don't know for certain is committed
 is exactly the same as a transaction we know is currently running or
 aborted. So why do you need it during HS?

All running transactions need to be part of the snapshot. This is true
on both standby and primary.

On primary we allocate new xids from a single counter, so there are no
gaps. Newly assigned xids go straight into the procarray and then are
removed later when they commit/abort.

In standby we only know what is in WAL. Xid assignment is not currently
WAL logged, so either we choose to WAL log each new xid and face the
performance consequences (IMHO, unacceptable), or we choose a different
strategy. UnobservedXids is that different strategy. Florian Pflug had a
different strategy, but he did have a strategy.

 The comment says:
 
 +  * This is very important because if we leave 
 +  * those xids out of the snapshot then they will appear to be
 already complete. 
 +  * Later, when they have actually completed this could lead to
 confusion as to 
 +  * whether those xids are visible or not, blowing a huge hole in
 MVCC. 
 +  * We need 'em.
 
 But that doesn't sound rational to me. I'm not sure what confusion
 this would cause. If they later actually complete then any existing
 snapshots would still not see them. And any later snapshots wouldn't
 be confused by the earlier conclusions.

If a xact is running when we take a snapshot, yet we do not include it
then bad things will happen, but not til later. If a xact is not in
snapshot *and* less than xamx then we presume it completed prior to the
snapshot. If that xact did subsequently commit *after* we took the
snapshot, then it's absence from the snapshot will make that data
visible if the xact committed, making it look like it committed *before*
the snapshot was taken. So the confusion results in an MVCC violation
and so we must handle this case correctly, or die.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Tom Lane
Joshua D. Drake j...@commandprompt.com writes:
 On Wed, 2009-01-28 at 19:27 +, Simon Riggs wrote:
 On Wed, 2009-01-28 at 18:55 +, Gregory Stark wrote:
 I still *strongly* feel the default has to be the
 non-destructive conservative -1.
 
 I don't. Primarily, we must support high availability. It is much better
 if we get people saying I get my queries cancelled and we say RTFM and
 change parameter X, than if people say my failover was 12 hours behind
 when I needed it to be 10 seconds behind and I lost a $1 million because
 of downtime of Postgres and we say RTFM and change parameter X.

 If the person was stupid enough to configure it for such as thing they
 deserve to the lose the money.

Well, those unexpectedly cancelled queries could have represented
critical functionality too.  I think this argument calls the entire
approach into question.  If there is no safe setting for the parameter
then we need to find a way to not have the parameter.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Aidan Van Dyk
* Tom Lane t...@sss.pgh.pa.us [090128 15:02]:
 
 Well, those unexpectedly cancelled queries could have represented
 critical functionality too.  I think this argument calls the entire
 approach into question.  If there is no safe setting for the parameter
 then we need to find a way to not have the parameter.

That's what we currently have without HS, but people aren't completely
satisfied with it either ;-)

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Greg Stark

(sorry for top posting -- blame apple)

I don't see anything dangerous with either setting. For use cases  
where the backup is the primary purpose then killing queries is fine.  
For use cases where the maching is a reporting machine then saving  
large amounts of archived logs is fine.


Engineering is about tradeoffs and these two use cases are  
intrinsically in conflict.


The alternative is postponing vacuuming on the master which is imho  
even worse than the disease.


--
Greg


On 28 Jan 2009, at 19:56, Tom Lane t...@sss.pgh.pa.us wrote:


Joshua D. Drake j...@commandprompt.com writes:

On Wed, 2009-01-28 at 19:27 +, Simon Riggs wrote:

On Wed, 2009-01-28 at 18:55 +, Gregory Stark wrote:

I still *strongly* feel the default has to be the
non-destructive conservative -1.


I don't. Primarily, we must support high availability. It is much  
better
if we get people saying I get my queries cancelled and we say  
RTFM and
change parameter X, than if people say my failover was 12 hours  
behind
when I needed it to be 10 seconds behind and I lost a $1 million  
because

of downtime of Postgres and we say RTFM and change parameter X.


If the person was stupid enough to configure it for such as thing  
they

deserve to the lose the money.


Well, those unexpectedly cancelled queries could have represented
critical functionality too.  I think this argument calls the entire
approach into question.  If there is no safe setting for the parameter
then we need to find a way to not have the parameter.

   regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Greg Stark
Put another way: your characterization is no more true than claiming  
there's no safe setting for statement_timeout since a large value  
means clog could overflow your disk and your tables could bloat.


(I note we default statement_timeout to off though)

--
Greg


On 28 Jan 2009, at 19:56, Tom Lane t...@sss.pgh.pa.us wrote:


Joshua D. Drake j...@commandprompt.com writes:

On Wed, 2009-01-28 at 19:27 +, Simon Riggs wrote:

On Wed, 2009-01-28 at 18:55 +, Gregory Stark wrote:

I still *strongly* feel the default has to be the
non-destructive conservative -1.


I don't. Primarily, we must support high availability. It is much  
better
if we get people saying I get my queries cancelled and we say  
RTFM and
change parameter X, than if people say my failover was 12 hours  
behind
when I needed it to be 10 seconds behind and I lost a $1 million  
because

of downtime of Postgres and we say RTFM and change parameter X.


If the person was stupid enough to configure it for such as thing  
they

deserve to the lose the money.


Well, those unexpectedly cancelled queries could have represented
critical functionality too.  I think this argument calls the entire
approach into question.  If there is no safe setting for the parameter
then we need to find a way to not have the parameter.

   regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Simon Riggs

On Wed, 2009-01-28 at 21:41 +0200, Heikki Linnakangas wrote:

 So, you can think of the unobserved xids array as an extension of 
 ProcArray. The entries are like light-weight PGPROC entries. In fact I
 proposed earlier to simply create dummy PGPROC entries instead.

Which we don't do because we don't know whether we are dealing with
top-level xids or subtransactions of already observed top-level xids.

Either way we have to rearrange things when we move from unobserved to
observed. A major difference is that what we have now works and what we
might have instead may not, which is being illustrated by recent
testing.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Jeff Davis
On Wed, 2009-01-28 at 19:27 +, Simon Riggs wrote:
 my failover was 12 hours behind when I needed it to be 10 seconds
  behind and I lost a $1 million because of downtime of Postgres

The same could be said for warm standby right now. Or Slony-I, for that
matter. I think that we can reasonably expect anyone implementing
asynchronous replication for HA to properly monitor the lag time.

There are many sources of latency in the process, so I don't think
anyone can expect 10 seconds without actually monitoring to verify what
the actual lag time is.

I apologize if my post is based on ignorance, I haven't followed your
patch as closely as others involved in this discussion.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Heikki Linnakangas

Tom Lane wrote:

Joshua D. Drake j...@commandprompt.com writes:

On Wed, 2009-01-28 at 19:27 +, Simon Riggs wrote:

On Wed, 2009-01-28 at 18:55 +, Gregory Stark wrote:

I still *strongly* feel the default has to be the
non-destructive conservative -1.

I don't. Primarily, we must support high availability. It is much better
if we get people saying I get my queries cancelled and we say RTFM and
change parameter X, than if people say my failover was 12 hours behind
when I needed it to be 10 seconds behind and I lost a $1 million because
of downtime of Postgres and we say RTFM and change parameter X.



If the person was stupid enough to configure it for such as thing they
deserve to the lose the money.


Well, those unexpectedly cancelled queries could have represented
critical functionality too.  I think this argument calls the entire
approach into question.  If there is no safe setting for the parameter
then we need to find a way to not have the parameter.


We've gone through that already. Different ideas were hashed out around 
September. There's four basic feasible approaches to what to do when an 
incoming WAL record conflicts with a running read-only query:


1. Kill the query. (max_standby_delay=0)
2. Wait for the query to finish before continuing (max_standby_delay=-1)
3. Have a feedback loop from standby to master, feeding an OldestXmin to 
the master, preventing it from removing tuples that are still needed in 
the standby.

4. Allow the query to continue, knowing that it will return wrong results.

I don't consider 4 to be an option. Option 3 has its own set of 
drawbacks, as a standby can then cause bloat in the master, and in any 
case we're not going to have it in this release. And then there's some 
middle ground, like wait a while and then kill the query 
(max_standby_delay  0).


I don't see any way around the fact that when a tuple is removed, it's 
gone and can't be accessed by queries. Either you don't remove it, or 
you kill the query.


I think the max_standby_delay setting is fairly easy to explain. It 
shouldn't be too hard for a DBA to set it correctly.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Simon Riggs

On Wed, 2009-01-28 at 11:33 -0800, Joshua D. Drake wrote:
 On Wed, 2009-01-28 at 19:27 +, Simon Riggs wrote:
  On Wed, 2009-01-28 at 18:55 +, Gregory Stark wrote:
 
  Agreed. As explained when I published that patch it is deliberately
  severe to allow testing of conflict resolution and feedback on it.
  
   I still *strongly* feel the default has to be the
   non-destructive conservative -1.
  
  I don't. Primarily, we must support high availability. It is much better
  if we get people saying I get my queries cancelled and we say RTFM and
  change parameter X, than if people say my failover was 12 hours behind
  when I needed it to be 10 seconds behind and I lost a $1 million because
  of downtime of Postgres and we say RTFM and change parameter X.
 
 If the person was stupid enough to configure it for such as thing they
 deserve to the lose the money. 

It was never intended to be 0, that was just for testing, as I said. But
a smallish integer number of seconds, say 10, 60, 300 or at most 600 is
reasonable.

Crash barriers can be moved, falling off a cliff is permanent. It's easy
to change the parameter if you don't like it, and too late to change it
if we set the default wrong. 

My name is on the patch and I take responsibility for such failures. I'm
not going to turn round and say but Josh said, kinda like Stan Laurel,
if it fails. I've never called any user stupid and never will, not while
they use Postgres, at least... If we get the default wrong then its down
to us.

Never cancelling queries is definitely a wrong default choice for an HA
server.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Simon Riggs

On Wed, 2009-01-28 at 14:56 -0500, Tom Lane wrote:

 Well, those unexpectedly cancelled queries could have represented
 critical functionality too.  I think this argument calls the entire
 approach into question.  If there is no safe setting for the parameter
 then we need to find a way to not have the parameter.

I see the opposite: We don't know what tradeoffs, if any, the user is
willing to put up with, so we need input. It is about resource
prioritisation and not for us to decide, since these reflect business
objectives not internal twangy things like join_collapse_limit.

The essential choice is What would you like the max failover time to
be?. Some users want one server with max 5 mins behind, some want two
servers, one with 0 seconds behind, one with 12 hours behind

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Heikki Linnakangas

Simon Riggs wrote:

The essential choice is What would you like the max failover time to
be?. Some users want one server with max 5 mins behind, some want two
servers, one with 0 seconds behind, one with 12 hours behind


It's not quite that simple. Setting max_standby_delay=5mins means that 
you're willing to wait 5 minutes for each query to die. Which means that 
in worst case you have to stop for 5 minutes at every single vacuum 
record, and fall behind much more than 5 minutes.


You could make it more like that by tracking the timestamps in commit 
records, and/or having some sort of a moving average logic in the 
timeout, where you allow more waiting if you haven't waited for a long 
time, and kill queries more aggressively if you've had to wait a lot 
recently.


It should also be noted that the control functions allow you to connect 
to the database and manually pause/resume the replay. So you can for 
example set max_standby_delay=0 during the day, but pause the replay 
manually before starting a nightly report.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Simon Riggs

On Wed, 2009-01-28 at 22:47 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  The essential choice is What would you like the max failover time to
  be?. Some users want one server with max 5 mins behind, some want two
  servers, one with 0 seconds behind, one with 12 hours behind
 
 It's not quite that simple. 

In this case, yes it is.

 Setting max_standby_delay=5mins means that 
 you're willing to wait 5 minutes for each query to die. Which means that 
 in worst case you have to stop for 5 minutes at every single vacuum 
 record, and fall behind much more than 5 minutes.

That's not how this patch works.

 You could make it more like that by tracking the timestamps in commit 
 records

Which is what we do.

 It should also be noted that the control functions allow you to connect 
 to the database and manually pause/resume the replay. So you can for 
 example set max_standby_delay=0 during the day, but pause the replay 
 manually before starting a nightly report.

Yes, thank you for bringing balance to the discussions.

Please everybody read this before commenting further.

http://wiki.postgresql.org/wiki/Hot_Standby#Usage

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Jeff Davis
On Wed, 2009-01-28 at 22:47 +0200, Heikki Linnakangas wrote:
 It's not quite that simple. Setting max_standby_delay=5mins means that 
 you're willing to wait 5 minutes for each query to die. Which means that 
 in worst case you have to stop for 5 minutes at every single vacuum 
 record, and fall behind much more than 5 minutes.

Just trying to follow along: are you talking about the situation where
there are (for example) a continuous stream of select pg_sleep(600) on
the standby, and a series of rapid VACUUMs on the primary?

This situation might be more likely now that we have partial VACUUMs.

 It should also be noted that the control functions allow you to connect 
 to the database and manually pause/resume the replay. So you can for 
 example set max_standby_delay=0 during the day, but pause the replay 
 manually before starting a nightly report.
 

That's a very cool feature.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Robert Haas
 I don't. Primarily, we must support high availability. It is much better
 if we get people saying I get my queries cancelled and we say RTFM and
 change parameter X, than if people say my failover was 12 hours behind
 when I needed it to be 10 seconds behind and I lost a $1 million because
 of downtime of Postgres and we say RTFM and change parameter X.

 If the person was stupid enough to configure it for such as thing they
 deserve to the lose the money. Not to mention we have already lost them
 as a user because they will blame postgresql regardless of reality as
 evidenced by their inability to RTFM (or have a vendor that RTFMs) in
 the first place.

 I got to vote with Greg on this one.

I vote with Simon.  The thing is that if you get some queries
cancelled, you'll realize you have a problem.  Now you have several
options for what to do to fix it.   Having your failover be 12 hours
behind (or 12 months behind) is something that it would be much easier
to not realize.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Gregory Stark
Simon Riggs si...@2ndquadrant.com writes:

 On Wed, 2009-01-28 at 14:56 -0500, Tom Lane wrote:

 Well, those unexpectedly cancelled queries could have represented
 critical functionality too.  I think this argument calls the entire
 approach into question.  If there is no safe setting for the parameter
 then we need to find a way to not have the parameter.

 I see the opposite: We don't know what tradeoffs, if any, the user is
 willing to put up with, so we need input. 

Well, if you see it that way then it seems to me you should be arguing for
making max_standby_delay a mandatory parameter. Without it don't start allow
connections. I hadn't considered that and am not exactly sure where I would
stand on it.

 The essential choice is What would you like the max failover time to
 be?. Some users want one server with max 5 mins behind, some want two
 servers, one with 0 seconds behind, one with 12 hours behind

Sure. But if they don't configure one then we shouldn't impose one. You're
thinking of precisely one use case and taking positive action to interrupt the
user's requests on the basis of it. But there are plenty of other use cases. I
claim the default has to be to do as the user instructed without intervention.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I vote with Simon.  The thing is that if you get some queries
 cancelled, you'll realize you have a problem.

... or if you don't, they couldn't have been all that critical.

 Having your failover be 12 hours
 behind (or 12 months behind) is something that it would be much easier
 to not realize.

Okay, I'm sold, positive max_standby_delay should be the default.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-27 Thread Simon Riggs

On Sat, 2009-01-24 at 17:24 +1300, Mark Kirkwood wrote:

 I'm doing some test runs with this now. I notice an old flatfiles 
 related bug has reappeared:

Bug fix patch against git repo.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
***
*** 1381,1387  RecordTransactionAbort(bool isSubXact)
  	 * main xacts, the equivalent happens just after this function returns.
  	 */
  	if (isSubXact)
! 		XidCacheRemoveRunningXids(xid, nchildren, children, latestXid);
  
  	/* Reset XactLastRecEnd until the next transaction writes something */
  	if (!isSubXact)
--- 1381,1387 
  	 * main xacts, the equivalent happens just after this function returns.
  	 */
  	if (isSubXact)
! 		XidCacheRemoveRunningXids(MyProc, xid, nchildren, children, latestXid);
  
  	/* Reset XactLastRecEnd until the next transaction writes something */
  	if (!isSubXact)
***
*** 4536,4541  RecordKnownAssignedTransactionIds(XLogRecPtr lsn, TransactionId top_xid, Transac
--- 4536,4548 
  		{
  			int			nxids = myproc-subxids.nxids;
  
+ 			/*
+ 			 * It's possible for us to overflow the subxid cache and then
+ 			 * for a subtransaction abort to reduce the number of subxids
+ 			 * in the cache below the cache threshold again. If that happens
+ 			 * then it's still OK for us to use the subxid cache again, since
+ 			 * once its in the cache it lives there till abort or commit.
+ 			 */
  			if (nxids  PGPROC_MAX_CACHED_SUBXIDS)
  			{
  /* 
***
*** 4621,4629  RecordKnownAssignedTransactionIds(XLogRecPtr lsn, TransactionId top_xid, Transac
  	LWLockRelease(ProcArrayLock);
  
  	elog(trace_recovery(DEBUG4), 
! 	record known xact top_xid %u child_xid %u %slatestObservedXid %u,
  	top_xid, child_xid,
  	(unobserved ? unobserved  :  ),
  	latestObservedXid);
  
  	/* 
--- 4628,4637 
  	LWLockRelease(ProcArrayLock);
  
  	elog(trace_recovery(DEBUG4), 
! 	record known xact top_xid %u child_xid %u %s%slatestObservedXid %u,
  	top_xid, child_xid,
  	(unobserved ? unobserved  :  ),
+ 	(mark_subtrans ? mark subtrans  :  ),
  	latestObservedXid);
  
  	/* 
***
*** 4690,4707  xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, bool preparedXact)
  	PGPROC	   *proc;
  	int			i;
  
- 	/* Make sure nextXid is beyond any XID mentioned in the record */
- 	max_xid = xid;
  	sub_xids = (TransactionId *) (xlrec-xnodes[xlrec-nrels]);
  
! 	/*
! 	 * Find the highest xid and remove unobserved xids if required.
! 	 */
! 	for (i = 0; i  xlrec-nsubxacts; i++)
! 	{
! 		if (TransactionIdPrecedes(max_xid, sub_xids[i]))
! 			max_xid = sub_xids[i];
! 	}
  
  	/* Mark the transaction committed in pg_clog */
  	TransactionIdCommitTree(xid, xlrec-nsubxacts, sub_xids);
--- 4698,4706 
  	PGPROC	   *proc;
  	int			i;
  
  	sub_xids = (TransactionId *) (xlrec-xnodes[xlrec-nrels]);
  
! 	max_xid = TransactionIdLatest(xid, xlrec-nsubxacts, sub_xids);
  
  	/* Mark the transaction committed in pg_clog */
  	TransactionIdCommitTree(xid, xlrec-nsubxacts, sub_xids);
***
*** 4720,4726  xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, bool preparedXact)
  		 */
  		if (IsRunningXactDataValid()  !preparedXact)
  		{
! 			ProcArrayRemove(proc, InvalidTransactionId, xlrec-nsubxacts, sub_xids);
  			FreeRecoveryProcess(proc);
  		}
  
--- 4719,4725 
  		 */
  		if (IsRunningXactDataValid()  !preparedXact)
  		{
! 			ProcArrayRemove(proc, max_xid, xlrec-nsubxacts, sub_xids);
  			FreeRecoveryProcess(proc);
  		}
  
***
*** 4790,4821  xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, bool preparedXact)
  /*
   * Be careful with the order of execution, as with xact_redo_commit().
   * The two functions are similar but differ in key places.
   */
  static void
! xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid, bool preparedXact)
  {
  	PGPROC		*proc = NULL;
  	TransactionId *sub_xids;
  	TransactionId max_xid;
  	int			i;
  
- 	/* Make sure nextXid is beyond any XID mentioned in the record */
- 	max_xid = xid;
  	sub_xids = (TransactionId *) (xlrec-xnodes[xlrec-nrels]);
! 
! 	/*
! 	 * Find the highest xid and remove unobserved xids if required.
! 	 */
! 	for (i = 0; i  xlrec-nsubxacts; i++)
! 	{
! 		if (TransactionIdPrecedes(max_xid, sub_xids[i]))
! 			max_xid = sub_xids[i];
! 	}
  
  	/* Mark the transaction aborted in pg_clog */
  	TransactionIdAbortTree(xid, xlrec-nsubxacts, sub_xids);
  
! 	if (InArchiveRecovery  (proc = BackendXidGetProc(xid)) != NULL)
  	{
  		/*
  		 * We must mark clog before we update the ProcArray. Only update
--- 4789,4815 
  /*
   * Be careful with the order of execution, as with xact_redo_commit().
   * The two functions are similar but differ in key places.
+  *
+  * Note also that an abort can be for a subtransaction and its children,
+  * not just for a 

Re: [HACKERS] Hot Standby (v9d)

2009-01-24 Thread Simon Riggs

On Sat, 2009-01-24 at 17:24 +1300, Mark Kirkwood wrote:

  version 9g - please use this for testing now

 I'm doing some test runs with this now. I notice an old flatfiles 
 related bug has reappeared:

I'm seeing an off-by-one error on xmax, in some cases. That then causes
the flat file update to not pick up correct info, even though it
executed in other ways as intended. If you run two create databases and
then test only the first, it appears to have worked as intended.

These bugs are result of recent refactoring and it will take a few days
to shake some of them out. We've had more than 20 already so we're
beating them back, but we're not done yet. 

Thanks for the report, will publish this and other fixes on Monday.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-24 Thread Simon Riggs

On Sat, 2009-01-24 at 11:20 +, Simon Riggs wrote:
 On Sat, 2009-01-24 at 17:24 +1300, Mark Kirkwood wrote:
 
   version 9g - please use this for testing now
 
  I'm doing some test runs with this now. I notice an old flatfiles 
  related bug has reappeared:
 
 I'm seeing an off-by-one error on xmax, in some cases. That then causes
 the flat file update to not pick up correct info, even though it
 executed in other ways as intended. If you run two create databases and
 then test only the first, it appears to have worked as intended.
 
 These bugs are result of recent refactoring and it will take a few days
 to shake some of them out. We've had more than 20 already so we're
 beating them back, but we're not done yet. 

I was at a loss to explain how this could have slipped through our
tests. It appears that the error was corrected following each checkpoint
as a result of ProcArrayUpdateRunningXacts(). Our tests were performed
after a short delay, which typically would be greater than the
deliberately short setting of checkpoint_timeout/archive_timeout and so
by the time we looked the error was gone and masked the problem. We're
setting checkpoint_timeout to 30 mins now to avoid the delay...

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-24 Thread Mark Kirkwood

Simon Riggs wrote:

On Sat, 2009-01-24 at 11:20 +, Simon Riggs wrote:
  

On Sat, 2009-01-24 at 17:24 +1300, Mark Kirkwood wrote:



version 9g - please use this for testing now

I'm doing some test runs with this now. I notice an old flatfiles 
related bug has reappeared:
  

I'm seeing an off-by-one error on xmax, in some cases. That then causes
the flat file update to not pick up correct info, even though it
executed in other ways as intended. If you run two create databases and
then test only the first, it appears to have worked as intended.

These bugs are result of recent refactoring and it will take a few days
to shake some of them out. We've had more than 20 already so we're
beating them back, but we're not done yet. 



I was at a loss to explain how this could have slipped through our
tests. It appears that the error was corrected following each checkpoint
as a result of ProcArrayUpdateRunningXacts(). Our tests were performed
after a short delay, which typically would be greater than the
deliberately short setting of checkpoint_timeout/archive_timeout and so
by the time we looked the error was gone and masked the problem. We're
setting checkpoint_timeout to 30 mins now to avoid the delay...

  
That makes sense - I had archive_timeout set at 5 minutes, and I would 
have checked before 5 minutes were up.


Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Heikki Linnakangas

Simon Riggs wrote:

* Put corrected version into GIT
* Produce outstanding items as patch-on-patch via GIT


I've applied the hot standby patch and recovery infra v9 patch to 
branches in my git repository, and pushed those to:


git://git.postgresql.org/git/~hlinnaka/pgsql.git

You can clone that to get started.

I've set those branches up so that the hot standby branch is branched 
off from the recovery infra branch. I'd like to keep the two separate, 
as the recovery infra patch is useful on it's own, and can be reviewed 
separately.


As a teaser, I made a couple of minor changes after importing your 
patches. For the sake of the archives, I've also included those changes 
as patches here.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
From 6d583356063c9d3c8d0e69233a40065bc5d7bde1 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas hei...@enterprisedb.com
Date: Fri, 23 Jan 2009 14:37:05 +0200
Subject: [PATCH] Remove padding in XLogCtl; might be a good idea, but let's keep it simple
 for now.

Also remove the XLogCtl-mode_lck spinlock, which is pretty pointless for
a single boolean that's only written by one process.
---
 src/backend/access/transam/xlog.c |   24 +---
 1 files changed, 1 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fcf5657..42c4552 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -340,18 +340,11 @@ typedef struct XLogCtlWrite
 
 /*
  * Total shared-memory state for XLOG.
- *
- * This small structure is accessed by many backends, so we take care to
- * pad out the parts of the structure so they can be accessed by separate
- * CPUs without causing false sharing cache flushes. Padding is generous
- * to allow for a wide variety of CPU architectures.
  */
-#define	XLOGCTL_BUFFER_SPACING	128
 typedef struct XLogCtlData
 {
 	/* Protected by WALInsertLock: */
 	XLogCtlInsert Insert;
-	char	InsertPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlInsert)];
 
 	/* Protected by info_lck: */
 	XLogwrtRqst LogwrtRqst;
@@ -359,16 +352,9 @@ typedef struct XLogCtlData
 	uint32		ckptXidEpoch;	/* nextXID  epoch of latest checkpoint */
 	TransactionId ckptXid;
 	XLogRecPtr	asyncCommitLSN; /* LSN of newest async commit */
-	/* add data structure padding for above info_lck declarations */
-	char	InfoPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogwrtRqst) 
-			- sizeof(XLogwrtResult)
-			- sizeof(uint32)
-			- sizeof(TransactionId)
-			- sizeof(XLogRecPtr)];
 
 	/* Protected by WALWriteLock: */
 	XLogCtlWrite Write;
-	char	WritePadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlWrite)];
 
 	/*
 	 * These values do not change after startup, although the pointed-to pages
@@ -390,11 +376,8 @@ typedef struct XLogCtlData
 	 * always during Recovery Processing Mode. This allows us to identify
 	 * code executed *during* Recovery Processing Mode but not necessarily
 	 * by Startup process itself.
-	 *
-	 * Protected by mode_lck
 	 */
 	bool		SharedRecoveryProcessingMode;
-	slock_t		mode_lck;
 
 	/*
 	 * recovery target control information
@@ -410,8 +393,6 @@ typedef struct XLogCtlData
 	TransactionId 	recoveryLastXid;
 	XLogRecPtr		recoveryLastRecPtr;
 
-	char		InfoLockPadding[XLOGCTL_BUFFER_SPACING];
-
 	slock_t		info_lck;		/* locks shared variables shown above */
 } XLogCtlData;
 
@@ -4399,7 +4380,6 @@ XLOGShmemInit(void)
 	XLogCtl-XLogCacheBlck = XLOGbuffers - 1;
 	XLogCtl-Insert.currpage = (XLogPageHeader) (XLogCtl-pages);
 	SpinLockInit(XLogCtl-info_lck);
-	SpinLockInit(XLogCtl-mode_lck);
 
 	/*
 	 * If we are not in bootstrap mode, pg_control should already exist. Read
@@ -6183,9 +6163,7 @@ IsRecoveryProcessingMode(void)
 		if (xlogctl == NULL)
 			return false;
 
-		SpinLockAcquire(xlogctl-mode_lck);
-		LocalRecoveryProcessingMode = XLogCtl-SharedRecoveryProcessingMode;
-		SpinLockRelease(xlogctl-mode_lck);
+		LocalRecoveryProcessingMode = xlogctl-SharedRecoveryProcessingMode;
 	}
 
 	knownProcessingMode = true;
-- 
1.5.6.5

From 4061ac8f84cc699bf0f417689f853791089ed472 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas hei...@enterprisedb.com
Date: Fri, 23 Jan 2009 15:55:33 +0200
Subject: [PATCH] Remove knownProcessingMode variable.

---
 src/backend/access/transam/xlog.c |   19 ---
 1 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7e480e2..e64fb48 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -126,9 +126,11 @@ bool		InRecovery = false;
 /* Are we recovering using offline XLOG archives? */
 static bool InArchiveRecovery = false;
 
-/* Local copy of shared RecoveryProcessingMode state */
+/*
+ * Local copy of shared RecoveryProcessingMode state. True actually
+ * means not known, need to check the shared state
+ */
 static bool LocalRecoveryProcessingMode = true;
-static bool 

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Simon Riggs

On Fri, 2009-01-23 at 16:14 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  * Put corrected version into GIT
  * Produce outstanding items as patch-on-patch via GIT
 
 I've applied the hot standby patch and recovery infra v9 patch to 
 branches in my git repository, and pushed those to:
 
 git://git.postgresql.org/git/~hlinnaka/pgsql.git
 
 You can clone that to get started.
 
 I've set those branches up so that the hot standby branch is branched 
 off from the recovery infra branch. I'd like to keep the two separate, 
 as the recovery infra patch is useful on it's own, and can be reviewed 
 separately.
 
 As a teaser, I made a couple of minor changes after importing your 
 patches. For the sake of the archives, I've also included those changes 
 as patches here.

OK, I'll look at those. I've fixed 6 bugs today (!), so I'd rather go
for the new version coming out in an hour. That's too many to unpick
unfortunately.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Heikki Linnakangas

Simon Riggs wrote:

On Fri, 2009-01-23 at 16:14 +0200, Heikki Linnakangas wrote:

I made a couple of minor changes after importing your 
patches. 


I've applied them both to v9g, attached here.

If you wouldn't mind redoing the initial step, I will promise not to do
anything else to the code, except via patch on GIT.


No problem. I don't need to do it from scratch, I'll just apply the 
changes from that patch as an incremental commit. Done, you can see it 
in my git repo now too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Heikki Linnakangas

@@ -1601,6 +1602,24 @@ BufferProcessRecoveryConflictsIfAny(volatile BufferDesc 
*bufHdr)
{
XLogRecPtr  bufLSN = BufferGetLSN(bufHdr);
 
+   /*

+* If the buffer is recent we may need to cancel ourselves
+* rather than risk returning a wrong answer. This test is
+* too conservative, but it is correct.
+*

+* We only need to cancel the current subtransaction.

+* Once we've handled the error then other subtransactions can
+* continue processing. Note that we do *not* reset the
+* BufferRecoveryConflictLSN at subcommit/abort, but we do
+* reset it if we release our last remaining sbapshot.
+* see SnapshotResetXmin()
+*


Is it really enough to cancel just the current subtransaction? What if 
it's a serializable transaction?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Simon Riggs

On Fri, 2009-01-23 at 18:22 +0200, Heikki Linnakangas wrote:
  @@ -1601,6 +1602,24 @@ BufferProcessRecoveryConflictsIfAny(volatile 
  BufferDesc *bufHdr)
  {
  XLogRecPtr  bufLSN = BufferGetLSN(bufHdr);
   
  +   /*
  +* If the buffer is recent we may need to cancel ourselves
  +* rather than risk returning a wrong answer. This test is
  +* too conservative, but it is correct.
  +*
  +* We only need to cancel the current subtransaction.
  +* Once we've handled the error then other subtransactions 
  can
  +* continue processing. Note that we do *not* reset the
  +* BufferRecoveryConflictLSN at subcommit/abort, but we do
  +* reset it if we release our last remaining sbapshot.
  +* see SnapshotResetXmin()
  +*
 
 Is it really enough to cancel just the current subtransaction? What if 
 it's a serializable transaction?

I did originally think that when I first looked at the problem. I'm
sorry if I say that a lot. 

If you have a serializable transaction with subtransactions that suffers
a serializability error it only cancels the current subtransaction. That
means it's snapshot is still valid and can be used again. By analogy, as
long as a transaction does not see any data that is inconsistent with
its snapshot it seems OK for it to continue. So I think it is correct.

(Bizarrely, this might mean that if we did this programatically in a
loop we might keep the system busy for some time while it continually
re-reads data and fails. But that's another story).

You remind me that we can now do what Kevin has requested and throw a
errcode(ERRCODE_T_R_SERIALIZATION_FAILURE) at this point, which I agree
is the most easily understood way of describing this error.

(I was sorely tempted to make it snapshot too old, as a joke).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Heikki Linnakangas

Simon Riggs wrote:

If you have a serializable transaction with subtransactions that suffers
a serializability error it only cancels the current subtransaction. That
means it's snapshot is still valid and can be used again. By analogy, as
long as a transaction does not see any data that is inconsistent with
its snapshot it seems OK for it to continue. So I think it is correct.


Yeah, you're right. How bizarre.


(I was sorely tempted to make it snapshot too old, as a joke).


Yeah, that is a very describing message, actually. If there wasn't any 
precedence to that, I believe we might have came up with exactly that 
message ourselves.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Alvaro Herrera
Heikki Linnakangas wrote:
 Simon Riggs wrote:
 If you have a serializable transaction with subtransactions that suffers
 a serializability error it only cancels the current subtransaction. That
 means it's snapshot is still valid and can be used again. By analogy, as
 long as a transaction does not see any data that is inconsistent with
 its snapshot it seems OK for it to continue. So I think it is correct.

 Yeah, you're right. How bizarre.

It was argued this way to me way back when subtransactions were written.
Originally I had written it in such a way as to abort the whole
transaction, on the rationale that if you're blindly restarting the
subtransaction after a serialization error, it would result in a (maybe
infinite) loop.  I think the reason it only aborts the subxact is that
that's what all other errors do, so why would this one act differently.

Hmm, now that I think about it, I think it was deadlock errors not
serialization errors ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Greg Stark






If you have a serializable transaction with subtransactions that  
suffers

a serializability error it only cancels the current subtransaction.


This is a complete tangent to your work, but I wonder if this is  
really right. I mean it's not as if we could have srrialized the  
transaction as a whole with respect to whatever other transaction we  
failed.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Jeff Davis
On Fri, 2009-01-23 at 20:13 +, Greg Stark wrote:
  If you have a serializable transaction with subtransactions that  
  suffers
  a serializability error it only cancels the current subtransaction.
 
 This is a complete tangent to your work, but I wonder if this is  
 really right. I mean it's not as if we could have srrialized the  
 transaction as a whole with respect to whatever other transaction we  
 failed.

Isn't this back to the discussion about PostgreSQL serializability
versus true serializability?

I agree that's not true serializability.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Mark Kirkwood

Simon Riggs wrote:

On Thu, 2009-01-22 at 22:35 +, Simon Riggs wrote:

  

* Bug fix v9 over next few days



version 9g - please use this for testing now

  


I'm doing some test runs with this now. I notice an old flatfiles 
related bug has reappeared:


master:

=# create database test;

slave:

=# select datname from pg_database where datname like 'test';
datname
-
test
(1 row)

postgres=# \c test
FATAL:  database test does not exist
Previous connection kept


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers