Re: [HACKERS] Simplifying "standby mode"

2006-09-13 Thread Simon Riggs
On Tue, 2006-09-12 at 16:23 -0400, Tom Lane wrote:
> Gregory Stark <[EMAIL PROTECTED]> writes:
> > Simon Riggs <[EMAIL PROTECTED]> writes:
> >>> My memory is lousy at the best of times, but when have we had a minor 
> >>> release that would have broken this due to changed format?
> 
> >> Not often, which is why I mention the possibility of having
> >> interoperating minor release levels at all. If it was common, I'd just
> >> put a blanket warning on doing that.
> 
> > I don't know that it's happened in the past but I wouldn't be surprised.
> > Consider that the bug being fixed in the point release may well be a bug in
> > WAL log formatting. 
> 
> This would be the exception, not the rule, and should not be documented
> as if it were the rule.  It's not really different from telling people
> to expect a forced initdb at a minor release: you are simply
> misrepresenting the project's policy.

OK, that's clear. I'll word it the other way around.

SGML'd version will go straight to -patches.

--

Other Questions and Changes:: please shout them in now.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Simplifying "standby mode"

2006-09-12 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> The project policy has always been that we don't change on-disk formats
> in minor releases.  I'm not entirely clear why you are so keen on
> carving out an exception for WAL data.

I had always thought of the policy as "initdb is not required" not "no on-disk
format changes". In that light you're suggesting extending the policy which I
guess I just thought should be done explicitly rather than making policy by
accident.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Simplifying "standby mode"

2006-09-12 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> Well it's never been a factor before so I'm not sure there is a
> policy. Is there now a policy that WAL files like database formats are
> as far as possible not going to be changed in minor versions?

> This means if there's a bug fix that affects WAL records the new point
> release will generally have to be patched to recognise the broken WAL
> records and process them correctly rather than simply generate
> corrected records. That could be quite a burden.

Let's see, so if we needed a bug fix that forced a tuple header layout
change or datatype representation change or page header change, your
position would be what exactly?

The project policy has always been that we don't change on-disk formats
in minor releases.  I'm not entirely clear why you are so keen on
carving out an exception for WAL data.

While I can imagine bugs severe enough to make us violate that policy,
our track record of not having to is pretty good.  And I don't see any
reason at all to suppose that such a bug would be more likely to affect
WAL (and only WAL) than any other part of our on-disk structures.

But having said all that, I'm not sure why we are arguing about it in
this context.  There was an upthread mention that we ought to recommend
using identical executables on master and slave PITR systems, and I
think that's a pretty good recommendation in any case, because of the
variety of ways in which you could screw yourself through configuration
differences.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Simplifying "standby mode"

2006-09-12 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> This would be the exception, not the rule, and should not be documented
> as if it were the rule.  It's not really different from telling people
> to expect a forced initdb at a minor release: you are simply
> misrepresenting the project's policy.

Well it's never been a factor before so I'm not sure there is a policy. Is
there now a policy that WAL files like database formats are as far as possible
not going to be changed in minor versions?

This means if there's a bug fix that affects WAL records the new point release
will generally have to be patched to recognise the broken WAL records and
process them correctly rather than simply generate corrected records. That
could be quite a burden.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Simplifying "standby mode"

2006-09-12 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> Simon Riggs <[EMAIL PROTECTED]> writes:
>>> My memory is lousy at the best of times, but when have we had a minor 
>>> release that would have broken this due to changed format?

>> Not often, which is why I mention the possibility of having
>> interoperating minor release levels at all. If it was common, I'd just
>> put a blanket warning on doing that.

> I don't know that it's happened in the past but I wouldn't be surprised.
> Consider that the bug being fixed in the point release may well be a bug in
> WAL log formatting. 

This would be the exception, not the rule, and should not be documented
as if it were the rule.  It's not really different from telling people
to expect a forced initdb at a minor release: you are simply
misrepresenting the project's policy.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Simplifying "standby mode"

2006-09-12 Thread Gregory Stark
Simon Riggs <[EMAIL PROTECTED]> writes:

>> My memory is lousy at the best of times, but when have we had a minor 
>> release that would have broken this due to changed format? OTOH, the 
>> Primary and Backup servers need the same config settings (e.g. 
>> --enable-integer-datetimes), architecture, compiler, etc, do they not? 
>> Probably working from an identical set of binaries would be ideal.
>
> Not often, which is why I mention the possibility of having
> interoperating minor release levels at all. If it was common, I'd just
> put a blanket warning on doing that.

I don't know that it's happened in the past but I wouldn't be surprised.

Consider that the bug being fixed in the point release may well be a bug in
WAL log formatting. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Simplifying "standby mode"

2006-09-12 Thread Simon Riggs
On Tue, 2006-09-12 at 13:25 -0400, Andrew Dunstan wrote:
> Simon Riggs wrote:
> >
> > In general, log shipping between servers running different release
> > levels will not be possible. However, it may be possible for servers
> > running different minor release levels e.g. 8.2.1 and 8.2.2 to
> > inter-operate successfully. No formal support for that is offered and
> > there may be minor releases where that is not possible, so it is unwise
> > to rely on that capability.

> My memory is lousy at the best of times, but when have we had a minor 
> release that would have broken this due to changed format? OTOH, the 
> Primary and Backup servers need the same config settings (e.g. 
> --enable-integer-datetimes), architecture, compiler, etc, do they not? 
> Probably working from an identical set of binaries would be ideal.

Not often, which is why I mention the possibility of having
interoperating minor release levels at all. If it was common, I'd just
put a blanket warning on doing that.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Simplifying "standby mode"

2006-09-12 Thread Andrew Dunstan

Simon Riggs wrote:


In general, log shipping between servers running different release
levels will not be possible. However, it may be possible for servers
running different minor release levels e.g. 8.2.1 and 8.2.2 to
inter-operate successfully. No formal support for that is offered and
there may be minor releases where that is not possible, so it is unwise
to rely on that capability.

  


My memory is lousy at the best of times, but when have we had a minor 
release that would have broken this due to changed format? OTOH, the 
Primary and Backup servers need the same config settings (e.g. 
--enable-integer-datetimes), architecture, compiler, etc, do they not? 
Probably working from an identical set of binaries would be ideal.


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Simplifying "standby mode"

2006-09-12 Thread Simon Riggs
On Wed, 2006-09-06 at 12:01 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > 1. Notes on restartable recovery

Previously submitted

> > 2. Notes on standby functionality
> > 3. discussion on rolling your own record-level polling using
> > pg_xlogfile_name_offset()

Given below, but not in SGML yet. Looking for general pointers/feedback
before I drop those angle-brackets in place.


Warm Standby Servers for High Availability 
==

Overview


Continuous Archiving can also be used to create a High Availability (HA)
cluster configuration with one or more Standby Servers ready to take
over operations in the case that the Primary Server fails. This
capability is more widely known as Warm Standby Log Shipping.

The Primary and Standby Server work together to provide this capability,
though the servers are only loosely coupled. The Primary Server operates
in Continuous Archiving mode, while the Standby Server operates in a
continuous Recovery mode, reading the WAL files from the Primary. No
changes to the database tables are required to enable this capability,
so it offers a low administration overhead in comparison with other
replication approaches. This configuration also has a very low
performance impact on the Primary server.

Directly moving WAL or "log" records from one database server to another
is typically described as Log Shipping. PostgreSQL implements file-based
Log Shipping, meaning WAL records are batched one file at a time. WAL
files can be shipped easily and cheaply over any distance, whether it be
to an adjacent system, another system on the same site or another system
on the far side of the globe. The bandwidth required for this technique
varies according to the transaction rate of the Primary Server.
Record-based Log Shipping is also possible with custom-developed
procedures, discussed in a later section. Future developments are likely
to include options for synchronous and/or integrated record-based log
shipping.

It should be noted that the log shipping is asynchronous, i.e. the WAL
records are shipped after transaction commit. As a result there can be a
small window of data loss, should the Primary Server suffer a
catastrophic failure. The window of data loss is minimised by the use of
the archive_timeout parameter, which can be set as low as a few seconds
if required. A very low setting can increase the bandwidth requirements
for file shipping.

The Standby server is not available for access, since it is continually
performing recovery processing. Recovery performance is sufficiently
good that the Standby will typically be only minutes away from full
availability once it has been activated. As a result, we refer to this
capability as a Warm Standby configuration that offers High
Availability. Restoring a server from an archived base backup and
rollforward can take considerably longer and so that technique only
really offers a solution for Disaster Recovery, not HA.

Other mechanisms for High Availability replication are available, both
commercially and as open-source software.  

In general, log shipping between servers running different release
levels will not be possible. However, it may be possible for servers
running different minor release levels e.g. 8.2.1 and 8.2.2 to
inter-operate successfully. No formal support for that is offered and
there may be minor releases where that is not possible, so it is unwise
to rely on that capability.

Planning


On the Standby server all tablespaces and paths will refer to similarly
named mount points, so it is important to create the Primary and Standby
servers so that they are as similar as possible, at least from the
perspective of the database server. Furthermore, any CREATE TABLESPACE
commands will be passed across as-is, so any new mount points must be
created on both servers before they are used on the Primary. Hardware
need not be the same, but experience shows that maintaining two
identical systems is easier than maintaining two dissimilar ones over
the whole lifetime of the application and system.

There is no special mode required to enable a Standby server. The
operations that occur on both Primary and Standby servers are entirely
normal continuous archiving and recovery tasks. The primary point of
contact between the two database servers is the archive of WAL files
that both share: Primary writing to the archive, Standby reading from
the archive. Care must be taken to ensure that WAL archives for separate
servers do not become mixed together or confused.

The magic that makes the two loosely coupled servers work together is
simply a restore_command that waits for the next WAL file to be archived
from the Primary. The restore_command is specified in the recovery.conf
file on the Standby Server. Normal recovery processing would request a
file from the WAL archive, causing an error if the file was unavailable.
For Standby processing it is normal for the next file to be unavailable,

Re: [HACKERS] Simplifying "standby mode"

2006-09-06 Thread Simon Riggs
On Sat, 2006-09-02 at 09:14 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > 
> > OK, I'll submit a C program called pg_standby so that we have an
> > approved and portable version of the script, allowing it to be
> > documented more easily.
> 
> I think we are still waiting for this.  I am also waiting for more PITR
> documentation to go with the recent patches.

Yup.

Likely to be completed by end of next week now, submitted in chunks:

1. Notes on restartable recovery
2. Notes on standby functionality
3. discussion on rolling your own record-level polling using
pg_xlogfile_name_offset()
4. pg_standby.c sample code
5. Reworking Marko Kreen's test harness as a example for contrib

Any other requests?

Timescale acceptable?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Simplifying "standby mode"

2006-09-06 Thread Bruce Momjian
Simon Riggs wrote:
> On Sat, 2006-09-02 at 09:14 -0400, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > 
> > > OK, I'll submit a C program called pg_standby so that we have an
> > > approved and portable version of the script, allowing it to be
> > > documented more easily.
> > 
> > I think we are still waiting for this.  I am also waiting for more PITR
> > documentation to go with the recent patches.
> 
> Yup.
> 
> Likely to be completed by end of next week now, submitted in chunks:
> 
> 1. Notes on restartable recovery
> 2. Notes on standby functionality
> 3. discussion on rolling your own record-level polling using
> pg_xlogfile_name_offset()

> 4. pg_standby.c sample code

I need #4 long before the end of _this_ week, or it is going to be
rejected for 8.2.  The documentation can be added even during beta,
though the earlier the better so it can be tested.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Simplifying "standby mode"

2006-09-02 Thread Bruce Momjian
Simon Riggs wrote:
> On Mon, 2006-08-07 at 11:37 -0400, Tom Lane wrote:
> > Simon Riggs <[EMAIL PROTECTED]> writes:
> > > If we are in standby mode, then rather than ending recovery we go into a
> > > wait loop. We poll for the next file, then sleep for 1000 ms, then poll
> > > again. When a file arrives we mark a restartpoint each checkpoint.
> > 
> > > We need the standby_mode to signify the difference in behaviour at
> > > end-of-logs, but we may not need a parameter of that exact name.
> > 
> > > The piece I have been puzzling over is how to initiate a failover when
> > > in standby_mode. I've not come up with a better solution than checking
> > > for the existence of a trigger file each time round the next-file wait
> > > loop. This would use a naming convention to indicate the port number,
> > > allowing us to uniquely identify a cluster on any single server. That's
> > > about as portable and generic as you'll get.
> > 
> > The original intention was that all this sort of logic was to be
> > external in the recovery_command script.  I'm pretty dubious about
> > freezing it in the C code when there's not yet an established
> > convention for how it should work.  I'd kinda like to see a widely
> > accepted recovery_command script before we move the logic inside
> > the server.
> 
> OK, I'll submit a C program called pg_standby so that we have an
> approved and portable version of the script, allowing it to be
> documented more easily.

I think we are still waiting for this.  I am also waiting for more PITR
documentation to go with the recent patches.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Simplifying "standby mode"

2006-08-07 Thread Simon Riggs
On Mon, 2006-08-07 at 11:37 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > If we are in standby mode, then rather than ending recovery we go into a
> > wait loop. We poll for the next file, then sleep for 1000 ms, then poll
> > again. When a file arrives we mark a restartpoint each checkpoint.
> 
> > We need the standby_mode to signify the difference in behaviour at
> > end-of-logs, but we may not need a parameter of that exact name.
> 
> > The piece I have been puzzling over is how to initiate a failover when
> > in standby_mode. I've not come up with a better solution than checking
> > for the existence of a trigger file each time round the next-file wait
> > loop. This would use a naming convention to indicate the port number,
> > allowing us to uniquely identify a cluster on any single server. That's
> > about as portable and generic as you'll get.
> 
> The original intention was that all this sort of logic was to be
> external in the recovery_command script.  I'm pretty dubious about
> freezing it in the C code when there's not yet an established
> convention for how it should work.  I'd kinda like to see a widely
> accepted recovery_command script before we move the logic inside
> the server.

OK, I'll submit a C program called pg_standby so that we have an
approved and portable version of the script, allowing it to be
documented more easily.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Simplifying "standby mode"

2006-08-07 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> If we are in standby mode, then rather than ending recovery we go into a
> wait loop. We poll for the next file, then sleep for 1000 ms, then poll
> again. When a file arrives we mark a restartpoint each checkpoint.

> We need the standby_mode to signify the difference in behaviour at
> end-of-logs, but we may not need a parameter of that exact name.

> The piece I have been puzzling over is how to initiate a failover when
> in standby_mode. I've not come up with a better solution than checking
> for the existence of a trigger file each time round the next-file wait
> loop. This would use a naming convention to indicate the port number,
> allowing us to uniquely identify a cluster on any single server. That's
> about as portable and generic as you'll get.

The original intention was that all this sort of logic was to be
external in the recovery_command script.  I'm pretty dubious about
freezing it in the C code when there's not yet an established
convention for how it should work.  I'd kinda like to see a widely
accepted recovery_command script before we move the logic inside
the server.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Simplifying "standby mode"

2006-08-07 Thread Simon Riggs
On Mon, 2006-08-07 at 09:48 -0400, Tom Lane wrote:
> I'm in process of reviewing the restartable-recovery patch,
> http://archives.postgresql.org/pgsql-patches/2006-07/msg00356.php
> and I'm wondering if we really need to invent a "standby mode" boolean
> to get the right behavior.  The problem I see with that flag is that
> it'd be static over a run, whereas the behavior we want is dynamic.
> It seems entirely likely that a slave will be started from a base backup
> that isn't quite current, and will need to run through some archived WAL
> segments quickly before it catches up to the master.  So during the
> catchup period we'd prefer that it not do restartpoints one-for-one
> with the logged checkpoints, whereas after it's caught up, that's what
> we want.

That's a great observation. It also ties in neatly with the last piece
of function I've been trying to add.

Let's have it run at full speed, i.e. restartpoint every 100 checkpoints
up until we hit end-of-logs, then if we are not in standby_mode the
recovery will just end. [Also: Currently, we do not retry a request for
a archive file during recovery, though for balance with archive we
should retry 3 times.]

If we are in standby mode, then rather than ending recovery we go into a
wait loop. We poll for the next file, then sleep for 1000 ms, then poll
again. When a file arrives we mark a restartpoint each checkpoint.

We need the standby_mode to signify the difference in behaviour at
end-of-logs, but we may not need a parameter of that exact name.

The piece I have been puzzling over is how to initiate a failover when
in standby_mode. I've not come up with a better solution than checking
for the existence of a trigger file each time round the next-file wait
loop. This would use a naming convention to indicate the port number,
allowing us to uniquely identify a cluster on any single server. That's
about as portable and generic as you'll get.

We could replace the standby_mode with a single parameter to indicate
where the trigger file should be located.

This is then the last piece in the standby server puzzle.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Simplifying "standby mode"

2006-08-07 Thread Tom Lane
I'm in process of reviewing the restartable-recovery patch,
http://archives.postgresql.org/pgsql-patches/2006-07/msg00356.php
and I'm wondering if we really need to invent a "standby mode" boolean
to get the right behavior.  The problem I see with that flag is that
it'd be static over a run, whereas the behavior we want is dynamic.
It seems entirely likely that a slave will be started from a base backup
that isn't quite current, and will need to run through some archived WAL
segments quickly before it catches up to the master.  So during the
catchup period we'd prefer that it not do restartpoints one-for-one
with the logged checkpoints, whereas after it's caught up, that's what
we want.

I'm thinking that we could instead track the actual elapsed time since
the last restartpoint, and do a restartpoint when we encounter a
checkpoint WAL record and the time since the last restartpoint is
at least X.  I'd be inclined to just use checkpoint_timeout for X,
although perhaps there's an argument to be made for making it
separately settable.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org