Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-12 Thread Simon Riggs
Hannu Krosing
 Josh Berkus kirjutas T, 09.03.2004 kell 19:46:
  In my personal experience, the *primary* use of PITR is recovery
from
 User
  Error.   For example, with one SQL Server 7.0 installation for a law
 firm,
  I've made use of PITR 4 times over the last 4 years: once was
because
 and HDD
  failed, the other three were all becuase of IT dept. staff running
  unconstrained UPDATE queries against the back end.   For recovery
with
  minimal loss of data, there are existing solutions, such as
replication
  servers, in addition to PITR; for recovery from User Error, only
PITR
 will
  suffice.
 
 Actually PostgreSQL used to have very good support for this until some
 time in Postgres95 development by supporting additional temporal
 qualifiers for queries. I.e. one could ask for 'SELECT SALARY FROM
 EMPLOYEES AS IT WAS AT YESTERDAY NOON' (the syntax was not exacly this
 :)

Shame we can't run cc -as was 4 years ago.
Or even better cc -as will be in 3 weeks; that would save me loads :)
!!

 It was a very simple and logical result of PostgreSQL's MVCC storage
and
 was supported by VACUUM allowing dropping only deleted tuples older
than
 some specified time.
 
 Oracle has recently added something similar (using their WAL's) to
ver.
 9.0 or 10.x of their DBMS exactly for recovery from user errors.
 
 The support for this was dropped from postgreSQL citing performance
 reasons at that time, but I still hope that it can restored some time.

Flashback query is a new feature in Oracle 9i. I believe it is regarded
with some horror by the DBA community...

I get your idea though, though I think it is a different thing.

PITR is about the avoidance of risk, not really about fixing any
particular classes of problem. If you have PITR you can recover from
all problems, frequent or not, depending upon how carefully and for
how long you protect your backups.

Reading old MVCC copies won't take that away, though is a feature that
would be useful within a particular time window. It's also hard to tell
whether VACUUM has been run, and if so on which tables, since that will
also change the answer you get from those MVCC-usage type queries. 

Oracle recognise this also. Flashback hasn't replaced backup/restore.
Neither has it prevented them from enhancing log miner.

Temporal support is a different issue anyway. It is up to you to come up
with a database design that supports being able to ask that question, if
that is a business requirement.

Best regards, Simon Riggs



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-12 Thread Robert Treat
On Tuesday 09 March 2004 17:38, Simon Riggs wrote:
 Richard Huxton
 
  On Monday 08 March 2004 23:28, Simon Riggs wrote:
   PITR Functional Design v2 for 7.5
   Review of current Crash Recovery
 
  Is there any value in putting this section on techdocs or similar? We

 do

  get a
  small but steady trickle of people asking for details on internals,

 and I

  think this covers things in a different way to the WAL section of the
  manuals.

 Certainly, though I would like to do all of that after it actually
 works!


Just getting caught up on this thread and had similar thoughts as to Richards.  
If there are no objections, I'd like to put the first part of this email up 
on techdocs as an explination of our current crash recovery system. 

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Andreas Pflug
Tom Lane wrote:

Andreas Pflug [EMAIL PROTECTED] writes:
 

When I'm doing a file level hot backup, I can't be sure about the backup 
order. To be sure the cluster is in a consistent state regarding 
checkpoints, pg_clog must be the first directory backed up.
   

You are going off in the wrong direction entirely.

Any hot-backup design that thinks safety can be ensured by back up file A
before file B considerations is wrong.  That's because Postgres doesn't
necessarily dump dirty blocks into the data area (or clog area) at any
particular time.  Therefore, a filesystem-level backup taken while the
postmaster is running is basically certain to be inconsistent.  You can
*not* avoid this by being careful about the order you dump files in.
Heck, you can't even be certain that a file you dump is internally
consistent.
 

Maybe my wording was misleading, seems Simon understood me as int was meant.
With consistent state regarding checkpoints I meant that all 
transactions that are marked as committed with the checkpoint are really 
present in the data files. Of course, there might be even more 
transactions which haven't been checkpointed so far, they'll need WAL 
replay.
To clarify:
I'd expect a cluster to be workable, if I
- disable VACUUM until backup completed
- issue CHECKPOINT
- backup clog (CHECKPOINT and backup clog are the backup checkpoint)
- backup all datafiles (which include at least all completed transaction 
data at checkpoint time)
and then
- restore datafiles and clog
- bring up pgsql.
Certainly, all transactions after the backup checkpoint are lost. There 
might be fragments of newer transactions in data files, but they were 
never committed according to clog and thus rolled back.
WAL replay would add more completed transactions, making the cluster 
more up-to-date, but omitting this would be sufficient in many desaster 
recovery scenarios.
Did I miss something? If so, not only an API to get WAL data ordered out 
of pgsql is needed, but for the whole cluster.

Regards,
Andreas


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Tom Lane
Andreas Pflug [EMAIL PROTECTED] writes:
 To clarify:
 I'd expect a cluster to be workable, if I
 - disable VACUUM until backup completed
 - issue CHECKPOINT
 - backup clog (CHECKPOINT and backup clog are the backup checkpoint)
 - backup all datafiles (which include at least all completed transaction 
 data at checkpoint time)
 and then
 - restore datafiles and clog
 - bring up pgsql.

Why is that a useful approach?  You might as well shut down the
postmaster and do a cold filesystem backup, because you are depending on
the data files (including clog) not to change after the checkpoint.  You
cannot make such an assumption in a running database.

Preventing VACUUM does not help btw.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Simon Riggs
Josh Berkus [mailto:[EMAIL PROTECTED]
  wal_archive_policy and enable/disable archiving accordingly. This
  parameter can only be changed at server start. (This is required
because
  the initial step of archiving each xlog is performed by the backend;
if
  this were changeable after boot, then it might be possible for an
  individual backend to override the wal_archive_policy and choose not
to
  archive - which would then effect the whole system and all users,
not
  just the user making that choice). It is considered less desirable
to
 
 Let me voice a real-world exception to this policy.   Imagine that you
are
 running an OLAP or decision-support database that analyzes data coming
 from
 an external source.   Once a day you load 250MB of data via COPY and
then
 does transformations on that data.   While doing the load, you do
*not*
 want
 the archiver running, as it would quickly fill up the WAL partition
and
 backlog the archive tape.
 Under the proposed PITR spec, the only way to handle this would be to:
 1) Full back up
 2) Shut down PG
 3) Restart PG without archiving
 4) Load the data
 5) Shut down PG again
 6) Restart PG with archiving
 7) Full back-up again.
 DBAs would like it much more if starting/stopping the archiver was
 possible
 via a superuser (not regular user) GUC.This would allow a much
faster
 cycle:
 1) Full back up
 2) Stop archiving
 3) Load the data
 4) Restart archiving
 5) Full back-up

The scenario you mention is what I'd like to do, but don't just yet see
how.

I'd welcome input on this point, since I don't fully understand GUCs:

Thinking about this:
1. Since the backends run XLogArchiveNotify(), they must all do so
identically. One slip invalidates all the work of all the others. 
GUC Options are:
INTERNAL - not appropriate
POSTMASTER - what I originally envisaged, but not what you want
SIGHUP - seems to allow different parameter settings in each backend
BACKEND - not appropriate
SUSET - maybe what you're looking for???
USERLIMIT - no
USERSET - absolutely no
 
2. Maybe have Postmaster run something every so often that looks for
full xlogs and then executes XLogArchiveNotify() for them?

Thoughts anyone?

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Simon Riggs
 From: Josh Berkus [mailto:[EMAIL PROTECTED]
  SIGHUP - seems to allow different parameter settings in each backend
 
 Nope.   SIGHUP means that you need to send a HUP to the postmaster,
such
 as
 you would with changes to pg_hba.conf.
 
  SUSET - maybe what you're looking for???
 
 Yes.   This means that it can be changed, at runtime, but by the
Superuser
 only.  This is used for several settings which are possible to change
at
 runtime but take effect system-wide.

SUSET it is then. I'll update the design doc

Thanks, Simon


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Zeugswetter Andreas SB SD

 The only way we can support file-level hot backup is in conjunction with
 PITR-style WAL log archiving.  It is okay for the data area dump to be
 inconsistent, so long as your recovery process includes replay of WAL
 starting at some checkpoint before the filesystem dump started, and
 extending to some point after the filesystem dump finished.  Replaying
 WAL will correct the inconsistencies.

And the last checkpoint info resides in pg_control, and not in pg_clog, no ?
So basically a PITR restore would need to adjust the pg_control file
after filesystem restore and before starting recovery. Maybe it can take that 
info from the oldest available WAL ? The OP would only need to ensure,
that only such logs that need to be rolled forward are visible (in the 
correct directory) to the recovery.

Andreas

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Tom Lane
Andreas Pflug [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Why is that a useful approach?  You might as well shut down the
 postmaster and do a cold filesystem backup, 
 
 We're talking about *hot* backup, aren't we?

Exactly.  The approach you're sketching can't work for hot backup,
because it effectively assumes that the database won't be changing.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Andreas Pflug
Tom Lane wrote:

Andreas Pflug [EMAIL PROTECTED] writes:
 

Tom Lane wrote:
   

Why is that a useful approach?  You might as well shut down the
postmaster and do a cold filesystem backup, 

 

We're talking about *hot* backup, aren't we?
   

Exactly.  The approach you're sketching can't work for hot backup,
because it effectively assumes that the database won't be changing.
 

Well in the first place my posting was to express my suspicion that WAL 
replay relies on clog/pg_control being accurate, i.e. transactions 
marked as flushed must be on disk. AFAICS this is the consequence of WAL 
replay implementation. In case of hot backup, this means that data files 
must not be older than clog. Do you agree? So PITR needs a mechanism to 
insure this at backup time.

Next question would be: If the point in time I'd like to recover is that 
very backup checkpoint time, do I need xlog at all?

Regards,
Andreas


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Andreas Pflug
Tom Lane wrote:

Andreas Pflug [EMAIL PROTECTED] writes:
 

To clarify:
I'd expect a cluster to be workable, if I
- disable VACUUM until backup completed
- issue CHECKPOINT
- backup clog (CHECKPOINT and backup clog are the backup checkpoint)
- backup all datafiles (which include at least all completed transaction 
data at checkpoint time)
and then
- restore datafiles and clog
- bring up pgsql.
   

Why is that a useful approach?  You might as well shut down the
postmaster and do a cold filesystem backup, 

We're talking about *hot* backup, aren't we?

Regards,
Andreas


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Zeugswetter Andreas SB SD

  To clarify:
  I'd expect a cluster to be workable, if I
  - disable VACUUM until backup completed
  - issue CHECKPOINT
  - backup clog (CHECKPOINT and backup clog are the backup checkpoint)
  - backup all datafiles (which include at least all completed transaction 
  data at checkpoint time)
  and then
  - restore datafiles and clog
  - bring up pgsql.
 
 Why is that a useful approach?  You might as well shut down the
 postmaster and do a cold filesystem backup, because you are depending on
 the data files (including clog) not to change after the checkpoint.  You
 cannot make such an assumption in a running database.

I think there is a misunderstanding here. 

What I think is possible is the following (continuous backup of WAL assumed):
- disable VACUUM
- issue CHECKPOINT C1
- backup all files
- reenable VACUUM

- restore files
- adapt pg_control (checkpoint C1)
- recover WAL until at least end of backup

The db is inconsistent until you recovered all WAL (PITR) that accumulated during
file backup. 

I am not sure about clog, isn't clog logged in xlog ?

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Hannu Krosing
Josh Berkus kirjutas T, 09.03.2004 kell 19:46:

 In my personal experience, the *primary* use of PITR is recovery from User 
 Error.   For example, with one SQL Server 7.0 installation for a law firm, 
 I've made use of PITR 4 times over the last 4 years: once was because and HDD 
 failed, the other three were all becuase of IT dept. staff running 
 unconstrained UPDATE queries against the back end.   For recovery with 
 minimal loss of data, there are existing solutions, such as replication 
 servers, in addition to PITR; for recovery from User Error, only PITR will 
 suffice.

Actually PostgreSQL used to have very good support for this until some
time in Postgres95 development by supporting additional temporal
qualifiers for queries. I.e. one could ask for 'SELECT SALARY FROM
EMPLOYEES AS IT WAS AT YESTERDAY NOON' (the syntax was not exacly this
:)

It was a very simple and logical result of PostgreSQL's MVCC storage and
was supported by VACUUM allowing dropping only deleted tuples older than
some specified time.

Oracle has recently added something similar (using their WAL's) to ver.
9.0 or 10.x of their DBMS exactly for recovery from user errors.

The support for this was dropped from postgreSQL citing performance
reasons at that time, but I still hope that it can restored some time.

--
Hannu

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-10 Thread Tom Lane
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes:
 What I think is possible is the following (continuous backup of WAL assumed):
 - disable VACUUM
 - issue CHECKPOINT C1
 - backup all files
 - reenable VACUUM

 - restore files
 - adapt pg_control (checkpoint C1)
 - recover WAL until at least end of backup

I do not understand this fixation on disable VACUUM.  What are you
thinking that will buy you?  AFAICS it would make no difference.

 I am not sure about clog, isn't clog logged in xlog ?

Right.  For the purposes of PITR, clog behaves the same as regular
data files.  You gotta back it up, but a filesystem copy will be
inconsistent until fixed by WAL replay.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-09 Thread Richard Huxton
On Monday 08 March 2004 23:28, Simon Riggs wrote:
 PITR Functional Design v2 for 7.5

Blimey - that's a long post :-) Thanks for explaining things simply enough 
that a non-hacker like me can understand. Well - almost understand ;-)

 Review of current Crash Recovery

Is there any value in putting this section on techdocs or similar? We do get a 
small but steady trickle of people asking for details on internals, and I 
think this covers things in a different way to the WAL section of the 
manuals.

 PITR Proposed Solution

 To allow this to occur, the full backup *must* occur while the database
 is open or hot. This backup must include all data and clogs (and any
 tablespaces or logical links utilised). A continuous sequence of xlogs
 must also be available, stretching from the last checkpoint prior to the
 start of the backup through to whatever time is specified for the
 recovery point or until the end of the xlogs.

So this is a standard cp/tar etc while the cluster is actually in use?


 XLogArchiveXlogs() returns a single XLOG filename, or NULL

 If an xlog file is waiting to be archived, then the archiver will
 discover
 the name of the xlog by using this API call. If more than one file is
 available to be archived, then it will be ignored. If the archiver is
 multi-threaded, it need not wait until it has executed
 XLogArchiveComplete
 before it executes XLogArchiveXlogs again.

So this means:
1. The archiver is responsible for noticing that it is already archiving the 
filename returned (if it repeats the call too quickly).
2. The archiver can only ever archive one XLOG file at a time.


 The initial proposal is a simple scheme that uses file existence  file
 extension to pass information between PostgreSQL and the archiver. This
 would take place in a peer directory of pg_xlog and pg_clog which has
 been named the pg_rlog directory. (r as in the strong first syllable
 ar in English pronunciation of archive)

Any reason why not pg_pitr or pg_pitr_log?

 1.2 pg_arch: simple xlog archiving tool

 Src/tools/ will add:
 pg_arch.c
   a single-threaded program that uses libpgarch.c to use API, but
 offers
 a simple copy facility from pg_xlog to another directory. The program
 will
 continue to wait and watch for archived files: it is not a file-filter
 type
 of program. It may be run as a foreground process (for testing etc),
 though
 is also designed to be run as a background process, typically executed
 at
 the same time as postmaster startup (through a mechanism such as service
 autostart mechanisms following system boot).
   pg_arch has two parameters:
   -D data-file root for particular instance of PostgreSQL
   -A archive directory

Does the specification of these parameters (and any others) need to be part of 
the API? I'm thinking about the ability to drop in different archivers with 
each using the same pre-defined settings.

 2. Recovery to Point-in-Time (RPIT)

 Recovery to will offer these options:

 2.1 Recovery to end of logs (last time)
 2.2 Recovery of all available on-line logs
 2.3 Point in time recovery to the checkpoint AT or the last checkpoint
 before the time specified.

 The administrator is expected to be responsible for placing archived
 xlogs back into the pg_xlog directory. This may be a facility provided
 by the external archiver, a manual or other automated process. If any
 mistakes are made at this point then the administrator can then reselect
 appropriate xlogs and try again. There is no enforced limit to the
 number of recovery attempts possible.

Just to clarify:
1. I can identify which XLOG files I need based on their timestamp?
2. Can I force a checkpoint using standard PG client APIs? So I can do close 
weekly payroll, force checkpoint.
3. We're restoring an entire cluster here, not just one database? How 
difficult would it be to strip out information for a single db - I'm thinking 
about the case where you may have limited backup storage and want to save an 
orders db but not a catalogue db. Or perhaps a hosting company with 
platinum customers getting PITR.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-09 Thread Josh Berkus
Simon,

First off, let me compliment you on such a thourough proposal.  I'm feeling 
very enthusiastic about 7.5 PITR based on the amount of thought you've given 
the project.

Based on throuroughness, I wanted to make a few comments and suggestions.   
All of the below are in the category of extras it would be nice to have but 
are not essential to implementing PITR.   Possibly they are all features to 
consider for 7.6/8.0/whatever; the next version.   These comments are based 
on my personal experience as a professional contract DBA for PostgreSQL, MS 
SQL Server and SQL Anywhere.

 PITR features are designed to extend the existing Crash Recovery
 features so
 that a recovery can take place in situations where a crash recovery
 would
 not have been possible. These situations are:

In my personal experience, the *primary* use of PITR is recovery from User 
Error.   For example, with one SQL Server 7.0 installation for a law firm, 
I've made use of PITR 4 times over the last 4 years: once was because and HDD 
failed, the other three were all becuase of IT dept. staff running 
unconstrained UPDATE queries against the back end.   For recovery with 
minimal loss of data, there are existing solutions, such as replication 
servers, in addition to PITR; for recovery from User Error, only PITR will 
suffice.

 There are a wide range of Backup and Recovery (BAR) products on the
 market, both open source and commercially licensed programs that provide
 facilities to perform full physical backups and individual file
 archives. The best way to foster wide adoption of PostgreSQL is to allow
 it to work in conjunction with any of these products. To this end, a

Very perceptive of you.   Good idea!

 wal_archive_policy and enable/disable archiving accordingly. This
 parameter can only be changed at server start. (This is required because
 the initial step of archiving each xlog is performed by the backend; if
 this were changeable after boot, then it might be possible for an
 individual backend to override the wal_archive_policy and choose not to
 archive - which would then effect the whole system and all users, not
 just the user making that choice). It is considered less desirable to

Let me voice a real-world exception to this policy.   Imagine that you are 
running an OLAP or decision-support database that analyzes data coming from 
an external source.   Once a day you load 250MB of data via COPY and then 
does transformations on that data.   While doing the load, you do *not* want 
the archiver running, as it would quickly fill up the WAL partition and 
backlog the archive tape.
Under the proposed PITR spec, the only way to handle this would be to:
1) Full back up
2) Shut down PG
3) Restart PG without archiving
4) Load the data
5) Shut down PG again
6) Restart PG with archiving
7) Full back-up again.
DBAs would like it much more if starting/stopping the archiver was possible 
via a superuser (not regular user) GUC.This would allow a much faster 
cycle:
1) Full back up
2) Stop archiving
3) Load the data
4) Restart archiving
5) Full back-up

Related to the above, what I don't see in your paper or the proposed API is a 
way to coordinate full back-ups and WAL archiving.   Obviously, the PITR 
Archive is only useful in reference to an existing full backup, so it is 
important to be able to associate a set of PITR archives with a particular 
full backup, or with some kind of backup checkpoint.   I'm sure that you 
have a solution for this, I just didn't see it explained in your proposal, or 
didn't understand it.

FWIW, I find the MSSQL PITR system awkward in the extreme and harrowing in its 
unreliability.   So it's not a good model to copy 

 There is no requirement for the archiver to halt when PostgreSQL shuts
 down, though may choose to do so or not, e.g. it may be desirable to
 have one archiver operate for multiple postmasters simultaneously. The

I see that you've chosen the One archiver, many databases/clusters 
architecture.   I can also see how this strategy will be easier than the 
many archivers strategy.   Be prepared that, based on the needs of DBAs, 
you will get the following requests:
A) Will it be possible to have the archiver process run on a seperate machine 
from PostgreSQL and access it over the network, via NFS or some other means?
B) Will it be possible to define multiple output streams, so that database X 
and be archived to device Y and database N to device M?

 The out of space condition could therefore occur in two ways:
 1. there is a single delay during which xlog filesystem fills
 2. there could be a systematic delay which builds slowly until the xlog
 filesystem fills

Given how PITR, and Tablespaces, both substantially increase the risk of 
running out of space on the xlog partition(s), it would be very nice to be 
able to arrange a WARNING whenever any PostgreSQL disk resource drops below a 
pre-defined percentage of availability.This could be done through a 
simple 

Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-09 Thread Andreas Pflug
Josh Berkus wrote:

5) Full back-up

Related to the above, what I don't see in your paper or the proposed API is a 
way to coordinate full back-ups and WAL archiving.   Obviously, the PITR 
Archive is only useful in reference to an existing full backup, so it is 
important to be able to associate a set of PITR archives with a particular 
full backup, or with some kind of backup checkpoint.   I'm sure that you 
have a solution for this, I just didn't see it explained in your proposal, or 
didn't understand it.

As far as I understand , full backup in the sense of pgsql means all 
data files including c_log where all transactions before the checkpoint 
are completely written to the data files. AFAICS there is a small detail 
missing so far.

When I'm doing a file level hot backup, I can't be sure about the backup 
order. To be sure the cluster is in a consistent state regarding 
checkpoints, pg_clog must be the first directory backed up. If this 
isn't made sure, the situation could arise that the backed up clog 
version contains a checkpoint which marks a transaction completed that 
has been written to a file which was backed up earlier than the data 
write took place.

This could be insured by doing the backup in two steps; first backing up 
pg_clog, and then the rest, restore being performed in the opposite 
order. But this seems to be not too fail safe, what if the admin doesn't 
know this/forgot about it? So IMHO a mechanism insuring this would be 
better. I could think of a solution where a second pg_clog directory is 
used, and a pgsql api for that which  is called right before performing 
the file backup. Josh calls this second pg_clog the backup checkpoint.

At the moment, a restart is done from clog + WAL, where clog might be 
too new in a hot backup situation as mentioned above. There should be a 
second pgsql restart mode, where checkpoints are not taken from that 
current clog, but the backup checkpoint clog which was created 
explicitely at backup time. This is somewhat similar to MSSQL's backup 
behaviour, where the transaction log (=WAL) is growing until a full 
backup has been performed successfully.

Regards,
Andreas


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-09 Thread Simon Riggs
Andreas Pflug
 Josh Berkus wrote:
 Related to the above, what I don't see in your paper or the proposed
API
 is a
 way to coordinate full back-ups and WAL archiving.   Obviously, the
PITR
 Archive is only useful in reference to an existing full backup, so it
is
 important to be able to associate a set of PITR archives with a
 particular
 full backup, or with some kind of backup checkpoint.   I'm sure
that
 you
 have a solution for this, I just didn't see it explained in your
 proposal, or
 didn't understand it.
 
AFAICS there is a small detail
 missing so far.
 
 When I'm doing a file level hot backup, I can't be sure about the
backup
 order. To be sure the cluster is in a consistent state regarding
 checkpoints, pg_clog must be the first directory backed up. If this
 isn't made sure, the situation could arise that the backed up clog
 version contains a checkpoint which marks a transaction completed that
 has been written to a file which was backed up earlier than the data
 write took place.
 
 This could be insured by doing the backup in two steps; first backing
up
 pg_clog, and then the rest, restore being performed in the opposite
 order. 

Good spot. I'll add this to the design.

Will think more on the backup checkpoint. Don't let me off the hook...

Best Regards, Simon 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-09 Thread Josh Berkus
Simon,

 SIGHUP - seems to allow different parameter settings in each backend

Nope.   SIGHUP means that you need to send a HUP to the postmaster, such as 
you would with changes to pg_hba.conf.

 SUSET - maybe what you're looking for???

Yes.   This means that it can be changed, at runtime, but by the Superuser 
only.  This is used for several settings which are possible to change at 
runtime but take effect system-wide.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-09 Thread Simon Riggs
Richard Huxton
 On Monday 08 March 2004 23:28, Simon Riggs wrote:
  PITR Functional Design v2 for 7.5
  Review of current Crash Recovery
 
 Is there any value in putting this section on techdocs or similar? We
do
 get a
 small but steady trickle of people asking for details on internals,
and I
 think this covers things in a different way to the WAL section of the
 manuals.

Certainly, though I would like to do all of that after it actually
works!

  PITR Proposed Solution
 
  To allow this to occur, the full backup *must* occur while the
database
  is open or hot. This backup must include all data and clogs (and
any
  tablespaces or logical links utilised). A continuous sequence of
xlogs
  must also be available, stretching from the last checkpoint prior to
the
  start of the backup through to whatever time is specified for the
  recovery point or until the end of the xlogs.
 
 So this is a standard cp/tar etc while the cluster is actually in use?

Yes. I will add a line in to clarify that.

  XLogArchiveXlogs() returns a single XLOG filename, or NULL
 
  If an xlog file is waiting to be archived, then the archiver will
  discover
  the name of the xlog by using this API call. If more than one file
is
  available to be archived, then it will be ignored. If the archiver
is
  multi-threaded, it need not wait until it has executed
  XLogArchiveComplete
  before it executes XLogArchiveXlogs again.
 
 So this means:
 1. The archiver is responsible for noticing that it is already
archiving
 the
 filename returned (if it repeats the call too quickly).
 2. The archiver can only ever archive one XLOG file at a time.

1. No: I notice I missed a line saying
 XLogArchiveXlogs() in section 1.1.3 (corrected).
Clarification: The archiver will not need to keep track of whether it is
already archiving the same file (though sounds reasonable programming to
do so anyway). The API call will never return the same log file twice to
this call (by definition). That is implemented in my proposal by
renaming the rlog entry to .busy, so it wont show up on subsequent
calls.
2. 
a) There is no restriction on threading in the archiver; it can if it
wishes archive many files simultaneously. Since PostgreSQL produces them
one at a time, this implies a build up of xlogs, which is specifically
not encouraged. An archiver would be encouraged to multi-thread to avoid
peaks of demand where the archive process was occurring slower than
xlogs were being written.
b) The reference implementation won't be multi-threaded in its first
incarnation (if I write it!!!be my guest, you have the API
definition).

You have also made me realise another failure condition which I have
also added, todo with a failure of the copy process after this API call.

  The initial proposal is a simple scheme that uses file existence 
file
  extension to pass information between PostgreSQL and the archiver.
This
  would take place in a peer directory of pg_xlog and pg_clog which
has
  been named the pg_rlog directory. (r as in the strong first syllable
  ar in English pronunciation of archive)
 
 Any reason why not pg_pitr or pg_pitr_log?

None. I like pg_pitr...
Let's wait for other feedback to come in...

  1.2 pg_arch: simple xlog archiving tool
 
 Does the specification of these parameters (and any others) need to be
 part of
 the API? I'm thinking about the ability to drop in different
archivers
 with
 each using the same pre-defined settings.

Those parameters ARE NOT part of the API. The parameters mentioned are
command line switches on the simple external archiving program pg_arch.

pg_arch is intended to be a simple archiver-side testing tool. It makes
sense to make it available also. Basically, you can do whatever you like
on the archiver side of the API...contrib beckons

  2. Recovery to Point-in-Time (RPIT)
 Just to clarify:
 1. I can identify which XLOG files I need based on their timestamp?
 2. Can I force a checkpoint using standard PG client APIs? So I can do
 close
 weekly payroll, force checkpoint.
 3. We're restoring an entire cluster here, not just one database? How
 difficult would it be to strip out information for a single db - I'm
 thinking
 about the case where you may have limited backup storage and want to
save
 an
 orders db but not a catalogue db. Or perhaps a hosting company with
 platinum customers getting PITR.

1. Yes, the external timestamp gives that I think. Checking detail...
2. CHECKPOINT is a PostgreSQL SQL command which can be executed from any
client. Yes, your scenario fits.
3. I tried to avoid that issue, but it rears its head. You seem to be
specifying what you want though, so I'll have a think.

More response required on 1  3...later!

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] PITR Functional Design v2 for 7.5

2004-03-09 Thread Simon Riggs
Josh Berkus [mailto:[EMAIL PROTECTED]
 First off, let me compliment you on such a thorough proposal.  I'm
 feeling very enthusiastic about 7.5 PITR 

Thank you, though please realise that I am in many ways summarising a
wide range of suggestions and earlier work into a coherent whole.

Me too! I'm trying to pace myself through release and into subsequent
maintenance of the new features.

Best Regards, Simon



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] PITR Functional Design v2 for 7.5

2004-03-08 Thread Simon Riggs
PITR Functional Design v2 for 7.5
[EMAIL PROTECTED]

Currently, PostgreSQL provides Crash Recovery but not yet full Point In
Time
Recovery (PITR). The following document provides a design which enhances
the existing robustness features to include full PITR. Since one of the
primary objectives for PITR is robustness, this design is provided in
advance of patches to allow those features and behaviours to be
subjected to the rigours of [HACKERS] before final coding is attempted.
We're really not that far now from making this work, hence the attention
on up-front planning.

Thanks for your comments, Best Regards, Simon Riggs, 2nd Quadrant

 
Review of current Crash Recovery

Crash recovery is catered for by the use of WAL logging, or xlogs. xlogs
are
written to disk immediately before a transaction is acknowledged as
committed. xlogs contain REDO information sufficient to rollforward any
changes from a known starting position. The known starting position is
also recorded by keeping track of which transactions have completed in a
file structure known as the clog. Clogs are also written to disk as
transactions commit.

The changed data pages are not written immediately back to disk. They do
not
need to be because of the entries in the xlog and clog taken together
are
sufficient to recover from a crash.  Every so often a full checkpoint
process is created that will perform a full synchronisation of changed
(or
dirty) data pages back to disk. When a checkpoint is complete it will
write the last transaction id to the xlog as a marker, and it will trim
the
clog files to the last transaction id. The frequency of checkpoints is
controllable. Changed data pages are written back to disk as a
background
process called the bg_writer (or lazy writer), reducing the effect of
checkpoints on busy workloads.

In crash recovery, the database files are presumed to be intact, but not
necessarily up to date. When postmaster comes up again, it checks clog
to
discover what the last checkpointed transaction id was. Using this, it
then
scans through the available xlog files to the marker written by the
checkpoint at that time. Following REDO entries are then reapplied to
the
data pages as far as possible until the system is brough to the best
available point.

If the appropriate xlogs are not available, no recovery is possible.

Following initdb, there will be at least 1 xlog. As new data is written
to
xlog, new files will be allocated as required. As a result of
checkpointing,
there will be a time when xlogs are no longer required for crash
recovery.
At each checkpoint, if there is an xlog that is no longer required the
last
one will be recycled or removed. Xlogs will be recycled back to the
front
of the queue, so that we do not need to delete and create files
constantly.
A certain maximum number of files will be kept as preallocated logs;
this limit is controllable. When the limit is reached, xlogs will be
removed rather than being recycled. As a result, the number of xlogs may
vary considerably over time, but mostly they will cycle around
maintaining roughly steady state number of xlogs, therefore with
predictably constant space utilisation.

If an xlog cannot be written because the space available is full then
the
transaction that depended upon the xlog write will not be able to
commit,
nor will any subsequent transactions until the space situation
alleviates.
Currently, this imposes a limit in the size of any transaction based
upon
the available diskspace in the pg_xlog directory.

Xlogs are relatively high volume, clogs are relatively low. An out of
space
condition on clog is typically unlikely.

Failure analysis:
- If a transaction fails, no changes will be committed to xlog and the
clog
entry will show the transaction aborted. 
- If a transaction succeeds, its changes are committed to xlog and the
clog
entry shows the transactions succeeded.
- If xlog directory fills or is otherwise unwritable, a PANIC is raised
- If clog directory fills or is otherwise unwritable, a PANIC is raised

Point in Time Recovery (PITR)

PITR features are designed to extend the existing Crash Recovery
features so
that a recovery can take place in situations where a crash recovery
would
not have been possible. These situations are:
- database objects have been dropped
- xlogs do not go back far enough in time to allow rollforward recovery
- the database files are not intact and need to be completely replaced
before rollforward

To do this, a full physical backup of the system is required for
recovery.
When tablespaces are available, it should be possible to restore and
recover
individual tablespaces. In addition, xlogs will need to be moved out of
the normal xlog filesystem to an archive destination.

PITR Proposed Solution

The proposed solution is to allow the existing crash recovery detection
and rollforward logic to be utilised directly to perform PITR, which
should allow the minimum number of changes and additional code.

To allow this to occur,