date:20040716

[HACKERS] Release status report

2004-07-16 Thread Bruce Momjian

We are two weeks away from the beta freeze on August 1 so I want to give
a status report on where we are.

Right now we have two large outstanding items, and a few smaller ones. 
Nested transactions (NT) is in CVS, but it is missing:

o  savepoints
o  cursor rollback
o  performance fixes

Alvaro has completed the first item and a few more and will probably
submit a mega-patch shortly that can be applied.  Tom is working on
cursors and should finish in a day or two.  Performance fixes are needed
and will have to be addressed, though we might be adjusting these during
beta as we find them.

PITR has been ready for a while though Simon continues to improve it and
users are continuing to suggest fixes/ideas.  I am hopeful we can get
the majority of PITR issues/comments addressed next week while Simon is
available.

The other patches are in-progress and will be applied when the community
concludes requesting changes to each.

So, basically, I think we are on track.  It would be nice to get the
majority of NT/PITR work into CVS by next weekend so we can have a week
of polishing before beta.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Bruce Momjian

Let me address you concerns about PITR getting into 7.5. I think a few
people spoke last week expressing concern about our release process and
wanting to take drastic action.  However, looking at the release status
report I am about to post, you will see we are on track for an August 1
beta.

PITR has been neglected only because it has been moving along so well we
haven't needed to get deeply involved.  Simon has been able to address
concerns as we raised them and make adjustments quickly with little
guidance.  

Now, we certainly don't want to skip adding PITR by not giving it our
full attention to get into 7.5.  Once Tom completes the cursor issues
with NT in the next day or so,  I think that removes the last big NT
stumbling block, and we will start to focus on PITR.  Unless there is
some major thing we are missing, we fully expect to get PITR in 7.5.  We
don't have a crystal ball to know for sure, but our intent is clear.

I know Simon is going away July 26 so we want to get him feedback as
soon as possible.  If we wait until after July 26, we will have to make
all the adjustments without Simon's guidance, which will be difficult.

As far as the importance of PITR, it is a _key_ enterprise feature, even
more key than NT.  PITR is going to be one of the crowning jewels of the
7.5 release, and I don't want to go into beta without it unless we can't
help it.

So, I know with the deadline looming, and everyone it getting nervous,
but keep the faith.  I can see the light at the end of the tunnel.  I
know this is a tighter schedule than we would like, but I know we can do
it, and I expect we will do it.

---

Simon Riggs wrote:
> On Thu, 2004-07-15 at 15:57, Bruce Momjian wrote:
> 
> > We will get there --- it just seems dark at this time.
> 
> Thanks for that. My comments were heartfelt, but not useful right now. 
> 
> I'm badly overdrawn already on my time budget, though that is my concern
> alone. There is more to do than I have time for. Pragmatically, if we
> aren't going to get there then I need to stop now, so I can progress
> other outstanding issues. All help is appreciated.
> 
> I'm aiming for the minimum feature set - which means we do need to take
> care over whether that set is insufficient and also to pull any part
> that doesn't stand up to close scrutiny over the next few days.
> 
> Overall, my primary goal is increased robustness and availability for
> PostgreSQL...and then to have a rest!
> 
> Best Regards, Simon Riggs
> 
> 
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Bruce Momjian

OK, I think I have some solid ideas and reasons for them.

First, I think we need server-side functions to call when we start/stop
the backup.  The advantage of these server-side functions is that they
will do the required work of recording the pg_control values and
creating needed files with little chance for user error.  It also allows
us to change the internal operations in later releases without requiring
admins to change their procedures.  We are even able to adjust the
internal operation in minor releases without forcing a new procedure on
users.

Second, I think once we start a restore, we should rename recovery.conf
to recovery.in_progress, and when complete rename that to
recovery.done.  If the postmaster starts and sees recovery.in_progress,
it will fail to start knowing its recovery was interrupted.  This allows
the admin to take appropriate action.  (I am not sure what that action
would be. Does he bring back the backup files or just keep going?)

Third, I think we need to put a file in the archive location once we
complete a backup, recording the start/stop xid and wal/offsets.  This
gives the admin documentation on what archive logs to keep and what xids
are available for recovery.  Ideally the recover program would read that
file and check the recover xid to make sure it is after the stop xid
recorded in the file.

How would the recover program know the name of that file?  We need to
create it in /data with start contents before the backup, then complete
it with end contents and archive it.

What should we name it?  Ideally it would be named by the WAL
name/offset of the start so it orders in the proper spot in the archive
file listing, e.g.:

093a
093b
093b.032b9.start
093c

Are people going to know they need 093b for
093b.032b9.start?  I hope so.  Another idea is to do:

093a.xlog
093b.032b9.start
093b.xlog
093c.xlog

This would order properly.  It might be a very good idea to add
extensions to these log files now that we are archiving them in strange
places.  In fact, maybe we should use *.pg_xlog to document the
directory they came from.

---

Simon Riggs wrote:
> On Fri, 2004-07-16 at 16:47, Tom Lane wrote:
> > As far as the business about copying pg_control first goes: there is
> > another way to think about it, which is to copy pg_control to another
> > place that will be included in your backup.  For example the standard
> > backup procedure could be
> > 
> > 1. [somewhat optional] Issue CHECKPOINT and wait till it finishes.
> > 
> > 2. cp $PGDATA/global/pg_control $PGDATA/pg_control.dump
> > 
> > 3. tar cf /dev/mt $PGDATA
> > 
> > 4. do something to record ending WAL position
> > 
> > If we standardized on this way, then the tar archive would automatically
> > contain the pre-backup checkpoint position in ./pg_control.dump, and
> > there is no need for any special assumptions about the order in which
> > tar processes things.
> > 
> 
> Sounds good. That would be familiar to Oracle DBAs doing BACKUP
> CONTROLFILE. We can document that and off it as a suggested procedure.
> 
> > However, once you decide to do things like that, there is no reason why
> > the copied file has to be an exact image of pg_control.  I claim it
> > would be more useful if the copied file were plain text so that you
> > could just "cat" it to find out the starting WAL position; that would
> > let you determine without any special tools what range of WAL archive
> > files you are going to need to bring back from your archives.
> 
> I wouldn't be in favour of a manual mechanism. If you want an automated
> mechanism, whats wrong with using the one thats already there? You can
> use pg_controldata to read the controlfile, again whats wrong with that?
> 
> We agreed some time back that an off-line xlog file inspector would be
> required to allow us to inspect the logs and make a decision about where
> to end recovery. You'd still need that.
> 
> It's scary enough having to specify the end point, let alone having to
> specify the starting point as well.
> 
> At your request, and with Bruce's idea, I designed and built the
> recovery system so that you don't need to know what range of xlogs to
> bring back. You just run it, it brings back the right files from archive
> and does recovery with them, then cleans up - and it works without
> running out of disk space on long recoveries.
> 
> I've built it now and it works...
> 
> > This is pretty much the same chain of reasoning that Bruce and I went
> > through yesterday to come up with the idea of putting a label file
> > inside the tar backups.  We concluded that it'd be worth putting
> > both the backup starting time and the checkpoint WAL position into
> > the label file --- the starting time isn't needed for res

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Andreas Pflug

Bruce Momjian wrote:
Andreas Pflug wrote:
Just for convenience. Both start and size are optional parameters, but 
with start=0 and size=5. Well, it's a very special function anyway, 
so we could require the user to supply all parameters. I'll remove it.

Agreed, and maybe a zero value gets the entire file.
Which is a default param back again, maybe on a 100MB file? Better not. 
Lets leave it to the admin to do sick stuff as 
pg_read_file('base/5000/5002', 0, 1) ...

No, I am thinking the program goes crazy and writes everywhere.
What I described was just that situation.

Yes, that is the usual method.  We signal the postmaster and it then
does the signalling to the logger.  I thought you had looked at other
backend signalling examples so I didn't explain it.
Well if you know the places where backends do signal stuff to the 
postmaster... Still, somebody could have yelled "use the standard way 
before reinventing the wheel".

Now, one really good efficiency would be to use LISTEN/NOTIFY so clients
could know when new data has appeared in the log, or the log file is
rotated.  Now that's an efficiency!   However, let's get this
infrastructure completed first.   One wacky idea would be for the
clients to LISTEN on 'pg_new_logfile' and have the logger do
system('psql -c "NOTIFY pg_new_logfile" template1') or something like
that.
No, certainly not. This would mean that every time a log is done, psql 
is fired up. Tom wouldn't accept this as KISS, I believe. And h*ll, that 
would cause traffic (just imagine a single log message on client startup...)

What you saw on LinuxTag was pgAdmin3 polling once a second if the 
logfile length changed, which is the fastest setting possible.

Regards,
Andreas


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Andreas Pflug

Bruce Momjian wrote:
Andreas Pflug wrote:

Right.  We already have to use shared mem for the backends and
postmaster.  It is the logger we are worried about.
Tom brought up the point that if the logger used shared memory, we would
have to kill/restart it if we need to reinitialize shared memory,
I don't know why that particular segment should ever be renewed. Anyway, 
it's gone.

Regards,
Andreas
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Bruce Momjian

Andreas Pflug wrote:
> Bruce Momjian wrote:
> > Andreas Pflug wrote:
> 
> > Right.  We already have to use shared mem for the backends and
> > postmaster.  It is the logger we are worried about.
> > 
> > Tom brought up the point that if the logger used shared memory, we would
> > have to kill/restart it if we need to reinitialize shared memory,
> 
> I don't know why that particular segment should ever be renewed. Anyway, 
> it's gone.

As I remember, we have one big shared memory segment.  Where you
creating a special one just for this timestamp?  If you were, I see why
your approach was safer, but as you said, it doesn't buy us much anyway.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs

On Fri, 2004-07-16 at 16:47, Tom Lane wrote:
> As far as the business about copying pg_control first goes: there is
> another way to think about it, which is to copy pg_control to another
> place that will be included in your backup.  For example the standard
> backup procedure could be
> 
> 1. [somewhat optional] Issue CHECKPOINT and wait till it finishes.
> 
> 2. cp $PGDATA/global/pg_control $PGDATA/pg_control.dump
> 
> 3. tar cf /dev/mt $PGDATA
> 
> 4. do something to record ending WAL position
> 
> If we standardized on this way, then the tar archive would automatically
> contain the pre-backup checkpoint position in ./pg_control.dump, and
> there is no need for any special assumptions about the order in which
> tar processes things.
> 

Sounds good. That would be familiar to Oracle DBAs doing BACKUP
CONTROLFILE. We can document that and off it as a suggested procedure.

> However, once you decide to do things like that, there is no reason why
> the copied file has to be an exact image of pg_control.  I claim it
> would be more useful if the copied file were plain text so that you
> could just "cat" it to find out the starting WAL position; that would
> let you determine without any special tools what range of WAL archive
> files you are going to need to bring back from your archives.

I wouldn't be in favour of a manual mechanism. If you want an automated
mechanism, whats wrong with using the one thats already there? You can
use pg_controldata to read the controlfile, again whats wrong with that?

We agreed some time back that an off-line xlog file inspector would be
required to allow us to inspect the logs and make a decision about where
to end recovery. You'd still need that.

It's scary enough having to specify the end point, let alone having to
specify the starting point as well.

At your request, and with Bruce's idea, I designed and built the
recovery system so that you don't need to know what range of xlogs to
bring back. You just run it, it brings back the right files from archive
and does recovery with them, then cleans up - and it works without
running out of disk space on long recoveries.

I've built it now and it works...

> This is pretty much the same chain of reasoning that Bruce and I went
> through yesterday to come up with the idea of putting a label file
> inside the tar backups.  We concluded that it'd be worth putting
> both the backup starting time and the checkpoint WAL position into
> the label file --- the starting time isn't needed for restore but
> might be really helpful as documentation, if you needed to verify
> which dump file was which.

...if you are doing tar backups...what will you do if you're not using
that mechanism?

If you are: It's common practice to make up a backup filename from
elements such as systemname, databasename, date and time etc. That gives
you the start time, the file last mod date gives you the end time. 

I think its perfectly fine for everybody to do backups any way they
please. There are many licenced variants of PostgreSQL and it might be
appropriate in those to specify particular ways of doing things.

I'll be trusting the management of backup metadata and storage media to
a solution designed for the purpose (open or closed source), just as
I'll be trusting my data to a database solution designed for that
purpose. That for me is one of the good things about PostgreSQL - we use
the filesystem, we don't write our own, we provide language interfaces
not invent our own proprietary server language etc..

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Bruce Momjian

Andreas Pflug wrote:
> Andreas Pflug wrote:
> 
> > 
> > Ah wait.
> > Digging further behind SIGUSR1 I now *do* see a solution without pid in 
> > shmem, using SendPostmasterSignal. Well, a little hint from gurus would 
> > have helped...
> > 
> 
> Oops, SendPostmasterSignal uses shmem
> 
> At least, this enables syslogger.c to be free from shmem stuff, except 
> for PGSharedMemDetach.

Right.  We already have to use shared mem for the backends and
postmaster.  It is the logger we are worried about.

Tom brought up the point that if the logger used shared memory, we would
have to kill/restart it if we need to reinitialize shared memory,
meaning we would loose logging info at a time we really need it ---
again a good reason not to use shared memory in the logger.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Bruce Momjian

Andreas Pflug wrote:
> Bruce Momjian wrote:
> >>Ok, no limit (but a default maximum of 50k remains). And since it's 
> >>superuser only, he hopefully knows what he does.
> > 
> > 
> > Huh?  Why have a default maximum?
> 
> Just for convenience. Both start and size are optional parameters, but 
> with start=0 and size=5. Well, it's a very special function anyway, 
> so we could require the user to supply all parameters. I'll remove it.

Agreed, and maybe a zero value gets the entire file.

> > Ah, I think it needs to close that as soon as it starts.  Don't other
> > subprocesses do that?  That shared memory is very fragile and we don't
> > want an errant pointer poking in there.
> 
> The result of an errant pointer writing to that shred mem would be
> 
> 1) wrong pid for SysLogger, so it can't be signalled to rotate from backends
> 2) wrong timestamp, so backends don't know the latest logfile.
> 
> Nothing particularly crash prone really.

No, I am thinking the program goes crazy and writes everywhere.

> > This seems clean and fast enough to me:
> > 
> > SELECT filename
> > FROM pg_dir_ls('/var/log')
> > ORDER BY 1 DESC
> > LIMIT 1
> 
> For a logfile listing function, this would look
> 
> SELECT MAX(startdate)
>FROM pg_logfile_ls()
> 
> 
> > 
> > Considering that any query from a client is going to have to go through
> > the parser and be executed, an 'ls' in a directory just isn't a
> > measurable performance hit.
> > 
> > If you want run a test that does an 'ls' and one that doesn't to see
> > that there is no measurable performance difference.
> > 
> So while a simple PG_RETURN_TIMESTAMP(logfiletimestamp) compared to a 
> lengthy setof returning function is drastically faster, this appears 
> much less drastic with parser overhead.
> 
> > I would not worry about the clock going backward.  PostgreSQL would have
> > enough problems with timestamp columns moving backward that the file log
> > times are the least of our problems.
> 
> I see, so the admin is in trouble anyway (what about PITR? Data column 
> deviations appear harmless compared to restoration based on timestamps).

PITR uses WAL numbering so it would be fine, but the timestamps on the
commit records would have problems.

> > You can't know that you aren't reading corrupt data if you read shared
> > memory without a lock.  What if the write is happening as you read?
> 
> I thought about this quite a while.
> 
> If the shmem fields aren't written atomically (one is 32bit, one 64 bit, 
> probably on dword boundaries so writing will happen at least processor 
> bus wide, do we support any 16 bit processor?) the corruption 
> consequences as above apply. In the case of the timestamp, the high word 
> will rarely change anyway, only every 2^32 seconds...
> 
> Concurrent access on the logger pid would mean to call 
> pg_logfile_rotate() while a killed logger is being restarted, which is 
> creating a new logfile then anyway. This would send a SIGINT into outer 
> space, maybe to the bgwriter triggering a checkpoint, or the postmaster 
> shutting it down (gracefully, still unwanted).
> 
> BTW, the consequences of a trigger flag in shmem would be less because 
> all that could happen was a log rotation (which appends to existing 
> files, just in case syslogger died in the milliseconds after a rotation).
> 
> > 
> > The only clean solution I can think of is to write an operating system
> > file that contains the current log filename and read from that.  I
> > believe such writes are atomic.  But again, this seems like overkill to
> > me.
> 
> Ah wait.
> Digging further behind SIGUSR1 I now *do* see a solution without pid in 
> shmem, using SendPostmasterSignal. Well, a little hint from gurus would 
> have helped...
> 
> I'll convert to this, *dropping* all shmem.

Yes, that is the usual method.  We signal the postmaster and it then
does the signalling to the logger.  I thought you had looked at other
backend signalling examples so I didn't explain it.

Now, one really good efficiency would be to use LISTEN/NOTIFY so clients
could know when new data has appeared in the log, or the log file is
rotated.  Now that's an efficiency!   However, let's get this
infrastructure completed first.   One wacky idea would be for the
clients to LISTEN on 'pg_new_logfile' and have the logger do
system('psql -c "NOTIFY pg_new_logfile" template1') or something like
that.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Andreas Pflug

Andreas Pflug wrote:
Ah wait.
Digging further behind SIGUSR1 I now *do* see a solution without pid in 
shmem, using SendPostmasterSignal. Well, a little hint from gurus would 
have helped...

Oops, SendPostmasterSignal uses shmem
At least, this enables syslogger.c to be free from shmem stuff, except 
for PGSharedMemDetach.

Regards,
Andreas
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Andreas Pflug

Bruce Momjian wrote:
Ok, no limit (but a default maximum of 50k remains). And since it's 
superuser only, he hopefully knows what he does.

Huh?  Why have a default maximum?
Just for convenience. Both start and size are optional parameters, but 
with start=0 and size=5. Well, it's a very special function anyway, 
so we could require the user to supply all parameters. I'll remove it.


Ah, I think it needs to close that as soon as it starts.  Don't other
subprocesses do that?  That shared memory is very fragile and we don't
want an errant pointer poking in there.
The result of an errant pointer writing to that shred mem would be
1) wrong pid for SysLogger, so it can't be signalled to rotate from backends
2) wrong timestamp, so backends don't know the latest logfile.
Nothing particularly crash prone really.

This seems clean and fast enough to me:
SELECT filename
FROM pg_dir_ls('/var/log')
ORDER BY 1 DESC
LIMIT 1
For a logfile listing function, this would look
SELECT MAX(startdate)
  FROM pg_logfile_ls()

Considering that any query from a client is going to have to go through
the parser and be executed, an 'ls' in a directory just isn't a
measurable performance hit.
If you want run a test that does an 'ls' and one that doesn't to see
that there is no measurable performance difference.
So while a simple PG_RETURN_TIMESTAMP(logfiletimestamp) compared to a 
lengthy setof returning function is drastically faster, this appears 
much less drastic with parser overhead.

I would not worry about the clock going backward.  PostgreSQL would have
enough problems with timestamp columns moving backward that the file log
times are the least of our problems.
I see, so the admin is in trouble anyway (what about PITR? Data column 
deviations appear harmless compared to restoration based on timestamps).


You can't know that you aren't reading corrupt data if you read shared
memory without a lock.  What if the write is happening as you read?
I thought about this quite a while.
If the shmem fields aren't written atomically (one is 32bit, one 64 bit, 
probably on dword boundaries so writing will happen at least processor 
bus wide, do we support any 16 bit processor?) the corruption 
consequences as above apply. In the case of the timestamp, the high word 
will rarely change anyway, only every 2^32 seconds...

Concurrent access on the logger pid would mean to call 
pg_logfile_rotate() while a killed logger is being restarted, which is 
creating a new logfile then anyway. This would send a SIGINT into outer 
space, maybe to the bgwriter triggering a checkpoint, or the postmaster 
shutting it down (gracefully, still unwanted).

BTW, the consequences of a trigger flag in shmem would be less because 
all that could happen was a log rotation (which appends to existing 
files, just in case syslogger died in the milliseconds after a rotation).

The only clean solution I can think of is to write an operating system
file that contains the current log filename and read from that.  I
believe such writes are atomic.  But again, this seems like overkill to
me.
Ah wait.
Digging further behind SIGUSR1 I now *do* see a solution without pid in 
shmem, using SendPostmasterSignal. Well, a little hint from gurus would 
have helped...

I'll convert to this, *dropping* all shmem.
Regards,
Andreas
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs

On Fri, 2004-07-16 at 19:30, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Fri, 2004-07-16 at 16:58, Zeugswetter Andreas SB SD wrote:
> > > > >> Do we need a checkpoint after the archiving
> > > > >> starts but before the backup begins?
> > > > 
> > > > > No.
> > > > 
> > > > Actually yes.
> > > 
> > > Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-(
> > > So yes, you need one checkpoint after archiving starts. Imho turning on xlog
> > > archiving should issue such a checkpoint just to be sure. 
> > > 
> > 
> > By agreement, archive_mode can only be turned on at postmaster startup,
> > which means you always have a checkpoint - either because you shut it
> > down cleanly, or you didn't and it recovers, then writes one.
> > 
> > There is always something to start the rollforward. 
> > 
> > So, non-issue.
> 

I was discussing the claim that there might not be a checkpoint to begin
the rollforward from. There always is: if you are in archive_mode=true
then you will always have a checkpoint that can be used for recovery. It
may be "a long way in the past", if there has been no write activity,
but the rollforward will very very quick, since there will be no log
records.

> I don't think so.  I can imagine many cases where you want to do a
> nightly tar backup without turning archiving on/off or restarting the
> postmaster.  

This is a misunderstanding. I strongly agree with what you say: the
whole system has been designed to avoid any benefit from turning on/off
archiving and there is no requirement to restart postmaster to take
backups.

> In those cases, a manual checkpoint would have to be issued
> before the backup begins.

A manual checkpoint doesn't HAVE TO be issued. Presumably most systems
will be running checkpoint every few minutes. Wherever the last one was
is where the rollforward would start from.

But you can if thats the way you want to do things, just wait long
enough for the checkpoint to have completed, otherwise your objective of
reducing rollforward time will not be met.

(please note my earlier reported rollback performance of approximately
x10 rate of recovery v elapsed time - will require testing on your own
systems).

> Imagine a system that is up for a month, and they don't have enough
> archive space to keep a months worth of WAL files.  They would probably
> do nightly or weekend tar backups, and then discard the WAL archives.
> 

Yes, that would be normal practice. I would recommend keeping at least
the last 3 full backups and all of the WAL logs to cover that period.

> What procedure would they use?  I assume they would copy all their old
> WAL files to a save directory, issue a checkpoint, do a tar backup, then
> they can delete the saved WAL files.  Is that correct?

PITR is designed to interface with a wide range of systems, through the
extensible archive/recovery program interface. We shouldn't focus on
just tar backups - if you do, then the whole thing seems less
feature-rich. The current design allows interfacing with tape, remote
backup, internet backup providers, automated standby servers and the
dozen major storage/archive vendors' solutions.

Writing a procedure to backup, assign filenames, keep track of stuff
isn't too difficult if you're a competent DBA with a mild knowledge of
shell or perl scripting. But if data is important, people will want to
invest the time and trouble to adopt one of the open source or licenced
vendors that provide solutions in this area.

Systems management is a discipline and procedures should be in place for
everything. I fully agree with the "automate everything" dictum, but
just don't want to constrain people too much to a particular way of
doing things.

-o-o-

Overall, for first release, I think the complexity of this design is
acceptable. PITR is similar to Oracle7 Backup/Recovery, and easily
recognisable to any DBA with current experience of current SQL Server,
DB2 (MVS, UDB) or Teradata systems. [I can't comment much on Ingres,
Informix, Sybase etc]

My main areas of concern are:
- the formal correctness of the recovery process
As a result of this concern, PITR makes ZERO alterations to the recovery
code itself. The trick is to feed it the right xlog files and to stop,
if required, at the right place and allow normal work to resume.

- the robustness and quality of my implementation
This requires quality checking of the code and full beta testing

-o-o-

We've raised a couple of valid points on the lists in the last few days:
- its probably a desirable feature (but not essential) to implement a
write suspend feature on the bgwriter, if nothing else it will be a
confidence building feature...as said previously, for many people, this
will not be required, but people will no doubt keep asking
- there is a small window of risk around the possibility that a recovery
target might be set by the user that doesn't rollforward all the way
past the end of the backup. That is real, but in general, people aren't
like

Re: [HACKERS] Nested Savepoints in Db2 Stinger

2004-07-16 Thread Alvaro Herrera

On Thu, Jul 15, 2004 at 08:41:08PM +0100, Simon Riggs wrote:
> FYI
> 
> http://www.databasejournal.com/features/db2/article.php/3361941

Hmm.  Basically it says that they implement SQL2003 semantics (or the
other way around: SQL2003 was written with DB2 in mind).

In the code there is a provision for savepoint levels, but it doesn't do
anything (yet).

-- 
Alvaro Herrera ()
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Bruce Momjian

Andreas Pflug wrote:
> Bruce Momjian wrote:
> >>- How's the official way to restrict pg_* functions to superuser only
> > 
> > 
> > Very crudely  :-)
> 
> Got it.
> 
> 'nother question: Is reading the logfile a task that may be allowed to 
> superusers only? I don't think so, though acls might apply.

Yes, the log file might contain SQL queries issued by others.  It is a
super-user only capability.

> > Uh, that seems fine.  You already check to see it is within the limit. 
> > I think a bigger question is should we limit it at all?  Do we limit
> > pg_largeobject?  Is that similar?
> 
> Ok, no limit (but a default maximum of 50k remains). And since it's 
> superuser only, he hopefully knows what he does.

Huh?  Why have a default maximum?

> >>Very open question:
> >>- How should a backend know the logger's pid if it's not in shmem. Write 
> >>a magic string to the pipe?
> > 
> > 
> > I think it has to and in fact the pid is being written by the
> > postmaster, not by the logger process, so that should be OK.  The issue
> > is that the logger shouldn't _attach_ to shared memory unless it has to.
> 
> It doesn't. It inherits the unnamed shared mem segment from the 
> postmaster, as all subprocesses.

Ah, I think it needs to close that as soon as it starts.  Don't other
subprocesses do that?  That shared memory is very fragile and we don't
want an errant pointer poking in there.

> > As far as recording the current log timestamp, I think that will be a
> > problem.  I would much rather see us forget about doing timestamp
> > processing with these log files and keep it simple at this point and see
> > what needs we have for 7.6.
> 
> I'm a bit insisting on this point. Remember, this all started from the 
> attempt to display the serverlog on the client side. To do this, I need 
> a way to retrieve the current logfile properties (size, and in case of 
> rotation timestamp too) in a low-overhead way, or at least get to know 
> something has changed. Scanning a whole directory and interpreting the 
> data isn't low overhead any more.

This seems clean and fast enough to me:

SELECT filename
FROM pg_dir_ls('/var/log')
ORDER BY 1 DESC
LIMIT 1

Considering that any query from a client is going to have to go through
the parser and be executed, an 'ls' in a directory just isn't a
measurable performance hit.

If you want run a test that does an 'ls' and one that doesn't to see
that there is no measurable performance difference.

I would not worry about the clock going backward.  PostgreSQL would have
enough problems with timestamp columns moving backward that the file log
times are the least of our problems.

> There's no locking on the shmem, and the single dependence on shmem is 
> the existence of it at the time of rotation. If the shmem is gone, 
> postmaster is probably dead anyway.

You can't know that you aren't reading corrupt data if you read shared
memory without a lock.  What if the write is happening as you read?

The only clean solution I can think of is to write an operating system
file that contains the current log filename and read from that.  I
believe such writes are atomic.  But again, this seems like overkill to
me.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] case sensitivity in PQExecPrepared

2004-07-16 Thread Peter Eisentraut

Merlin Moncure wrote:
> Well, yes :)  Just to be absolutely clear what I mean, the following
> will fail (pseudocode, but you get the idea):
>
> char stmt[] = "prepare X as select 0()";
> PQexec(c, "execute X"); <-- works
> PQexecPrepared(c, "X" [...]); <-- fails
>
> You are saying this is the correct and expected behavior?

Yes, because part of those syntax rules is that in SQL, unquoted 
identifiers are folded to lower case, but in C they are not.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] case sensitivity in PQExecPrepared

2004-07-16 Thread Merlin Moncure

Peter Eisentraut wrote:
> Merlin Moncure wrote:
> > While using the PQExecPrepared interface, the statement name passed
> > to the function (as a const char*) has to be in lower case to get it
> > to work.  I kind of understand why this is, but it is kind of weird
> > that passing the exact same statement name to PQExecPrepared and
> > PREPARE does not work if it contains any upper case characters.
> 
> PQExecPrepared is used in C programs, PREPARE is used in SQL programs.
> I think it's fairly obvious that those use different syntax rules.

Well, yes :)  Just to be absolutely clear what I mean, the following
will fail (pseudocode, but you get the idea):

char stmt[] = "prepare X as select 0()";
PQexec(c, "execute X"); <-- works
PQexecPrepared(c, "X" [...]); <-- fails

You are saying this is the correct and expected behavior?

Merlin


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] case sensitivity in PQExecPrepared

2004-07-16 Thread Peter Eisentraut

Merlin Moncure wrote:
> While using the PQExecPrepared interface, the statement name passed
> to the function (as a const char*) has to be in lower case to get it
> to work.  I kind of understand why this is, but it is kind of weird
> that passing the exact same statement name to PQExecPrepared and
> PREPARE does not work if it contains any upper case characters.

PQExecPrepared is used in C programs, PREPARE is used in SQL programs.  
I think it's fairly obvious that those use different syntax rules.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Andreas Pflug

Bruce Momjian wrote:
- How's the official way to restrict pg_* functions to superuser only

Very crudely  :-)
Got it.
'nother question: Is reading the logfile a task that may be allowed to 
superusers only? I don't think so, though acls might apply.


Uh, that seems fine.  You already check to see it is within the limit. 
I think a bigger question is should we limit it at all?  Do we limit
pg_largeobject?  Is that similar?
Ok, no limit (but a default maximum of 50k remains). And since it's 
superuser only, he hopefully knows what he does.


Very open question:
- How should a backend know the logger's pid if it's not in shmem. Write 
a magic string to the pipe?

I think it has to and in fact the pid is being written by the
postmaster, not by the logger process, so that should be OK.  The issue
is that the logger shouldn't _attach_ to shared memory unless it has to.
It doesn't. It inherits the unnamed shared mem segment from the 
postmaster, as all subprocesses.

As far as recording the current log timestamp, I think that will be a
problem.  I would much rather see us forget about doing timestamp
processing with these log files and keep it simple at this point and see
what needs we have for 7.6.
I'm a bit insisting on this point. Remember, this all started from the 
attempt to display the serverlog on the client side. To do this, I need 
a way to retrieve the current logfile properties (size, and in case of 
rotation timestamp too) in a low-overhead way, or at least get to know 
something has changed. Scanning a whole directory and interpreting the 
data isn't low overhead any more.

There's no locking on the shmem, and the single dependence on shmem is 
the existence of it at the time of rotation. If the shmem is gone, 
postmaster is probably dead anyway.

Regards,
Andreas
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] case sensitivity in PQExecPrepared

2004-07-16 Thread Merlin Moncure

I noticed a curious thing (not sure if by design or not).

While using the PQExecPrepared interface, the statement name passed to
the function (as a const char*) has to be in lower case to get it to
work.  I kind of understand why this is, but it is kind of weird that
passing the exact same statement name to PQExecPrepared and PREPARE does
not work if it contains any upper case characters.

Just FYI.
Merlin


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Memory management with C aggregate

2004-07-16 Thread Ian Burrell

Tom Lane wrote:
Ian Burrell <[EMAIL PROTECTED]> writes:
We have user-defined aggregates written in C running inside the server. 
 We are running into some memory management issues and wondering what 
is the best way to solve the problem.

The state of the aggregates is a structure with a pointer to allocated 
memory.  The structure and memory are allocated in the 
PortalMemoryContext and freed in the ffunc.  We just discovered that the 
ffunc function can be called multiple times with the same state on 
certain queries.  The double free causes a crash.

AFAICS the ffunc should be called only once per aggregated group.
Perhaps your code is confused about the handling of groups?  If so,
the double free is hardly your only worry --- you'll be computing wrong
answers anyway till you fix that.
The aggregate is in a subquery which might make a difference.  The ffunc 
is only called multiple times when a nested loop is used in the query 
plan.  With other query plans, the ffunc is only called once.  The ffunc 
seems to be called once for every combination.

I can't get the following query to use a nested loop, but our query is 
similar, but with a compound key and a custom aggregate.  If x has N 
rows, y has M, with N groups, then the ffunc can be called N*M times, M 
times for each N group.

SELECT x.key_no, s.agg
FROM x
INNER JOIN (
SELECT y.key_no, custom_agg(num) AS agg
FROM y
GROUP BY key_no
) s
USING (key_no)
ORDER BY key_no
I'll try to come up with a test case that illustrates the problem.
 - Ian

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Bruce Momjian

Simon Riggs wrote:
> On Fri, 2004-07-16 at 16:58, Zeugswetter Andreas SB SD wrote:
> > > >> Do we need a checkpoint after the archiving
> > > >> starts but before the backup begins?
> > > 
> > > > No.
> > > 
> > > Actually yes.
> > 
> > Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-(
> > So yes, you need one checkpoint after archiving starts. Imho turning on xlog
> > archiving should issue such a checkpoint just to be sure. 
> > 
> 
> By agreement, archive_mode can only be turned on at postmaster startup,
> which means you always have a checkpoint - either because you shut it
> down cleanly, or you didn't and it recovers, then writes one.
> 
> There is always something to start the rollforward. 
> 
> So, non-issue.

I don't think so.  I can imagine many cases where you want to do a
nightly tar backup without turning archiving on/off or restarting the
postmaster.  In those cases, a manual checkpoint would have to be issued
before the backup begins.

Imagine a system that is up for a month, and they don't have enough
archive space to keep a months worth of WAL files.  They would probably
do nightly or weekend tar backups, and then discard the WAL archives.

What procedure would they use?  I assume they would copy all their old
WAL files to a save directory, issue a checkpoint, do a tar backup, then
they can delete the saved WAL files.  Is that correct?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Bruce Momjian

Andreas Pflug wrote:
> Bruce Momjian wrote:
> 
> > 
> > This seems quite involved.  Can we get the basic functionality I
> > described first? 
> 
> Current workable patch.
> 
> Some questions/limitations:
> - How's the official way to restrict pg_* functions to superuser only

Very crudely  :-)

static int pg_signal_backend(int pid, int sig)
{
if (!superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 (errmsg("only superuser can signal other backends";

> - I've restricted pg_file_read to 50k max. What's a reasonable limit for 
> a generic function?

Uh, that seems fine.  You already check to see it is within the limit. 
I think a bigger question is should we limit it at all?  Do we limit
pg_largeobject?  Is that similar?

> - pg_file_read and pg_file_write read/write text only; should have 
> binary versions too.

I guess we could but no one is asking for that yet so I would leave it
for later.

> Very open question:
> - How should a backend know the logger's pid if it's not in shmem. Write 
> a magic string to the pipe?

I think it has to and in fact the pid is being written by the
postmaster, not by the logger process, so that should be OK.  The issue
is that the logger shouldn't _attach_ to shared memory unless it has to.

As far as recording the current log timestamp, I think that will be a
problem.  I would much rather see us forget about doing timestamp
processing with these log files and keep it simple at this point and see
what needs we have for 7.6.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Bruce Momjian

Andreas Pflug wrote:
> Bruce Momjian wrote:
> 
> > 
> > This seems quite involved.  Can we get the basic functionality I
> > described first?
> 
> On the way.
> 
> > Also I am not sure how all this information is going
> > to be passed from the logging process to the backend requesting the
> > information, and it seems overly complicated.
> 
> There's *no* information passing from the logging process, with the 
> single exception of the latest logfile timestamp (if allowed). I'd 
> rather like to have that information from the logger, to be safe in case 
> the system time was manipulated and the last logfile is not the current one.
> The rest is just a reworked version of pg_dir_ls, with internal 
> knowledge of how the timestamp is formatted.

Oh, so you are hardcoding the logfile name so you can interpret the
timestamp from that?  It seems cleaner to allow the admin to specify
whatever log pattern the want.

However, you idea of expiring the log files based on timestamp values is
pretty powerful.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] analyze.c

2004-07-16 Thread Peter Eisentraut

Anja Klein wrote:
> how can i tell postgres. that it should only create the sample table,
> when the original table to be created has the name "origin" or
> something like that? or that it should only act, if some user wants
> to create the table, not postgres itself?

You would presumably do a strcmp() on the name of the table to be 
created before you jump into your new code that creates the sample 
table.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs

On Fri, 2004-07-16 at 16:25, Zeugswetter Andreas SB SD wrote:

> I think the filename 'recovery.conf' is misleading, since it is not a 
> static configuration file, but a command file for one recovery.
> How about 'recovery.command' then 'recovery.inprogress', and on recovery 
> completion it should be renamed to 'recovery.done'

You understand this and your assessment is correct.

recovery.conf isn't an attempt to persist information. It is a means of
delivering a set of parameters to the recovery process, as well as
signalling overall that archive recovery is required (because the system
default remains the same, which is to recover from the logs it has
locally available to it).

I originally offered a design which used a command, similar to
DB2/Oracle...that was overruled as too complex. The (whatever you call
it) file is just a very simple way of specifying whats required.

There is more to be said here...clearly some explanations are required
and I will provide those later...

Best regards, Simon Riggs

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs

On Fri, 2004-07-16 at 15:27, Bruce Momjian wrote:

> Also, when you are in recovery mode, how do you get out of recovery
> mode, meaning if you have a power failure, how do you prevent the system
> from doing another recovery?  Do you remove the recovery.conf file?

That was the whole point of the recovery.conf file:
it prevents you from re-entering recovery accidentally, as would occur
if the parameters were set in the normal .conf file.

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs

On Fri, 2004-07-16 at 16:58, Zeugswetter Andreas SB SD wrote:
> > >> Do we need a checkpoint after the archiving
> > >> starts but before the backup begins?
> > 
> > > No.
> > 
> > Actually yes.
> 
> Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-(
> So yes, you need one checkpoint after archiving starts. Imho turning on xlog
> archiving should issue such a checkpoint just to be sure. 
> 

By agreement, archive_mode can only be turned on at postmaster startup,
which means you always have a checkpoint - either because you shut it
down cleanly, or you didn't and it recovers, then writes one.

There is always something to start the rollforward. 

So, non-issue.

Best regards, Simon Riggs

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Andreas Pflug

Tom Lane wrote:
Andreas Pflug <[EMAIL PROTECTED]> writes:
In-use check is easy for the backend, if the syslog process publishes 
the current logfile's timestamp in sharedmem.

You really haven't absorbed any of the objections I've raised, have you?
I don't want the log process connected to shared mem at *all*, and see
no particularly good reason why it should be.
Why shouldn't the process announce the logfile timestamp and its pid 
*writeonly*, so other backends know about it?

At least the pid must be distributed like this, just as bgwriter does.
I understand perfectly that postmaster and logger are very critical 
processes, so they should be dependent on as few resources as possible. 
The logger works without shmem in general, but how to reach it if its 
pid is unknown?


The directory should definitely be a GUC variable.  The individual
filenames should probably be of the form , where
the server dictates the format of the timestamp (and we choose it so
that the names sort correctly).  We could let the prefix be
user-selectable or make it hard-wired; I don't have a strong feeling
about that either way.
Agreed.
Regard,
Andreas
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Andreas Pflug

Bruce Momjian wrote:
This seems quite involved.  Can we get the basic functionality I
described first?
On the way.
Also I am not sure how all this information is going
to be passed from the logging process to the backend requesting the
information, and it seems overly complicated.
There's *no* information passing from the logging process, with the 
single exception of the latest logfile timestamp (if allowed). I'd 
rather like to have that information from the logger, to be safe in case 
the system time was manipulated and the last logfile is not the current one.
The rest is just a reworked version of pg_dir_ls, with internal 
knowledge of how the timestamp is formatted.

Regards,
Andreas


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Zeugswetter Andreas SB SD


> >> Do we need a checkpoint after the archiving
> >> starts but before the backup begins?
> 
> > No.
> 
> Actually yes.

Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-(
So yes, you need one checkpoint after archiving starts. Imho turning on xlog
archiving should issue such a checkpoint just to be sure. 

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Tom Lane

"Zeugswetter Andreas SB SD" <[EMAIL PROTECTED]> writes:
>> Do we need a checkpoint after the archiving
>> starts but before the backup begins?

> No.

Actually yes.  You have to start at a checkpoint record when replaying
the log, so if no checkpoint occurred between starting to archive WAL
and starting the tar backup, you have a useless backup.

It would be reasonable to issue a CHECKPOINT just before starting the
backup as part of the standard operating procedure for taking PITR
dumps.  We need not require this, but it would help to avoid this
particular sort of mistake; and of course it might save a little bit of
replay effort if the backup is ever used.

As far as the business about copying pg_control first goes: there is
another way to think about it, which is to copy pg_control to another
place that will be included in your backup.  For example the standard
backup procedure could be

1. [somewhat optional] Issue CHECKPOINT and wait till it finishes.

2. cp $PGDATA/global/pg_control $PGDATA/pg_control.dump

3. tar cf /dev/mt $PGDATA

4. do something to record ending WAL position

If we standardized on this way, then the tar archive would automatically
contain the pre-backup checkpoint position in ./pg_control.dump, and
there is no need for any special assumptions about the order in which
tar processes things.

However, once you decide to do things like that, there is no reason why
the copied file has to be an exact image of pg_control.  I claim it
would be more useful if the copied file were plain text so that you
could just "cat" it to find out the starting WAL position; that would
let you determine without any special tools what range of WAL archive
files you are going to need to bring back from your archives.

This is pretty much the same chain of reasoning that Bruce and I went
through yesterday to come up with the idea of putting a label file
inside the tar backups.  We concluded that it'd be worth putting
both the backup starting time and the checkpoint WAL position into
the label file --- the starting time isn't needed for restore but
might be really helpful as documentation, if you needed to verify
which dump file was which.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Zeugswetter Andreas SB SD


> then on restore once all the files are restored move the
> pg_control.backup to its original name.  That gives us the checkpoint
> wal/offset but how do we get the start/stop information.  Is that not
> required?

The checkpoint wal/offset is in pg_control, that is sufficient start 
information. The stop info is only necessary as a safeguard.

> Do we need a checkpoint after the archiving
> starts but before the backup begins?

No.

> Also, when you are in recovery mode, how do you get out of recovery
> mode, meaning if you have a power failure, how do you prevent the system
> from doing another recovery?  Do you remove the recovery.conf file?

pg_control could be updated during rollforward (only if that actually 
does a checkpoint). So if pg_control is also the recovery start info, then 
we can continue from there if we have a power failure.
For the first release it would imho also be ok to simply start over if
you loose power.

I think the filename 'recovery.conf' is misleading, since it is not a 
static configuration file, but a command file for one recovery.
How about 'recovery.command' then 'recovery.inprogress', and on recovery 
completion it should be renamed to 'recovery.done'

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> Also, when you are in recovery mode, how do you get out of recovery
> mode, meaning if you have a power failure, how do you prevent the system
> from doing another recovery?  Do you remove the recovery.conf file?

I do not care for the idea of a recovery.conf file at all, and have been
intending to look to see what we'd need to do to not have one.  I find
it hard to believe that there is anything one would put in it that is
really persistent state.  The above concern shows why it shouldn't be
treated as a persistent configuration file.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone

2004-07-16 Thread Tom Lane

"Magnus Hagander" <[EMAIL PROTECTED]> writes:
>> At the moment I like Oliver Jowett's idea of defining "best" 
>> as "the one that matches furthest back".

> Sounds reasonable to me. As long as a clear warning is put in the log
> whenever something is picked that is not a perfect match,

Define "perfect match".  I do not think we can really tell if we have an
exact match or not; the libc timezone API is just too limited to be
sure.  And on many platforms we can be sure we will never have an exact
match, especially if we look at years before 1970.

If you want something in the log I'd be inclined to just always make a
log entry when we infer a timezone setting.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Bruce Momjian

> "Zeugswetter Andreas SB SD" <[EMAIL PROTECTED]> writes:
> > We only need to tell people to backup pg_control first. The rest was only 
> > intended to enforce 
> > 1. that pg_control is the first file backed up
> > 2. the dba uses a large enough PIT (or xid) for restore
> 
> Right, but I think Bruce's point is that it is far too easy to get those
> things wrong; especially point 2 for which a straight tar dump will
> simply not contain the information you need to determine what is a safe
> stopping point.
> 
> I agree with Bruce that we should have some mechanism that doesn't rely
> on the DBA to get this right.  Exactly what the mechanism should be is
> certainly open for discussion...

Right.  I am wondering what process people would use to backup
pg_control first?  If they do:

tar -f $TAPE ./global/pg_control .

They will get two copies or pg_control, the early one, and one as part
of the directory scan.  On restore, they would restore the early one,
but the directory scan would overwrite it.  I suppose they could do:

cp global/pg_control global/pg_control.backup
tar -f $TAPE .

then on restore once all the files are restored move the
pg_control.backup to its original name.  That gives us the checkpoint
wal/offset but how do we get the start/stop information.  Is that not
required?  Maybe we should just have a start/stop server-side functions
that create a file in the archive directory describing the start/stop
counters and time and the admin would then have to find those values.
Why are the start/stop wal/offset values needed anyway?  I know why we
need the checkpoint value.  Do we need a checkpoint after the archiving
starts but before the backup begins?

Also, when you are in recovery mode, how do you get out of recovery
mode, meaning if you have a power failure, how do you prevent the system
from doing another recovery?  Do you remove the recovery.conf file?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Bruce Momjian

Andreas Pflug wrote:
> > OK, it would be nice if we could do a sed operation like:
> > 
> > sed 's/%./*/g'
> > 
> > but I don't know a way to do that without defining a function or pulling
> > in a procedural language, but if we could do it we could do:
> > 
> > pg_dir(echo log_destination | sed 's/%./*/g')
> > 
> 
> Ar ever used sed on win32?!? And how should the timestamp be 
> represented in client tools? Date/time interpretation is always a source 
> of problems, so *please* let the server do that.

I am thinking of these all being server-side functions.

> Rethinking all this, I'd like the pg_logfile_list to return a complex type:
> 
> CREATE TYPE pg_logfile_list AS (
>   filedate timestamp,
>   filename text,
>   backendpid int,
>   inuse bool)
> 
> and
> 
> pg_logfile_list() RETURNS SETOF pg_logfile_list
> 
> which would enable
> 
> SELECT  filename,
>  pg_file_unlink(filename)
>FROM  pg_logfile_list()
>   WHERE  filedate < current_timestamp - '3 months'::interval
>  AND NOT inuse
> 
> In-use check is easy for the backend, if the syslog process publishes 
> the current logfile's timestamp in sharedmem.
> 
> We can use a GUC variable for the log_directory (not log_destination); 
> anyway, I'd like the filenames to be selected by the server.

This seems quite involved.  Can we get the basic functionality I
described first?  Also I am not sure how all this information is going
to be passed from the logging process to the backend requesting the
information, and it seems overly complicated.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] XLogWrite: write request 0/53A4000 is past end of log

2004-07-16 Thread Tom Lane

lists <[EMAIL PROTECTED]> writes:
> Alvaro Herrera wrote:
>>> this is 7.3.3 (PGDG rpm's) on red hat 7.2
>> 
>> Get 7.3.6 and try again.  IIRC this is a known bug that was repaired in
>> 7.3.4.

> to get the data back, is it a matter of just upgrading and starting the 
> db (ie will it perform recovery)?

Yes.  There's nothing wrong with your data, it's just a corner case that
was not correctly handled by the recovery code in 7.3.3.

You could actually use any version 7.3.4-7.3.6, but you might as well go
to 7.3.6 while you're at it.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone

2004-07-16 Thread Magnus Hagander

> >> I thought the issue under question was to find out what 
> the time zone 
> >> was.
> 
> > Nope, we already had that. The issue is that the names are not the 
> > same as the one used in zic/unix, so there is nothing to match on.
> 
> Right.  The problem we are actually faced with is to identify 
> which of the zic timezones is the best match for the system's 
> timezone setting.
> One of the issues is that it's not clear what "best" means...
> 
> At the moment I like Oliver Jowett's idea of defining "best" 
> as "the one that matches furthest back".

Sounds reasonable to me. As long as a clear warning is put in the log
whenever something is picked that is not a perfect match, so the admin
is directed at the potential problem and can fix it (by setting the GUC
timezone variable).

"Most apps" are probably going to deal with datetimes starting a couple
of years back and going into the future. In these cases it doesn't even
matter. Certainly not all, though.

//Magnus


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Tom Lane

"Zeugswetter Andreas SB SD" <[EMAIL PROTECTED]> writes:
> We only need to tell people to backup pg_control first. The rest was only 
> intended to enforce 
> 1. that pg_control is the first file backed up
> 2. the dba uses a large enough PIT (or xid) for restore

Right, but I think Bruce's point is that it is far too easy to get those
things wrong; especially point 2 for which a straight tar dump will
simply not contain the information you need to determine what is a safe
stopping point.

I agree with Bruce that we should have some mechanism that doesn't rely
on the DBA to get this right.  Exactly what the mechanism should be is
certainly open for discussion...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Tom Lane

Andreas Pflug <[EMAIL PROTECTED]> writes:
> In-use check is easy for the backend, if the syslog process publishes 
> the current logfile's timestamp in sharedmem.

You really haven't absorbed any of the objections I've raised, have you?
I don't want the log process connected to shared mem at *all*, and see
no particularly good reason why it should be.

> We can use a GUC variable for the log_directory (not log_destination); 
> anyway, I'd like the filenames to be selected by the server.

The directory should definitely be a GUC variable.  The individual
filenames should probably be of the form , where
the server dictates the format of the timestamp (and we choose it so
that the names sort correctly).  We could let the prefix be
user-selectable or make it hard-wired; I don't have a strong feeling
about that either way.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone

2004-07-16 Thread Tom Lane

"Magnus Hagander" <[EMAIL PROTECTED]> writes:
>> I thought the issue under question was to find out what the 
>> time zone was.

> Nope, we already had that. The issue is that the names are not the same
> as the one used in zic/unix, so there is nothing to match on.

Right.  The problem we are actually faced with is to identify which of
the zic timezones is the best match for the system's timezone setting.
One of the issues is that it's not clear what "best" means...

At the moment I like Oliver Jowett's idea of defining "best" as "the one
that matches furthest back".

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone

2004-07-16 Thread Magnus Hagander

> > Over-simplistic answers are not what we need here.  The 
> data structure 
> > you quote cannot even tell what DST transition dates Windows thinks 
> > are in effect this year, let alone what it thinks the dates were in 
> > past years.
> 
> Yes, there are other calls for that, obviously.  I sent to 
> Mr. Momjian a complete implementation of time zone stuff that 
> uses Windows calls.
> It's also accurate to a fraction of a nanosecond millions of 
> years into the past and the future.

You wouldn't happen to know if Windows internally has knowledge of the
many different DST rules in force at different years? See the other mail
about how it apparantly deals with Swedish DST - if that's common or a
single case?
If the OS doesn't have that knowledge, we can give up trying to get it
to tell us about it :-)


> The call that I showed returns the NAME OF THE TIME ZONE and 
> also what it is called when you are in Daylight savings time. 
>  I thought the issue under question was to find out what the 
> time zone was.

Nope, we already had that. The issue is that the names are not the same
as the one used in zic/unix, so there is nothing to match on.

//Magnus


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Zeugswetter Andreas SB SD


> > I'm aiming for the minimum feature set - which means we do need to take
> > care over whether that set is insufficient and also to pull any part
> > that doesn't stand up to close scrutiny over the next few days.
> 
> As you can see, we are still chewing on NT.  What PITR features are
> missing?  I assume because we can't stop file system writes during
> backup that we will need a backup parameter file like I described.  Is
> there anything else that PITR needs?

No, we don't need to stop writes ! Not even to split a mirror,
other db's need that to be able to restore, but we dont.
We only need to tell people to backup pg_control first. The rest was only 
intended to enforce 
1. that pg_control is the first file backed up
2. the dba uses a large enough PIT (or xid) for restore

I think the idea with an extra file with WAL start position was overly
complicated, since all you need is pg_control (+ WAL end position to enforce 2.).

If we don't want to tell people to backup pg_control first, imho the next 
best plan would be to add a "WAL start" input (e.g. xlog name) parameter 
to recovery.conf, that "fixes" pg_control.

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] analyze.c

2004-07-16 Thread Anja Klein

Hi guys,

during my study i have to du some practical work. i must extend postgresql with some 
sampling algorithms. therefore i implemented some trigger functions. it works fine.
but now i have to do some changes concerning the parser, particularly analyze.c . if a 
user sends an create table command, postgresql should create several sample tables, 
which will be filled wih the sample data from the original table. for the beginning, 
it would be enough, if two tables are created, the original one, with table name, 
columns etc. selectd by the user and second one, a copy with same columns etc, but 
another name e.g. sample.

my problem is, that the system catalogs are created at "initdb". then the following 
failure occurs:

loading pg_description... /usr/local/pqsql/initdb: line 837: 22348 Done (141)
   (cat

Re: [pgsql-hackers-win32] [HACKERS] Weird new time zone

2004-07-16 Thread Magnus Hagander

> >> It occurs to me that with a check this thorough, we might 
> be able to 
> >> finesse the problem on Windows with the system returning very 
> >> nonstandard timezone abbreviations.
> 
> > It does *not* pick up my timezone.
> 
> Drat.  I assume from your domain name that Europe/Stockholm 
> would actually be the best choice for you?  What Windows 
> timezone setting are you using for this test?

Yup, that would be the best. Either that or CET/CEST.

I'm using the timezone that in the GUI is called "(GMT +01:00)
Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna". srtftime with %z
calls it "W. Europe Daylight Time".


>  The following possibilities are rejected per:
> > DEBUG:  Reject TZ "Europe/Stockholm": at 16844400 
> 1970-07-15 00:00:00 
> > std versus 1970-07-15 01:00:00 dst
> 
> If you look in src/timezone/data/europe you will see that the 
> zic database thinks Sweden was on strict GMT+1 (no daylight 
> savings) between
> 1916 and 1980, and since 1980 they were on EU daylight-savings rules.
> Does that square with your ideas of reality?  (If it does not 
> then we should just punt the problem upstream to the zic 
> people, but I will assume here that their research is good.)

Actually, I was sure that was wrong, but started looking it up, and it
seems it's about right. See
http://www.sp.se/metrology/timefreq/eng/standard_time.htm for an
official take on the summertime rules. I *think* that is the EU rules.


> What I suspect given the above is that Windows has no clue 
> about historical reality and is retroactively applying the 
> current DST rules back to 1970, thus deciding that 1970-07-15 
> was on DST when it was really not.

That could definitly be. Since before 1970 they say there is no DST at
all (remember the old issues), it wouldn't surprise me a bit if they
took this kind of shortcut.


> I thought about restricting the scope of the TZ testing to 
> start in 1990 or so to avoid this, but that seems certain to 
> fall foul of the other problem, which is distinguishing 
> closely-related timezones (cf Chris K-L discovering that he 
> lives in Antarctica, a few days back...)
> 
> Maybe the whole match-on-behavior approach is wrong and we 
> need to do something else, but I'm really unsure what.  Ideas?

Well, Windows specific we could do a translate table. SInce there is a
finite (and not *huge*) number of timezones. But we'd have to figure out
for each one which to match. But that won't work for unix I think - the
lookup table would be just huge considwering all possible
combinatinos...

//Magnus

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] serverlog rotation/functions

2004-07-16 Thread Andreas Pflug

Bruce Momjian wrote:
Andreas Pflug wrote:
You do something that splits the value into directory name and file name
and removes every letter after %.
/var/log
postgresql.log.%-%-%_%%%
Another idea is to allow filename wildcards in the listing so it
becomes:
SELECT *
FROM dir_listing('/var/log/postgresql.log.*-*-*_***') AS dir
While that is nice, it doesn't match the functionality of opendir so we
are perhaps better with one that doesn't handle wildcards and we just do
the wildcard processing in the WHERE clause.
Uh, this looks ugly.
How about
pg_logfile_list() RETURNS setof timestamp -- to list available logfiles
pg_logfile_filename(timestamp) to return filename for that logfile

I don't see the need to return timestamps. If you select any empty
directory, you can just return the file names.  The only reason you
might need a pattern is to distinguish pg log files from other log
files.  If you want, create a server-side function that returns the file
name with the strftime() patterns converted to '*'.

and generic
pg_dir(wildcard_text)

Maybe pg_dir_ls().
OK, it would be nice if we could do a sed operation like:
sed 's/%./*/g'
but I don't know a way to do that without defining a function or pulling
in a procedural language, but if we could do it we could do:
pg_dir(echo log_destination | sed 's/%./*/g')
Ar ever used sed on win32?!? And how should the timestamp be 
represented in client tools? Date/time interpretation is always a source 
of problems, so *please* let the server do that.

Rethinking all this, I'd like the pg_logfile_list to return a complex type:
CREATE TYPE pg_logfile_list AS (
filedate timestamp,
filename text,
backendpid int,
inuse bool)
and
pg_logfile_list() RETURNS SETOF pg_logfile_list
which would enable
SELECT  filename,
pg_file_unlink(filename)
  FROM  pg_logfile_list()
 WHERE  filedate < current_timestamp - '3 months'::interval
AND NOT inuse
In-use check is easy for the backend, if the syslog process publishes 
the current logfile's timestamp in sharedmem.

We can use a GUC variable for the log_directory (not log_destination); 
anyway, I'd like the filenames to be selected by the server.

Regards,
Andreas
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs

On Fri, 2004-07-16 at 04:49, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Fri, 2004-07-16 at 00:01, Alvaro Herrera wrote:
> >> My manpage for signal(2) says that you shouldn't assign SIG_IGN to
> >> SIGCHLD, according to POSIX.
> 
> > So - I should be setting this to SIG_DFL and thats good for everyone?
> 
> Yeah, we learned the same lesson in the backend not too many releases
> back.  SIG_IGN'ing SIGCHLD is bad voodoo; it'll work on some platforms
> but not others.

Many thanks all, Best Regards Simon Riggs





---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Converting epoch to timestamp

2004-07-16 Thread Michael Glaesemann

On Jul 16, 2004, at 3:08 PM, Christopher Kings-Lynne wrote:
SELECT EXTRACT(TIMESTAMP FROM EPOCH '123412341234');
That's a really interesting idea! Makes for a much more consistent 
syntax for our other functions. ISTM this might require a native EPOCH 
datatype. I wouldn't want to encourage people to use EPOCH as a 
datatype though.

Michael Glaesemann
grzm myrealbox com
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

48 matches

Mail list logo