Re: [HACKERS] Fundamental error in "no WAL log" index/file

2005-08-04 Thread Qingqing Zhou

"Simon Riggs" <[EMAIL PROTECTED]> writes
>
> I have learnt that Tom means: read the code. :-)
>
> CREATE INDEX doesn't produce xlog records *except* when you use PITR, so
> PITR does work correctly.
>

wstate.btws_use_wal = XLogArchivingActive() && !wstate.index->rd_istemp;

Ah-oh, that's true ;-)

Regards,
Qingqing




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fundamental error in "no WAL log" index/file

2005-08-03 Thread Simon Riggs
On Thu, 2005-08-04 at 10:56 +0800, Qingqing Zhou wrote:
> "Tom Lane" <[EMAIL PROTECTED]> writes
> > > Since there is no xlog replay mechanism to CREATE INDEX (bottom-up
> method),
> > > so CREATE INDEX won't get replayed in PITR?
> >
> > On what do you base either part of that statement?

I have learnt that Tom means: read the code. :-)

CREATE INDEX doesn't produce xlog records *except* when you use PITR, so
PITR does work correctly.

Tom's earlier warning about CREATE DATABASE stands...

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation stuff

2005-08-03 Thread Qingqing Zhou

"Tom Lane" <[EMAIL PROTECTED]> writes
> > Since there is no xlog replay mechanism to CREATE INDEX (bottom-up
method),
> > so CREATE INDEX won't get replayed in PITR?
>
> On what do you base either part of that statement?
>

I see _bt_load() still relies on smgrimmedsync() to assure a correct disk
image of the btree after a crash.

As you found out, this won't work if we do a CREATE DATABASE + CREATE INDEX
and system crashes. If you forced a checkpoint, then CREATE DATABASE won't
get replayed after the crashed, so the btree disk image is still there, then
the problem is solved in this case.

However, if we take a physical copy of the database, then do CREATE DATABASE
+ CREATE INDEX again, and we want to rollforward, from my understanding of
current _bt_load() code, I don't see anywhere to get the btree disk image
again.

Regards,
Qingqing



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation stuff

2005-08-03 Thread Tom Lane
"Qingqing Zhou" <[EMAIL PROTECTED]> writes:
> Since there is no xlog replay mechanism to CREATE INDEX (bottom-up method),
> so CREATE INDEX won't get replayed in PITR?

On what do you base either part of that statement?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation stuff

2005-08-03 Thread Qingqing Zhou

"Tom Lane" <[EMAIL PROTECTED]> writes
> > So wouldn't this mean that any CREATE DATABASE won't work properly in
PITR?
>
> It works fine in a rollforward situation.


Since there is no xlog replay mechanism to CREATE INDEX (bottom-up method),
so CREATE INDEX won't get replayed in PITR?

This seems also true for SET TABLESPACE command.

Regards,
Qingqing



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation stuff

2005-06-26 Thread Jim C. Nasby
Would it be possible to rollback from WAL? My thought is this:

CREATE DATABASE is logged in WAL
If a replay of that CREATE DATABASE is performed, the template database
used is copied
Any modifications to the template database that occured after the CREATE
DATABASE are rolled back in the new database
WAL replay continues

Would this resolve the PITR issues with CREATE DATABASE?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation

2005-06-26 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Jan Wieck wrote:
>> The other annoyance this redesign would fix is the createdb failures 
>> because the template DB is in use.

> This one should be largely fixed by provision of another default 
> connection target, which is in the pipeline.

Yeah, but that's a workaround not a solution.  It's still one of the
issues I would like to see fixed by a still-hypothetical CREATE DATABASE
redesign.

There are many more-pressing problems of course, so I don't see anything
being done about CREATE DATABASE for a good while.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation stuff

2005-06-26 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> (b) force a checkpoint
>> immediately after any CREATE DATABASE, so that we never have to replay
>> one except in a PITR situation

> So wouldn't this mean that any CREATE DATABASE won't work properly in PITR?

It works fine in a rollforward situation.  However see the note I added
to backup.sgml:

 If a CREATE DATABASE command is executed while a base backup is
 being taken, and then the template database that the CREATE
 DATABASE copied is modified while the base backup is still in
 progress, it is possible that recovery will cause those
 modifications to be propagated into the created database as
 well. This is of course undesirable. To avoid this risk, it is best
 not to modify any template databases while taking a base backup.

I don't see anything much we can do about this except add the warning;
we cannot say which state of the template database will be picked up
by the filesystem backup.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation

2005-06-26 Thread Andrew Dunstan



Jan Wieck wrote:



One way of redesigning CREATE DATABASE would be to build the new 
database directory from scratch using bki files. Doing so would lose 
the current template mechanism, but that can easily be redesigned into 
a utility that creates a bki file set from any existing database. The 
other annoyance this redesign would fix is the createdb failures 
because the template DB is in use.



This one should be largely fixed by provision of another default 
connection target, which is in the pipeline.


cheers

andrew

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation stuff

2005-06-26 Thread Tom Lane
Jan Wieck <[EMAIL PROTECTED]> writes:
> One way of redesigning CREATE DATABASE would be to build the new 
> database directory from scratch using bki files. Doing so would lose the 
> current template mechanism, but that can easily be redesigned into a 
> utility that creates a bki file set from any existing database.

Not so easily as all that.  The .bki mechanism is pretty darn restricted
as to what it can generate, because of the limitations of bootstrap
mode.  (And when you are starting up without pg_proc or pg_type, it's
not trivial to remove those restrictions either.)

I thought for a bit about
1. Clone newdb from template0 (which we assume doesn't change)
2. pg_dump template1 | pg_restore newdb
but pg_dump isn't terribly fast, and in any case this still has issues
--- eg, if DBA has dropped the public schema in template1, it will fail
to propagate that fact.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation

2005-06-26 Thread Jan Wieck

On 6/25/2005 6:58 PM, Tom Lane wrote:

I wrote:

It seems our choices are (a) somehow fix things so CREATE DATABASE
replay doesn't have to zap the whole directory, (b) force a checkpoint
immediately after any CREATE DATABASE, so that we never have to replay
one except in a PITR situation, or (c) abandon non-WAL-logged index
and table builds.


Having overcome my initial dismay at missing such a fundamental problem,
I've thought harder about it and concluded that (b) is clearly the thing
to do.

I think it would take a wholesale redesign of the CREATE DATABASE
mechanism to do (a).  While there are a number of ugly things about
CREATE DATABASE that could perhaps be fixed with a redesign,
I don't currently have any good idea about how to do it right.
In any case this path wouldn't be an acceptable backpatch for the 8.0
branch.


One way of redesigning CREATE DATABASE would be to build the new 
database directory from scratch using bki files. Doing so would lose the 
current template mechanism, but that can easily be redesigned into a 
utility that creates a bki file set from any existing database. The 
other annoyance this redesign would fix is the createdb failures because 
the template DB is in use.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation stuff

2005-06-26 Thread Michael Paesold

Greg Stark wrote:


Tom Lane <[EMAIL PROTECTED]> writes:


> (b) force a checkpoint
> immediately after any CREATE DATABASE, so that we never have to replay
> one except in a PITR situation


So wouldn't this mean that any CREATE DATABASE won't work properly in 
PITR?


As I understand it: no. Because with PITR turned on there are no 
"non-WAL-logged index
and table builds". Therefore the indexes and tables are WAL-logged and will 
be recreated correctly.


I hope this is correct information :-)

Best Regards,
Michael Paesold 



---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation stuff

2005-06-25 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> > (b) force a checkpoint
> > immediately after any CREATE DATABASE, so that we never have to replay
> > one except in a PITR situation

So wouldn't this mean that any CREATE DATABASE won't work properly in PITR?

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fundamental error in "no WAL log" index/file creation stuff

2005-06-25 Thread Tom Lane
I wrote:
> It seems our choices are (a) somehow fix things so CREATE DATABASE
> replay doesn't have to zap the whole directory, (b) force a checkpoint
> immediately after any CREATE DATABASE, so that we never have to replay
> one except in a PITR situation, or (c) abandon non-WAL-logged index
> and table builds.

Having overcome my initial dismay at missing such a fundamental problem,
I've thought harder about it and concluded that (b) is clearly the thing
to do.

I think it would take a wholesale redesign of the CREATE DATABASE
mechanism to do (a).  While there are a number of ugly things about
CREATE DATABASE that could perhaps be fixed with a redesign,
I don't currently have any good idea about how to do it right.
In any case this path wouldn't be an acceptable backpatch for the 8.0
branch.

(c) is clearly not the direction we really wish to take, either.

Also, I realized that there is an entirely independent problem that (b)
will protect us against.  Consider this scenario:

create database acopy;
\c template1
create table bozo(f1 int);
-- now system crashes and replays WAL

After this sequence it is possible for "acopy" to contain the table
"bozo", because the replay of the CREATE DATABASE will copy whatever
is on disk in template1.  (You can actually duplicate this in 8.0
and HEAD, if you wait long enough for the bgwriter to write the
catalog changes, but not long enough for a checkpoint to happen.)

Accordingly, I've committed (b) into 8.0 and HEAD.  (Earlier branches
are not at risk because they didn't WAL-log CREATE DATABASE at all.)

This might be sufficient reason to release an 8.0.4 pretty soon.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Fundamental error in "no WAL log" index/file creation stuff

2005-06-25 Thread Tom Lane
I believe I have figured out the problem behind the recent reports we've
been seeing of "index is not a btree" errors.  Here's how to reproduce
it (in 8.0 or HEAD):

-- drop database test if present
checkpoint;
create database test;
\c test
create table t1 ( name varchar(20) primary key );
-- kill -9 either current session or bgwriter, to force XLOG replay
-- now reconnect to test
vacuum t1;
ERROR:  index "t1_pkey" is not a btree

On investigation, the filesystem shows t1_pkey exists but has size 0.

The reason for this is that the only entry in the XLOG concerning
t1_pkey is an "smgr create" record --- we didn't XLOG any of the
data inserted into the index, and particularly not the metapage.

Why is that a problem, if we fsynced the index?  Because *replay of
CREATE DATABASE drops and recreates the entire database directory*.
This is not trivial to avoid, because the only way to generate the
new database is to copy from another database, and it's very hard
to tell what to copy if we want it done selectively.

It seems our choices are (a) somehow fix things so CREATE DATABASE
replay doesn't have to zap the whole directory, (b) force a checkpoint
immediately after any CREATE DATABASE, so that we never have to replay
one except in a PITR situation, or (c) abandon non-WAL-logged index
and table builds.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq