[HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread ow
--- ow [EMAIL PROTECTED] wrote:
 IMHO, not only data need to loaded before FK constraints are created but also
 there has got to be a feature to allow creation of an FK constraint WITHOUT
 doing the verification that all loaded/existing records satisfy the FK
 constraint. The ability to create a FK constraint without verification of
 existing records should exist by itself (maybe only to superuser) and also as
 an option in pg_restore. 
 
 More details:
 http://archives.postgresql.org/pgsql-admin/2003-11/msg00308.php
 http://archives.postgresql.org/pgsql-admin/2003-11/msg00323.php


It appears there's not a lot of interest in discussing the possibility of FK
constraint creation WITHOUT the verification check. How then should one handle
the situation with pg_restore and large dbs where creation of FK constraint(s)
may take hours?

Thanks




__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] statistics about tamp tables ...

2003-11-26 Thread Hans-Jürgen Schönig
Recently I have come across a simple issue which made me think about it.
When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner 
won't know anything about its content after creating it.
Many people use temp tables heavy when the amount of data for a certain 
analysis has to be reduced significantly. Frequently the same tmp table 
is queried quite frequently. In order to speed those scenarios up it can 
be useful to vacuum those tmp tables so that the planner will find more 
clever joins.
Is it possible and does it make sense to generate those statistics on 
the fly (during CREATE TABLE AS)? Maybe we could have a GUC which tells 
the system whether to generate statistics or not.

test=# select * from test;
 id

  4
  4
(2 rows)
test=# VACUUM test ;
VACUUM
test=# explain select * from test ;
 QUERY PLAN

 Seq Scan on test  (cost=0.00..1.02 rows=2 width=4)
(1 row)


test=# select * into tmp from test;
SELECT
test=# explain select * from tmp;
  QUERY PLAN
---
 Seq Scan on tmp  (cost=0.00..20.00 rows=1000 width=4)
(1 row)
Best regards,

Hans
--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Andreas Pflug
ow wrote:

It appears there's not a lot of interest in discussing the possibility of FK
constraint creation WITHOUT the verification check. How then should one handle
the situation with pg_restore and large dbs where creation of FK constraint(s)
may take hours?
 

I'd prefer a backup/restore method that dumps physical data, so at 
restore time there's no need for recreation of FKs. But I didn't see any 
feedback on this proposal either.

Regards,
Andreas


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Hannu Krosing
Andreas Pflug kirjutas K, 26.11.2003 kell 12:09:
 ow wrote:
 
 
 It appears there's not a lot of interest in discussing the possibility of FK
 constraint creation WITHOUT the verification check. How then should one handle
 the situation with pg_restore and large dbs where creation of FK constraint(s)
 may take hours?
   
 
 
 I'd prefer a backup/restore method that dumps physical data, so at 
 restore time there's no need for recreation of FKs. But I didn't see any 
 feedback on this proposal either.

Was this proposal a separate one from using WAL logs for PITR ?

-
Hannu


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Andreas Pflug
Hannu Krosing wrote:

Andreas Pflug kirjutas K, 26.11.2003 kell 12:09:
 

ow wrote:

   

It appears there's not a lot of interest in discussing the possibility of FK
constraint creation WITHOUT the verification check. How then should one handle
the situation with pg_restore and large dbs where creation of FK constraint(s)
may take hours?
 

I'd prefer a backup/restore method that dumps physical data, so at 
restore time there's no need for recreation of FKs. But I didn't see any 
feedback on this proposal either.
   

Was this proposal a separate one from using WAL logs for PITR ?
 

Yes, I mentioned it just a few days when discussing dependency in pg_dump.
This is somewhat complementary to WAL and PITR. I'm seeking for a fast 
way to dump and restore a complete database, like physical file copy, 
without shutting down the backend. I was thinking of a BACKUP command 
that streams out the files including any indexes and non-vacuumed 
tuples. A database recreated from that wouldn't be as clean as a 
pg_dump/pg_restored database, but it would be up much faster, and there 
wouldn't be any dependency problem.
This doesn't really replace pg_dump/pg_restore, because it probably 
wouldn't be able to upgrade a cluster. Still, it would be helpful for 
disaster recovery.

Regards,
Andreas


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread ow

--- Andreas Pflug [EMAIL PROTECTED] wrote:
 Yes, I mentioned it just a few days when discussing dependency in pg_dump.
 This is somewhat complementary to WAL and PITR. I'm seeking for a fast 
 way to dump and restore a complete database, like physical file copy, 
 without shutting down the backend. I was thinking of a BACKUP command 
 that streams out the files including any indexes and non-vacuumed 
 tuples. A database recreated from that wouldn't be as clean as a 
 pg_dump/pg_restored database, but it would be up much faster, and there 
 wouldn't be any dependency problem.
 This doesn't really replace pg_dump/pg_restore, because it probably 
 wouldn't be able to upgrade a cluster. Still, it would be helpful for 
 disaster recovery.
 

I think creating a FK without verification check is still needed, especially in
case if:
1) original db is corrupted
2) during cluster upgrade
3) there's a need to BACKUP/RESTORE a *schema* instead of db.

Thanks




__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Andreas Pflug
ow wrote:

--- Andreas Pflug [EMAIL PROTECTED] wrote:
 

Yes, I mentioned it just a few days when discussing dependency in pg_dump.
This is somewhat complementary to WAL and PITR. I'm seeking for a fast 
way to dump and restore a complete database, like physical file copy, 
without shutting down the backend. I was thinking of a BACKUP command 
that streams out the files including any indexes and non-vacuumed 
tuples. A database recreated from that wouldn't be as clean as a 
pg_dump/pg_restored database, but it would be up much faster, and there 
wouldn't be any dependency problem.
This doesn't really replace pg_dump/pg_restore, because it probably 
wouldn't be able to upgrade a cluster. Still, it would be helpful for 
disaster recovery.

   

I think creating a FK without verification check is still needed, especially in
case if:
1) original db is corrupted
2) during cluster upgrade
 

Agreed. This might be useful for replication purposes too; in MSSQL, you 
can write CREATE TRIGGER ... NOT FOR REPLICATION. I'd like to see a 
transaction safe way (ENABLE/DISABLE TRIGGER command) for this.

3) there's a need to BACKUP/RESTORE a *schema* instead of db.

Thanks



__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] 7.4final regression failure on uw713

2003-11-26 Thread ohp
Thansk Tom,
I had vague memory about that :)
Better check he?
On Tue, 25 Nov 2003, Tom Lane wrote:

 Date: Tue, 25 Nov 2003 19:35:07 -0500
 From: Tom Lane [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: pgsql-hackers list [EMAIL PROTECTED]
 Subject: Re: [HACKERS] 7.4final regression failure on uw713

 [EMAIL PROTECTED] writes:
  Don't know if it's bad, but make check reports a regression failure on
  join.

 I believe we'd determined that this is an acceptable platform-specific
 behavior.

   regards, tom lane


-- 
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] protocol compatibility between 7.2 and 7.4

2003-11-26 Thread Daniel Kalchev
I know this is an attempt to save myself reading the mailing list, but still 
the issue remains:

the psql from version 7.4 does not talk to a 7.2.4 database.

The CHANGELOG indicates, that both server and libraries keep compatibility 
with versions after 6.3 - still there is no switch in psql to specify usage of 
the pre-7.3 protocol and the 7.2 server and 7.4 psql apparently do not 
negotiate.

Daniel


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] protocol compatibility between 7.2 and 7.4

2003-11-26 Thread Peter Eisentraut
Daniel Kalchev writes:

 The CHANGELOG indicates, that both server and libraries keep compatibility
 with versions after 6.3 - still there is no switch in psql to specify usage of
 the pre-7.3 protocol and the 7.2 server and 7.4 psql apparently do not
 negotiate.

It's automatic.  If you disagree, please post details.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Oli Sennhauser
Hello

I was asking about this too, one or two weeks ago.

It appears there's not a lot of interest in discussing the 
possibility of FK
constraint creation WITHOUT the verification check. How then should 
one handle
the situation with pg_restore and large dbs where creation of FK 
constraint(s)
may take hours? 
I'd prefer a backup/restore method that dumps physical data, so at 
restore time there's no need for recreation of FKs. But I didn't see 
any feedback on this proposal either.   
Was this proposal a separate one from using WAL logs for PITR ? 

My question then was:

Q2: New situation: Why is it not a good idea to backup the database
files of a cluster incl. all c_log and x_log (log files last) to get a
physicaly hot backup.
In principle it is the same situation like a server which is crashing
(not a once but during some time). After restoring, it should do a redo
and rollback automatically like after a crash. This methode (physical
hot backup) would increas backup and restore times dramatically.
The answer from Robert Treat was:

Essentially I think you're right, it should behave much like a crashing 
server.  The main reason why people don't recommend it is that (depending on 
your os setup) there is the potential to lose data that has been commited but 
not actually written to disk.  Note that you shouldn't get corrupted data 
from this, but in many cases losing data is just as bad so we don't recomend 
it.  If you really want to do this, you should really either shut down the 
database  or get LVM going.
I did not yet many tests. But in principle I was able to hot-backup a 
cluster or only one database and restore it. But the answer from Robert 
makes me a little afraid. It means for me he/they do not trust theire 
recovery mechanism. A definitive answer from Robert is still out.

In my opinion a high grade professional database system (like PostgreSQL 
is or want to be) should have some hot backup features. Otherwise you 
are NEVER able to handle VLDB's. They were discussing about a 32 TB 
PostgreSQL database. And I bet my next bonus this year :-), that they 
are also not able to backup and restore it in a reasonable time.

Regards Oli

---

Oli Sennhauser
Database-Engineer (Oracle  PostgreSQL)
Rebenweg 6
CH - 8610 Uster / Switzerland
Phone (+41) 1 940 24 82 or Mobile (+41) 79 450 49 14
e-Mail [EMAIL PROTECTED]
Website http://mypage.bluewin.ch/shinguz/PostgreSQL/
Secure (signed/encrypted) e-Mail with a Free Personal SwissSign ID: http://www.swisssign.ch

Import the SwissSign Root Certificate: http://swisssign.net/cgi-bin/trust/import



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Zeugswetter Andreas SB SD
  Q2: New situation: Why is it not a good idea to backup the database
  files of a cluster incl. all c_log and x_log (log files last) to get a
  physicaly hot backup.
  In principle it is the same situation like a server which is crashing
  (not a once but during some time). After restoring, it should do a redo
  and rollback automatically like after a crash. This methode (physical
  hot backup) would increas backup and restore times dramatically.
 
 The answer from Robert Treat was:
 
  Essentially I think you're right, it should behave much like a crashing 
  server.  The main reason why people don't recommend it is that (depending on 
  your os setup) there is the potential to lose data that has been commited but 
  not actually written to disk.  Note that you shouldn't get corrupted data 
  from this, but in many cases losing data is just as bad so we don't recomend 
  it.  If you really want to do this, you should really either shut down the 
  database  or get LVM going.

The key issue here is to have a pg_control file to start from with a 
finished checkpoint from before you start to backup. Then you need to 
ensure that you have all logfiles from checkpoint until backup finishes.
The last thing to backup must be the last active x_log.
It would prbbly be a good idea to not have a vacuum running concurrently :-)

And then you need to do a lot of tests, since nobody else does it yet.
I think this is an issue, since it is such high risk, nobody will 
step up easily and say that it is safe.

Andreas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Tom Lane
Andreas Pflug [EMAIL PROTECTED] writes:
 This is somewhat complementary to WAL and PITR. I'm seeking for a fast 
 way to dump and restore a complete database, like physical file copy, 
 without shutting down the backend. I was thinking of a BACKUP command 
 that streams out the files including any indexes and non-vacuumed 
 tuples. A database recreated from that wouldn't be as clean as a 
 pg_dump/pg_restored database, but it would be up much faster, and there 
 wouldn't be any dependency problem.

It's already intended to support this as part of the PITR work.  The
idea is you force a checkpoint and then make a tar-format dump of the
database tree (tar or whatever floats your boat, but anyway a
filesystem-level backup).  The database need not be stopped while you do
this, and you don't need a filesystem that can do snapshots or anything
fancy like that.  The tar dump itself most likely does not represent a
consistent state of the database by the time you are done making it.
That is okay, because you have also been archiving off to tape (or
someplace) all the WAL data generated since that pre-dump checkpoint.
You can continue archiving the WAL series for however far forward from
the original dump you feel like.  If you need to recover, you reload the
database from the tar dump and then replay the WAL series against it.
This is indistinguishable from a crash recovery situation --- the
inconsistent tar dump looks just like a disk that has received some
but not all of the updates since the last checkpoint.  Replay will fix it.

The cool thing about this is you can actually bring the DB to the state
it was in at any specific point in time covered by your WAL archive ---
just run the WAL replay as far as you want, then stop.  Solves the
junior DBA deleted all my data Saturday morning problem, thus PITR.
Now the uncool thing is you need massive amounts of secondary storage to
archive all that WAL data, if your installation has heavy update
activity.  But it seems to me it would address the need you mention
above --- you'd just not bother to continue archiving WAL past the end
of the dump operation.

In principle you could do this today, but we don't have enough
support code in place to make it work smoothly, eg WAL segment files
aren't labeled with enough identifying information to let you manage
an archive full of 'em.  Still it doesn't seem that far away.

 This doesn't really replace pg_dump/pg_restore, because it probably 
 wouldn't be able to upgrade a cluster.

Right, any such physical dump would be limited to restoring a whole
cluster as-is: no imports into other clusters, no selectivity, no fancy
games.  The main reason is you'd have to dump and restore pg_clog along
with the data files.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] statistics about tamp tables ...

2003-11-26 Thread Tom Lane
=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= [EMAIL PROTECTED] writes:
 Recently I have come across a simple issue which made me think about it.
 When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner 
 won't know anything about its content after creating it.

Run ANALYZE on the temp table, if you intend to use it enough to justify
gathering stats about it.  VACUUM is more work than needed.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Providing anonymous mmap as an option of sharing memory

2003-11-26 Thread Shridhar Daithankar
Tom Lane wrote:
Shridhar Daithankar [EMAIL PROTECTED] writes:

I was looking thr. the source and thought it would be worth to seek
opinion on this proposal.
This has been discussed and rejected before.  See the archives.
I went thr. this for details.

http://developer.postgresql.org/cvsweb.cgi/pgsql-server/doc/TODO.detail/mmap

There seem to be two objections to mmap.

1. If a backend from last crashed running postmaster exists then it might have 
file etc. open and that is in general not such a good idea

2. For replacing stdio for data and WAL files with mmap, mmap does not guarantee 
order of IO which defeats WAL.

I covered only first point in my post. IMO it is not such a unsolvable problem. 
If a postmaster crashes hard but leaves a backend running, would it clean pid 
file etc? I don't think so. So if a postmaster can start on a 'pid-clean' state, 
then it is guaranteed to be no childs left around.

There were issues where linux not supporting MAP_SHARE and MAP_ANONYMOUS 
simaltenously but they are quite old messages, from 1998, talking of linux 
2.0.x. I don't think it is still true anymore but need to check.

Too bad, freeBSD M_NOSYNC is not a standard otherwise even for point 2, it could 
have been considered.

Did I miss something?

 Shridhar

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread ow

--- Tom Lane [EMAIL PROTECTED] wrote:
  This doesn't really replace pg_dump/pg_restore, because it probably 
  wouldn't be able to upgrade a cluster.
 
 Right, any such physical dump would be limited to restoring a whole
 cluster as-is: no imports into other clusters, no selectivity, no fancy
 games.  The main reason is you'd have to dump and restore pg_clog along
 with the data files.
 

But that would not help people who would HAVE to use pg_dump/pg_restore (e.g.
to backup/restore a single schema), would it? Depending on the db size, etc.,
creation of FK constraint(s) may take many hours. How should this be handled
then?

Thanks




__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] NetBSD Sparc OK

2003-11-26 Thread Peter Eisentraut
FYI, I've successfully tested on NetBSD Sparc and added it to the
supported list for 7.4.1.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Sailesh Krishnamurthy
 Neil == Neil Conway [EMAIL PROTECTED] writes:

Neil It occurred to me that these kinds of poor planning
Neil decisions could easily be detected by PostgreSQL itself:
Neil after we've finished executing a plan, we can trivially
Neil compare the # of results produced by each node in the query
Neil tree with the # of results the planner expected that node to
Neil produce (look at EXPLAIN ANALYZE, for example). If the

Indeed. This is the approach being followed by the LeO project
(Learning Optimizer) at IBM Almaden. 

http://www.almaden.ibm.com/software/dm/SMART/leo.shtml

There is a vldb paper that describes it .. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 It occurred to me that these kinds of poor planning decisions could
 easily be detected by PostgreSQL itself: after we've finished
 executing a plan, we can trivially compare the # of results produced
 by each node in the query tree with the # of results the planner
 expected that node to produce (look at EXPLAIN ANALYZE, for
 example). If the estimate is off by a significant margin (say, 300%),
 we could perhaps emit a HINT suggesting that the user re-run ANALYZE

I think such a thing would have such a low signal-to-noise ratio as to
be useless :-(.  As you note, there are many places where the planner's
estimate is routinely off by more than 3x (or any other threshold you
might pick instead).  In some situations that doesn't really matter,
as the same plan would have gotten picked anyway.  Also, since 7.2 came
out what we've seen more and more is cases where the row count estimate
is acceptably good, but the wrong plan was picked anyway because of
deficiencies in the cost equations.

The question you really want to know about is not whether the row count
estimate is close, it's whether another plan could have done better.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-26 Thread Zeugswetter Andreas SB SD

  There are no such libraries.  I keep hearing ICU, but that is much too
  bloated.
 
 At least it is kind of standard and also something what will be
 maintained for foreseeable future, it also has a compatible license and
 is available on all platforms of interest to postgresql.

And it is used for DB/2 and Informix, so it must be quite feature complete
for DB relevant stuff.

Andreas

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Providing anonymous mmap as an option of sharing memory

2003-11-26 Thread Tom Lane
Shridhar Daithankar [EMAIL PROTECTED] writes:
 I covered only first point in my post. IMO it is not such a unsolvable
 problem.  If a postmaster crashes hard but leaves a backend running,
 would it clean pid file etc? I don't think so. So if a postmaster can
 start on a 'pid-clean' state, then it is guaranteed to be no childs
 left around.

And that helps how?  The problem is to detect whether there are any
children left from the old postmaster, when what you have to work from
is the pid-file it left behind.

In any case, you're still handwaving away the very real portability
issues around mmap.  Linux is not the universe, and Linux+BSD isn't
either.

We might still have considered it, despite the negatives, if anyone had
been able to point to any actual *advantages* of mmap.  There are none.
Yes, the SysV shmem API is old and ugly and crufty, but it does what we
need it to do.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Greg Stark

Neil Conway [EMAIL PROTECTED] writes:

 It occurred to me that these kinds of poor planning decisions could easily
 be detected by PostgreSQL itself: after we've finished executing a plan, we
 can trivially compare the # of results produced by each node in the query
 tree with the # of results the planner expected that node to produce

There's a dual to this as well. If the results were very close but the actual
time taken to run the node doesn't match the cost calculated then some
optimizer parameter needs to be adjusted. Either one of the cost_* parameters
or random_page_cost, or effective_cache_size or...

I'm not sure it's as obvious what to put in the HINT though. Ideally these
results would have to be gathered and pumped through linear optimization
algorithms which is a lot more work.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Tom Lane
ow [EMAIL PROTECTED] writes:
 --- Tom Lane [EMAIL PROTECTED] wrote:
 Right, any such physical dump would be limited to restoring a whole
 cluster as-is: no imports into other clusters, no selectivity, no fancy
 games.

 But that would not help people who would HAVE to use pg_dump/pg_restore (e.g.
 to backup/restore a single schema), would it? Depending on the db size, etc.,
 creation of FK constraint(s) may take many hours. How should this be handled
 then?

Quite honestly, I think they should check their foreign keys.  In a
partial restore situation there is no guarantee that the referenced
table and the referencing table are being restored at the same time from
the same dump.  An override in that situation looks like a great tool
for shooting yourself in the foot.

People might be more interested in debating this topic with you if we
hadn't discussed it at length just a couple months back.  There wasn't
consensus then that we had to offer an escape hatch, and you've not
offered any argument that wasn't made before.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 7.4final regression failure on uw713

2003-11-26 Thread Larry Rosenman
On Tue, 25 Nov 2003, Tom Lane wrote:

 [EMAIL PROTECTED] writes:
  Don't know if it's bad, but make check reports a regression failure on
  join.

 I believe we'd determined that this is an acceptable platform-specific
 behavior.
Yep.  Due to the new qsort() in 713UP3.

LER


   regards, tom lane

 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster



-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Neil Conway
Tom Lane [EMAIL PROTECTED] writes:
 I think such a thing would have such a low signal-to-noise ratio as
 to be useless :-(.  As you note, there are many places where the
 planner's estimate is routinely off by more than 3x (or any other
 threshold you might pick instead).

I wonder, perhaps we could add a certainty parameter to the
estimated query plan + result sizes + costs produced by the
planner. That way, when we run into a planner deficiency we can
basically mark the relevant portion of the query tree as a WAG, and
not bother with emitting hints for it.

 In some situations that doesn't really matter, as the same plan
 would have gotten picked anyway.

The hint is NOT the chosen plan was non-optimal; the hint is the
query planner did not produce an accurate row count estimate for this
node in the query tree. The chosen query plan may or may not be
optimal -- we're merely pointing out that we chose the plan we did on
shakey grounds. The hint might just as well indicate a problem with
another query that happens to apply a similar predicate to the column
in question.

 The question you really want to know about is not whether the row
 count estimate is close, it's whether another plan could have done
 better.

Perhaps, but is there a reasonable way to answer the second question?

-Neil


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Materialized views proposal

2003-11-26 Thread Jonathan Gardner
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I apologize if this post is inappropriate.

Doing some development work, me and my co-worker discussed some 
optimizations strategies. One of the ideas that came up was materialized 
views. Trading disk space to summarize queries, and paying for a trigger 
waterfall on certain kinds of updates seems like a small price to pay right 
now in our situation.

I searched through the archives and I couldn't seem to find anything 
relevant.

As I've no familiarity with the internals, but I do know the front-end 
pretty well, what can I do to help implement materialized views? Is such an 
idea feasible? Is there some reason why it just isn't a good idea? (And if 
not, what is a better idea?)

If someone is willing to guide me, I can probably contribute a fair amount 
of C code. I do have experience with C.

The bird's eye view of the project would probably be turning a statement (is 
there such a statement in SQL92?) in the creation of a table, the initial 
population of the table, and the creation of several triggers strategically 
placed so that the data in the materialized view table is always in sync. 
Direct inserts or updates on the materialized view should be illegal, 
except by the triggers. However, perhaps like views work today, we can 
allow rules to be added to the table.

Certain restrictions on the materialized views should be enforced: No 
mutables, in particular.

- -- 
Jonathan Gardner
[EMAIL PROTECTED]
Live Free, Use Linux!
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/xNzcWgwF3QvpWNwRAkbiAJ4m1zRPe+y3Tha0649QXH30y9eITwCfTjsv
ow9Nwnnwrc6x9QaAB1AfHWQ=
=Ofb5
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread ow

--- Tom Lane [EMAIL PROTECTED] wrote:
 Quite honestly, I think they should check their foreign keys.

Generally speaking, I agree. The problem is that verification of FK
constraint(s) may take too long, depending on the size of the db and other
conditions. In my case, on test data, it takes about 1 hour to create tables
and copy the data, then about 40 min to create indexes, then 4.5 hours to
create one (1) FK constraint. In production, we'll have 10-100x more data than
we have for testing.
If we have a problem in production, the time necessary to restore the db is
simply going to kill us.

 People might be more interested in debating this topic with you if we
 hadn't discussed it at length just a couple months back.  There wasn't
 consensus then that we had to offer an escape hatch, and you've not
 offered any argument that wasn't made before.

I'm simply presenting a problem for which I currently do not see any solution
(it's very important for us to be able to restore db within a reasonable amount
of time). If there's no solution and none is planned, then we cannot use pgsql,
can we?

Thanks






__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] building outside source tree

2003-11-26 Thread Neil Conway
Building PostgreSQL outside the source tree is slightly broken:

(Using the current CVS HEAD code)

$ mkdir ~/test-pg-build
$ cd ~/test-pg-build
$ ../pgsql/configure
[ output omitted]
$ make
[ output omitted; make succeeds ]
$ touch ../pgsql/configure # i.e. cvs up
$ make

The final make command invokes 'config.status --recheck', which
invokes '../pgsql/configure --no-create --no-recursion'. The re-run of
configure succeeds, but the rest of the make fails abruptly:

make: *** No rule to make target `src/Makefile.global.in', needed by
`src/Makefile.global'.  Stop.

When 'configure' has been updated, we use config.status to re-run
configure with the arguments that were previously specified. The
problem is caused by the '--no-create' flag passed to configure by
config.status: in configure.in we don't set the 'vpath_build' variable
if '--no-create' has been specified. 

(Therefore, when src/Makefile.global is recreated after configure is
re-run, the 'vpath_build' variable isn't set, and a vpath build
(i.e. a build outside the source tree) obviously fails.)

Any comments on how we should fix this?

-Neil


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Materialized views proposal

2003-11-26 Thread Hannu Krosing
Jonathan Gardner kirjutas K, 26.11.2003 kell 19:03:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 I apologize if this post is inappropriate.
 
 Doing some development work, me and my co-worker discussed some 
 optimizations strategies. One of the ideas that came up was materialized 
 views. Trading disk space to summarize queries, and paying for a trigger 
 waterfall on certain kinds of updates seems like a small price to pay right 
 now in our situation.
 
 I searched through the archives and I couldn't seem to find anything 
 relevant.
 
 As I've no familiarity with the internals, but I do know the front-end 
 pretty well, what can I do to help implement materialized views? 

First, You could start by implementing materialized views manually,
using tables and triggers, to get the feel of what should be generated.

Next, still working from frontend, try to make some code (probably in
some RAD/scripring language) which does the transform from CREATE VIEW
(or whatever) syntax (or manually constructed syntax tree) into create
table + create trigger statements.

If you have the above working well, talk about moving this to backend.

 Is such an 
 idea feasible? Is there some reason why it just isn't a good idea? (And if 
 not, what is a better idea?)
 
 If someone is willing to guide me, I can probably contribute a fair amount 
 of C code. I do have experience with C.

What is needed is good algorithms. Writing C code is secondary to that.

Similar problem has kept us from implementing updatable views for quite
some time.


Hannu


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Greg Stark

ow [EMAIL PROTECTED] writes:

 --- Tom Lane [EMAIL PROTECTED] wrote:
  Quite honestly, I think they should check their foreign keys.

What should I do if I *know* there will be a FK failure but I want to correct
it manually. Perhaps by creating all the necessary target records, perhaps by
deleting or updating the dead references. Perhaps with a mix of these.

As it stands I have to delete the FK constraint, load the table, and fix the
data. Then recreate the FK constraint -- with the danger that I'll get the
definition wrong -- and wait for the constraint to be verified.

If I could disable and reenable the constraint the danger that I would get the
definition wrong would be eliminated. And if I had already done the work to
ensure there were no broken relationships I would optionally be able to skip
the redundant automatic check. I could even have done the verification myself
while the data wasn't live for example.

The database is a tool. It's annoying to have a tool that tries to outsmart
the user.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Andreas Pflug
Tom Lane wrote:

Andreas Pflug [EMAIL PROTECTED] writes:
 

This is somewhat complementary to WAL and PITR. I'm seeking for a fast 
way to dump and restore a complete database, like physical file copy, 
without shutting down the backend. I was thinking of a BACKUP command 
that streams out the files including any indexes and non-vacuumed 
tuples. A database recreated from that wouldn't be as clean as a 
pg_dump/pg_restored database, but it would be up much faster, and there 
wouldn't be any dependency problem.
   

It's already intended to support this as part of the PITR work.  The
idea is you force a checkpoint and then make a tar-format dump of the
database tree (tar or whatever floats your boat, but anyway a
filesystem-level backup).  The database need not be stopped while you do
this, and you don't need a filesystem that can do snapshots or anything
fancy like that.  The tar dump itself most likely does not represent a
consistent state of the database by the time you are done making it.
That is okay, because you have also been archiving off to tape (or
someplace) all the WAL data generated since that pre-dump checkpoint.
You can continue archiving the WAL series for however far forward from
the original dump you feel like.  If you need to recover, you reload the
database from the tar dump and then replay the WAL series against it.
This is indistinguishable from a crash recovery situation --- the
inconsistent tar dump looks just like a disk that has received some
but not all of the updates since the last checkpoint.  Replay will fix it.
The cool thing about this is you can actually bring the DB to the state
it was in at any specific point in time covered by your WAL archive ---
just run the WAL replay as far as you want, then stop.  Solves the
junior DBA deleted all my data Saturday morning problem, thus PITR.
Now the uncool thing is you need massive amounts of secondary storage to
archive all that WAL data, 

Shouldn't be a problem, since there are few databases out there 
worldwide exceeding today's average disk capacity...

if your installation has heavy update
activity.  But it seems to me it would address the need you mention
above --- you'd just not bother to continue archiving WAL past the end
of the dump operation.
 

PITR is cool, no question, it's more than I've been requesting. When the 
database server burns, I'd be quite happy if I could restore to my 
latest tape's point in time, since the WAL log disk probably isn't 
functional too. So having a fast backup of the snapshot when the backup 
CHECKPOINT was issued would be enough, no WAL replay needed.

In principle you could do this today, but we don't have enough
support code in place to make it work smoothly, eg WAL segment files
aren't labeled with enough identifying information to let you manage
an archive full of 'em.  Still it doesn't seem that far away.
 

So I issue CHECKPOINT, and tar the cluster or database. Still, I got two 
questions:
- how to restore a single database
- while tar is running, CHECKPOINT(n+1) might be recorded in some files, 
while others have CHECKPOINT(n). How does the backend know to rollback 
to CHECKPOINT(n)?

Regards,
Andreas


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Alvaro Herrera
On Wed, Nov 26, 2003 at 11:59:33AM -0500, Neil Conway wrote:

  In some situations that doesn't really matter, as the same plan
  would have gotten picked anyway.
 
 The hint is NOT the chosen plan was non-optimal; the hint is the
 query planner did not produce an accurate row count estimate for this
 node in the query tree.

Maybe it could only be done for SeqScan and IndexScan nodes, which are
probably the most common source of bad estimates related to poor
statistics.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
At least to kernel hackers, who really are human, despite occasional
rumors to the contrary (LWN.net)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] Optimizer cost calculation

2003-11-26 Thread Greg Stark


It seems to me that the root cause of some of the optimizer failures that come
is the optimizer's attempt to assign a single expected cost value to every
choice. In fact it seems it should have also a minimum cost and maximum
cost in addition to the expected cost. Often the optimizer is faced with
two possible choices, one of which appears slightly better but in fact has a
much larger failure mode than the alternative.

For example, consider a simple join between two tables where the optimizer
thinks approximately 10% of the second table will be used. It will probably be
just at the threshold of using a full table scan with a merge or hash join
instead of the simple nested loop.

In fact the nested loop has only a small linear penalty (2-4 times slower even
if the *entire* table is read) if it's mistakenly chosen. Whereas if the
selectivity is estimated wrong and only a few records are needed the full
table scan can be thousands of times slower than the nested loop.

If the nested loop calculated the min/max/expected costs based on 1 row, the
full table, and the expected number of records and found that while the
expected value is slightly higher than the equivalent for the merge join, the
max is 2x higher than the merge join but the minimum is thousands of times
smaller, then it should consider choosing the nested loop because of the
greater risk of choosing the merge join.


-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Andreas Pflug
Greg Stark wrote:

If I could disable and reenable the constraint the danger that I would get the
definition wrong would be eliminated. And if I had already done the work to
ensure there were no broken relationships I would optionally be able to skip
the redundant automatic check. I could even have done the verification myself
while the data wasn't live for example.
 

Since FKs are implemented as trigger, you could disable all triggers on 
the table right now, no? Could be a bit more comfortable, I agree, and 
hope for an upcoming DISABLE TRIGGER command.

While talking about this: I could add ENABLE/DISABLE TRIGGER 
functionality to pgadmin3. Unfortunately, on pg_trigger.tgenabled 
there's still the comment not presently checked everywhere it should 
be, so disabling a trigger by setting this to false does not work 
reliably. I wonder if this is still true for 7.4. I can't imagine that 
this should be so hard to fix.

Regards,
Andreas


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Tom Lane
Andreas Pflug [EMAIL PROTECTED] writes:
 In principle you could do this today, but we don't have enough
 support code in place to make it work smoothly, eg WAL segment files
 aren't labeled with enough identifying information to let you manage
 an archive full of 'em.  Still it doesn't seem that far away.

 So I issue CHECKPOINT, and tar the cluster or database. Still, I got two 
 questions:
 - how to restore a single database

You don't.  As I said, any physical backup is going to be
all-or-nothing.  These techniques are not a replacement for pg_dump.

 - while tar is running, CHECKPOINT(n+1) might be recorded in some files, 
 while others have CHECKPOINT(n). How does the backend know to rollback 
 to CHECKPOINT(n)?

That's part of the management code that we need to write before this
will really be very useful; you need to be able to associate the
starting time of a tar dump with the most recent previous CHECKPOINT
in the WAL logs.  Not hard in principle, just takes some code we ain't
got yet.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Providing anonymous mmap as an option of sharing memory

2003-11-26 Thread Bruce Momjian
Tom Lane wrote:
 Shridhar Daithankar [EMAIL PROTECTED] writes:
  I covered only first point in my post. IMO it is not such a unsolvable
  problem.  If a postmaster crashes hard but leaves a backend running,
  would it clean pid file etc? I don't think so. So if a postmaster can
  start on a 'pid-clean' state, then it is guaranteed to be no childs
  left around.
 
 And that helps how?  The problem is to detect whether there are any
 children left from the old postmaster, when what you have to work from
 is the pid-file it left behind.
 
 In any case, you're still handwaving away the very real portability
 issues around mmap.  Linux is not the universe, and Linux+BSD isn't
 either.
 
 We might still have considered it, despite the negatives, if anyone had
 been able to point to any actual *advantages* of mmap.  There are none.
 Yes, the SysV shmem API is old and ugly and crufty, but it does what we
 need it to do.

Plus many operating systems can lock SvssV shmem into RAM to prevent it
from being swapped out.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Andreas Pflug
Tom Lane wrote:

- how to restore a single database
   

You don't.  As I said, any physical backup is going to be
all-or-nothing.  These techniques are not a replacement for pg_dump.
 

That's sad. I've been backing up and restoring single databases from a 
cluster frequently, so I'd really like the database to be selectable.

That's part of the management code that we need to write before this
will really be very useful; you need to be able to associate the
starting time of a tar dump with the most recent previous CHECKPOINT
in the WAL logs.  Not hard in principle, just takes some code we ain't
got yet.
 

So I lay back patiently (more or less :-)

Regards,
Andreas


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Stephan Szabo

On Wed, 26 Nov 2003, ow wrote:

  People might be more interested in debating this topic with you if we
  hadn't discussed it at length just a couple months back.  There wasn't
  consensus then that we had to offer an escape hatch, and you've not
  offered any argument that wasn't made before.

 I'm simply presenting a problem for which I currently do not see any solution
 (it's very important for us to be able to restore db within a reasonable amount
 of time). If there's no solution and none is planned, then we cannot use pgsql,
 can we?

You can make your own solution, that's the nice thing about open source
stuff.  If you wanted to go the SET variable route to control alter time
checks of CHECK and FOREIGN KEY constraints, it's almost certainly less
than an hours worth of work.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] ALTER COLUMN/logical column position

2003-11-26 Thread Andreas Pflug
Hannu Krosing wrote:



You are just shifting the interface problems to a place needing way more
changes in the backend. There will be some problems either way.
 

Not quite. Certainly, basing internal storage on attstoragenum is more 
work in the backend, but less (precisely: zero) work on an unknown 
number of frontend tools and apps.

also, tools needing knowledge should start using information schema as
much as they can, making internal reshufflings less of a problem.
 

We had this discussion. information_schema doesn't deliver enough info 
needed for admin tools.

 

This way, the user interface doesn't change, and all those 
SELECT ... FROM pg_attribute ORDER BY attnum continue delivering the 
expected result.
   

Depending on what you expect ;)
 

Usually, nobody should care about the column ordering, but for those 
unfortunate guys that rely on a specific SELECT * ordering the list of 
columns displayed in admin tools must show that ordering; this is what 
current admin tools expect from attnum. No SQL user would ever care 
about internal storage details/pointers/counters, so any admin tool 
would need to ORDER BY CASE WHEN version=7.5 THEN attpos ELSE attnum 
END (and the unique key to pg_attribute, as seen from the tool, changes 
from refoid/attnum to refoid/attindex too).

If you expect the above to give you all active columns as orderd as they
are stored, then it does not give you what you expect.
Btw, most of these concerns (and more) were already iterated when DROP
column was done causing gaps in attnum. There were a lot of doomsday
profecies, but in the end it went quite smoothly.
I don't bother about missing attnum values, even 1,2,3,5,6,8 is nicely 
ordered.

The tools needing
internal knowledge about storage (meaning any tool doing select .. from
pg_...) have always needed some upgrades for new verions.
 

Yes, but changes to pg_... should retain the usual meanings as much as 
possible, so older tools continue to work. The discussed change is 
problematic because old tools *seem* to work ok, but their attnum 
interpretation would be wrong.

IMHO, The only behaviour visible to common user we should worry about is
SELECT * , and a special column for solving this is _the_ easiest way to
do it.
 

Surely this is the easiest way. But it has the biggest impact on clients 
too. I'm just imagining what would happen to pgAdmin3. The column number 
would have to display attpos (this is what the user is interested in to 
see the ordering), while index, FK and so forth will continue to display 
attnum. This seems quite unwanted to me.

---
Are there any comments on the proposed lean way to alter columns for 
trivial type changes?

Regards,
Andreas
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 4 Clause license?

2003-11-26 Thread Rod Taylor
On Thu, 2003-11-20 at 08:34, Rod Taylor wrote:
  I think maybe the simplest thing is for me to prepare a patch that rips 
  that code out and replaces it with a (slightly simpler - less umask 
  hacking required, I think) piece of code that I will write.
 
 The FreeBSD folks sorted it out for us.
 
 Everyones names should be in the copyright for the file. The licence
 portion should be the 3 clause version -- no advertising clause.
 
 I think borrowing should be encouraged, and now that we know what
 license / copyright we need to carry over, this can be done without
 worry.

Ignore that.. I hadn't read Terry's statement yet -- it got snagged by
the spam filter for some reason.

Since we have other code which has the same issue, I think this needs to
be solved. Can anyone show what Apple did?

-- 
Rod Taylor rbt [at] rbt [dot] ca

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-26 Thread Kurt Roeckx
On Tue, Nov 25, 2003 at 04:19:05PM -0500, Tom Lane wrote:
 
 UCS-2 is impractical without some *extremely* wide-ranging changes in
 the backend.  To take just the most obvious point, doesn't it require
 allowing embedded zero bytes in text strings?

If you're going to use unicode in the rest of the backend, you'll
have to be able to deal with them anyway.  You can't use normal C
string functions.


Kurt


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Stephan Szabo

On Wed, 26 Nov 2003, Tom Lane wrote:

 Quite honestly, I think they should check their foreign keys.  In a
 partial restore situation there is no guarantee that the referenced
 table and the referencing table are being restored at the same time from
 the same dump.  An override in that situation looks like a great tool
 for shooting yourself in the foot.

 People might be more interested in debating this topic with you if we
 hadn't discussed it at length just a couple months back.  There wasn't
 consensus then that we had to offer an escape hatch, and you've not
 offered any argument that wasn't made before.

I actually thought the majority in the past discussion thought that an
escape hatch was a good idea, but that the discussion broke down in trying
to determine what sort of hatch that might be (iirc, it got off into the
general discussion of disabling constraints for normal operation as
opposed to at alter time).

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Neil Conway
Greg Stark [EMAIL PROTECTED] writes:
 There's a dual to this as well. If the results were very close but
 the actual time taken to run the node doesn't match the cost
 calculated then some optimizer parameter needs to be adjusted.

I was thinking about this, but I couldn't think of how to get it to
work properly:

 (1) The optimizer's cost metric is somewhat bogus to begin with.
 ISTM that translating a cost of X into an expected runtime of
 Y msecs is definitely not trivial to do.

 (2) The size of a node's result relation does not depend upon
 anything outside of PostgreSQL, whereas the exact time it
 takes to produce that result relation depends on a wide
 collection of external factors. For example, if the system is
 under heavy load, queries will take longer than normal to
 run. Or, if the query invokes a function that happens to
 occasionally block waiting on some resource, the execution
 time of the query could be wildly unpredictable.

 (3) ISTM we couldn't produce a really helpful hint message, even
 if we somehow resolved #1 and #2

-Neil


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Robert Treat
On Wed, 2003-11-26 at 12:43, Andreas Pflug wrote:
 Greg Stark wrote:
 
 If I could disable and reenable the constraint the danger that I would get the
 definition wrong would be eliminated. And if I had already done the work to
 ensure there were no broken relationships I would optionally be able to skip
 the redundant automatic check. I could even have done the verification myself
 while the data wasn't live for example.
   
 
 
 Since FKs are implemented as trigger, you could disable all triggers on 
 the table right now, no? Could be a bit more comfortable, I agree, and 
 hope for an upcoming DISABLE TRIGGER command.

ISTM I've done this before... from a pg_dump -Fc backup first do a
pg_dump -s restore (schema only) and then a pg_dump -a
--disable-triggers to load the data without check foreign keys. 

Theres certainly potential for trouble with this method I suppose but it
seems like it accomplish what the original poster requires...


Robert Treat

-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Greg Stark
Neil Conway [EMAIL PROTECTED] writes:

 I was thinking about this, but I couldn't think of how to get it to
 work properly:
 
  (1) The optimizer's cost metric is somewhat bogus to begin with.
  ISTM that translating a cost of X into an expected runtime of
  Y msecs is definitely not trivial to do.

At least for all the possible plans of a given query at a specific point in
time the intention is that the cost be proportional to the execution time. 

 the exact time it takes to produce that result relation depends on a wide
 collection of external factors.

That's a valid point. The ms/cost factor may not be constant over time.
However I think in the normal case this number will tend towards a fairly
consistent value across queries and over time. It will be influenced somewhat
by things like cache contention with other applications though.

On further thought the real problem is that these numbers are only available
when running with explain on. As shown recently on one of the lists, the
cost of the repeated gettimeofday calls can be substantial. It's not really
feasible to suggest running all queries with that profiling.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Neil Conway
Greg Stark [EMAIL PROTECTED] writes:
 At least for all the possible plans of a given query at a specific
 point in time the intention is that the cost be proportional to the
 execution time.

Why is this relevant? 

Given a cost X at a given point in time, the system needs to derive an
expected runtime Y, and compare Y with the actual runtime. I think
that producing Y given an arbitrary X involves so many parameters as
to be practically impossible for us to compute with any degree of
accuracy.

 That's a valid point. The ms/cost factor may not be constant over
 time.  However I think in the normal case this number will tend
 towards a fairly consistent value across queries and over time.

It might be true in the normal case, but that doesn't seem very
helpful to me: in general, the mapping of plan costs to execution time
can vary wildly over time. Spewing hints to the log whenever the
system's workload changes, a checkpoint occurs, or the system's RAID
array hiccups doesn't sound like a useful feature to me.

 On further thought the real problem is that these numbers are only
 available when running with explain on. As shown recently on one
 of the lists, the cost of the repeated gettimeofday calls can be
 substantial.

That sounds more like an implementation detail than the real problem
to me -- I think this proposed feature has more fundamental issues.

-Neil


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Tom Lane
ow [EMAIL PROTECTED] writes:
 --- Tom Lane [EMAIL PROTECTED] wrote:
 Quite honestly, I think they should check their foreign keys.

 Generally speaking, I agree. The problem is that verification of FK
 constraint(s) may take too long, depending on the size of the db and other
 conditions. In my case, on test data, it takes about 1 hour to create tables
 and copy the data, then about 40 min to create indexes, then 4.5 hours to
 create one (1) FK constraint.

If you're seeing this on 7.4, I'd like to see the details of the exact
commands being issued.  If it's not 7.4, it's not a relevant
measurement.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Limiting factors of pg_largeobject

2003-11-26 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 pg_largeobject uses a loid identifier for the loid. What do we think it
 would take to move that identifier to something like bigint?

Breaking all the client-visible LO APIs, for one thing ...

 I don't really
 know the underlying internals of pg_largeboject but it does seem that
 if we made it have:

1. A larger identifier
2. An identifier that is not typed to the underlying system (oid)
3. The ability to be indexed

We may benefit. Am I on crack?

I don't see what you're getting at with #2 and #3 at all.  OID is
perfectly indexable.

As for #1, it'd theoretically be useful, but I'm not sure about the
real-world usefulness.  If your large objects are even moderately
large (for whatever value of large applies this week), you're not
likely to be expecting to cram 4 billion of them into your database.

If we were doing LOs over for some reason it might be interesting to
consider this, but I think they're largely a legacy feature at this
point, and not worth that kind of effort.  It would be better to put the
development effort on creating serial access capability to toasted
fields, whereupon LOs would really be obsolete.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 That's a valid point. The ms/cost factor may not be constant over time.
 However I think in the normal case this number will tend towards a fairly
 consistent value across queries and over time. It will be influenced somewhat
 by things like cache contention with other applications though.

I think it would be interesting to collect the numbers over a long
period of time and try to learn something from the averages.  The real
hole in Neil's original suggestion was that it assumed that comparisons
based on just a single query would be meaningful enough to pester the
user about.

 On further thought the real problem is that these numbers are only available
 when running with explain on. As shown recently on one of the lists, the
 cost of the repeated gettimeofday calls can be substantial. It's not really
 feasible to suggest running all queries with that profiling.

Yeah.  You could imagine a simplified-stats mode that only collects the
total runtime (two gettimeofday's per query is nothing) and the row
counts (shouldn't be impossibly expensive either, especially if we
merged the needed fields into PlanState instead of requiring a
separately allocated node).  Not sure if that's as useful though.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Limiting factors of pg_largeobject

2003-11-26 Thread Joshua D. Drake

Breaking all the client-visible LO APIs, for one thing ...

 

Erck.

  1. A larger identifier
  2. An identifier that is not typed to the underlying system (oid)
  3. The ability to be indexed
   

 

  We may benefit. Am I on crack?
   

I don't see what you're getting at with #2 and #3 at all.  OID is
perfectly indexable.
 

Well number 2 is that we have a limit on total number of OID's yes? 
Which means
we could theorectically run out of OID's because of pg_largeobject.

The ability to be indexed is obviously there but one problem we have is
that you can't create an index on a system table at least not a user
level index. Is there system level indexes that I am unaware of?
As for #1, it'd theoretically be useful, but I'm not sure about the
real-world usefulness.  If your large objects are even moderately
large (for whatever value of large applies this week), you're not
likely to be expecting to cram 4 billion of them into your database.
If we were doing LOs over for some reason it might be interesting to
consider this, but I think they're largely a legacy feature at this
point, and not worth that kind of effort.  It would be better to put the
development effort on creating serial access capability to toasted
fields, whereupon LOs would really be obsolete.
			regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org
 

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Limiting factors of pg_largeobject

2003-11-26 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 The ability to be indexed is obviously there but one problem we have is
 that you can't create an index on a system table at least not a user
 level index. Is there system level indexes that I am unaware of?

pg_largeobject already has an index (which is used by all the LO
operations).  Again, don't see what the width of the object ID column
has to do with it.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Limiting factors of pg_largeobject

2003-11-26 Thread Joshua D. Drake

pg_largeobject already has an index (which is used by all the LO
operations).  Again, don't see what the width of the object ID column
has to do with it.
 

I was more after the not having an OID than the width of the ID column.


			regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster
 

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Stephan Szabo

On Wed, 26 Nov 2003, Tom Lane wrote:

 ow [EMAIL PROTECTED] writes:
  --- Tom Lane [EMAIL PROTECTED] wrote:
  Quite honestly, I think they should check their foreign keys.

  Generally speaking, I agree. The problem is that verification of FK
  constraint(s) may take too long, depending on the size of the db and other
  conditions. In my case, on test data, it takes about 1 hour to create tables
  and copy the data, then about 40 min to create indexes, then 4.5 hours to
  create one (1) FK constraint.

 If you're seeing this on 7.4, I'd like to see the details of the exact
 commands being issued.  If it's not 7.4, it's not a relevant

IIRC, he was. I think the thing causing the difference between his times
and the ones we saw typically when doing the tests was that he didn't have
an index on the fktable's referencing column.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Andreas Pflug
Stephan Szabo wrote:



IIRC, he was. I think the thing causing the difference between his times
and the ones we saw typically when doing the tests was that he didn't have
an index on the fktable's referencing column.
 

A common mistake, can't count how often I created this one... And not 
easy to find, because EXPLAIN won't explain triggers.
I'm planning to create some kind of fk index wizard in pgAdmin3, which 
finds out about fks using columns that aren't covered by an appropriate 
index. Maybe this check could be performed (as a NOTICE) when the fk is 
created?

Regards,
Andreas


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Gavin Sherry
  On further thought the real problem is that these numbers are only available
  when running with explain on. As shown recently on one of the lists, the
  cost of the repeated gettimeofday calls can be substantial. It's not really
  feasible to suggest running all queries with that profiling.

 Yeah.  You could imagine a simplified-stats mode that only collects the
 total runtime (two gettimeofday's per query is nothing) and the row
 counts (shouldn't be impossibly expensive either, especially if we
 merged the needed fields into PlanState instead of requiring a
 separately allocated node).  Not sure if that's as useful though.

How about a PGC_POSTMASTER GUC variable which tells postgres to collect
details on the planner's performance and comparison to actual run times.
Optionally, we could also have the executor run some/all of the possible
plans (presumably only useful for SELECTs) and keep details on the
performance of each. At postmaster shutdown (some other time?) a report
could be produced profiling all queries.

The reason I suggest this is it would have zero impact on production
databases but would allow DBAs to profile their databases with real usage
patterns in development environments.

Gavin

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Limiting factors of pg_largeobject

2003-11-26 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 I was more after the not having an OID than the width of the ID column.

We're still at cross-purposes then.  pg_largeobject doesn't have OIDs
(in the sense of per-row OIDs).  What I thought you were complaining
about was the chosen datatype of the LO identifier column (loid), which
happens to be OID.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Tom Lane
Stephan Szabo [EMAIL PROTECTED] writes:
 On Wed, 26 Nov 2003, Tom Lane wrote:
 If you're seeing this on 7.4, I'd like to see the details of the exact
 commands being issued.  If it's not 7.4, it's not a relevant

 IIRC, he was. I think the thing causing the difference between his times
 and the ones we saw typically when doing the tests was that he didn't have
 an index on the fktable's referencing column.

I'm not convinced it should matter all that much, for the single-query
test method that 7.4 uses.  That's why I wanted to see details.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread ow

--- Andreas Pflug [EMAIL PROTECTED] wrote:
 Stephan Szabo wrote:
 
 
 
 IIRC, he was. I think the thing causing the difference between his times
 and the ones we saw typically when doing the tests was that he didn't have
 an index on the fktable's referencing column.
   
 
 
 A common mistake, can't count how often I created this one...

Wrong. It's a mistake to think that you always HAVE to have an index on FK
column. See the links below for more details:

http://archives.postgresql.org/pgsql-admin/2003-11/msg00317.php
http://archives.postgresql.org/pgsql-admin/2003-11/msg00319.php




__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Limiting factors of pg_largeobject

2003-11-26 Thread Joshua D. Drake

We're still at cross-purposes then.  pg_largeobject doesn't have OIDs
(in the sense of per-row OIDs).  What I thought you were complaining
about was the chosen datatype of the LO identifier column (loid), which
happens to be OID.
 

O.k. that was my main concern, which per your statement is unfounded ;).
If I don't have to worry about it, then that solves the issue :)
			regards, tom lane
 

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Christopher Kings-Lynne
A common mistake, can't count how often I created this one... And not 
easy to find, because EXPLAIN won't explain triggers.
I'm planning to create some kind of fk index wizard in pgAdmin3, which 
finds out about fks using columns that aren't covered by an appropriate 
index. Maybe this check could be performed (as a NOTICE) when the fk is 
created?
Weird - I'm planning the exact same thing for phpPgAdmin!

Great minds think alike :P

Chris



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] 7.5 Plans

2003-11-26 Thread Christopher Kings-Lynne
Hi everyone,

I'm just interested in what everyone's personal plans for 7.5 
development are?

Shridar, Gavin and myself are trying to get the tablespaces stuff off 
the ground.  Hopefully we'll have a CVS set up for us to work in at some 
point (we didn't think getting a branch and commit privs was likely). 
Changing all our $Id$ tags to $Postgres$ would make a separate CVS a lot 
easier, hint hint :)

What's everyone else wanting to work on?

Chris



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Kevin Brown
Tom Lane wrote:
 Andreas Pflug [EMAIL PROTECTED] writes:
  In principle you could do this today, but we don't have enough
  support code in place to make it work smoothly, eg WAL segment files
  aren't labeled with enough identifying information to let you manage
  an archive full of 'em.  Still it doesn't seem that far away.
 
  So I issue CHECKPOINT, and tar the cluster or database. Still, I got two 
  questions:
  - how to restore a single database
 
 You don't.  As I said, any physical backup is going to be
 all-or-nothing.  These techniques are not a replacement for pg_dump.

But this is just an artifact of the fact that the WAL is a single
instance-wide entity, rather than a per-database entity.  But since
databases are completely separate entities that cannot be simultaneously
accessed by any query (corrections welcome), there isn't any reason in
principle that the WAL files cannot also be created on a per-database
basis.

I'm sure, of course, that doing so would bring with it a new set of
problems and tradeoffs, so it might not be worth it...


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Bruce Momjian
Gavin Sherry wrote:
   On further thought the real problem is that these numbers are only available
   when running with explain on. As shown recently on one of the lists, the
   cost of the repeated gettimeofday calls can be substantial. It's not really
   feasible to suggest running all queries with that profiling.
 
  Yeah.  You could imagine a simplified-stats mode that only collects the
  total runtime (two gettimeofday's per query is nothing) and the row
  counts (shouldn't be impossibly expensive either, especially if we
  merged the needed fields into PlanState instead of requiring a
  separately allocated node).  Not sure if that's as useful though.
 
 How about a PGC_POSTMASTER GUC variable which tells postgres to collect
 details on the planner's performance and comparison to actual run times.
 Optionally, we could also have the executor run some/all of the possible
 plans (presumably only useful for SELECTs) and keep details on the
 performance of each. At postmaster shutdown (some other time?) a report
 could be produced profiling all queries.
 
 The reason I suggest this is it would have zero impact on production
 databases but would allow DBAs to profile their databases with real usage
 patterns in development environments.

Great idea.  Under ideal situations, it shouldn't be needed, but things
are often less than idea.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Alvaro Herrera
On Thu, Nov 27, 2003 at 12:40:28AM +0100, Andreas Pflug wrote:

 A common mistake, can't count how often I created this one... And not 
 easy to find, because EXPLAIN won't explain triggers.

That's a pity.  And the lack of EXPLAINing function execution, too.
Maybe it's not that hard to do?

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Just treat us the way you want to be treated + some extra allowance
for ignorance.(Michael Brusser)

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] 7.5 Plans

2003-11-26 Thread Alvaro Herrera
On Thu, Nov 27, 2003 at 10:27:22AM +0800, Christopher Kings-Lynne wrote:

 What's everyone else wanting to work on?

I want to get the nested transaction patch thingie sorted out.
I feel it's not that far away.  After that, maybe

- try using a pg_shareddepend shared catalog to check user dependencies

- get VACUUM FULL to REINDEX instead of munging individual index tuples

- try to come out with something to enable aggregates optimization by
  using an aggregate-specific function to check the plan tree

Anyway, if I get nested xacts in 7.5 I'll be more than happy ...


(I will probably be doing lots of translation work too, or maybe enable
someone else to do it ...)

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Dios hizo a Adán, pero fue Eva quien lo hizo hombre.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 You don't.  As I said, any physical backup is going to be
 all-or-nothing.  These techniques are not a replacement for pg_dump.

 But this is just an artifact of the fact that the WAL is a single
 instance-wide entity, rather than a per-database entity.  But since
 databases are completely separate entities that cannot be simultaneously
 accessed by any query (corrections welcome), there isn't any reason in
 principle that the WAL files cannot also be created on a per-database
 basis.

WAL is not the bottleneck ... as I already mentioned today, pg_clog (and
more specifically the meaning of transaction IDs) is what really makes a
cluster an indivisible whole at the physical level.

If you want to do separate physical dumps/restores, the answer is to set
up separate clusters (separate postmasters).  Not so hard, is it?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 That's a pity.  And the lack of EXPLAINing function execution, too.
 Maybe it's not that hard to do?

Not sure if it's hard or not, but it'd sure be a nice thing to have.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread Kevin Brown
Tom Lane wrote:
 Kevin Brown [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  You don't.  As I said, any physical backup is going to be
  all-or-nothing.  These techniques are not a replacement for pg_dump.
 
  But this is just an artifact of the fact that the WAL is a single
  instance-wide entity, rather than a per-database entity.  But since
  databases are completely separate entities that cannot be simultaneously
  accessed by any query (corrections welcome), there isn't any reason in
  principle that the WAL files cannot also be created on a per-database
  basis.
 
 WAL is not the bottleneck ... as I already mentioned today, pg_clog (and
 more specifically the meaning of transaction IDs) is what really makes a
 cluster an indivisible whole at the physical level.
 
 If you want to do separate physical dumps/restores, the answer is to set
 up separate clusters (separate postmasters).  Not so hard, is it?

Well, aside from the fact that separate clusters have completely separate
user databases, listen on different ports, will compete with other
clusters on the same system for resources that would be better managed
by a single cluster, and generally have to be maintained as completely
separate entities from start to finish, no it's not that hard.  ;-)


The ability to restore a single large database quickly is, I think,
a reasonable request, it's just that right now it's difficult (perhaps
impossible) to satisfy that request.


It's probably something that we'll have to deal with if we want PG to be
useful to people managing really large databases on really, really big
iron, though.


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 7.5 Plans

2003-11-26 Thread Marc G. Fournier
On Thu, 27 Nov 2003, Christopher Kings-Lynne wrote:

 Changing all our $Id$ tags to $Postgres$ would make a separate CVS a lot
 easier, hint hint :)

Why?  From the info pages:

`$Header$'
 A standard header containing the full pathname of the RCS file,
 the revision number, the date (UTC), the author, the state, and
 the locker (if locked).  Files will normally never be locked when
 you use CVS.

`$Id$'
 Same as `$Header$', except that the RCS filename is without a path.

So, what does changing it to $PostgreSQL$ do?  Or am I reading the wrong
part of the manual?


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 7.5 Plans

2003-11-26 Thread Rod Taylor
 So, what does changing it to $PostgreSQL$ do?  Or am I reading the wrong
 part of the manual?

After applying the patch in -patches to CVSROOT and running the update
script...

It will allow Chris and other to import the PostgreSQL source into their
own CVS tree without having do to a ton of diff editing to reimport /
export changes between the primary tree and their development tree.

Forking the main PostgreSQL tree 30 times for the various groups to do
development isn't appropriate nor necessary when we can just change the
ID that PostgreSQL uses to something different.

The BSDs wen't through similar measures to ensure they could maintain
multiple CVS sources without diff / patch going nuts.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] 7.5 Plans

2003-11-26 Thread Marc G. Fournier
On Wed, 26 Nov 2003, Rod Taylor wrote:

  So, what does changing it to $PostgreSQL$ do?  Or am I reading the wrong
  part of the manual?

 After applying the patch in -patches to CVSROOT and running the update
 script...

 It will allow Chris and other to import the PostgreSQL source into their
 own CVS tree without having do to a ton of diff editing to reimport /
 export changes between the primary tree and their development tree.

'k, but why can't that be accomplished with $Id$?

Are there any caveats to the change?  Ie. if Tom has a checkout and this
change is made, will he have problems committing?


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 7.5 Plans

2003-11-26 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 So, what does changing it to $PostgreSQL$ do?  Or am I reading the wrong
 part of the manual?

 The BSDs wen't through similar measures to ensure they could maintain
 multiple CVS sources without diff / patch going nuts.

Yeah, I have gotten similar requests from Red Hat's internal development
group.  $PostgreSQL$ doesn't do anything for *us* ... but it makes life
easier for other people trying to import PG sources into their own CVS
trees.  In light of the Grand Scheme for World Domination, that seems
like a worthwhile improvement to me ;-)

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 7.5 Plans

2003-11-26 Thread Tom Lane
Marc G. Fournier [EMAIL PROTECTED] writes:
 'k, but why can't that be accomplished with $Id$?

$Id$ isn't much better than $Header$ --- the point is to avoid keywords
that downstream people's CVS repositories will want to replace.

 Are there any caveats to the change?  Ie. if Tom has a checkout and this
 change is made, will he have problems committing?

Making the change will mean a big cvs update pass for everyone, but
no worse than what happens every time Bruce hits the copyright dates,
for example.  It's a one-line change in every file AFAICS.

I'd recommend giving committers a day or two's notice of the edit, same
as Bruce typically does for pgindent runs, but there's no reason to
consider it any worse than that.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 7.5 Plans

2003-11-26 Thread Rod Taylor
On Wed, 2003-11-26 at 23:32, Marc G. Fournier wrote:
 On Wed, 26 Nov 2003, Rod Taylor wrote:
 
   So, what does changing it to $PostgreSQL$ do?  Or am I reading the wrong
   part of the manual?
 
  After applying the patch in -patches to CVSROOT and running the update
  script...
 
  It will allow Chris and other to import the PostgreSQL source into their
  own CVS tree without having do to a ton of diff editing to reimport /
  export changes between the primary tree and their development tree.
 
 'k, but why can't that be accomplished with $Id$?

When you import the files into the other CVS system the version and file
information $Id$ represents will be replaced by the other system. So,
when you diff the original (primary repository) EVERY file will show
$Id$ has changed.

 Are there any caveats to the change?  Ie. if Tom has a checkout and this
 change is made, will he have problems committing?

$VARIABLE$ is used for informational purposes only AFAIK. There would be
no change that I know of.

At work I use $InQuent$, FreeBSD of course uses $FreeBSD$ of which other
BSDs follow suite $NetBSD$ and $OpenBSD$. I've not heard of any issues
with 3rd party CVS clients and any of those repositories in this regard.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] $Id$ - $PostgreSQL$ Change

2003-11-26 Thread Marc G. Fournier

Based on discussions on -hackers, and baring any objections betwen now and
then, I'm going to go through all files in CVS and change:

$Id$ - $PostgreSQL$

I will do this the evening of Friday, November 29th ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] pg_restore and create FK without verification check

2003-11-26 Thread ow
--- Alvaro Herrera [EMAIL PROTECTED] wrote:
 On Thu, Nov 27, 2003 at 12:40:28AM +0100, Andreas Pflug wrote:
 
  A common mistake, can't count how often I created this one... And not 
  easy to find, because EXPLAIN won't explain triggers.
 
 That's a pity.  And the lack of EXPLAINing function execution, too.
 Maybe it's not that hard to do?
 

I'd like to emphasize again that NOT having an index on the FK column is a
perfectly valid approach, despite some opinions to the contrary. In fact,
requiring an index  on FK column(s) when it is not required by the
application's logic IS a mistake since it slows down
inserts/deletes/updates/vacume/reindex/etc on the respective table and wastes
disk space (could be considerable amount on large tables).

Also, FK column index DOES NOT, in general, solve performance issues with FK
verification check. Someone may (and, I'm sure, will) simply have more data or
more constraints.

The only solution here appears to be the --disable-triggers option as it was
suggested by Robert Treat. If it works then I'm fine, somehow I did not see
that option in the beginning.

Thanks






__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 7.5 Plans

2003-11-26 Thread Shridhar Daithankar
Christopher Kings-Lynne wrote:

Hi everyone,

I'm just interested in what everyone's personal plans for 7.5 
development are?

Shridar, Gavin and myself are trying to get the tablespaces stuff off 
the ground.  Hopefully we'll have a CVS set up for us to work in at some 
point (we didn't think getting a branch and commit privs was likely). 
Changing all our $Id$ tags to $Postgres$ would make a separate CVS a lot 
easier, hint hint :)

What's everyone else wanting to work on?
By the time tablespaces CVS goes live, I plan to study postgresql buffer 
management and push mmap. (See other thread). Hopefully my home internet 
connection will be up soon so that I can work on weekends.

 Shridhar

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 7.5 Plans

2003-11-26 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 On Wed, 2003-11-26 at 23:32, Marc G. Fournier wrote:
 'k, but why can't that be accomplished with $Id$?

 When you import the files into the other CVS system the version and file
 information $Id$ represents will be replaced by the other system. So,
 when you diff the original (primary repository) EVERY file will show
 $Id$ has changed.

Right.  I can write a long sob story about the pain this has caused
within Red Hat ... likely other people can say the same ... but the
executive summary is:

1. You can't easily generate a clean diff of your local version against
the original imported from postgresql.org.  The changes you actually
made get buried in a mass of useless $Foo$ diff lines.  Stripping those
out is possible in theory but painful.

2. You can't easily see which postgresql.org original version your local
copy was actually derived from, because guess what, your local CVS
server is going out of its way to hide the evidence.

These effects not only cause pain for the downstream users, but
interfere with them contributing their changes back upstream (by making
it many orders of magnitude harder to generate a clean diff to send us).
So we should fix it.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Providing anonymous mmap as an option of sharing memory

2003-11-26 Thread Shridhar Daithankar
Tom Lane wrote:

Shridhar Daithankar [EMAIL PROTECTED] writes:

I covered only first point in my post. IMO it is not such a unsolvable
problem.  If a postmaster crashes hard but leaves a backend running,
would it clean pid file etc? I don't think so. So if a postmaster can
start on a 'pid-clean' state, then it is guaranteed to be no childs
left around.


And that helps how?  The problem is to detect whether there are any
children left from the old postmaster, when what you have to work from
is the pid-file it left behind.
fine. We need shared memory for that. How about using 1 8K page just for 
detecting that? We don't need to base shared memory model on that, right?

May be we can put clog in shared memory segment which would serve as process 
counter and move shared buffers to mmap?

In any case, you're still handwaving away the very real portability
issues around mmap.  Linux is not the universe, and Linux+BSD isn't
either.
From the machines I can access here, following have anon and shared mmap..

[ost] ~ uname -a
SunOS host 5.8 Generic_108528-21 sun4u sparc SUNW,Sun-Fire-880 Solaris
[host] ~ uname -a
AIX host 1 5 0001A5CA4C00
[/home/user]uname -a
HP-UX host B.11.00 A 9000/785 2005950738 two-user license
Is it enough of support?

We might still have considered it, despite the negatives, if anyone had
been able to point to any actual *advantages* of mmap.  There are none.
Yes, the SysV shmem API is old and ugly and crufty, but it does what we
need it to do.


1) Per database buffers

Postgresql does not perform well with large number of buffers. Say an 
installation is configured for 100K buffers and have 5 databases. Now what would 
happen if each of these databases get their own 100K buffers?

mmap can not expand shared memory without a server restart. The current 
implementation of shared memory behaves the same way.

Rather than moving it to use shared memory as and when required, we could push 
per database buffers to improve scalability.

I think of this.

1. Introduce parameter columns in pg_database, for shared memory size (to start 
with) and number of live connections to that database. May be a callback to 
daemon postmaster to reread configuration if possible. (In shared memory, may be?)

2. Provide start and stop server commands which essentially either let a 
connection happen or not.

Now somebody modifies the buffer parameters for a database(Say via alter 
database), all it has to do is disconnect and reconnect. If this is a first 
connection to the database, the parent postmaster should reread the per database 
parameters and force them. Same can happen with start/stop commands.

2) No more kernel mucking required.

Recent linux installations are provide sane enough default of SHMMAX but I am 
sure plenty of folks would be glad to see that dependency go.

I also want to talk about mmap for file IO but not in this thread.

 Shridhar

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] 7.5 Plans

2003-11-26 Thread Shridhar Daithankar
Tom Lane wrote:
1. You can't easily generate a clean diff of your local version against
the original imported from postgresql.org.  The changes you actually
made get buried in a mass of useless $Foo$ diff lines.  Stripping those
out is possible in theory but painful.
Is that the reason linux does not use CVS? I thought so at least.

 Shridhar

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings