Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL

2007-02-19 Thread Dawid Kuroczko

On 2/17/07, Tom Lane [EMAIL PROTECTED] wrote:

Hannu Krosing [EMAIL PROTECTED] writes:
 How easy/hard would it be to create unique indexes on tinterval (unique
 here meaning non-overlapping) ?

Overlapping is not an equality relation (it fails the transitive law),
so I'm not entirely sure what unique means in this context ... but I
can promise you you can't make it work with btree.


Hmm, let's assume two time intervals:

A (with a0 as start and a1 as end times)
B (woth b0 as start and b1 as end times)

Now, we'd define operators as:

A is left of B when a0  b0 AND a1  b0
A is right of B when a0  b1 AND a1  b1

A is equal to B if (a0 = b0 AND a0 = b1) OR (a1 = b0 AND a1 =
b1) OR (a0  b0 AND a1  b1)
Actually equal doesn't mean equal here, rather it says overlaps.

Now, assuming UNIQUE INDEX on such table, the order would be preserved
since no two intervals can overlap.  And no overlapping data could be inserted
without breaking ovelapivity. And of course non-unique index would
produce garbage (since left of/right of wouldn't make any sense anymore).

Interestingly, such non-overlapping datatypes could also make sense for
network addresses (with netmasks).

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] Howto change db cluster locale on-the-fly

2007-02-19 Thread Jakub Ouhrabka

Hi,

we've made mistake and initdb database cluster in wrong locale :-(

Now it's full of data. I've read in the docs that it's not possible to 
change locale.


But I guess something like this would work:

a)
1) drop all indexes on text/varchar columns
2) change cluster locale
3) create all indexes on text/varchar columns

or even

b)
1) change cluster locale
2) reindex all indexes on text/varchar columns [I'm aware that before 
reindex queries on top of these indexes would return wrong answers]


Is it possible/safe to do a) or b)? How to do step change cluster 
locale? Where is this information stored?


Or the only way is to rebuild the database cluster from scratch?

Thanks,

Kuba


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pg_restore fails with a custom backup file

2007-02-19 Thread Magnus Hagander
On Sat, Feb 17, 2007 at 08:40:54PM +0100, Magnus Hagander wrote:
 
 IIRC, there was a warning from pg_dump. I don't recall exactly what, and
 don't have the space to re-run the test on my laptop here, but I think
 it was from:
 write_msg(modulename, WARNING: ftell mismatch with expected position --
 ftell used\n);

Ok, I've confirmed that the output is this:
D:\prog\pgsql\inst\binpg_dump -Fc -Z0 test  out
pg_dump: [custom archiver] WARNING: ftell mismatch with expected position -- 
ftell used
pg_dump: [custom archiver] WARNING: ftell mismatch with expected position -- 
ftell used
pg_dump: [custom archiver] WARNING: ftell mismatch with expected position -- 
ftell used


Three warnings for that one dump - my guess would be one for each table
past the 2gb limit.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] pg_proc without oid?

2007-02-19 Thread Magnus Hagander
I notice that this patch: 
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/pg_proc.h.diff?r1=1.443r2=1.444

inserts a bunch of XML related rows in pg_proc without specifying oid.
This breaks the fmgrtab generator on msvc. Most likely because I didn't
think of that case. But since all other rows in pg_proc.h contain the
oid, I just wanted to check if they're actually supposed to be withuot
oid, or if that was a mistake?

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] anyelement2 pseudotype

2007-02-19 Thread Tom Dunstan

Tom Lane wrote:

I realized that I can probably fix ATAddForeignKeyConstraint to do the
right thing by having it pass the two actual column types to
can_coerce_type, thus allowing check_generic_type_consistency to kick
in and detect the problem.


Yeah, I came to the same conclusion. No amount of refactoring in 
parse_coerce.c is going to get the original concrete types back to 
compare. That should fix the problem with arrays, enums and any 
potential future generic types without mentioning them explicitly in 
there a la the hack there currently, thankfully.


Cheers

Tom


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Howto change db cluster locale on-the-fly

2007-02-19 Thread Martijn van Oosterhout
On Mon, Feb 19, 2007 at 09:27:06AM +0100, Jakub Ouhrabka wrote:
 But I guess something like this would work:
 
 a)
 1) drop all indexes on text/varchar columns
 2) change cluster locale
 3) create all indexes on text/varchar columns

You're going to miss the name columns, ie. every string index in
pg_catalog. Also, there are shared tables which all used in every DB.
You need to log into every DB in the cluster (don't forget template[01]
and reindex everything.

So, REINDEX DATABASE; seems to be a safer bet. In general this doesn't
actually work since changing the locale may make two strings equal that
wern't before, thus possibly breaking a unique index, but it may be
possible.

I'd suggest single user mode at least, and make backups!

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Andreas 'ads' Scherbaum

Hello,

On Sat, 17 Feb 2007 06:49:42 -0800 (PST)
RPK [EMAIL PROTECTED] wrote:

 PostgreSQL, already a mature database, needs to have more options for
 recovery as compared to proprietary databases. I just worked with Oracle's
 FlashBack query feature in Oracle 9i and FlashBack Table feature in 10g.
 
 Future versions of PostgreSQL must have similar features which enable users
 to bring Table(s) and/or Database(s) to a desired Time Stamp.

There is a pgfoundry project which tries to achieve this:

http://pgfoundry.org/projects/tablelog/


Kind regards

-- 
Andreas 'ads' Scherbaum
German PostgreSQL Usergroup: http://www.pgug.de

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Peter Eisentraut
Am Montag, 19. Februar 2007 10:16 schrieb Magnus Hagander:
 This breaks the fmgrtab generator on msvc. Most likely because I didn't
 think of that case. But since all other rows in pg_proc.h contain the
 oid, I just wanted to check if they're actually supposed to be withuot
 oid, or if that was a mistake?

It's intentional.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] n-gram search function

2007-02-19 Thread Guillaume Smet

On 2/19/07, Oleg Bartunov oleg@sai.msu.su wrote:

pg_trgm was developed for spelling corrrection and there is a threshold of
similarity, which is 0.3 by default. Readme explains what does it means.


Yes, I read it.


Similarity could be very low, since you didn't make separate column and length
of the full string is used to normalize similarity.


Yep, that's probably my problem. Ignored records are a bit longer than
the others.

I tried the tip in README.pg_trgm to generate a table with all the words.

It can do the work in conjunction of tsearch2 and a bit of AJAX to
suggest the full words to the users. The reason why I was not using
tsearch2 is that it's sometimes hard to spell location names
correctly.

The only problem is that it is still quite slow on a 50k rows words
table but I'll make further tests on a decent server this afternoon.

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] Short varlena headers and arrays

2007-02-19 Thread Gregory Stark

I had intended to make varlenas alignment 'c' and have the heaptuple.c force
them to alignment 'i' if they required it. However I've noticed a problem that
makes me think I should do this the other way around.

The problem is that other places in the codebase use the alignment. In
particular arrays do. Also toasting.c expects to get a worst-case size from
att_align rather than a best-case. Also there's indextuple.c but probably I
should get to that in this round anyways.

So now I'm thinking it's best to leave them as alignment 'i' unless
heaptuple.c thinks it can get away without aligning them. This means we don't
have a convenient way for data types to opt out of this header compression.
But the more I think about it the less convinced I am that we need that. The
alignment inside the data type doesn't matter since you'll only be working
with detoasted versions of them unless you specifically go out of your way to
do otherwise.


Once this is done it may be worth having arrays convert to short varlenas as
well. Arrays of short strings hurt pretty badly currently:

postgres=# select pg_column_size(array['a','b','c','d']);
 pg_column_size 

 56
(1 row)

The only problem with this is if it's more likely for someone to stuff things
in an array and then read them back out without detoasting than it is for
someone to stuff them in a tuple. Probably the risk is the same. There is some
code that assumes it understands how arrays are laid out in execQual.c and
varlena.c.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Dimitri Fontaine
Hi list,

Here's a proposal of this idea which stole a good part of my night.
I'll present first the idea, then 2 use cases where to read some rational and 
few details. Please note I won't be able to participate in any development 
effort associated with this idea, may such a thing happen!

The bare idea is to provide a way to 'attach' multiple storage facilities (say 
volumes) to a given tablespace. Each volume may be attached in READ ONLY, 
READ WRITE or WRITE ONLY mode.
You can mix RW and WO volumes into the same tablespace, but can't have RO with 
any W form, or so I think.

It would be pretty handy to be able to add and remove volumes on a live 
cluster, and this could be a way to implement moving/extending tablespaces.


Use Case A: better read performances while keeping data write reliability

The first application of this multiple volumes per tablespace idea is to keep 
a tablespace both into RAM (tmpfs or ramfs) and on disk (both RW).

Then PG should be able to read from both volumes when dealing with read 
queries, and would have to fwrite()/fsync() both volumes for each write.
Of course, write speed will be constrained by the slowest volume, but the 
quicker one would then be able to take away some amount of read queries 
meanwhile.

It would be neat if PG was able to account volumes relative write speed in 
order to assign pounds to each tablespace volumes; and have the planner or 
executor span read queries among volumes depending on that.
For example if a single query has a plan containing several full scan (of 
indexes and/or tables) in the same tablespace, those could be done on 
different volumes. 

Use Case B: Synchronous Master Slave(s) Replication

By using a Distributed File System capable of being mounted from several nodes 
at the same time, we could have a configuration where a master node has 
('exports') a WO tablespace volume, and one or more slaves (depending on FS 
capability) configures a RO tablespace volume.

PG has then to be able to cope with a RO volume: the data are not written by 
PG itself (local node point of view), so some limitations would certainly 
occur.
Will it be possible, for example, to add indexes to data on slaves?
I'd use the solution even without this, thus...

When the master/slave link is broken, the master can no more write to 
tablespace, as if it was a local disk failure of some sort, so this should 
prevent nasty desync' problems: data is written on all W volumes or data is 
not written at all.


I realize this proposal is the first draft of a work to be done, and that I 
won't be able to make a lot more than drafting this idea. This mail is sent 
on the hackers list in the hope someone there will find this is worth 
considering and polishing...

Regards, and thanks for the good work ;)
-- 
Dimitri Fontaine


pgp4lUKkfwp0p.pgp
Description: PGP signature


Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL

2007-02-19 Thread Ian Caulfield

On 17/02/07, Warren Turkal [EMAIL PROTECTED] wrote:

PERIOD(INT) is actually listed in the Dr. Snodgrass's book. However, I am not
really sure about the semantics of the type. When would you use a
PERIOD(INT)?


It wouldn't be directly useful for temporal SQL, but I have a number
of tables in a database application where a range of integers is
mapped onto a 0-100 range (e.g. 1-5 might get mapped onto 1, 6-15 to
2, etc), which I'd like to store using a (preferably non-overlapping)
period type.


Also, please bring this discussion back on list so that it gets recorded. I
didn't want to post your private reply to the list without your permission.


Oops, meant to reply to the list originally.

Ian

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Magnus Hagander
On Mon, Feb 19, 2007 at 11:25:02AM +0100, Peter Eisentraut wrote:
 Am Montag, 19. Februar 2007 10:16 schrieb Magnus Hagander:
  This breaks the fmgrtab generator on msvc. Most likely because I didn't
  think of that case. But since all other rows in pg_proc.h contain the
  oid, I just wanted to check if they're actually supposed to be withuot
  oid, or if that was a mistake?
 
 It's intentional.

Could you explain why, and what the expected result is? Since I can't
find any other examples of people doing it :-)

(will fix the vc stuff, of course, but still interested in knowing)

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps

2007-02-19 Thread Csaba Nagy
 One option that I've heard before is to have vacuum after a single iteration
 (ie, after it fills maintenance_work_mem and does the index cleanup and the
 second heap pass), remember where it was and pick up from that point next
 time.

From my experience this is not acceptable... I have tables for which the
index cleanup takes hours, so no matter how low I would set the
maintenance_work_mem (in fact I set it high enough so there's only one
iteration), it will take too much time so the queue tables get overly
bloated (not happening either, they get now special cluster
treatment).

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Martijn van Oosterhout
On Mon, Feb 19, 2007 at 11:25:41AM +0100, Dimitri Fontaine wrote:
 Hi list,
 
 Here's a proposal of this idea which stole a good part of my night.
 I'll present first the idea, then 2 use cases where to read some rational and 
 few details. Please note I won't be able to participate in any development 
 effort associated with this idea, may such a thing happen!
 
 The bare idea is to provide a way to 'attach' multiple storage facilities 
 (say 
 volumes) to a given tablespace. Each volume may be attached in READ ONLY, 
 READ WRITE or WRITE ONLY mode.
 You can mix RW and WO volumes into the same tablespace, but can't have RO 
 with 
 any W form, or so I think.

Somehow this seems like implementing RAID within postgres, which seems
a bit outside of the scope of a DB.

 Use Case A: better read performances while keeping data write reliability
 
 The first application of this multiple volumes per tablespace idea is to keep 
 a tablespace both into RAM (tmpfs or ramfs) and on disk (both RW).

For example, I don't beleive there is a restiction against having one
member of a RAID array being a RAM disk.

 Use Case B: Synchronous Master Slave(s) Replication
 
 By using a Distributed File System capable of being mounted from several 
 nodes 
 at the same time, we could have a configuration where a master node has 
 ('exports') a WO tablespace volume, and one or more slaves (depending on FS 
 capability) configures a RO tablespace volume.

Here you have the problem of row visibility. The data in the table isn't
very useful without the clog, and that's not stored in a tablespace...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Alvaro Herrera
Bruce Momjian wrote:
 Tom Lane wrote:
  Jim C. Nasby [EMAIL PROTECTED] writes:
   On Thu, Feb 08, 2007 at 01:54:13PM -0500, Tom Lane wrote:
   I would be satisfied if the returned command tag were something else,
   maybe NO OPERATION.
  
   TABLE blah DID NOT EXIST might be less confusing...
  
  You're confusing a command tag with a notice.  In the first place,
  we shouldn't assume that applications are ready to deal with
  indefinitely long command tags (the backend itself doesn't think they
  can be longer than 64 bytes); in the second place, they should be
  constant strings for the most part so that simple strcmp()s suffice
  to see what happened.  Command tags are meant for programs to deal
  with, more than humans.
 
 Yep.  Because IF EXISTS is in a lot of object destruction commands,
 adding a modified tag seems very confusing, because in fact the DROP
 TABLE did succeed, so to give any other tag seems incorrect.

I don't understand -- what problem you got with NO OPERATION?  It
seemed a sound idea to me.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] wishlist items ..

2007-02-19 Thread Hannu Krosing
Ühel kenal päeval, L, 2007-02-17 kell 13:35, kirjutas Lukas Kahwe Smith:
 Lukas Kahwe Smith wrote:
 
  I just wanted to bring up the wishlist todo items:
  http://developer.postgresql.org/index.php/Todo:WishlistFor83

What does/did the row Clustered/replication solutions refer to ?


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] wishlist items ..

2007-02-19 Thread Lukas Kahwe Smith

Hannu Krosing wrote:

Ühel kenal päeval, L, 2007-02-17 kell 13:35, kirjutas Lukas Kahwe Smith:

Lukas Kahwe Smith wrote:


I just wanted to bring up the wishlist todo items:
http://developer.postgresql.org/index.php/Todo:WishlistFor83


What does/did the row Clustered/replication solutions refer to ?


there was some discussion early on in 8.3 scoping to add some out of the 
box solutions for clustering and replication in order to reduce the 
barrier to entry for people who require such features.


regards,
Lukas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Hannu Krosing
Ühel kenal päeval, P, 2007-02-18 kell 14:27, kirjutas Joshua D. Drake:
 Hannu Krosing wrote:
  Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:
  
 
  However, they don't have vacuum, we do.
 
  Right, and I think that is more or less because Oracle doesn't need
  it.  Vacuum's main purpose (correct me if I am wrong) is to
  recover/mark rows that are no longer used, and Oracle essentially
  reuses the space immediately. 
 
  Obviously with Oracle if you bloat out a table and delete a ton of
  rows then you have to rebuild the table, but that is more or less the
  same problem that PostgreSQL has and where vacuum full comes into
  play.
 
  The only benefit with the Oracle model is that you can achieve
  flashback, which is a very rarely used feature in my book.
  
  We can have flashbacks up to the last vacuum. It is just not exposed.
  Don't vacuum, and you have the whole history. (Actually you can't go for
  more than 2G transactions, or you get trx id rollover).
  
  To get a flashback query, you just have to construct a snapshot from
  that time and you are done. We don't store transaction times anywere, so
  the flashback has to be by transaction id, but there is very little
  extra work involved. We just don't have syntax for saying SELECT ... AS
  SEEN BY TRANSACTION XXX
 
 Well this is certainly interesting. What do we think it would take to
 enable the functionality?

First we must run the query in serializable mode and replace the
snapshot with a synthetic one, which defines visibility at the start of
the desired transaction

probably it is a good idea to take a lock on all tables involved to
avoid a vacuum to be started on them when the query is running.

also, we can't trust the DELETED flags in index pages, so we should
forbid index scans, or just always re-check the visibility in heap.

Otherways it would probably be enough to just scan tuples as usual, and
check if they were visible to desired transaction, that is they were
inserted before that transaction and they are not deleted before that
trx.

Of course this will not be true, once we have HOT/WIP with in-page
vacuuming.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Peter Eisentraut
Am Montag, 19. Februar 2007 13:12 schrieb Alvaro Herrera:
 I don't understand -- what problem you got with NO OPERATION?  It
 seemed a sound idea to me.

It seems nonorthogonal.  What if only some of the tables you mentioned did not 
exist?  Do you get SOME OPERATION?

There are also other cases where commands don't have an effect but we don't 
explicitly point that out.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Florian G. Pflug

Hannu Krosing wrote:

Ühel kenal päeval, P, 2007-02-18 kell 14:27, kirjutas Joshua D. Drake:

Hannu Krosing wrote:

Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:
To get a flashback query, you just have to construct a snapshot from
that time and you are done. We don't store transaction times anywere, so
the flashback has to be by transaction id, but there is very little
extra work involved. We just don't have syntax for saying SELECT ... AS
SEEN BY TRANSACTION XXX

Well this is certainly interesting. What do we think it would take to
enable the functionality?


First we must run the query in serializable mode and replace the
snapshot with a synthetic one, which defines visibility at the start of
the desired transaction

probably it is a good idea to take a lock on all tables involved to
avoid a vacuum to be started on them when the query is running.

Would the xmin exported by that transaction prevent vacuum from removing
any tuples still needed for the flashback snapshot?

greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Alvaro Herrera
Florian G. Pflug escribió:
 Hannu Krosing wrote:
 Ühel kenal päeval, P, 2007-02-18 kell 14:27, kirjutas Joshua D. Drake:
 Hannu Krosing wrote:
 Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:
 To get a flashback query, you just have to construct a snapshot from
 that time and you are done. We don't store transaction times anywere, so
 the flashback has to be by transaction id, but there is very little
 extra work involved. We just don't have syntax for saying SELECT ... AS
 SEEN BY TRANSACTION XXX
 Well this is certainly interesting. What do we think it would take to
 enable the functionality?
 
 First we must run the query in serializable mode and replace the
 snapshot with a synthetic one, which defines visibility at the start of
 the desired transaction
 
 probably it is a good idea to take a lock on all tables involved to
 avoid a vacuum to be started on them when the query is running.
 Would the xmin exported by that transaction prevent vacuum from removing
 any tuples still needed for the flashback snapshot?

Sure, and that makes the mentioned lock unnecessary.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Zeugswetter Andreas ADI SD

  Well this is certainly interesting. What do we think it 
 would take to 
  enable the functionality?
 
 First we must run the query in serializable mode and replace 
 the snapshot with a synthetic one, which defines visibility 
 at the start of the desired transaction

We could use something that controls global xmin.
It would ensure, that global xmin does not advance bejond
what still needs to be visible. This would probably be a 
sliding time window, or a fixed point in time that is
released by the dba/user.

Then all below is not really different from a situation where
you had a long running tx.  

 probably it is a good idea to take a lock on all tables 
 involved to avoid a vacuum to be started on them when the 
 query is running.
 
 also, we can't trust the DELETED flags in index pages, so we 
 should forbid index scans, or just always re-check the 
 visibility in heap.
 
 Otherways it would probably be enough to just scan tuples as 
 usual, and check if they were visible to desired transaction, 
 that is they were inserted before that transaction and they 
 are not deleted before that trx.
 
 Of course this will not be true, once we have HOT/WIP with 
 in-page vacuuming.

Currently I think HOT does honor global xmin. There is no 
lookup for relevant xids, so parts of an update chain where
only a previous tuple or a later tuple can be visible are reused.
Else Hot would need to be told not to, in a scenario where
a backend can choose a snapshot at will.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Zeugswetter Andreas ADI SD

  First we must run the query in serializable mode and replace the 
  snapshot with a synthetic one, which defines visibility at the
start 
  of the desired transaction
  
  probably it is a good idea to take a lock on all tables involved to

  avoid a vacuum to be started on them when the query is running.
  Would the xmin exported by that transaction prevent vacuum from 
  removing any tuples still needed for the flashback snapshot?
 
 Sure, and that makes the mentioned lock unnecessary.

Problem is, that that transaction sets a historic snapshot at a later
time, so it is not yet running when vacuum looks at global xmin.
So something else needs to hold up global xmin (see prev post).

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Florian G. Pflug

Zeugswetter Andreas ADI SD wrote:
First we must run the query in serializable mode and replace the 
snapshot with a synthetic one, which defines visibility at the
start 

of the desired transaction

probably it is a good idea to take a lock on all tables involved to



avoid a vacuum to be started on them when the query is running.
Would the xmin exported by that transaction prevent vacuum from 
removing any tuples still needed for the flashback snapshot?

Sure, and that makes the mentioned lock unnecessary.


Problem is, that that transaction sets a historic snapshot at a later
time, so it is not yet running when vacuum looks at global xmin.
So something else needs to hold up global xmin (see prev post).


I think to make this flashback stuff fly, you'd need to know the 
earliest xmin that you can still flashback too. Vacuum would advance

that xmin, as soon as it starts working. So the case you'd need to
protect against would be a race condition when you start a vacuum
and a flashback transaction at the same time. But for that, some simple
semaphore should suffice, and a well-thought-out ordering of the actions
taken.

In the long run, you'd probably want to store the commit-times of 
transactions somewhere, and add some guc that makes a vacuum assume

that recently comitted transaction (say, in the last hour) are still
considered active. That allow the dba to guarantee that he can always
flashback at least a hour.

greetings, Florian Pflug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Montag, 19. Februar 2007 13:12 schrieb Alvaro Herrera:
 I don't understand -- what problem you got with NO OPERATION?  It
 seemed a sound idea to me.

 It seems nonorthogonal.  What if only some of the tables you mentioned did 
 not 
 exist?  Do you get SOME OPERATION?

I'd say you get DROP TABLE as long as at least one table was dropped.

 There are also other cases where commands don't have an effect but we don't 
 explicitly point that out.

The precedent that I'm thinking about is that the command tag for COMMIT
varies depending on what it actually did.

regression=# begin;
BEGIN
regression=# select 1/0;
ERROR:  division by zero
regression=# commit;
ROLLBACK
regression=#

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] n-gram search function

2007-02-19 Thread Oleg Bartunov

On Mon, 19 Feb 2007, Guillaume Smet wrote:


On 2/19/07, Oleg Bartunov oleg@sai.msu.su wrote:

pg_trgm was developed for spelling corrrection and there is a threshold of
similarity, which is 0.3 by default. Readme explains what does it means.


Yes, I read it.

Similarity could be very low, since you didn't make separate column and 
length

of the full string is used to normalize similarity.


Yep, that's probably my problem. Ignored records are a bit longer than
the others.

I tried the tip in README.pg_trgm to generate a table with all the words.

It can do the work in conjunction of tsearch2 and a bit of AJAX to
suggest the full words to the users. The reason why I was not using
tsearch2 is that it's sometimes hard to spell location names
correctly.

The only problem is that it is still quite slow on a 50k rows words
table but I'll make further tests on a decent server this afternoon.


You need to wait GiN support.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] pg_restore fails with a custom backup file

2007-02-19 Thread Magnus Hagander
On Sat, Feb 17, 2007 at 01:28:22PM -0500, Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  I'd also like a comment from at least one other patch reviewer that
  the methods used are good.
 
 It looks reasonable as far as it goes.  One thought is that pg_dump
 really should have noticed that it was writing a broken archive.
 On machines where off_t is 32 bits, can't we detect the overflow
 situation?

Tested on MSVC as well, works. Also tested and doesn't break the build
on Linux (which shouldn't be affected at all).

So, patch applied to HEAD and 8.2.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL

2007-02-19 Thread Tom Lane
Dawid Kuroczko [EMAIL PROTECTED] writes:
 ... Now, assuming UNIQUE INDEX on such table, the order would be preserved
 since no two intervals can overlap.  And no overlapping data could be inserted
 without breaking ovelapivity. And of course non-unique index would
 produce garbage (since left of/right of wouldn't make any sense anymore).

I think actually it doesn't work for unique indexes either :-( because
of dead tuples.  Consider that we have in the index

...
(1,2)
(6,8)   DEAD
(4,10)
(12,14)
...

Since under the given operators (6,8) and (4,10) are equal, btree will
not guarantee that those index entries appear in any particular relative
order.  Thus the above is a legal index configuration.  Now insert (3,5).
This should surely be rejected because it overlaps (4,10).  But what
may well happen is that it gets compared to (1,2) --- OK, it's greater
--- and to (6,8) --- OK, it's less --- and then the uniqueness check stops,
because if it's less than (6,8) then there is no need to search further.
Ooops.

*This* is why the transitive law is essential.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Pavan Deolasee

On 2/19/07, Tom Lane [EMAIL PROTECTED] wrote:


Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Montag, 19. Februar 2007 13:12 schrieb Alvaro Herrera:
 I don't understand -- what problem you got with NO OPERATION?  It
 seemed a sound idea to me.

 It seems nonorthogonal.  What if only some of the tables you mentioned
did not
 exist?  Do you get SOME OPERATION?

I'd say you get DROP TABLE as long as at least one table was dropped.



How about DROP TABLE cnt where 'cnt' is the number of tables dropped ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Gregory Stark

Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:

 First we must run the query in serializable mode and replace 
 the snapshot with a synthetic one, which defines visibility 
 at the start of the desired transaction

 We could use something that controls global xmin.
 It would ensure, that global xmin does not advance bejond
 what still needs to be visible. This would probably be a 
 sliding time window, or a fixed point in time that is
 released by the dba/user.

Well there's another detail you have to cover aside from rolling back your
xmin. You have to find the rest of the snapshot including knowing what other
transactions were in-progress at the time you want to flash back to.

If you just roll back xmin and set xmax to the same value you'll get a
consistent view of the database but it may not match a view that was ever
current. That is, some of the transactions after the target xmin may have
committed before that xmin. So there was never a time in the database when
they were invisible but your new xmin was visible.

I think to do this you'll need to periodically record a snapshot and then
later restore one of those saved snapshots. Not sure where would be a good
place to record them. The WAL seems like a handy place but digging through the
WAL would be annoying.

Incidentally this is one of the things that would be useful for read-only
access to PITR warm standby machines.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Howto change db cluster locale on-the-fly

2007-02-19 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 But I guess something like this would work:
 1) drop all indexes on text/varchar columns
 2) change cluster locale
 3) create all indexes on text/varchar columns

 You're going to miss the name columns, ie. every string index in
 pg_catalog.

But name is not locale-aware --- it just uses strcmp().  AFAIR there
aren't any locale-dependent indexes in the system catalogs.  So in
principle you could hack pg_control, restart the postmaster, and then
reindex every locale-dependent index.  Hacking pg_control would be the
hard part; you'll never get the CRC right if you do it manually.  Possibly
pg_resetxlog could be adapted to the purpose.

 I'd suggest single user mode at least, and make backups!

Yup, a filesystem backup would be a *real* good idea.  Not to mention
testing the procedure on a toy installation.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Montag, 19. Februar 2007 10:16 schrieb Magnus Hagander:
 This breaks the fmgrtab generator on msvc. Most likely because I didn't
 think of that case. But since all other rows in pg_proc.h contain the
 oid, I just wanted to check if they're actually supposed to be withuot
 oid, or if that was a mistake?

 It's intentional.

Kindly change that intention.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Short varlena headers and arrays

2007-02-19 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Once this is done it may be worth having arrays convert to short varlenas as
 well.

Elements of arrays are not subject to being toasted by themselves, so
I don't think you can make that work.  At least not without breaking
wide swaths of code that works fine today.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 Somehow this seems like implementing RAID within postgres,

RAID and LVM too.  I can't get excited about re-inventing those wheels
when perfectly good implementations already exist for us to sit on top of.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Peter Eisentraut
Am Montag, 19. Februar 2007 16:26 schrieb Tom Lane:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  Am Montag, 19. Februar 2007 10:16 schrieb Magnus Hagander:
  This breaks the fmgrtab generator on msvc. Most likely because I didn't
  think of that case. But since all other rows in pg_proc.h contain the
  oid, I just wanted to check if they're actually supposed to be withuot
  oid, or if that was a mistake?
 
  It's intentional.

 Kindly change that intention.

What is wrong?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Peter Eisentraut
Am Montag, 19. Februar 2007 15:57 schrieb Tom Lane:
 The precedent that I'm thinking about is that the command tag for COMMIT
 varies depending on what it actually did.

Some have also argued against that in the past, so I guess we just have 
different ideas of how it should work.  Not a problem.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Montag, 19. Februar 2007 16:26 schrieb Tom Lane:
 Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Montag, 19. Februar 2007 10:16 schrieb Magnus Hagander:
 This breaks the fmgrtab generator on msvc.

 It's intentional.
 
 Kindly change that intention.

 What is wrong?

Well, in the first place Gen_fmgrtab.sh is producing garbage:

#define F_CURSOR_TO_XML DATAINSERT
#define F_CURSOR_TO_XMLSCHEMA DATAINSERT
#define F_QUERY_TO_XML DATAINSERT
#define F_QUERY_TO_XML_AND_XMLSCHEMA DATAINSERT
#define F_QUERY_TO_XMLSCHEMA DATAINSERT
#define F_TABLE_TO_XML DATAINSERT
#define F_TABLE_TO_XML_AND_XMLSCHEMA DATAINSERT
#define F_TABLE_TO_XMLSCHEMA DATAINSERT
#define F_BYTEAOUT 31
#define F_CHAROUT 33

const FmgrBuiltin fmgr_builtins[] = {
  { 0, cursor_to_xml, 5, true, false, cursor_to_xml },
  { 0, cursor_to_xmlschema, 4, true, false, cursor_to_xmlschema },
  { 0, query_to_xml, 4, true, false, query_to_xml },
  { 0, query_to_xml_and_xmlschema, 4, true, false, query_to_xml_and_xmlschema 
},
  { 0, query_to_xmlschema, 4, true, false, query_to_xmlschema },
  { 0, table_to_xml, 4, true, false, table_to_xml },
  { 0, table_to_xml_and_xmlschema, 4, true, false, table_to_xml_and_xmlschema 
},
  { 0, table_to_xmlschema, 4, true, false, table_to_xmlschema },
  { 31, byteaout, 1, true, false, byteaout },

The fact that that table is broken means you're incurring expensive
linear searches to invoke these functions.  It's only by chance that
it works at all...

In the second place, if you don't want to predetermine OIDs for your
functions then they shouldn't be in hardwired pg_proc.h rows at all.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] n-gram search function

2007-02-19 Thread Guillaume Smet

On 2/19/07, Oleg Bartunov oleg@sai.msu.su wrote:

You need to wait GiN support.


OK. Thanks.

If you need testers for this one, feel free to contact me. I'm very
interested in testing pg_trgm in conjunction with tsearch2.

--
Guillaume

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 In the second place, if you don't want to predetermine OIDs for your
 functions then they shouldn't be in hardwired pg_proc.h rows at all.

Is there any place to hook in to create things like procedures or other SQL
objects that don't really need hard coded OIDs?

It seems like we could make the catalogs much easier to maintain by ripping
out everything that isn't needed by the system tables themselves and having
initdb create them by running a plain SQL script.

In particular I'm looking towards having all the operators and associated
hardware except for the basic btree operators for types used by the system
tables be created in plain SQL.

It might also reduce the pain OID conflicts cause.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Dimitri Fontaine
Le lundi 19 février 2007 16:33, Tom Lane a écrit :
 Martijn van Oosterhout kleptog@svana.org writes:
  Somehow this seems like implementing RAID within postgres,

 RAID and LVM too.  I can't get excited about re-inventing those wheels
 when perfectly good implementations already exist for us to sit on top of.

I though moving some knowledge about data availability into PostgreSQL code 
could provide some valuable performance benefit, allowing to organize reads 
(for example parallel tables scan/indexes scan to different volumes) and 
obtaining data from 'quicker' known volume (or least used/charged).

You're both saying RAID/LVM implementations provide good enough performances 
for PG not having to go this way, if I understand correctly.
And distributed file systems are enough to have the replication stuff, without 
PG having to deal explicitly with the work involved.

May be I should have slept after all ;)

Thanks for your time and comments, regards,
-- 
Dimitri Fontaine

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Peter Eisentraut
Am Montag, 19. Februar 2007 16:50 schrieb Tom Lane:
 Well, in the first place Gen_fmgrtab.sh is producing garbage:

Uh, ok, that needs fixing.

 In the second place, if you don't want to predetermine OIDs for your
 functions then they shouldn't be in hardwired pg_proc.h rows at all.

Where else would you put them?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Montag, 19. Februar 2007 16:50 schrieb Tom Lane:
 In the second place, if you don't want to predetermine OIDs for your
 functions then they shouldn't be in hardwired pg_proc.h rows at all.

 Where else would you put them?

SQL script maybe, much along the lines Greg was just mentioning.
(I'd been thinking myself earlier that pg_amop/amproc/etc would be a
whole lot easier to maintain if we could feed CREATE OPERATOR CLASS
commands to the bootstrap process.)  But getting there will take
nontrivial work; you can't just decide to leave out a few OIDs on the
spur of the moment.

Magnus, I'd suggest reverting whatever you did to your MSVC script,
so we'll find out the next time someone makes this mistake...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, Feb 19, 2007 at 04:00:09PM +0100, Florian G. Pflug wrote:
[...]
 In the long run, you'd probably want to store the commit-times of 
 transactions somewhere, and add some guc that makes a vacuum assume
 that recently comitted transaction (say, in the last hour) are still
 considered active [...]

Funny how some things recur:

  http://archives.postgresql.org/pgsql-hackers/2007-01/msg01301.php

(says I and seeks shelter beneath a big rock ;-)

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF2c3RBcgs9XrR2kYRAh1PAJ442IXzr0CjN0w5a3BpwBrKgVGvsgCcCmyh
mnM5AUTHo4uIZ/WCnWxLVM0=
=1aUG
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Tom Lane
Dimitri Fontaine [EMAIL PROTECTED] writes:
 You're both saying RAID/LVM implementations provide good enough performances 
 for PG not having to go this way, if I understand correctly.

There's certainly no evidence to suggest that reimplementing them
ourselves would be a productive use of our time.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Martijn van Oosterhout
On Mon, Feb 19, 2007 at 05:10:36PM +0100, Dimitri Fontaine wrote:
  RAID and LVM too.  I can't get excited about re-inventing those wheels
  when perfectly good implementations already exist for us to sit on top of.
 
 I though moving some knowledge about data availability into PostgreSQL code 
 could provide some valuable performance benefit, allowing to organize reads 
 (for example parallel tables scan/indexes scan to different volumes) and 
 obtaining data from 'quicker' known volume (or least used/charged).

Well, organising requests to be handled quickly is a function of
LVM/RAID, so we don't go there. However, speeding up scans by having
multiple requests is an interesting approach, as would perhaps a
different random_page_cost for different tablespaces.

My point is, don't try to implement the mechanics of LVM/RAID into
postgres, instead, work on providing ways for users to take advantage
of these mechanisms if they have them. Look at it as if you have got
LVM/RAID setup for your ideas, how do you get postgres to take
advantage of them?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Alvaro Herrera
Peter Eisentraut wrote:
 Am Montag, 19. Februar 2007 16:50 schrieb Tom Lane:
  Well, in the first place Gen_fmgrtab.sh is producing garbage:
 
 Uh, ok, that needs fixing.
 
  In the second place, if you don't want to predetermine OIDs for your
  functions then they shouldn't be in hardwired pg_proc.h rows at all.
 
 Where else would you put them?

But _why_ wouldn't you want to have fixed OIDs for the functions?  I'm
not seeing the benefit.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Bruce Momjian
Tom Lane wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  Am Montag, 19. Februar 2007 13:12 schrieb Alvaro Herrera:
  I don't understand -- what problem you got with NO OPERATION?  It
  seemed a sound idea to me.
 
  It seems nonorthogonal.  What if only some of the tables you mentioned did 
  not 
  exist?  Do you get SOME OPERATION?
 
 I'd say you get DROP TABLE as long as at least one table was dropped.

If we went with DROP TABLE if any table was dropped, and NO OPERATION
for none, I am fine with that.  What I didn't want was a different NO
OPERATION-type of message for every object type.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Confusing message on startup after a crash while recovering

2007-02-19 Thread Florian G. Pflug

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

I'd suggest that the text is changed to something along the line of:
database system was interrupted while in recovery at ...
If this has occurred more than once some data may be corrupted and
you may need to restore from the last backup.


It seems the real problem is that it's not specifying *which* data is
probably corrupted.  Maybe:

HINT: If recovery fails repeatedly, it probably means that the recovery log
data is corrupted; you may have to restore from your last full backup.


IMHO that wording would be fine too - the important points for me is to 
clearly state that corrupted data is maybe the _cause_ of the crash, and
not the _effect_ of the crash. And for the sake of consistency, the 
message for abort-during-recovery and abort-during-archivelog-replay 
should be similar.



Also, do we want to suggest use of pg_resetxlog in the message?

I'd rather add some documentation of how to use pg_resetxlog to the
manual if it's not already there, any maybe reference that chapter in
a HINT message. In that manual chapter you can warn about the dangers
of pg_resetxlog, and put in an advice to backup the database before 
using it. I think such a warning is important, because any documentation 
of pg_resetxlog is targeted at users know are not familiar with postgres

internals, and those users are likely to shoot themselves in their foot
if you point them to pg_resetxlog.

greetings, Florian Pflug

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Guillaume Smet

On 2/19/07, Alvaro Herrera [EMAIL PROTECTED] wrote:

Guillaume Smet escribió:
 On 2/19/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
 So add the session ID (%c) to log_line_prefix.

 It could work if log_line_prefix was added before every line but it's
 definitely not the case:
 myuser mydb 45d9d615.4abe LOG:  duration : 185.223 ms, statement : SELECT *
FROM lieu
LIMIT 10;
 if you execute:
 SELECT *
 FROM lieu
 LIMIT 10;

Interesting.  I wonder why didn't you report this as a bug before?
Maybe we could have discussed it and fixed it.


Perhaps because I thought it was not really a bug but the intended behaviour.
Syslog has the same behaviour and it's quite logical when you consider
how queries are logged (I've spent a few hours in the logging code).
Syslog has exactly the same behaviour but adds the necessary context
information.
I'm pretty sure I have explained the problem a few times on the lists
though but perhaps it was just on IRC.

From the feedback I have on pgFouine, very few people think it's a
real problem, probably because they don't use query logging as we do:
our production servers have it enabled all the time and we have a high
load on them so this particular case is a common case for us.

(Second try to move this discussion to -hackers)

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Alvaro Herrera
Guillaume Smet escribió:
 On 2/19/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
 Guillaume Smet escribió:
  On 2/19/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
  So add the session ID (%c) to log_line_prefix.
 
  It could work if log_line_prefix was added before every line but it's
  definitely not the case:
  myuser mydb 45d9d615.4abe LOG:  duration : 185.223 ms, statement : 
 SELECT *
 FROM lieu
 LIMIT 10;
  if you execute:
  SELECT *
  FROM lieu
  LIMIT 10;
 
 Interesting.  I wonder why didn't you report this as a bug before?
 Maybe we could have discussed it and fixed it.
 
 Perhaps because I thought it was not really a bug but the intended 
 behaviour.
 Syslog has the same behaviour and it's quite logical when you consider
 how queries are logged (I've spent a few hours in the logging code).
 Syslog has exactly the same behaviour but adds the necessary context
 information.

If it adds necessary context then it clear does not have the same
behavior, because the problem is precisely that the context is missing.
I'd propose adding a log_entry_prefix separate from log_line_prefix; the
entry prefix would contain most of the stuff, and log_line_prefix would
be a minimal thing intended to be put in front of each _line_, so the
example you show above could be

myuser mydb 45d9d615.4abe LOG:  duration : 185.223 ms, statement : SELECT *
45d9d615.4abe FROM lieu
45d9d615.4abe LIMIT 10;

where you have

log_entry_prefix=%d %u 
log_line_prefix=%c 

Really, prefixing with a tab does not strike me as a great idea
precisely because it's ambiguous.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] How can I merge a TargetEntry with a tuple?

2007-02-19 Thread Zoltan Boszormenyi

Hi,

I started working again on my
IDENTITY/GENERATED patch.
My question is $SUBJECT.
This code is in rewriteTargetlist():

new_attr = build_column_default()
new_tle = makeTargetEntry((Expr *) new_expr, ...)

Now, in ExecInsert() I have to compute
the default for IDENTITY/GENERATED
between ExecConstraints() and heap_insert().

How can I create a Datum out of either an Expr
or a TargetEntry (that contains the computed
constant out of the default expression) so I can
use it after I did an heap_deform_tuple()?

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] wishlist items ..

2007-02-19 Thread David Fetter
On Mon, Feb 19, 2007 at 01:28:46PM +0100, Lukas Kahwe Smith wrote:
 Hannu Krosing wrote:
 Ühel kenal päeval, L, 2007-02-17 kell 13:35, kirjutas Lukas Kahwe Smith:
 Lukas Kahwe Smith wrote:
 
 I just wanted to bring up the wishlist todo items:
 http://developer.postgresql.org/index.php/Todo:WishlistFor83
 
 What does/did the row Clustered/replication solutions refer to ?
 
 there was some discussion early on in 8.3 scoping to add some out of
 the box solutions for clustering and replication in order to reduce
 the barrier to entry for people who require such features.

I believe this has been started as a replication hooks project,
although I'm unsure of its current status.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Guillaume Smet

On 2/19/07, Alvaro Herrera [EMAIL PROTECTED] wrote:

If it adds necessary context then it clear does not have the same
behavior,


I mean log_line_prefix behaviour is the same. The other information
are syslog specific.


I'd propose adding a log_entry_prefix separate from log_line_prefix; the
entry prefix would contain most of the stuff, and log_line_prefix would
be a minimal thing intended to be put in front of each _line_, so the
example you show above could be


It could be a good idea.
It won't make me use stderr output but it will allow other people to
do so without any disadvantage :).


Really, prefixing with a tab does not strike me as a great idea
precisely because it's ambiguous.


Sure.

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Peter Eisentraut
Tom Lane wrote:
 SQL script maybe, much along the lines Greg was just mentioning.

I would welcome that, although a similar suggestion was rejected a few 
years ago, which is why I didn't pursue it here.

 you can't just decide to leave out a few OIDs on the
 spur of the moment.

I still don't understand why that would be a problem, aside from the 
fmgrtab problem that is specific to pg_proc.  Other system catalogs 
also have mixed entries with and without explicit OIDs.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Peter Eisentraut
Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  Somehow this seems like implementing RAID within postgres,

 RAID and LVM too.  I can't get excited about re-inventing those
 wheels when perfectly good implementations already exist for us to
 sit on top of.

I expect that someone will point out that Windows doesn't support RAID 
or LVM, and we'll have to reimplement it anyway.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Magnus Hagander
Peter Eisentraut wrote:
 Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
 Somehow this seems like implementing RAID within postgres,
 RAID and LVM too.  I can't get excited about re-inventing those
 wheels when perfectly good implementations already exist for us to
 sit on top of.
 
 I expect that someone will point out that Windows doesn't support RAID 
 or LVM, and we'll have to reimplement it anyway. 

Windows supports both RAID and LVM.

//Magnus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Stefan Kaltenbrunner
Peter Eisentraut wrote:
 Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
 Somehow this seems like implementing RAID within postgres,
 RAID and LVM too.  I can't get excited about re-inventing those
 wheels when perfectly good implementations already exist for us to
 sit on top of.
 
 I expect that someone will point out that Windows doesn't support RAID 
 or LVM, and we'll have to reimplement it anyway.

windows supports software raid just fine since Windows 2000 or so ...


Stefan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Short varlena headers and arrays

2007-02-19 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Once this is done it may be worth having arrays convert to short varlenas as
 well.

 Elements of arrays are not subject to being toasted by themselves, so
 I don't think you can make that work.  At least not without breaking
 wide swaths of code that works fine today.

You think it's more likely there are places that build arrays and then read
the items back without passing through detoast than there are places that
build tuples and do so?



Btw I ran into some problems with system tables. Since many of them are read
using the GETSTRUCT method and in that method the first varlena field should
be safely accessible, i would have to not skip the alignment for the first
varlena field in system tables. Instead I just punt on all system tables. The
only one that seems like it'll be loss on is pg_statistic and there the
biggest problem is the space wasted inside the arrays, not before the varlena
fields.

Also, int2vector and oidvector don't expect to be toasted so I've skipped them
as well. If we want to have an escape hatch they would have to be so marked.
For now I just hard coded them.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread August Zajonc
Gregory Stark wrote:
 Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
 
 First we must run the query in serializable mode and replace 
 the snapshot with a synthetic one, which defines visibility 
 at the start of the desired transaction
 We could use something that controls global xmin.
 It would ensure, that global xmin does not advance bejond
 what still needs to be visible. This would probably be a 
 sliding time window, or a fixed point in time that is
 released by the dba/user.
 
 Well there's another detail you have to cover aside from rolling back your
 xmin. You have to find the rest of the snapshot including knowing what other
 transactions were in-progress at the time you want to flash back to.
 
 If you just roll back xmin and set xmax to the same value you'll get a
 consistent view of the database but it may not match a view that was ever
 current. That is, some of the transactions after the target xmin may have
 committed before that xmin. So there was never a time in the database when
 they were invisible but your new xmin was visible.
 
[...]
 Incidentally this is one of the things that would be useful for read-only
 access to PITR warm standby machines.
 

Couldn't you define things simply to be that you get a consistent view
including all transactions started before x transaction? This is time
travel lite, but low overhead which I think is a key benefit of this
approach.

A huge value for this would be in the oops, I deleted my data category.
Postgresql rarely looses data, but clients seem to have a habit of doing
so, and then going oops. This seems to happen most often when facing
something like a reporting deadline where they are moving lots of stuff
around and making copies and sometimes delete the wrong company
recordset or equivalent, even with confirmation dialogs at the app level.

This would give a quick and easy oops procedure to the client. DBA set's
guc to 1hr, tells client, if you make a big mistake, stop database
server as follows and call. Frankly, would bail a few DBA's out as well.

The key is how lightweight the setup could be, which matters because
clients are not always willing to pay for a PITR setup. The low overhead
would mean you'd feel fine about setting guc to 1hr or so.

As a % of total installed instances I suspect the % with PITR is small.
I've got stuff I snapshot nightly, but that's it. So don't have an easy
out from the oops query either.

- August






---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Greg Smith

On Mon, 19 Feb 2007, Guillaume Smet wrote:

Why not simply put something like %log_table% in the sql file and let 
the admin replace it with sed or whatever he likes?


This is a reasonable approach.  I would suggest that no special characters 
be used though, so that the SQL could be used as-is by a DBA who doesn't 
even know about or want to use tools like sed.  I also think the default 
should be a pg_ name like the pg_log I suggested, partly because I'd like 
this to be completely internal one day--just push the logs into the 
database directly without even passing through an external file first. 
Also, something like pg_log is unlikely to cause a conflict with existing 
tables.  I would bet there's already databases out there with tables 
called log_table, for example, but everyone already avoids naming 
application tables starting with pg_.


A workable syntax might be

INSERT INTO pg_log ...

The redundant quotation marks will make it easier to do a search/replace 
to change the table name without worrying about accidentally impacting the 
text of the message, so that even people who aren't aware how to build a 
regular expression that only modifies the first match will probably be OK.


I consider using the same name as the default log directory helpful, but 
would understand that others might consider it confusing to overload the 
name like that.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Peter Eisentraut
Magnus Hagander wrote:
 Windows supports both RAID and LVM.

Oh good, so we've got that on record. :)

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Florian G. Pflug

August Zajonc wrote:

Gregory Stark wrote:

Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:

First we must run the query in serializable mode and replace 
the snapshot with a synthetic one, which defines visibility 
at the start of the desired transaction

We could use something that controls global xmin.
It would ensure, that global xmin does not advance bejond
what still needs to be visible. This would probably be a 
sliding time window, or a fixed point in time that is

released by the dba/user.

Well there's another detail you have to cover aside from rolling back your
xmin. You have to find the rest of the snapshot including knowing what other
transactions were in-progress at the time you want to flash back to.

If you just roll back xmin and set xmax to the same value you'll get a
consistent view of the database but it may not match a view that was ever
current. That is, some of the transactions after the target xmin may have
committed before that xmin. So there was never a time in the database when
they were invisible but your new xmin was visible.

[...]
Incidentally this is one of the things that would be useful for read-only
access to PITR warm standby machines.



Couldn't you define things simply to be that you get a consistent view
including all transactions started before x transaction? This is time
travel lite, but low overhead which I think is a key benefit of this
approach.


I was thinking along the same line. Flashback is probably ony really
usefull on databases that are mostly read-only, but with a few users
who update data. You'd use flashback to undo catastrophic changes done
by accident, and probably will gladly accept that you undo a little
more work than strictly necessary.

On the contrary, if you're running a online shop were people buy stuff
24/7, and, say, somebody accidentally deletes some producs, than you
won't want to loose the orders happened during that last hour, but will
rather try to regenerate that products from your last backup.

So I don't think that it's too important what snapshot you get exactly,
making the xmin=xmax idea feasable.

The same holds true for PITR warm standby (readonly queries on pitr 
slaves). This would be used for reporting, or load-balancing of searches
in fairly static data - all of which won't depend on the exact snapshot 
you get.


greetings, Florian Pflug


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Joshua D. Drake
Stefan Kaltenbrunner wrote:
 Peter Eisentraut wrote:
 Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
 Somehow this seems like implementing RAID within postgres,
 RAID and LVM too.  I can't get excited about re-inventing those
 wheels when perfectly good implementations already exist for us to
 sit on top of.
 I expect that someone will point out that Windows doesn't support RAID 
 or LVM, and we'll have to reimplement it anyway.
 
 windows supports software raid just fine since Windows 2000 or so ...

Longer than that... it supported mirroring and raid 5 in NT4 and
possibly even NT3.51 IIRC.


Joshua D. Drake


 
 
 Stefan
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Deadlock with pg_dump?

2007-02-19 Thread Simon Riggs
On Tue, 2007-02-13 at 22:19 -0500, Bruce Momjian wrote:
 Simon Riggs wrote:
  On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote:
   Chris Campbell [EMAIL PROTECTED] writes:
Is there additional logging information I can turn on to get more  
details? I guess I need to see exactly what locks both processes  
hold, and what queries they were running when the deadlock occurred?  
Is that easily done, without turning on logging for *all* statements?
   
   log_min_error_statement = error would at least get you the statements
   reporting the deadlocks, though not what they're conflicting against.
  
  Yeh, we need a much better locking logger for performance analysis.
  
  We really need to dump the whole wait-for graph for deadlocks, since
  this might be more complex than just two statements involved. Deadlocks
  ought to be so infrequent that we can afford the log space to do this -
  plus if we did this it would likely lead to fewer deadlocks.
  
  For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter
  that would allow you to dump the wait-for graph for any data-level locks
  that wait too long, rather than just those that deadlock. Many
  applications experience heavy locking because of lack of holistic
  design. That will also show up the need for other utilities to act
  CONCURRENTLY, if possible.
 
 Old email, but I don't see how our current output is not good enough?
 
   test= lock a;
   ERROR:  deadlock detected
   DETAIL:  Process 6855 waits for AccessExclusiveLock on relation 16394 of
   database 16384; blocked by process 6795.
   Process 6795 waits for AccessExclusiveLock on relation 16396 of database
   16384; blocked by process 6855.

This detects deadlocks, but it doesn't detect lock waits. 

When I wrote that it was previous experience driving me. Recent client
experience has highlighted the clear need for this. We had a lock wait
of 50 hours because of an RI check; thats the kind of thing I'd like to
show up in the logs somewhere.

Lock wait detection can be used to show up synchronisation points that
have been inadvertently designed into an application, so its a useful
tool in investigating performance issues.

I have a patch implementing the logging as agreed with Tom, will post to
patches later tonight.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Andrew Sullivan
On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  Somehow this seems like implementing RAID within postgres,
 
 RAID and LVM too.  I can't get excited about re-inventing those wheels
 when perfectly good implementations already exist for us to sit on top of.

Ok, warning, this is a you know what would be sweet moment.

What would be nice is to be able to detach one of the volumes, and
know the span of the data in there without being able to access the
data.

The problem that a lot of warehouse operators have is something like
this: We know we have all this data, but we don't know what we will
want to do with it later.  So keep it all.  I'll get back to you when
I want to know something.

It'd be nice to be able to load up all that data once, and then shunt
it off into (say) read-only media.  If one could then run a query
that would tell one which spans of data are candidates for the
search, you could bring back online (onto reasonably fast storage,
for instance) just the volumes you need to read.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Users never remark, Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath.
--Damien Katz

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Joshua D. Drake
Andrew Sullivan wrote:
 On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
 Somehow this seems like implementing RAID within postgres,
 RAID and LVM too.  I can't get excited about re-inventing those wheels
 when perfectly good implementations already exist for us to sit on top of.
 
 Ok, warning, this is a you know what would be sweet moment.

The dreaded words from a developers mouth to every manager in the world.

Joshua D. Drake



-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Greg Smith

On Mon, 19 Feb 2007, Joshua D. Drake wrote:


Longer than that... it supported mirroring and raid 5 in NT4 and
possibly even NT3.51 IIRC.


Mirroring and RAID 5 go back to Windows NT 3.1 Advanced Server in 1993:

http://support.microsoft.com/kb/114779
http://www.byte.com/art/9404/sec8/art7.htm

The main source of confusion about current support for this feature is 
that the desktop/workstation version of Windows don't have it.  For 
Windows XP, you need the XP Professional version to get dynamic disk 
support; it's not in the home edition.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] proposed todo: use insert/update returning anywhere a table is allowed

2007-02-19 Thread Bruce Momjian

Added to TODO:

* Allow INSERT/UPDATE ... RETURNING inside a SELECT 'FROM' clause

  http://archives.postgresql.org/pgsql-general/2006-09/msg00803.php
  http://archives.postgresql.org/pgsql-hackers/2006-10/msg00693.php

---

Jeff Davis wrote:
 On Tue, 2007-02-06 at 16:54 -0500, Merlin Moncure wrote:
  It would be great to be able to join to update, returning, etc.  It
  looks like the code was deliberately tied off as-is and I was
  surprised not to see a todo for this.
  
  the basic idea is to be able to do things like:
  select * from (update foo set id = 1 returning *) q;
  
  which currently syntax errors out on 'set' which is a bit odd.
  
  here are tom's brief notes on it and also some context on how select
  combined with insert/update might influence triggers:
  
  http://archives.postgresql.org/pgsql-hackers/2006-10/msg00693.php
  
 
 Also a relevent thread:
 
 http://archives.postgresql.org/pgsql-general/2006-09/msg00803.php
 
 Regards,
   Jeff Davis
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Bruce Momjian
Joshua D. Drake wrote:
 Andrew Sullivan wrote:
  On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
  Martijn van Oosterhout kleptog@svana.org writes:
  Somehow this seems like implementing RAID within postgres,
  RAID and LVM too.  I can't get excited about re-inventing those wheels
  when perfectly good implementations already exist for us to sit on top of.
  
  Ok, warning, this is a you know what would be sweet moment.
 
 The dreaded words from a developers mouth to every manager in the world.

Yea, I just instinctively hit delete when I saw that phrase.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread August Zajonc

On Mon, 19 Feb 2007 20:30:59 +0100, Florian G. Pflug [EMAIL PROTECTED]
said:
 August Zajonc wrote:
  Gregory Stark wrote:
  
  Couldn't you define things simply to be that you get a consistent view
  including all transactions started before x transaction? This is time
  travel lite, but low overhead which I think is a key benefit of this
  approach.
 
 I was thinking along the same line. Flashback is probably ony really
 usefull on databases that are mostly read-only, but with a few users
 who update data. You'd use flashback to undo catastrophic changes done
 by accident, and probably will gladly accept that you undo a little
 more work than strictly necessary.
 
 On the contrary, if you're running a online shop were people buy stuff
 24/7, and, say, somebody accidentally deletes some producs, than you
 won't want to loose the orders happened during that last hour, but will
 rather try to regenerate that products from your last backup.

Hopefully people doing order systems are using PITR or similar :) 

For the time travel light case, it's just a matter of clear definition.
You get all transactions that were *started* before and up to x trx. If
the transaction rolled back you still won't see it, so you're still
getting a consistent view. But if it committed after your marker you
will see it. That seems ok to me. In fact, I suspect folks think of
transactions as happening more or less when they get sent to the DB, so
this may map more directly to what people expect.

The one caveat would be that if you started a long running transaction,
then did the oops trx 5 minutes later, and then started time travel
*before* the long running trx committed. In that case you wouldn't see
that long running trx, so the definition would need to be modified to be
something like all trx started before x, that were no longer running
when you time travel. Don't know if it is worth a NOTICE in the logs if
you time travel back, but there are id's of transactions from before
your xmin that are still running (and if you waited a bit might become
visable in your time travel view). 

If Jan gets his way with a timestamp on trx commit, then you can do
started before x time, which may be more user friendly. 

For PITR I'd imagine you might actually be able to get the visability
right no? Havn't looked deeply enough into the wal logs to understand
how the partial playback scanario works. If the wal logs are ordered on
trx commit time, then you'd get proper visability. 

- August

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Greg Smith

On Mon, 19 Feb 2007, Alvaro Herrera wrote:

We already have a combined GUC option that is used to change two 
different things (DateStyle) and I regularly see people confused about 
how to use it.


You already have a combined GUC option called log_destination that's 
sitting in the appropriate area of the configuration file, doing something 
similar to what's needed for the new feature.  People confused by that are 
already confused.



Also, sql is not really a destination -- it is a format.


A log file with a different name is another destination.  eventlog is 
certainly a different format and it's sitting happily as an option there. 
I haven't heard anyone make a useful argument yet as to how insert/sql 
logs are any different than the current way that stderr, syslog, and 
eventlog are all possibilities now for log_destination, each with their 
own little quirks (and in the case of syslog, their own additional GUC 
parameters).


That way you can choose to have one or the other, or both if you're 
really dumb.


The fact that you're characterizing people who might want both as really 
dumb tells me you're not familiar with enterprise logging requirements. I 
already commented on situations where wanting both types of output going 
at once is going to absolutely be a requirement in some environments for 
this feature addition to be useful; there are a lot of large operations 
that rely heavily on features like syslog to help manage their systems. 
Most of the places I've worked at, the syslog server where the analysis is 
running wasn't necessarily even in the same state as the machine 
generating the log entries.


I know I can't deploy this feature unless it operates in parallel with the 
existing text-based info going to syslog, both because of that and because 
of transition issues--I can't disrupt the existing logs to test a new log 
mechanism until that new mechanism has proven itself.  I'll probably 
deploy it with both turned on forever once it's available.


As for your comments on syslog vs. stderr, I completely agree with 
Guillaume's response to you on that subject.  The stderr output is 
difficult to use for the reasons he describes, but the kind of 
environments that use complicated logging aren't relying on that anyway. 
I wouldn't get distracted by fixing that implementation when it's 
functional enough for most who are satisfied with stderr output.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Alvaro Herrera
Andrew Sullivan wrote:
 On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
  Martijn van Oosterhout kleptog@svana.org writes:
   Somehow this seems like implementing RAID within postgres,
  
  RAID and LVM too.  I can't get excited about re-inventing those wheels
  when perfectly good implementations already exist for us to sit on top of.
 
 Ok, warning, this is a you know what would be sweet moment.
 
 What would be nice is to be able to detach one of the volumes, and
 know the span of the data in there without being able to access the
 data.
 
 The problem that a lot of warehouse operators have is something like
 this: We know we have all this data, but we don't know what we will
 want to do with it later.  So keep it all.  I'll get back to you when
 I want to know something.

You should be able to do that with tablespaces and VACUUM FREEZE, the
point of the latter being that you can take the disk containing the
read only data offline, and still have the data readable after
plugging it back in, no matter how far along the transaction ID counter
is.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] referential Integrity and SHARE locks

2007-02-19 Thread Bruce Momjian

Added to TODO:

* Allow UPDATEs on only non-referential integrity columns not to 
conflict
  with referential integrity locks

  http://archives.postgresql.org/pgsql-hackers/2007-02/msg00073.php


---

Jan Wieck wrote:
 On 2/8/2007 2:46 PM, Marc Munro wrote:
  On Thu, 2007-08-02 at 14:33 -0500, Tom Lane wrote:
  Marc Munro [EMAIL PROTECTED] writes:
   Yes in this case, T1 must abort because the record it was going to
   update has disappeared from underneath it.  I don't see how this is
   significantly different from the same race for the record if the table
   had no RI constraints.  The only difference that I can see, is that T1
   now has some locks that it must relinquish as the transaction aborts.
  
  No, the difference is there would have been no error at all before;
  if the record were deleted before T1 got to it then it wouldn't have
  attempted to update it.  I really don't think you can make it work
  to perform updates or deletes on a record you have not yet locked.
  
  The record would be locked before the update or delete is attempted,
  however it would not be locked until the referential integrity
  constraints have succeeded in acquiring their locks.
  
  It is becoming clear to me that I am missing something but I still don't
  know what it is.  If anyone can see it and explain it I'd really
  appreciate it.
 
 I think you are missing the fact that the exclusive row lock on UPDATE 
 is taken before any triggers are fired at all, even BEFORE ROW triggers. 
 This is necessary in order to prevent the row being updated or removed 
 concurrently while the triggers are executing. Since BEFORE ROW triggers 
 can modify the content of the row (including the foreign key), the RI 
 check and lock of the referenced row cannot happen before other BR 
 triggers are completed.
 
 In order to make your idea fly, the RI check trigger on INSERT or UPDATE 
 would have to be fired before taking the row lock considering the NEW 
 values for referencing columns as they are thus far. Since the row isn't 
 locked at this time, it can change or disappear while the RI trigger is 
 executing, so the check and lock has to be redone later with the actual 
 row that got locked and after all BR triggers are done with it.
 
 
 Jan
 
 -- 
 #==#
 # It's easier to get forgiveness for being wrong than for being right. #
 # Let's break this rule - forgive me.  #
 #== [EMAIL PROTECTED] #
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposed adjustments in MaxTupleSize andtoastthresholds

2007-02-19 Thread Bruce Momjian

Added to TODO:

* Consider allowing configuration of TOAST thresholds

  http://archives.postgresql.org/pgsql-hackers/2007-02/msg00213.php

---

Simon Riggs wrote:
 On Mon, 2007-02-05 at 19:18 -0500, Jan Wieck wrote:
  On 2/5/2007 11:52 AM, Tom Lane wrote:
   Simon Riggs [EMAIL PROTECTED] writes:
   Sounds like a good time to suggest making these values configurable,
   within certain reasonable bounds to avoid bad behaviour.
   
   Actually, given what we've just learned --- namely that choosing these
   values at random is a bad idea --- I'd want to see a whole lot of
   positive evidence before adding such a configuration knob.
  
  Some of the evidence is TOAST itself. Every time you do not SET a column 
  that has been toasted into external storage during an UPDATE, you win 
  because the columns data isn't read during the scan for the row to 
  update, it isn't read during heap_update(), it isn't actually updated at 
  all (the toast reference is copied as is and the external value reused), 
  and not a single byte of the external data is bloating WAL. If someone 
  knows that 99% of their updates will not hit certain text columns in 
  their tables, actually forcing them to be compressed no matter what and 
  to be stored external if they exceed 100 bytes will be a win.
 
 Yes, thats the main use case.
 
  Of course, this is a bit different from Simon's approach. What I 
  describe here is a per pg_attribute configuration to enforce a certain 
  new toaster behavior. Since we already have something that gives the 
  toaster a per column cluestick (like not to bother trying to compress), 
  it might be much easier to implement then Simon's proposal. It would 
  require that the toaster goes over the initial heap tuple for those 
  specially configured columns even if the tuple is below the toast 
  threshold, which suggests that a pg_class.relhasspecialtoastneeds could 
  be useful. But I think as for fine tuning capabilities, a column 
  insensitive maximum tuple size is insufficient anyway.
 
 Well, sounds like we both want the same thing. The only discussion seems
 to be about user interface.
 
 Setting it per column is much better for very fine tuning, but setting
 them in isolation doesn't help decide what to do when you have lots of
 medium length strings where the sum exceeds the toast target.
 
 IMHO it would be better to have an col-level storage priority (default
 0) and then an table-level settable toast target. So we start applying
 the storage handling mechanisms on the highest priority columns and keep
 going in descending order until we are under the limit for the table.
 
 ALTER TABLE foo
   ALTER COLUMN foocol 
   SET STORAGE EXTERNAL PRIORITY 5
   WITH 
   (toast_target = 400);   /* must be MAXALIGNed value */
 
 Equal priorities are allowed, in which case lowest attribute id wins,
 i.e. current behaviour remains the default.
 
 -- 
   Simon Riggs 
   EnterpriseDB   http://www.enterprisedb.com
 
 
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] libpq docs about PQfreemem

2007-02-19 Thread Bruce Momjian

I have updated the PQfree documentation;  patch attached.  Backpatched
to 8.2.X.

---

Zeugswetter Andreas ADI SD wrote:
 
   future for some reason. (the doc for the other functions say you
 have to
   use PQfreemem without mentioning any exceptions)
   
   Thoughts? Rip out or update?
  
  Are you saying that almost all Win32 binaries and libraries now can
 free
  across DLLs?
 
 You can under very narrow conditions. You need to force code generation 
 for Multithreaded DLL run-time libraries (e.g. in VC6 msvcrt.dll) 
 for all exe's and dll's.
 This is bad for debugging, memory checkers and probably impossible
 when using different compilers.
 So you really need PQfreemem.
 
 Andreas

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/libpq.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/libpq.sgml,v
retrieving revision 1.231
diff -c -c -r1.231 libpq.sgml
*** doc/src/sgml/libpq.sgml	16 Feb 2007 16:37:29 -	1.231
--- doc/src/sgml/libpq.sgml	19 Feb 2007 22:04:28 -
***
*** 2617,2625 
 functionPQescapeBytea/function,
 functionPQunescapeBytea/function,
 and functionPQnotifies/function.
!It is needed by Microsoft Windows, which cannot free memory across
!DLLs, unless multithreaded DLLs (option/MD/option in VC6) are used.
!On other platforms, this function is the same as the standard library function functionfree()/.
/para
/listitem
/varlistentry
--- 2617,2629 
 functionPQescapeBytea/function,
 functionPQunescapeBytea/function,
 and functionPQnotifies/function.
!It is particularly important that this function, rather than
!functionfree()/, be used on Microsoft Windows.  This is because
!allocating memory in a DLL and releasing it in the application works
!only if multithreaded/single-threaded, release/debug, and static/dynamic
!flags are the same for the DLL and the application.  On non-Microsoft
!Windows platforms, this function is the same as the standard library
!function functionfree()/.
/para
/listitem
/varlistentry

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Howto change db cluster locale on-the-fly

2007-02-19 Thread Jakub Ouhrabka

Hi Tom,

 Hacking pg_control would be the hard part; you'll never get the CRC
 right if you do it manually.  Possibly pg_resetxlog could be adapted
 to the purpose.

thanks for your valuable answer! I looked at pg_resetxlog.c but I'm no 
pg internals' expert - would something like this work?


1) normally shut down database
2) hack pg_resetxlog to set locale to wanted value
3) run pg_resetxlog -f (rewrite pg_control - everything would be guessed 
with the exception of overloaded locale)

4) start database

We won't miss any transactions and there won't be any inconsistency in 
data because server was normally shut down, right?


Thanks,

Kuba

Tom Lane napsal(a):

Martijn van Oosterhout kleptog@svana.org writes:

But I guess something like this would work:
1) drop all indexes on text/varchar columns
2) change cluster locale
3) create all indexes on text/varchar columns



You're going to miss the name columns, ie. every string index in
pg_catalog.


But name is not locale-aware --- it just uses strcmp().  AFAIR there
aren't any locale-dependent indexes in the system catalogs.  So in
principle you could hack pg_control, restart the postmaster, and then
reindex every locale-dependent index.  Hacking pg_control would be the
hard part; you'll never get the CRC right if you do it manually.  Possibly
pg_resetxlog could be adapted to the purpose.


I'd suggest single user mode at least, and make backups!


Yup, a filesystem backup would be a *real* good idea.  Not to mention
testing the procedure on a toy installation.

regards, tom lane


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Deadlock with pg_dump?

2007-02-19 Thread Simon Riggs
On Mon, 2007-02-19 at 19:38 +, Simon Riggs wrote:
 On Tue, 2007-02-13 at 22:19 -0500, Bruce Momjian wrote:
  Simon Riggs wrote:
   On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote:
Chris Campbell [EMAIL PROTECTED] writes:
 Is there additional logging information I can turn on to get more  
 details? I guess I need to see exactly what locks both processes  
 hold, and what queries they were running when the deadlock occurred?  
 Is that easily done, without turning on logging for *all* statements?

log_min_error_statement = error would at least get you the statements
reporting the deadlocks, though not what they're conflicting against.
   
   Yeh, we need a much better locking logger for performance analysis.
   
   We really need to dump the whole wait-for graph for deadlocks, since
   this might be more complex than just two statements involved. Deadlocks
   ought to be so infrequent that we can afford the log space to do this -
   plus if we did this it would likely lead to fewer deadlocks.
   
   For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter
   that would allow you to dump the wait-for graph for any data-level locks
   that wait too long, rather than just those that deadlock. Many
   applications experience heavy locking because of lack of holistic
   design. That will also show up the need for other utilities to act
   CONCURRENTLY, if possible.
  
  Old email, but I don't see how our current output is not good enough?
  
  test= lock a;
  ERROR:  deadlock detected
  DETAIL:  Process 6855 waits for AccessExclusiveLock on relation 16394 of
  database 16384; blocked by process 6795.
  Process 6795 waits for AccessExclusiveLock on relation 16396 of database
  16384; blocked by process 6855.
 
 This detects deadlocks, but it doesn't detect lock waits. 
 
 When I wrote that it was previous experience driving me. Recent client
 experience has highlighted the clear need for this. We had a lock wait
 of 50 hours because of an RI check; thats the kind of thing I'd like to
 show up in the logs somewhere.
 
 Lock wait detection can be used to show up synchronisation points that
 have been inadvertently designed into an application, so its a useful
 tool in investigating performance issues.
 
 I have a patch implementing the logging as agreed with Tom, will post to
 patches later tonight.

Patch for discussion, includes doc entries at top of patch, so its
fairly clear how it works.

Output is an INFO message, to allow this to trigger
log_min_error_statement when it generates a message, to allow us to see
the SQL statement that is waiting. This allows it to generate a message
prior to the statement completing, which is important because it may not
ever complete, in some cases, so simply logging a list of pids won't
always tell you what the SQL was that was waiting.

Other approaches are possible...

Comments?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

Index: doc/src/sgml/config.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.108
diff -c -r1.108 config.sgml
*** doc/src/sgml/config.sgml	1 Feb 2007 00:28:16 -	1.108
--- doc/src/sgml/config.sgml	6 Feb 2007 12:31:49 -
***
*** 2936,2941 
--- 2936,2965 
/listitem
   /varlistentry
  
+  varlistentry id=guc-log-lock-waits xreflabel=log_lock_waits
+   termvarnamelog_lock_waits/varname (typeboolean/type)/term
+   indexterm
+primaryvarnamelog_lock_waits/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ 		Controls whether log messages are produced when a statement is forced
+ 		to wait when trying to acquire locks on database objects. The threshold
+ 		time is the value of the xref linkend=guc-deadlock-timeout parameter.
+ 		The log messages generated are intended for use during specific 
+ 		investigations into application performance issues and subsequent tuning.
+ 		It is designed for use in conjunction with varnamelog_min_error_statement/.
+ 		Log messages indicating long lock waits might indicate problems with
+ 		applications accessing the database or possibly disconnection issues. 
+ 		If no such problem exist it might indicate that varnamedeadlock_timeout/
+ 		could be set higher. Log messages might also indicate that certain
+ 		deadlocks have been avoided. In those cases, decreasing the value of 
+ 		varnamedeadlock_timeout/ might resolve lock wait situations faster,
+ 		thereby reducing contention. By default, this form of logging is literaloff/.
+/para
+   /listitem
+  /varlistentry
+ 
   /variablelist
  /sect2
 /sect1
Index: src/backend/storage/lmgr/deadlock.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/deadlock.c,v
retrieving revision 

Re: [HACKERS] --enable-debug does not work with gcc

2007-02-19 Thread Bruce Momjian

Added to developer's FAQ:

PIsrc/Makefile.custom/I can be used to set environment variables,
like ICUSTOM_COPT/I, that are used for every compile.



---

Magnus Hagander wrote:
  If we did what you suggest, then --enable-debug would cause performance
  degradation, which would cause people to not use it, which would result
  in most binaries being completely undebuggable rather than only partially.
  Doesn't sound like a good tradeoff to me.
 
  Personally, in my development tree I use a Makefile.custom containing
 
  # back off optimization unless profiling
  ifeq ($(PROFILE),)
 CFLAGS:= $(patsubst -O2,-O1,$(CFLAGS))
  endif
 
  -O1 still generates uninitialized variable warnings but the code is a
  lot saner to step through ... not perfect, but saner.  It's been a
  workable compromise for a long time.  I don't recommend developing with
  -O0, exactly because it disables some mighty valuable warnings.
  
  Agreed.  I use -O1 by default myself, unless I am doing performance testing.
  
 Something for the developers FAQ perhaps? I confess I did not know of
 Makefile.custom :-D
 
 //Magnus
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] TopPlan, again

2007-02-19 Thread Simon Riggs
On Sun, 2007-02-18 at 18:19 -0500, Tom Lane wrote:
 While thinking about having a centralized plan cache for managing plan
 invalidation, I got annoyed again about the fact that the executor needs
 access to the Query tree.  This means that we'll be storing *three*
 representations of any cached query: raw parsetree for possible
 regeneration, plus parsed Query tree and Plan tree.
...
 After looking over the code it seems that the executor needs a limited
 subset of the Query fields, namely
...
   into
   intoOptions
   intoOnCommit (why is this separate from intoOptions?)
   intoTableSpaceName
...

 which I think we should put into a new TopPlan node type.

All else sounds good, but why would we be caching a plan that used these
fields? Anybody re-executing a CREATE TABLE AS SELECT on the same table
isn't somebody we should be helping. ISTM that we'd be able to exclude
them from the TopPlan on that basis, possibly creating an Into node to
reduce the clutter.



Couple of incidental points on plan invalidation:
- We need to consider how the planner uses parameter values. Currently
the unnamed query utilises the first bind parameters to plan the query.
Doing that when we have a central plan cache will definitely cause
problems in some applications which currently repeatedly re-specify the
same parameter on their session only, but differ across sessions. Sounds
bizarre, but assuming that all users of the same query want it optimised
the same way is not a good assumption in all cases. I'm completely in
favour of a centralized plan cache in all other ways...

- I'd like to make it impossible to re-plan the output columns of
queries with unspecified output columns e.g. * or foo.* 
This makes it possible for the results of the query to change during
re-execution. I've never seen an application that used dynamic query
that allowed for the possibility that the result metadata might change
as we re-execute and allowing it would seem likely to break more
applications than we'd really want. It will also allow us to remove the
Metadata call from the v3 Protocol at Exec time, as David Strong
suggested last year on pgsql-jdbc.

- It would be good to allow for exec-time constraint exclusion, which
would allow caching plans that used by CE and stable functions (e.g. col
 CURRENT_DATE). That may change the design, even though thats not an
8.3 thing at all.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] TopPlan, again

2007-02-19 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 After looking over the code it seems that the executor needs a limited
 subset of the Query fields, namely
 ...
 which I think we should put into a new TopPlan node type.

 All else sounds good, but why would we be caching a plan that used these
 fields?

Um, what's your point?  I certainly have no desire to support two
different Executor APIs depending on whether we think the command might
be worth cacheing or not.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ToDo: add documentation for operator IS OF

2007-02-19 Thread Bruce Momjian
Pavel Stehule wrote:
 Hello,
 
 I miss doc for this operator

Strang IS [NOT] OF wasn't documented, especially seeing it was added in
PostgreSQL 7.3.  Anyway, documented and backpatched to 8.2.X.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/func.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.365
diff -c -c -r1.365 func.sgml
*** doc/src/sgml/func.sgml	19 Feb 2007 22:51:44 -	1.365
--- doc/src/sgml/func.sgml	20 Feb 2007 00:24:15 -
***
*** 411,416 
--- 411,432 
  literalIS NOT NULL/literal, respectively, except that the input
  expression must be of Boolean type.
 /para
+  
+para
+ indexterm
+  primaryIS OF/primary
+ /indexterm
+ indexterm
+  primaryIS NOT OF/primary
+ /indexterm
+ It is possible to check the data type of an expression using the
+ constructs
+ synopsis
+ replaceableexpression/replaceable IS OF (typename, ...)
+ replaceableexpression/replaceable IS NOT OF (typename, ...)
+ /synopsis
+ They return a boolean value based on whether the expression's data
+ type is one of the listed data types.
/sect1
  
sect1 id=functions-math
Index: doc/src/sgml/syntax.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v
retrieving revision 1.115
diff -c -c -r1.115 syntax.sgml
*** doc/src/sgml/syntax.sgml	6 Feb 2007 09:16:07 -	1.115
--- doc/src/sgml/syntax.sgml	20 Feb 2007 00:24:15 -
***
*** 847,853 
row
 entrytokenIS/token/entry
 entry/entry
!entryliteralIS TRUE/, literalIS FALSE/, literalIS UNKNOWN/, literalIS NULL//entry
/row
  
row
--- 847,854 
row
 entrytokenIS/token/entry
 entry/entry
!entryliteralIS TRUE/, literalIS FALSE/, literalIS
!UNKNOWN/, literalIS NULL/, literalIS OF//entry
/row
  
row

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] ToDo: add documentation for operator IS OF

2007-02-19 Thread Joe Conway

Bruce Momjian wrote:

Pavel Stehule wrote:

Hello,

I miss doc for this operator


Strang IS [NOT] OF wasn't documented, especially seeing it was added in
PostgreSQL 7.3.  Anyway, documented and backpatched to 8.2.X.


Here's the reason -- see this thread:
http://archives.postgresql.org/pgsql-patches/2003-08/msg00062.php

Joe


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ToDo: add documentation for operator IS OF

2007-02-19 Thread Bruce Momjian
Joe Conway wrote:
 Bruce Momjian wrote:
  Pavel Stehule wrote:
  Hello,
 
  I miss doc for this operator
  
  Strang IS [NOT] OF wasn't documented, especially seeing it was added in
  PostgreSQL 7.3.  Anyway, documented and backpatched to 8.2.X.
 
 Here's the reason -- see this thread:
 http://archives.postgresql.org/pgsql-patches/2003-08/msg00062.php

Wow, interesting.  I do remember that now.  Should I revert the
documentation addition and add a comment to gram.y?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps

2007-02-19 Thread Galy Lee


Gregory Stark wrote:

If we could have autovacuum interrupt a vacuum in mid-sweep, perform a cycle
of vacuums on smaller tables, then resume, that problem would go away. That
sounds too difficult though, but perhaps we could do something nearly as good.


I think to make vacuum has this interrupted-resume capability is quite 
useful for large table.


It can provide more flexibility for autovacuum to create a good schedule 
scheme. Sometimes it takes a whole day to vacuum the large table 
(Hundreds-GB table may qualify); setting the cost_delay make it even 
lasts for several days. If the system has maintenance time, vacuum task 
of the large table can be split to fit into the maintenance time by 
interrupted-resume feature.



One option that I've heard before is to have vacuum after a single iteration
(ie, after it fills maintenance_work_mem and does the index cleanup and the
second heap pass), remember where it was and pick up from that point next
time.



Even a single iteration may take a long time, so it is not so much 
useful to have a break in the boundary of the iteration. I think it is 
not so difficult to get vacuum to remember where it leaves and start 
from where it leaves last time. The following is a basic idea.


A typical vacuum process mainly have the following phases:
 Phase 1. scan heap
 Phase 2. scan and sweep index
 Phase 3. sweep heap
 Phase 4. update FSM
 Phase 5. truncate CLOG

Where vacuum is interrupted, we can just save the collected information 
into the disk, and restore it later when vacuum restarts. When vacuum 
process is interrupted, we can remember the dead tuple list and the 
block number it has scanned in phase 1; the indexes it has cleanup in 
phase 2; the tuples it has swept in phase 3. Before exiting from vacuum, 
we can also merge the free space information into FSM.


We are working on this feature now.  I will propose it latter to discuss 
with you.


Best Regards
Galy Lee
--
NTT Open Source Software Center

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread David Fetter
On Mon, Feb 19, 2007 at 02:50:34PM -0500, Andrew Sullivan wrote:
 On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
  Martijn van Oosterhout kleptog@svana.org writes:
   Somehow this seems like implementing RAID within postgres,
  
  RAID and LVM too.  I can't get excited about re-inventing those
  wheels when perfectly good implementations already exist for us to
  sit on top of.
 
 Ok, warning, this is a you know what would be sweet moment.
 
 What would be nice is to be able to detach one of the volumes, and
 know the span of the data in there without being able to access the
 data.
 
 The problem that a lot of warehouse operators have is something like
 this: We know we have all this data, but we don't know what we will
 want to do with it later.  So keep it all.  I'll get back to you
 when I want to know something.
 
 It'd be nice to be able to load up all that data once, and then
 shunt it off into (say) read-only media.  If one could then run a
 query that would tell one which spans of data are candidates for the
 search, you could bring back online (onto reasonably fast storage,
 for instance) just the volumes you need to read.

Isn't this one of the big use cases for table partitioning?

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 A workable syntax might be
 INSERT INTO pg_log ...

Why is this still under discussion?  I thought we'd agreed that COPY
format was the way to go.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Tom Lane
August Zajonc [EMAIL PROTECTED] writes:
 The key is how lightweight the setup could be, which matters because
 clients are not always willing to pay for a PITR setup. The low overhead
 would mean you'd feel fine about setting guc to 1hr or so.

This would have exactly the same performance consequences as always
having an hour-old open transaction.  I'm afraid that describing it
as low overhead is mere wishful thinking: it would cripple vacuuming
of high-update tables and greatly increase the typical load on pg_clog
and pg_subtrans.  We already know that pg_subtrans contention can be a
source of context-swap storms, with the size of the window back to
GlobalXmin being the controlling factor for how bad it gets.

It's possible that this last could be addressed by separating the
concept of old enough to be vacuumed from GlobalXmin, but it's
certainly not a trivial thing.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] pgsql: Fix for plpython functions; return true/false for boolean,

2007-02-19 Thread Bruce Momjian

I am still waiting for a plpython patch that has Python version
checking.

---

Guido Goldstein wrote:
 Peter Eisentraut wrote:
  Guido Goldstein wrote:
  Is it possible to tell me which python versions you want to
  support?
  
  The issue isn't so much which versions we want to support.  There is 
  certainly some flexibility with that.  But when a patch breaks the 
  buildfarm a) unannounced and b) without any apparent feature gain, then 
  people get annoyed.
 
 If this breaks the buildfarm it's not my failure.
 Except you can tell me what I've got to do with the
 buildfarm.
 
 If you mean that plpython didn't compile, fine; simply tell
 the people what version they should consider when sending
 in patches.
 
 I've checked the patch with postgres 8.1.3 and 8.2.1
 with python 2.4 and 2.5 on intel 32 bit and amd 64 bit
 systems; all systems running linux.
 
 *And* it's not a feature patch but a bug-fixing one!
 Python is a language with strong typing, so silently
 converting a datatype is a bug -- not a feature.
 Btw, you'll lose the type information of boolean columns in
 trigger functions (NEW and OLD dicts, no explicit parameters),
 which does cause problems.
 
  That said, we certainly try to support a few more versions of Python 
 [...]
 
 If you want to support python 2.3 use the attached patch, which also
 works for the newer python versions.
 The Python 2.3 branch is the oldest _officially_ supported python version.
 
 Anyway, to circumvent the above mentiond point a) I herewith anncounce
 that the included patch might break the buildfarm.
 
 Cheers
Guido
 


 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Log levels for checkpoint/bgwriter monitoring

2007-02-19 Thread Greg Smith
I have a WIP patch that adds the main detail I have found I need to 
properly tune checkpoint and background writer activity.  I think it's 
almost ready to submit (you can see the current patch against 8.2 at 
http://www.westnet.com/~gsmith/content/postgresql/patch-checkpoint.txt ) 
after making it a bit more human-readable.  But I've realized that along 
with that, I need some guidance in regards to what log level is 
appropriate for this information.


An example works better than explaining what the patch does:

2007-02-19 21:53:24.602 EST - DEBUG:  checkpoint required (wrote 
checkpoint_segments)

2007-02-19 21:53:24.685 EST - DEBUG:  checkpoint starting
2007-02-19 21:53:24.705 EST - DEBUG:  checkpoint flushing buffer pool
2007-02-19 21:53:24.985 EST - DEBUG:  checkpoint database fsync starting
2007-02-19 21:53:42.725 EST - DEBUG:  checkpoint database fsync complete
2007-02-19 21:53:42.726 EST - DEBUG:  checkpoint buffer flush dirty=8034 
write=279956 us sync=17739974 us


Remember that Load distributed checkpoint discussion back in December? 
You can see exactly how bad the problem is on your system with this log 
style (this is from a pgbench run where it's postively awful--it really 
does take over 17 seconds for the fsync to execute, and there are clients 
that are hung the whole time waiting for it).


I also instrumented the background writer.  You get messages like this:

2007-02-19 21:58:54.328 EST - DEBUG:  BGWriter Scan All - Written = 5/5 
Unscanned = 23/54


This shows that we wrote (5) the maximum pages we were allowed to write 
(5) while failing to scan almost half (23) of the buffers we meant to look 
at (54).  By taking a look at this output while the system is under load, 
I found I was able to do bgwriter optimization that used to take me days 
of frustrating testing in hours.  I've been waiting for a good guide to 
bgwriter tuning since 8.1 came out.  Once you have this, combined with 
knowing how many buffers were dirty at checkpoint time because the 
bgwriter didn't get to them in time (the number you want to minimize), the 
tuning guide practically writes itself.


So my question is...what log level should all this go at?  Right now, I 
have the background writer stuff adjusting its level dynamically based on 
what happened; it logs at DEBUG2 if it hits the write limit (which should 
be a rare event once you're tuned properly), DEBUG3 for writes that 
scanned everything they were supposed to, and DEBUG4 if it scanned but 
didn't find anything to write.  The source of checkpoint information logs 
at DEBUG1, the fsync/write info at DEBUG2.


I'd like to move some of these up.  On my system, I even have many of the 
checkpoint messages logged at INFO (the source of the checkpoint and the 
total write time line).  It's a bit chatty, but when you get some weird 
system pause issue it makes it easy to figure out if checkpoints were to 
blame.  Given how useful I feel some of these messages are to system 
tuning, and to explaining what currently appears as inexplicable pauses, I 
don't want them to be buried at DEBUG levels where people are unlikely to 
ever see them (I think some people may be concerned about turning on 
things labeled DEBUG at all).  I am aware that I am too deep into this to 
have an unbiased opinion at this point though, which is why I ask for 
feedback on how to proceed here.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] \copy (query) delimiter syntax error

2007-02-19 Thread Andrew Dunstan


I think I'll go with Tom's Plan B for HEAD, but not do anything more for 
8.2 than has already been done.


cheers

andrew


Bruce Momjian wrote:

Did we come to a conclusion on this?

---

Andrew Dunstan wrote:
  

Andrew Dunstan wrote:


Tom Lane wrote:
  

Andrew Dunstan [EMAIL PROTECTED] writes:
 

The consequence will be, though, that psql will accept a syntax for 
\copy (query) ... that the corresponding backend command would 
reject were we not transforming it. That strikes me as potentially 
confusing.

  

Perhaps.  What about plan B: remove the legacy syntax support in \copy?
IIRC it has not been documented since 7.2, so maybe we can finally throw
it overboard.  Thoughts?


  

I like it for 8.3 - but  maybe my present patch would be better for 
8.2, as it involves less behaviour change.


  
While we decide this issue, which can be worked around in any case, I am 
going to commit the part of the patch that nobody has objected to (and 
which will fix Michael's original complaint), on HEAD and 8.2 stable, so 
we can get some testing going.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



  


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 On Mon, 19 Feb 2007, Alvaro Herrera wrote:
 Also, sql is not really a destination -- it is a format.

 A log file with a different name is another destination.  eventlog is 
 certainly a different format and it's sitting happily as an option there. 
 I haven't heard anyone make a useful argument yet as to how insert/sql 
 logs are any different than the current way that stderr, syslog, and 
 eventlog are all possibilities now for log_destination, each with their 
 own little quirks (and in the case of syslog, their own additional GUC 
 parameters).

Since the sql format doesn't make any sense for syslog or eventlog
output, I tend to agree that treating it as a destination is a
reasonable answer.  It's going to be a bit non-orthogonal no matter
which way we jump, but this seems like the most natural and useful
extension from where we are.  To me anyway ... YMMV ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Greg Smith

On Mon, 19 Feb 2007, Tom Lane wrote:


Why is this still under discussion?  I thought we'd agreed that COPY
format was the way to go.


Joshua Drake said COPY would be a good option, but INSERT is probably 
what I would use as the default. The most use I see for this is something 
where I am tailing out the log and inserting live into a log db... and I 
completely agreed with him--that's also how all the applications I'd like 
to build around this feature are expected to operate.  No one said 
anything else on this topic to defend COPY as the right choice until you 
just brought it back up here.


The arguments for COPY are performance and that you don't need to specify 
the table name.  INSERT is slower and you need a name, but it's easier to 
build a UNIX tool style pipeline to import it in real-time.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Jonah H. Harris

On 2/17/07, Joshua D. Drake [EMAIL PROTECTED] wrote:

My understanding is that the main difference is that rollbacks are
inexpensive for us, but expensive for Oracle.


Yes, Oracle is optimized for COMMIT, we're optimized for ROLLBACK :)

In all seriousness, last time I checked Oracle's MVCC was covered by
two patents.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Short varlena headers and arrays

2007-02-19 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Elements of arrays are not subject to being toasted by themselves, so
 I don't think you can make that work.  At least not without breaking
 wide swaths of code that works fine today.

 You think it's more likely there are places that build arrays and then read
 the items back without passing through detoast than there are places that
 build tuples and do so?

The former is valid per the coding rules, the latter is not, so...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 The arguments for COPY are performance and that you don't need to specify 
 the table name.  INSERT is slower and you need a name, but it's easier to 
 build a UNIX tool style pipeline to import it in real-time.

I can't believe that any production situation could tolerate the
overhead of one-commit-per-log-line.  So a realistic tool for this
is going to have to be able to wrap blocks of maybe 100 or 1000 or so
log lines with BEGIN/COMMIT, and that is exactly as difficult as
wrapping them with a COPY command.  Thus, I disbelieve your argument.
We should not be designing this around an assumed use-case that will
only work for toy installations.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Plan invalidation design

2007-02-19 Thread Simon Riggs
On Sat, 2007-02-17 at 12:48 -0500, Tom Lane wrote:

 Relcache inval casts a fairly wide net; for example, adding or dropping an
 index will invalidate all plans using the index's table whether or not
 they used that particular index, and I believe that VACUUM will also
 result in a relcache inval due to updating the table's pg_class row.
 I think this is a good thing though --- for instance, after adding an
 index it seems a good idea to replan to see if the new index is useful,
 and replanning after a VACUUM is useful if the table has changed size
 enough to warrant a different plan.  OTOH this might mean that plans on a
 high-update-traffic table never survive very long because of autovacuum's
 efforts.  If that proves to be a problem in practice we can look at ways
 to dial down the number of replans, but for the moment I think it's more
 important to be sure we *can* replan at need than to find ways to avoid
 replans.

Just some info on that: In an update-intensive scenario, I'm seeing
VACUUMs every 2 minutes on the heaviest hit tables on CVS HEAD on a
medium-powered 4-CPU server. Re-planning multiple queries on 100+
sessions every few minutes would not be good.

It seems a reasonable working assumption that HOT will reduce that
requirement considerably, but its something to watch. Thanks for drawing
attention  to it.

Presumably ANALYZE would have the same effect?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Robert Treat
On Monday 19 February 2007 15:08, Bruce Momjian wrote:
 Joshua D. Drake wrote:
  Andrew Sullivan wrote:
   On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
   Martijn van Oosterhout kleptog@svana.org writes:
   Somehow this seems like implementing RAID within postgres,
  
   RAID and LVM too.  I can't get excited about re-inventing those wheels
   when perfectly good implementations already exist for us to sit on top
   of.
  
   Ok, warning, this is a you know what would be sweet moment.
 
  The dreaded words from a developers mouth to every manager in the world.

 Yea, I just instinctively hit delete when I saw that phrase.

Too bad... I know oracle can do what he wants... possibly other db systems as 
well. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] HOT WIP Patch - version 2

2007-02-19 Thread Pavan Deolasee

Reposting - looks like the message did not get through in the first
attempt. My apologies if multiple copies are received.


This is the next version of the HOT WIP patch. Since the last patch that
I sent out, I have implemented the HOT-update chain pruning mechanism.

When following a HOT-update chain from the index fetch, if we notice that
the root tuple is dead and it is HOT-updated, we try to prune the chain to
the smallest possible length. To do that, the share lock is upgraded to an
exclusive lock and the tuple chain is followed till we find a
live/recently-dead
tuple. At that point, the root t_ctid is made point to that tuple. In order
to
preserve the xmax/xmin chain, the xmax of the root tuple is also updated
to xmin of the found tuple. Since this xmax is also  RecentGlobalXmin
and is a committed transaction, the visibility of the root tuple still
remains
the same.

The intermediate heap-only tuples are  removed from the HOT-update chain.
The HOT-updated status of these tuples is cleared and their respective
t_ctid are made point to themselves. These tuples are not reachable now
and ready for vacuuming. This entire action is logged in a single
WAL record.

During vacuuming, we keep track of number of root tuples vacuumed.
If this count is zero, then the index cleanup step is skipped. This
would avoid unnecessary index scans whenever possible.

This patch should apply cleanly on current CVS head and pass all regression
tests. I am still looking for review comments from the first WIP patch. If
anyone
has already looked through it and is interested in the incremental changes,
please let me know. I can post that.

Whats Next ?
-

ISTM that  the basic  HOT-updates and ability to prune the HOT-update chain,

should help us reduce the index bloat, limit the overhead of ctid following
in
index fetch and efficiently vacuum heap-only tuples. IMO the next important
but rather less troublesome thing to tackle is to reuse space within a block

without complete vacuum of the table. This would help us do much more
HOT-updates and thus further reduce index/heap bloat.

I am thinking of reusing the DEAD heap-only tuples which gets removed from
the HOT-update chain as part of pruning operation. Since these tuples, once
removed from the chain, are neither reachable nor have any index references,
could be readily used for storing newer versions of the same or other rows
in
the block. How about setting LP_DELETE on these tuples as part of the
prune operation ? LP_DELETE is unused for heap tuples, if I am not
mistaken. Other information like length and offset are is maintained as it
is.
When we run out space for update-within-the-block, we traverse
through all the line pointers looking for LP_DELETEd items. If any of these
items have space large enough to store the new tuple, that item is reused.
Does anyone see any issue with doing this ? Also, any suggestions
about doing it in a better way ?

If the page gets really fragmented, we can try to grab a VACUUM-strength
lock on the page and de-fragment it. The lock is tried conditionally to
avoid
any deadlocks. This is done in the heap_update() code path, so would add
some overhead, but may still prove better than putting the tuple in a
different block and having corresponding index insert(s). Also, since we are
more concerned about the large tables, the chances of being able to upgrade
the exclusive lock to vacuum-strength lock are high. Comments ?

If there are no objections, I am planning to work on the first part
while Nikhil would take up the second task of block level retail-vacuum.
Your comments on these issues and the patch are really appreciated.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com


NewHOT-v2.0.patch.gz
Description: GNU Zip compressed data

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Robert Treat
On Monday 19 February 2007 11:27, Martijn van Oosterhout wrote:
 On Mon, Feb 19, 2007 at 05:10:36PM +0100, Dimitri Fontaine wrote:
   RAID and LVM too.  I can't get excited about re-inventing those wheels
   when perfectly good implementations already exist for us to sit on top
   of.
 
  I though moving some knowledge about data availability into PostgreSQL
  code could provide some valuable performance benefit, allowing to
  organize reads (for example parallel tables scan/indexes scan to
  different volumes) and obtaining data from 'quicker' known volume (or
  least used/charged).

 Well, organising requests to be handled quickly is a function of
 LVM/RAID, so we don't go there. However, speeding up scans by having
 multiple requests is an interesting approach, as would perhaps a
 different random_page_cost for different tablespaces.


On one of my systems I have 1 tablespace for read data (99-1), 1 for read 
mostly data (90-10), and 1 for write mostly (40-60).  The breakdown is based 
on a combination of the underlying hardware and usage patterns of the tables 
involved. I suspect that isn't that uncommon really.  I've often thought that 
being able to set guc variables to a specific tablespace (like you can do for 
users) would allow for a lot of flexibility in tuning queries that go across 
different tablespaces. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


  1   2   >