date:20070702

Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-02 Thread Pavel Stehule

2007/7/3, Gregory Stark <[EMAIL PROTECTED]>:

"Jaime Casanova" <[EMAIL PROTECTED]> writes:

> while not just a new rekind indicating this is a template and not and
> actual table. and using that template for creating the actual tables?

For precisely the reason stated upthread. That would mean creating and
deleting catalog entries for every transaction. Imagine a busy OLTP system
running hundreds of transactions per second trying to use a temporary table
for intermediate results. Mixing DDL and DML is just as bad an idea behind the
scenes as it is for users.

Global temp table can be created from template only when is used. It's
has not negative efect on app which doesn't use it. The benefit of
g.t.t. is simplifycation of stored procedures.

regards
Pavel Stehule

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-02 Thread Pavel Stehule



> How about a new relkind which causes the table to be located in
> PGDATA/base//pg_temp_/
> So each backend can have its own copy of the table with the same
> relfilenode; there's no need for extra catalog entries.

Uh-huh.  And what do you do with relpages, reltuples, relfrozenxid, and
pg_statistic entries?  What if one backend wants to TRUNCATE or CLUSTER
its copy (requiring a new relfilenode)?  Where does ALTER TABLE fit into
this?



This entries can be teoreticly virtual (in memory). If we have some
memory storage we can use it for it.

nice a day
Pavel Stehule

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Joshua D. Drake


Alvaro Herrera wrote:

Joshua D. Drake wrote:

Tom Lane wrote:

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:

http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :

Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.

Is there a reason to say anything beyond "use autovac"?
Did we change the default autovac parameters for 8.3 (beyond turning it 
on?) because on any reasonably used database, they are way to conservative.


We're still on time to change them ...  Any concrete proposals?


I could provide numbers from production high use databases. We could 
probably back those down a little and make more reasonable numbers.


Joshua D. Drake




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Alvaro Herrera

Joshua D. Drake wrote:
> Tom Lane wrote:
> >"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> >>http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
> >
> >Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
> >text now.
> >
> >Is there a reason to say anything beyond "use autovac"?
> 
> Did we change the default autovac parameters for 8.3 (beyond turning it 
> on?) because on any reasonably used database, they are way to conservative.

We're still on time to change them ...  Any concrete proposals?

-- 
Alvaro Herrera   http://www.PlanetPostgreSQL.org/
"When the proper man does nothing (wu-wei),
his thought is felt ten thousand miles." (Lao Tse)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Joshua D. Drake


Tom Lane wrote:

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:

http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :


Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.

Is there a reason to say anything beyond "use autovac"?


Did we change the default autovac parameters for 8.3 (beyond turning it 
on?) because on any reasonably used database, they are way to conservative.


Joshua D. Drake



regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Tom Lane

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :

Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.

Is there a reason to say anything beyond "use autovac"?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Robert Treat

On Monday 02 July 2007 17:52, Jim C. Nasby wrote:
> From
> http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
>
> "Recommended practice for most sites is to schedule a database-wide
> VACUUM once a day at a low-usage time of day, supplemented by more
> frequent vacuuming of heavily-updated tables if necessary. (Some
> installations with extremely high update rates vacuum their busiest
> tables as often as once every few minutes.) If you have multiple
> databases in a cluster, don't forget to VACUUM each one; the program
> vacuumdb  might be helpful."
>
> Do we still want that to be our formal recommendation? ISTM it would be
> more logical to recommend a combination of autovac, daily vacuumdb -a if
> you can afford it and have a quiet period, and frequent manual vacuuming
> of things like web session tables.
>
> I'm happy to come up with a patch, but I figure there should be
> consensus first...

I generally recommend to try autovacuum first, augmented by 
vacuum/analyze/reindex if you find trouble.  I wont say there aren't 
workloads that autvacuum wont handle, but in most cases it does fine, and I 
expect that increase with 8.3. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-02 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane escribió:
>> I rather doubt that.  The most likely implementation would involve
>> cloning a "template" entry into pg_class.

> How about a new relkind which causes the table to be located in
> PGDATA/base//pg_temp_/
> So each backend can have its own copy of the table with the same
> relfilenode; there's no need for extra catalog entries.

Uh-huh.  And what do you do with relpages, reltuples, relfrozenxid, and
pg_statistic entries?  What if one backend wants to TRUNCATE or CLUSTER
its copy (requiring a new relfilenode)?  Where does ALTER TABLE fit into
this?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-02 Thread Gregory Stark

"Jaime Casanova" <[EMAIL PROTECTED]> writes:

> while not just a new rekind indicating this is a template and not and
> actual table. and using that template for creating the actual tables?

For precisely the reason stated upthread. That would mean creating and
deleting catalog entries for every transaction. Imagine a busy OLTP system
running hundreds of transactions per second trying to use a temporary table
for intermediate results. Mixing DDL and DML is just as bad an idea behind the
scenes as it is for users.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-02 Thread Jaime Casanova

On 7/3/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

Tom Lane escribió:
> Jim Nasby <[EMAIL PROTECTED]> writes:
> > I've often thought that having global temp tables would be a really
> > good idea, since it would drastically reduce the need to vacuum
> > catalog tables,
>
> I rather doubt that.  The most likely implementation would involve
> cloning a "template" entry into pg_class.

How about a new relkind which causes the table to be located in
PGDATA/base//pg_temp_/

So each backend can have its own copy of the table with the same
relfilenode; there's no need for extra catalog entries.

we recently make the path for temp files to be just base/pgsql_tmp or
pg_tblspc//pgsql_tmp. do we want to complicate things
again?

while not just a new rekind indicating this is a template and not and
actual table. and using that template for creating the actual tables?

--
regards,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
  Richard Cook

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Updated tsearch documentation

2007-07-02 Thread Bruce Momjian

Oleg Bartunov wrote:
> On Wed, 20 Jun 2007, Bruce Momjian wrote:
> >>
> >> We need to decide if we need oids as user-visible argument. I don't see
> >> any value, probably Teodor think other way.
> >
> > This is a good time to clean up the API because there are going to be
> > user-visible changes anyway.
> 
> Bruce, just remove oid argument specification from documentation.

Done.  I am attaching the current function prototypes.  If they don't
match the C code, please let me know.

I have also updated with some minor corrections I received from Erik.  I
will be adding more to the documentation hopefully this week:

http://momjian.us/expire/fulltext/HTML/

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
*** /pgsgml/fulltext-opfunc.sgml	Sat Jun 16 23:30:11 2007
--- fulltext-opfunc.sgml	Mon Jul  2 21:17:15 2007
***
*** 141,147 

! to_tsvector(configuration,  document TEXT) returns TSVECTOR

--- 141,147 

! to_tsvector(conf_name,  document TEXT) returns TSVECTOR

***
*** 285,306 

! tsearch(vector_column_name, (my_filter_name | text_column_name1) ... , text_column_nameN)

***
*** 323,329 

! stat(sqlquery text , weight text ) returns SETOF statinfo

--- 322,328 

! stat(sqlquery text , weight text ) returns SETOF statinfo

***
*** 403,409 

! to_tsquery(configuration, querytext text) returns TSQUERY

--- 402,408 

! to_tsquery(conf_name, querytext text) returns TSQUERY

***
*** 446,452 

! plainto_tsquery(configuration,  querytext text) returns TSQUERY

--- 445,451 

! plainto_tsquery(conf_name,  querytext text) returns TSQUERY

***
*** 989,995 

! rank( weights float4[],  vector TSVECTOR, query TSQUERY,  normalization int4 ) returns float4

--- 988,994 

! rank( weights float4[], vector TSVECTOR, query TSQUERY,  normalization int4 ) returns float4

***
*** 1084,1090 

! headline( id int4, | ts_name text,  document text, query TSQUERY,  options text ) returns text

--- 1083,1089 

! headline( ts_name text, document text, query TSQUERY,  options text ) returns text

***
*** 1351,1357 

! lexize( oid, | dict_name text, lexeme text) returns text[]

--- 1350,1356 

! lexize( dict_name text, lexeme text) returns text[]

***
*** 1858,1878 
  Debugging

! Function ts_debug allows easy testing of your full text indexing
  configuration.

! ts_debug(cfgname | oid ,document TEXT) returns SETOF tsdebug

--- 1852,1870 
  Debugging

! Function ts_debug allows easy testing of your full text searching
  configuration.

! ts_debug(conf_name, document TEXT) returns SETOF tsdebug

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-02 Thread Alvaro Herrera

Tom Lane escribió:
> Jim Nasby <[EMAIL PROTECTED]> writes:
> > I've often thought that having global temp tables would be a really  
> > good idea, since it would drastically reduce the need to vacuum  
> > catalog tables,
> 
> I rather doubt that.  The most likely implementation would involve
> cloning a "template" entry into pg_class.

How about a new relkind which causes the table to be located in
PGDATA/base//pg_temp_/

So each backend can have its own copy of the table with the same
relfilenode; there's no need for extra catalog entries.

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
"La victoria es para quien se atreve a estar solo"

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Jim C. Nasby

From
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :

"Recommended practice for most sites is to schedule a database-wide
VACUUM once a day at a low-usage time of day, supplemented by more
frequent vacuuming of heavily-updated tables if necessary. (Some
installations with extremely high update rates vacuum their busiest
tables as often as once every few minutes.) If you have multiple
databases in a cluster, don't forget to VACUUM each one; the program
vacuumdb  might be helpful."

Do we still want that to be our formal recommendation? ISTM it would be
more logical to recommend a combination of autovac, daily vacuumdb -a if
you can afford it and have a quiet period, and frequent manual vacuuming
of things like web session tables.

I'm happy to come up with a patch, but I figure there should be
consensus first...
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp6Wph0n8LxU.pgp
Description: PGP signature

Re: [HACKERS] [COMMITTERS] pgsql: Fix PGXS conventions so that extensions can be built against

2007-07-02 Thread Robert Treat

On Tuesday 26 June 2007 18:05, Tom Lane wrote:
> Log Message:
> ---
> Fix PGXS conventions so that extensions can be built against Postgres
> installations whose pg_config program does not appear first in the PATH.
> Per gripe from Eddie Stanley and subsequent discussions with Fabien Coelho
> and others.
>

Is there any chance of this being backpatched?  I just spent a few hours 
tracking down a problem with compiling a 3rd party module against an 8.2 
installation installed seperatly from my systems packages install. (Ie. i 
didnt move it, but there was an additional pg_config on the system pointing 
to the wrong/other place).  I'm not exactly sure how pgxs ever worked on 
systems with multiple postgres's installed, but I didn't see much discussion 
of backpatching this fix.  I'm wondering if we're going to start seeing more 
reports of this as three people (or four?) have hit it in the last week, all 
doing seperate things. Coincidence, or sign of impending doom? :-)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] GiST consistent function, expected arguments; multi-dimensional indexes

2007-07-02 Thread Martijn van Oosterhout

On Mon, Jul 02, 2007 at 10:44:55AM -0700, Eric wrote:
> I guess you can also get this before writing code from
> 
> select typbyval from pg_type where typname='mytype'

Note that the flag might not be constant. For example int8 is not byval
currently whereas it could be on a 64-bit architecture. However,
variable-length values are always byref.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Postgresql.conf cleanup

2007-07-02 Thread Tom Lane

Greg Smith <[EMAIL PROTECTED]> writes:
> On Mon, 2 Jul 2007, Tom Lane wrote:
>>> # wal_buffers = 1MB
>> Is there really evidence in favor of such a high setting for this,
>> either?

> I noticed consistant improvements in throughput on pgbench results with 
> lots of clients going from the default to 256KB, flatlining above that; it 
> seemed sufficiently large for any system I've used.  I've taken to using 
> 1MB anyway nowadays because others suggested that number, and it seemed to 
> be well beyond the useful range and thus never likely to throttle 
> anything.  Is there any downside to it being larger than necessary beyond 
> what seems like a trivial amount of additional RAM?

There might be some value in keeping wal_buffers small enough to fit in
L2 cache; not sure.

But pgbench is not really the poster child for large wal_buffers,
because it consists exclusively of short transactions.  The gain from
enlarging wal_buffers stops the moment it passes your largest
time-between-commits, since a commit has to flush out whatever's in
there.

There's probably not much point in arguing this now, though; once the
async commit patch is in there we will have to re-measure all the
behavior and develop new recommendations (and, quite possibly, a new
default value).  The existence of the walwriter will reduce the useful
size of wal_buffers, but the existence of async commit might increase it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [COMMITTERS] pgsql: Fix failure to restart Postgres when Linux kernel returns EIDRM

2007-07-02 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> Per reports from Michael Fuhr and Jon Lapham --- it's a bit surprising
> we have not seen more reports, actually.

Oh, fwiw I've seen this. I stop and start postmasters so often I just assumed
something wasn't getting cleaned up perfectly. The last time was a few days
ago and did puzzle me because it was the first time I was starting up the
postmaster after a reboot.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] SOLVED: unexpected EIDRM on Linux

2007-07-02 Thread Tom Lane

I wrote:
> I'm going to generate a smaller test program showing this and file
> a bug report at Red Hat.

Filed as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=246509
in case anyone wants to track it.  (I suspect the Red Hat kernel guys
will just bounce it upstream, but that's their call not mine.)

> In the mean time, it looks like we should assume EIDRM means EINVAL
> on Linux, because AFAICS there is not actually anyplace in that code
> that should return EIDRM; their data structure doesn't really have
> any state that would justify returning such a code.

Patch for this committed in all active branches.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] GiST consistent function, expected arguments; multi-dimensional indexes

2007-07-02 Thread Eric

I guess you can also get this before writing code from

select typbyval from pg_type where typname='mytype'

...thanks again.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Postgresql.conf cleanup

2007-07-02 Thread Greg Smith


On Mon, 2 Jul 2007, Tom Lane wrote:


# wal_buffers = 1MB

Is there really evidence in favor of such a high setting for this,
either?


I noticed consistant improvements in throughput on pgbench results with 
lots of clients going from the default to 256KB, flatlining above that; it 
seemed sufficiently large for any system I've used.  I've taken to using 
1MB anyway nowadays because others suggested that number, and it seemed to 
be well beyond the useful range and thus never likely to throttle 
anything.  Is there any downside to it being larger than necessary beyond 
what seems like a trivial amount of additional RAM?



# checkpoint_segments = 8 to 16 if you have the disk space (0.3 to 0.6 GB)

This seems definitely too small --- for write-intensive databases I like
to set it to 30 or so, which should eat about a GB if I did the
arithmetic right.


You did--I approximate larger values in my head by saying 1GB at 30 
segments and scaling up from there.  But don't forget this is impacted by 
the LDC change, with the segments expected to be active now


(2 + checkpoint_completion_target) * checkpoint_segments + 1

so with a default install setting the segments to 30 will creep that up to 
closer to a 1.2GB footprint.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] SOLVED: unexpected EIDRM on Linux

2007-07-02 Thread Tom Lane

It's a plain old Linux kernel bug: it returns EIDRM when it really ought
to say EINVAL, and apparently always has.  The surprising part is really
that we've not seen it many times before.

Kudos to Michael Fuhr for thinking to write a test program investigating
whether randomly-chosen IDs would yield EIDRM --- that was what led me
to study the kernel source code closely enough to realize it was just
wrong.

regards, tom lane

--- Forwarded Messages

Date:Mon, 2 Jul 2007 10:59:43 -0600
From:Michael Fuhr <[EMAIL PROTECTED]>
To:  Tom Lane <[EMAIL PROTECTED]>
Subject: Re: [GENERAL] shmctl EIDRM preventing startup

I don't know if this is relevant but on both the box that rebooted
and on another box that's been up for several weeks I see a pattern
of shmid's for which shmctl() returns EIDRM (the EACCES errors are
for segments that are in use by another user; I'm not running as root):

$ ./shmctl-test 0 1048576
shmctl(0 / 0): ERROR: Identifier removed
shmctl(1 / 0x1): ERROR: Identifier removed
shmctl(2 / 0x2): ERROR: Identifier removed
shmctl(32768 / 0x8000): ERROR: Identifier removed
shmctl(32769 / 0x8001): ERROR: Identifier removed
shmctl(32770 / 0x8002): ERROR: Identifier removed
shmctl(65536 / 0x1): ERROR: Permission denied
shmctl(65537 / 0x10001): ERROR: Identifier removed
shmctl(65538 / 0x10002): ERROR: Identifier removed
shmctl(98304 / 0x18000): ERROR: Identifier removed
shmctl(98305 / 0x18001): ERROR: Permission denied
shmctl(98306 / 0x18002): ERROR: Identifier removed
shmctl(131072 / 0x2): ERROR: Identifier removed
shmctl(131073 / 0x20001): ERROR: Identifier removed
shmctl(131074 / 0x20002): ERROR: Identifier removed
shmctl(163840 / 0x28000): ERROR: Identifier removed
shmctl(163841 / 0x28001): ERROR: Identifier removed
shmctl(163842 / 0x28002): ERROR: Permission denied
[...]
shmctl(983040 / 0xf): ERROR: Identifier removed
shmctl(983041 / 0xf0001): ERROR: Identifier removed
shmctl(983042 / 0xf0002): ERROR: Identifier removed
shmctl(1015808 / 0xf8000): ERROR: Identifier removed
shmctl(1015809 / 0xf8001): ERROR: Identifier removed
shmctl(1015810 / 0xf8002): ERROR: Identifier removed
shmctl(1048576 / 0x10): ERROR: Identifier removed

-- 
Michael Fuhr

#include 
#include 

#include 
#include 
#include 
#include 

int
main(int argc, char *argv[])
{
int  shmid, min_shmid, max_shmid, tmp_shmid;
struct shmid_ds  buf;

if (argc != 3) {
fprintf(stderr, "Usage: %s min_shmid max_shmid\n", argv[0]);
return EXIT_FAILURE;
}

min_shmid = atoi(argv[1]);
max_shmid = atoi(argv[2]);

if (min_shmid > max_shmid) {
tmp_shmid = min_shmid;
min_shmid = max_shmid;
max_shmid = tmp_shmid;
}

for (shmid = min_shmid; shmid <= max_shmid; shmid++) {
if (shmctl(shmid, IPC_STAT, &buf) == -1 && errno != EINVAL) {
printf("shmctl(%d / %#x): ERROR: %s\n", shmid, shmid, 
strerror(errno));
}
}

return EXIT_SUCCESS;
}

--- Message 2

Date:Mon, 02 Jul 2007 14:17:05 -0400
From:Tom Lane <[EMAIL PROTECTED]>
To:  Michael Fuhr <[EMAIL PROTECTED]>
Subject: Re: [GENERAL] shmctl EIDRM preventing startup 

Michael Fuhr <[EMAIL PROTECTED]> writes:
> On Mon, Jul 02, 2007 at 01:14:01PM -0400, Tom Lane wrote:
>> Oh, that's pretty durn interesting.  I get the same type of pattern on
>> my FC6 box, but not on HPUX.

> I don't get this pattern on FreeBSD 6.2 or Solaris 9 either.

Well, I've just traced through the Linux code, and I find:

1. The low-order 15 bits of the shmid are simply an index into an array
of valid shmem entries.  I'm not sure what is in index 0, but there's
apparently a live entry of some sort there.  Index 1 is the first actual
shmem segment allocated, and thereafter the first free slot is chosen
whenever you make a new shmem segment.

2. When you try to stat a segment, it takes the low-order 15 bits of the
supplied ID and indexes into this array.  If no such entry (out of
range, or NULL entry) you get EINVAL as expected.  If there's an entry
but its high-order ID bits don't match the supplied ID, you get EIDRM.

This is why the set of EIDRM IDs moves around as you create and delete
valid segments.

As near as I can tell, this is flat out a case of the kernel returning
the wrong error code.  It should say EINVAL when there's a mismatch.

It's a bit surprising that we have not seen a lot more reports of this
problem, because AFAICS the probability of a collision is extremely high
if there's more than one creator of shmem segments on a system.

I can reproduce the bug as follows:

1. Start postmaster 1.
2. Start postmaster 2 (different data directory and port).
3. Manually kill -9 both postmasters.
4. Manually ipcrm both shmem segments.
5. Start postmaster 2.
6. (Try to) start postmaster 1 --- it will fail because of EIDRM,
   because its saved shmem id points at slot 1 which is now in use
   by postmaster 2.

I'm going to generate a smaller test pr

Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-02 Thread Pavel Stehule

I

2007/7/2, Tom Lane <[EMAIL PROTECTED]>:

Jim Nasby <[EMAIL PROTECTED]> writes:
> I've often thought that having global temp tables would be a really
> good idea, since it would drastically reduce the need to vacuum
> catalog tables,

I rather doubt that.  The most likely implementation would involve
cloning a "template" entry into pg_class.

I am working on prototype, and cloning of template entry is propably
one possible solution. Every session's clon needs own statistic and
then needs own table oid.

Nice a day
Pavel Stehule

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] SetBufferCommitInfoNeedsSave and race conditions

2007-07-02 Thread Simon Riggs

On Fri, 2007-06-29 at 11:13 -0400, Tom Lane wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > On Thu, 2007-06-28 at 20:23 -0400, Tom Lane wrote:
> >> The methodology I suggested earlier (involving tracking LSN only at the
> >> level of pg_clog pages) isn't going to make that work, unless you
> >> somehow force the XID counter forward to the next page boundary.
> 
> > If you completely flush WAL after the AccessExclusiveLock has been taken
> > by VF, then you are guaranteed to have flushed all asynchronous commits
> > that touch the table.
> 
> I don't believe this is correct (see system catalogs) and in any case
> it's far too fragile for my taste.  I think it'd be a lot better to just
> stop referencing the hint bits directly in VACUUM FULL.

Well, according to the comments in repair_frag(), line 1799-1815,
specifically line 1810 says "we wouldnt be in repair_frag() at all" if
the tuple was "in a system catalog, inserted or deleted by a not yet
committed transaction". If we have flushed all committed and/or aborted
transactions then we're good.

Maybe the comment is wrong? Wouldn't be the first time. Either way,
please explain your concern in more detail.

I'd rather do this the easy way since VF seems soon to die (payback for
all the torture its caused us down the years).

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Configurable Additional Stats

2007-07-02 Thread Simon Riggs

On Mon, 2007-07-02 at 17:41 +0100, Gregory Stark wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> 
> > 2) Charge-back accounting. Keep track by userid, user group, time of
> > access etc of all accesses to the system, so we can provide chargeback
> > facilities to users. You can put your charging rules into the plugin and
> > have it spit out appropriate chargeback log records, when/if required.
> > e.g. log a chargeback record every time a transaction touches > 100
> > blocks, to keep track of heavy queries but ignore OLTP workloads.
> 
> Sure, but I think Tom's question is how do you get from the plugin to wherever
> you want this data to be? There's not much you can do with the data at that
> point. You would end up having to reconstruct the entire stats collector
> infrastructure to ship the data you want out via some communication channel
> and then aggregate it somewhere else.

I just want to LOG a few extra pieces of information in this simplest
possible way, 

There are no more steps in that process than there are for using
log_min_duration_statement and a performance analysis tool.
Outside-the-dbms processing is already required to use PostgreSQL
effectively, so this can't be an argument against the logging of
additional stats. Logging to the dbms means we have to change table
definitions etc, which will ultimately not work as well.

> Perhaps your plugin entry point is most useful *alongside* my stats-domain
> idea. If you wanted to you could write a plugin which set the stats domain
> based on whatever criteria you want whether that's time-of-day, userid, load
> on the system, etc.

Your stats domain idea is great, but it doesn't solve my problem (1). I
don't want this solved, I *need* it solved, since there's no other way
to get this done accurately with a large and complex application.

We could just go back to having
log_tables_in_transaction = on | off
which would produce output like this:

LOG:  transaction-id: 3456 table list {32456, 37456, 85345, 19436}

I don't expect everybody to like that, but its what I want, so I'm
proposing it in a way that is more acceptable. If somebody has a better
way of doing this, please say. The plugin looks pretty darn simple to
me... and hurts nobody.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Postgresql.conf cleanup

2007-07-02 Thread Joshua D. Drake


Tom Lane wrote:

(change requires restart): this phrase appears over 20 times in the 
notes.  This is enough times to be really repetitive and take up a lot 
of scrolling space, while not actually covering all startup-time 
parameters.  We should either (a) remove all such notes and rely on 
docs, or (b) make an annotation symbol (e.g. *R) and mark 100% of them. 


That was put in deliberately not long ago, so I doubt (a) will pass.
(b) seems fine to me.


+1 on (b), -1 on (a)



# work_mem = ( RAM * 0.5 ) / max_connections, or less


That seems guaranteed to drive people into swap hell, unless they
execute only trivial queries.


Maybe he meant .05, which would be semi-reasonable?



# checkpoint_segments = 8 to 16 if you have the disk space (0.3 to 0.6 GB)


This seems definitely too small --- for write-intensive databases I like
to set it to 30 or so, which should eat about a GB if I did the
arithmetic right.


Hmpf, I set it to 30 just to get it out of the way. I would agree that 
8-16 is too small.


Sincerely,

Joshua D. Drake


--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Configurable Additional Stats

2007-07-02 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> Sure, but I think Tom's question is how do you get from the plugin to wherever
> you want this data to be? There's not much you can do with the data at that
> point. You would end up having to reconstruct the entire stats collector
> infrastructure to ship the data you want out via some communication channel
> and then aggregate it somewhere else.

Right, and I don't see any reasonable way for a plug-in to establish
such an infrastructure --- how's it going to cause the postmaster to
shepherd a second stats collector process, for instance?

The proposal seems to be in the very early handwaving stage, because
issues like this obviously haven't been thought about.  I would suggest
building a working prototype plugin, and then you'll really know what
hooks you need.  (Comparison point: we'd never have invented the correct
hooks for the index advisor if we'd tried to define them in advance of
having rough working code to look at.)

> Perhaps your plugin entry point is most useful *alongside* my stats-domain
> idea. If you wanted to you could write a plugin which set the stats domain
> based on whatever criteria you want whether that's time-of-day, userid, load
> on the system, etc.

+1.  I'm also thinking that hooks inside the stats collector process
itself might be needed, though I have no idea exactly what.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-02 Thread Tom Lane

Jim Nasby <[EMAIL PROTECTED]> writes:
> I've often thought that having global temp tables would be a really  
> good idea, since it would drastically reduce the need to vacuum  
> catalog tables,

I rather doubt that.  The most likely implementation would involve
cloning a "template" entry into pg_class.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Configurable Additional Stats

2007-07-02 Thread Gregory Stark

"Simon Riggs" <[EMAIL PROTECTED]> writes:

> 2) Charge-back accounting. Keep track by userid, user group, time of
> access etc of all accesses to the system, so we can provide chargeback
> facilities to users. You can put your charging rules into the plugin and
> have it spit out appropriate chargeback log records, when/if required.
> e.g. log a chargeback record every time a transaction touches > 100
> blocks, to keep track of heavy queries but ignore OLTP workloads.

Sure, but I think Tom's question is how do you get from the plugin to wherever
you want this data to be? There's not much you can do with the data at that
point. You would end up having to reconstruct the entire stats collector
infrastructure to ship the data you want out via some communication channel
and then aggregate it somewhere else.

Perhaps your plugin entry point is most useful *alongside* my stats-domain
idea. If you wanted to you could write a plugin which set the stats domain
based on whatever criteria you want whether that's time-of-day, userid, load
on the system, etc.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen

On Mon, July 2, 2007 22:17, Gregory Stark wrote:

> The way you described it there were records being inserted and later
> deleted.
> Why wouldn't you need vacuums?
>
> Or are all the records eventually deleted and then the table truncated or
> dropped before the next batch of inserts?

In a nuthshell, yes.  The problem is I can't delete them all at once; it
happens in batches, and that means that stats degrade in the meantime.


Jeroen



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Postgresql.conf cleanup

2007-07-02 Thread Tom Lane

Josh Berkus <[EMAIL PROTECTED]> writes:
>> I'm working on cleaning up postgresql.conf and pg_settings for the 
>> ...

>> seq_scan_cost: this is independant of all of the other _costs.

So?  All the other costs are independent of it, too.  I don't understand
what problem you have with it.

>> (change requires restart): this phrase appears over 20 times in the 
>> notes.  This is enough times to be really repetitive and take up a lot 
>> of scrolling space, while not actually covering all startup-time 
>> parameters.  We should either (a) remove all such notes and rely on 
>> docs, or (b) make an annotation symbol (e.g. *R) and mark 100% of them. 

That was put in deliberately not long ago, so I doubt (a) will pass.
(b) seems fine to me.

>> transaction_isolation and transaction_read_only appear more than once in 
>> the pg_settings pseudo_table.

Not for me.

> # work_mem = ( RAM * 0.5 ) / max_connections, or less

That seems guaranteed to drive people into swap hell, unless they
execute only trivial queries.

> # wal_buffers = 1MB

Is there really evidence in favor of such a high setting for this,
either?  (I expect the walwriter in the async-commit patch will change
the landscape here, btw.)

> # max_fsm_pages = expected database size * 0.1

This might be too small.

> # checkpoint_segments = 8 to 16 if you have the disk space (0.3 to 0.6 GB)

This seems definitely too small --- for write-intensive databases I like
to set it to 30 or so, which should eat about a GB if I did the
arithmetic right.

> #explain_pretty_print = on

Putting this under "planner options" is wrong and illogical.

The file seems to be missing the effects of some recently committed
patches, eg, bgwriter_all_percent shouldn't be there anymore.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Postgresql.conf cleanup

2007-07-02 Thread Jim Nasby


On Jul 2, 2007, at 6:03 AM, Josh Berkus wrote:
(change requires restart): this phrase appears over 20 times in the  
notes.  This is enough times to be really repetitive and take up a  
lot of scrolling space, while not actually covering all startup- 
time parameters.  We should either (a) remove all such notes and  
rely on docs, or (b) make an annotation symbol (e.g. *R) and mark  
100% of them.  Votes?


Probably the #1 question I'm asked is "does this mean I need to  
restart?". +1 for marking everything.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-02 Thread Jim Nasby


On Jul 1, 2007, at 4:46 PM, Tom Lane wrote:
I have question. Is correct implementation of global temp in  
Oracle or

Firebird, where content of glob.temp table is session visible and
metadata of g.t.t is persistent?


It's correct per spec.  Whether it's more useful than what we do is
highly debatable --- it forces all sessions to use the same definition
of any given temp table name, which is a bit silly for something  
that's

supposed to support session-local data.


Would it be possible to support both global and local?

I've often thought that having global temp tables would be a really  
good idea, since it would drastically reduce the need to vacuum  
catalog tables, but I've never looked into what would be required to  
do so.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Configurable Additional Stats

2007-07-02 Thread Simon Riggs

On Fri, 2007-06-29 at 14:43 -0400, Tom Lane wrote:
> Dave Page <[EMAIL PROTECTED]> writes:
> > Yes, it's not intended to insert more stats, but to get the raw data out
> > for external analysis during development and testing of applications and
> > systems etc.
> 
> Mph --- the proposal was very poorly titled then.  In any case, it still
> sounds like a one-off hack that would be equally well served by a local
> patch.

Well, I want it to a) be configurable b) provide additional stats, so
the title was fine, but we can call this whatever you like; I don't have
a fancy name for it.

The purpose is to get access to the stats data while we still know the
username, transactionId and other information. Once it is sent to the
stats collector it is anonymised and summarised.

Examples of the potential uses of such plug-ins would be:

1) Which tables have been touched by this transaction? The purpose of
this is to profile table interactions to allow:
i) an accurate assessment of the replication sets for use with Slony. If
you define the replication set incorrectly then you may not be able to
recover all of your data. 
ii) determining whether it is possible to split a database that serves
two applications into two distinct databases (or not), allowing you to
scale out the Data Tier in a Service Oriented Application.

2) Charge-back accounting. Keep track by userid, user group, time of
access etc of all accesses to the system, so we can provide chargeback
facilities to users. You can put your charging rules into the plugin and
have it spit out appropriate chargeback log records, when/if required.
e.g. log a chargeback record every time a transaction touches > 100
blocks, to keep track of heavy queries but ignore OLTP workloads.

3) Tracing individual transaction types, as Greg suggests.

4) Anything else you might dream up...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Gregory Stark

"Jeroen T. Vermeulen" <[EMAIL PROTECTED]> writes:

> Actually, come to think of it, I don't think I'd want any vacuums at all
> on this particular table.  Just the analyze on the primary key, no
> vacuums, no statistics on anything else.  Unfortunately it's not just one
> table, but a set of tables that can be created dynamically.  I could
> change that, but in this particular case I don't think I should.

The way you described it there were records being inserted and later deleted.
Why wouldn't you need vacuums?

Or are all the records eventually deleted and then the table truncated or
dropped before the next batch of inserts?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen

On Mon, July 2, 2007 18:15, Gregory Stark wrote:

>> So I suppose the planner has a good reason to ignore the index at that
>> point.  I'm assuming that this is something to do with the correlation
>> between the index and the column's statistics degrading in some way.
>
> Best to post "explain analyze " for when the performance is good
> and
> bad. Perhaps also an explain analyze for the query with enable_seqscan off
> when it's bad.

Can't easily do that anymore...  AFAIR the plans were all identical
anyway, except in the "enable_seqscan bad" case which used a sequential
scan instead of using the index.  The queries are very simple, along the
lines of "select * from foo where id >= x and id < y".

> Also, which version of Postgres is this?

It was an 8.2 version.

> It's possible you just need vacuum to run more frequently on this table
> and
> autovacuum isn't doing it often enough. In which case you might have a
> cron
> job run vacuum (or vacuum analyze) on this table more frequently.

Actually, come to think of it, I don't think I'd want any vacuums at all
on this particular table.  Just the analyze on the primary key, no
vacuums, no statistics on anything else.  Unfortunately it's not just one
table, but a set of tables that can be created dynamically.  I could
change that, but in this particular case I don't think I should.

Jeroen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Gregory Stark

"Jeroen T. Vermeulen" <[EMAIL PROTECTED]> writes:

> So I suppose the planner has a good reason to ignore the index at that
> point.  I'm assuming that this is something to do with the correlation
> between the index and the column's statistics degrading in some way.

Best to post "explain analyze " for when the performance is good and
bad. Perhaps also an explain analyze for the query with enable_seqscan off
when it's bad.

Also, which version of Postgres is this? 

It's possible you just need vacuum to run more frequently on this table and
autovacuum isn't doing it often enough. In which case you might have a cron
job run vacuum (or vacuum analyze) on this table more frequently. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Postgresql.conf cleanup

2007-07-02 Thread Josh Berkus



This time *with* the attachment.



All,

I'm working on cleaning up postgresql.conf and pg_settings for the 
release.  Attached is a sample WIP.  It's not in patch form because I'm 
not done yet; I've just been editing postgresql.conf and need to fix the 
docs and pg_settings to match.


Issues encountered and changes made:

PostgreSQL.conf


suggestions: added section with the 7 most important obvious settings at 
the top and suggestions on how to calculate them.  If people like this, 
I'll add it to the Tutorial in the docs as well.


seq_scan_cost: this is independant of all of the other _costs.  I can't 
think of any way in which that doesn't make the whole set of costs 
unmanageable.  For example, if you want to change seq_scan_cost in order 
to make query cost more-or-less match up with ms execution time, you 
have to modify all 6 settings.   If we do implement per-tablespace 
costs, then we'll need per-tablespace random_page_cost as well.  Or am I 
missing something?


(change requires restart): this phrase appears over 20 times in the 
notes.  This is enough times to be really repetitive and take up a lot 
of scrolling space, while not actually covering all startup-time 
parameters.  We should either (a) remove all such notes and rely on 
docs, or (b) make an annotation symbol (e.g. *R) and mark 100% of them. 
 Votes?


Vacuum: all vacuum & autovacuum parameters put under their own section.

Client Cost Defaults: this section became a "catch-all" for all userset 
parameters which people weren't sure what to do with.  I've divided it 
into logical subsections, and moved some parameters to other sections 
where they logically belong (for example, explain_pretty_print belongs 
in Query Tuning).


pg_settings issues


transaction_isolation and transaction_read_only appear more than once in 
the pg_settings pseudo_table.   The setting column is supposed to be 
unique.



Given the amount of cleanup/improvement which I'm seeing as necessary 
for the GUCs, I'm wondering if I put this off too long for 8.3.


--Josh






# -
# PostgreSQL configuration file
# -
#
# This file consists of lines of the form:
#
#   name = value
#
# (The '=' is optional.)  White space may be used.  Comments are introduced
# with '#' anywhere on a line.  The complete list of option names and
# allowed values can be found in the PostgreSQL documentation.  The
# commented-out settings shown in this file represent the default values.
#
# Please note that re-commenting a setting is NOT sufficient to revert it
# to the default value, unless you restart the server.
#
# Any option can also be given as a command line switch to the server,
# e.g., 'postgres -c log_connections=on'.  Some options can be changed at
# run-time with the 'SET' SQL command.
#
# This file is read on server startup and when the server receives a
# SIGHUP.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pg_ctl reload". Some
# settings, which are marked below, require a server shutdown and restart
# to take effect.
#
# Memory units:  kB = kilobytes MB = megabytes GB = gigabytes
# Time units:ms = milliseconds s = seconds min = minutes h = hours d = days

#---
# IMPORTANT PERFORMANCE VALUES TO SET
#---

# What follows are some rough recommendations of values which most users
# should set immediately after installation for good performance.  The 
# calculations below are "rules of thumb" and are not intended to replace
# knowledgeable tuning. Please see the full documentation and the 
# pgsql-performance mailing list for more information and suggestions.
#
# Note that RAM below refers to RAM which is available to PostgreSQL, 
# so on shared servers the RAM in the calculation should be reduced from
# total system RAM accordingly.  Settings are listed in the order they
# appear below.
#
# max_connections = no. of concurrent sessions you need to support
# shared_buffers = RAM * 0.2 (this may require system configuration)
# work_mem = ( RAM * 0.5 ) / max_connections, or less
# maintenance_work_mem = RAM/8, up to 256MB
# wal_buffers = 1MB
# max_fsm_pages = expected database size * 0.1
# checkpoint_segments = 8 to 16 if you have the disk space (0.3 to 0.6 GB)
# effective_cache_size = RAM * 0.7


#---
# FILE LOCATIONS
#---

# The default values of these variables are driven from the -D command line
# switch or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'   # use data in another directory
# (change requires restart)
#hba_file = 'ConfigDir/

[HACKERS] Postgresql.conf cleanup

2007-07-02 Thread Josh Berkus


All,

I'm working on cleaning up postgresql.conf and pg_settings for the 
release.  Attached is a sample WIP.  It's not in patch form because I'm 
not done yet; I've just been editing postgresql.conf and need to fix the 
docs and pg_settings to match.


Issues encountered and changes made:

PostgreSQL.conf


suggestions: added section with the 7 most important obvious settings at 
the top and suggestions on how to calculate them.  If people like this, 
I'll add it to the Tutorial in the docs as well.


seq_scan_cost: this is independant of all of the other _costs.  I can't 
think of any way in which that doesn't make the whole set of costs 
unmanageable.  For example, if you want to change seq_scan_cost in order 
to make query cost more-or-less match up with ms execution time, you 
have to modify all 6 settings.   If we do implement per-tablespace 
costs, then we'll need per-tablespace random_page_cost as well.  Or am I 
missing something?


(change requires restart): this phrase appears over 20 times in the 
notes.  This is enough times to be really repetitive and take up a lot 
of scrolling space, while not actually covering all startup-time 
parameters.  We should either (a) remove all such notes and rely on 
docs, or (b) make an annotation symbol (e.g. *R) and mark 100% of them. 
 Votes?


Vacuum: all vacuum & autovacuum parameters put under their own section.

Client Cost Defaults: this section became a "catch-all" for all userset 
parameters which people weren't sure what to do with.  I've divided it 
into logical subsections, and moved some parameters to other sections 
where they logically belong (for example, explain_pretty_print belongs 
in Query Tuning).


pg_settings issues


transaction_isolation and transaction_read_only appear more than once in 
the pg_settings pseudo_table.   The setting column is supposed to be unique.



Given the amount of cleanup/improvement which I'm seeing as necessary 
for the GUCs, I'm wondering if I put this off too long for 8.3.


--Josh




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen

Hi all,

I've run into a case where I get bad performance that doesn't sound too
hard to solve.  Question is: is it worth solving?

The situation is this: I have a table that can grow to a large number of
rows, then shrink to zero over a large number of quick, consecutive
transactions.  The primary key index for the table is getting a lot of use
in the process.

But whenever perhaps one-third or so of the rows have been deleted, the
planner stops using that index and resorts to sequential scans.  I tried
suppressing that by toggling enable_seqscan: works as advertised, but
performance is still terrible until (as far as I can make out) the next
analyze run has completed!

So I suppose the planner has a good reason to ignore the index at that
point.  I'm assuming that this is something to do with the correlation
between the index and the column's statistics degrading in some way.

I also tried doing my own analyze runs on just the primary key index. 
That will complete very quickly, and performance is restored for a while. 
But as far as I can tell, a regular automatic analyze run will block my
own, more limited one on the same table.  So performance is still bad, and
now it's irregular to boot.

This makes me wonder: when the planner finds that an index is no longer
worth using because its corresponding statistics are out of date, and it's
cheap to update those same stats, maybe it should do so?  Even if there's
also going to be a full analyze on the table, it could be worthwhile to do
this quick limited run first.  (Though not if one is already underway, of
course).

All this is based largely on guesswork, so if I've got it all wrong,
please enlighten me!


Jeroen



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] GiST consistent function, expected arguments; multi-dimensional indexes

2007-07-02 Thread Martijn van Oosterhout

On Sun, Jul 01, 2007 at 07:20:08PM -0700, Eric wrote:
> 
> >
> > Everything is always passed as a Datum, so yes, it's is determined by
> > the storage clause in CREATE TYPE.
> 
> Still not sure what to do in some scenarios.  One example is the gist
> example code for btree (btree_gist).  If you look at the int4 example
> consistent function, it gets an int32 value (param 1).  For other
> data  types, it would get a pointer to a value.  Is the rule anything
> <= 4 bytes it's a value, above that it's a pointer?  See the code
> below...

Why guess. You know the type ID of what you're manipulating, right.
Then the function:

get_typlenbyval(Oid typid, int16 *typlen, bool *typbyval);

Returns the byval flag (true if passed by value, false if passed by
reference) and the typlen field will be either a positive integer,
representing the number of bytes, or negative for variable length.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

R: R: [postgresql-it] [HACKERS] no cascade triggers?

2007-07-02 Thread Manera, Villiam

> To be honest..No trigger necessary...

I have  1300 triggers and 345 functions (53500 lines of code), most of
them written 15 years ago for ALLBASE/SQL (the old Hewlett Packard
relational dbms)  and converted to pl/pgsql last year.

I admit that some of them may look eccentric, but some of them are very
old...

Having short deadlines for the database migration we didn't have the
time to study, understand and rewrite each one so we decided to just
convert them since the language is very similar..

Ok, now we have more time, our boss is very happy that postgres
performance is 15-20 times better than ALLBASE J so now we could afford
the rewriting of functions and triggers :(  .

If there's no other trick or suggestion, I think I'll build my own home
made "pg_trigger" relation containing the reference to all my
triggers/functions and a flag for each of them that tells me if it's
enabled or not.

It could be useful IMHO to at least specify in the documentation the
exact behaviour of Postgres ALTER TABLE ENABLE/DISABLE TRIGGER when
applied from a trigger on the same table the trigger belong to.

Here a more complex example:

relation A   : list of components of the fashion model (without colours)
relation B   : list of components of the fashion model for colours and
sizes

1)  One model inserted on relation A :
  * Fires then function 1 that inserts colours on relation B
  * Every record inserted on B fires function 2
2)  Function 2 :
  * Does some checks, calculations and finally updates the record
just inserted with the right amount for every size of the model
3)  ... does other non interesting things for the problem.


Function 2 disables trigger before  updating the relation B because
there is another trigger that fires on the update of the q.ty of the
size in relation B.

Function 2 is invoked not only by a wrapper trigger on relation B, but
sometimes directly by user programs.



Sorry for the disturb, this is the last one.

Villiam 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] todo: Hash index creation

2007-07-02 Thread Hannu Krosing

Ühel kenal päeval, E, 2007-07-02 kell 04:27, kirjutas Naz Gassiep:

> I've been warned away from hash indexes before, however I had no idea
> that it's performance was that abysmal that BTREE beat it and I was
> definitely not aware that they were not included in WAL logs. I was told
> it wasn't as good as it could be, but I wasn't told it was pretty much
> an alpha piece of code.
> 
> As a result, when creating tables containing large blocks of text I wish
> to index, I've been using HASH as an index method. 

If you just wish to have smaller indexes, then you can use functional
btree indexes over text hash, like this:

CREATE INDEX largetextindex on mytable(hashtext(largetext));

and use

SELECT * FROM mytable 
 where hashtext(largetext) = hastext('searchvalue') 
   and largetext = 'searchvalue'
;

btw, if the real hash indexes don't get fixes soon, maybe we could
redefine hash index to actually mean usage like this and do the rewrites
in parser?

> Please can we state
> in the manual that HASH index types are in a beta stage of development
> or something similar, or perhaps remove the manual entry altogether
> until HASH is at a point where it is usable in production.
> 
> Regards,
> A very surprised n00b.
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] GiST consistent function, expected arguments; multi-dimensional indexes

2007-07-02 Thread Eric


>
> Everything is always passed as a Datum, so yes, it's is determined by
> the storage clause in CREATE TYPE.

Still not sure what to do in some scenarios.  One example is the gist
example code for btree (btree_gist).  If you look at the int4 example
consistent function, it gets an int32 value (param 1).  For other
data  types, it would get a pointer to a value.  Is the rule anything
<= 4 bytes it's a value, above that it's a pointer?  See the code
below...

Datum
gbt_int4_consistent(PG_FUNCTION_ARGS)
{

GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
int32   query = PG_GETARG_INT32(1);
int32KEY   *kkk = (int32KEY *) DatumGetPointer(entry->key);

>
> The usual approach to this is to define the index on a composite of
> the values. For example, if you have a table with two points that you
> want to index together, you do:
>
> CREATE INDEX foo ON bar((box(point1,point2)));
>
> i.e. a functional index on the result of combining the points. It does
> mean you need to use the same syntax when doing the queries, but it
> works with modifying any internal code at all...
>
> Given you can use rowtypes more easily these days, it's quite possible
> you use build an operator class on a row type...
>
> Have a nice day,
> --
> Martijn van Oosterhout   <[EMAIL PROTECTED]>  http://svana.org/kleptog/
>
> > From each according to his ability. To each according to his ability to 
> > litigate.
>
>

Thanks Martijn.  I will consider that approach.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

43 matches

Mail list logo