Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread david

On Tue, 8 May 2007, �~]~N彦 Ian Li wrote:

In #postgresql on freenode, somebody ever mentioned that ZFS from Solaris 
helps a lot to the performance of pgsql, so dose anyone have information 
about that?


the filesystem you use will affect the performance of postgres 
significantly. I've heard a lot of claims for ZFS, unfortunantly many of 
them from people who have prooven that they didn't know what they were 
talking about by the end of their first or second e-mails.


much of the hype for ZFS is it's volume management capabilities and admin 
tools. Linux has most (if not all) of the volume management capabilities, 
it just seperates them from the filesystems so that any filesystem can use 
them, and as a result you use one tool to setup your RAID, one to setup 
snapshots, and a third to format your filesystems where ZFS does this in 
one userspace tool.


once you seperate the volume management piece out, the actual performance 
question is a lot harder to answer. there are a lot of people who say that 
it's far faster then the alternate filesystems on Solaris, but I haven't 
seen any good comparisons between it and Linux filesystems.


On Linux you have the choice of several filesystems, and the perfomance 
will vary wildly depending on your workload. I personally tend to favor 
ext2 (for small filesystems where the application is ensuring data 
integrity) or XFS (for large filesystems)


I personally don't trust reiserfs, jfs seems to be a tools for 
transitioning from AIX more then anything else, and ext3 seems to have all 
the scaling issues of ext2 plus the overhead (and bottleneck) of 
journaling.


one issue with journaling filesystems, if you journal the data as well as 
the metadata you end up with a very reliable setup, however it means that 
all your data needs to be written twice, oncce to the journal, and once to 
the final location. the write to the journal can be slightly faster then a 
normal write to the final location (the journal is a sequential write to 
an existing file), however the need to write twice can effectivly cut your 
disk I/O bandwidth in half when doing heavy writes. worse, when you end up 
writing mor ethen will fit in the journal (128M is the max for ext3) the 
entire system then needs to stall while the journal gets cleared to make 
space for the additional writes.


if you don't journal your data then you avoid the problems above, but in a 
crash you may find that you lost data, even though the filesystem is 
'intact' according to fsck.


David Lang


Steve Atkins wrote:


 On May 7, 2007, at 2:55 PM, David Levy wrote:

  Hi,
 
  I am about to order a new server for my Postgres cluster. I will

  probably get a Dual Xeon Quad Core instead of my current Dual Xeon.
  Which OS would you recommend to optimize Postgres behaviour (i/o
  access, multithreading, etc) ?
 
  I am hesitating between Fedora Core 6, CentOS and Debian. Can anyone

  help with this ?

 Well, all three you mention are much the same, just with a different
 badge on the box, as far as performance is concerned. They're all
 going to be a moderately recent Linux kernel, with your choice
 of filesystems, so any choice between them is going to be driven
 more by available staff and support or personal preference.

 I'd probably go CentOS 5 over Fedora  just because Fedora doesn't
 get supported for very long - more of an issue with a dedicated
 database box with a long lifespan than your typical desktop or
 interchangeable webserver.

 I might also look at Solaris 10, though. I've yet to play with it much,
 but it
 seems nice, and I suspect it might manage 8 cores better than current
 Linux setups.

 Cheers,
   Steve



 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings



Regards

Ian

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Claus Guttesen

I am about to order a new server for my Postgres cluster. I will
probably get a Dual Xeon Quad Core instead of my current Dual Xeon.
Which OS would you recommend to optimize Postgres behaviour (i/o
access, multithreading, etc) ?

I am hesitating between Fedora Core 6, CentOS and Debian. Can anyone
help with this ?


My only experience is with FreeBSD. My installation is running 6.2 and
pg 7.4 on a four-way woodcrest and besides being very stable it's also
performing very well. But then FreeBSD 6.x might not scale as well
beyond four cores atm. There you probably would need FreeBSD 7 which
is the development branch and should require extensive testing.

How big will the db be in size?

--
regards
Claus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Heikki Linnakangas

[EMAIL PROTECTED] wrote:
if you don't journal your data then you avoid the problems above, but in 
a crash you may find that you lost data, even though the filesystem is 
'intact' according to fsck.


PostgreSQL itself journals it's data to the WAL, so that shouldn't happen.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Claus Guttesen

 In #postgresql on freenode, somebody ever mentioned that ZFS from Solaris
 helps a lot to the performance of pgsql, so dose anyone have information
 about that?

the filesystem you use will affect the performance of postgres
significantly. I've heard a lot of claims for ZFS, unfortunantly many of
them from people who have prooven that they didn't know what they were
talking about by the end of their first or second e-mails.

much of the hype for ZFS is it's volume management capabilities and admin
tools. Linux has most (if not all) of the volume management capabilities,
it just seperates them from the filesystems so that any filesystem can use
them, and as a result you use one tool to setup your RAID, one to setup
snapshots, and a third to format your filesystems where ZFS does this in
one userspace tool.


Even though those posters may have proven them selves wrong, zfs is
still a very handy fs and it should not be judged relative to these
statements.


once you seperate the volume management piece out, the actual performance
question is a lot harder to answer. there are a lot of people who say that
it's far faster then the alternate filesystems on Solaris, but I haven't
seen any good comparisons between it and Linux filesystems.


One could install pg on solaris 10 and format the data-area as ufs and
then as zfs and compare import- and query-times and other benchmarking
but comparing ufs/zfs to Linux-filesystems would also be a comparison
of those two os'es.

--
regards
Claus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread david

On Tue, 8 May 2007, Claus Guttesen wrote:

  In #postgresql on freenode, somebody ever mentioned that ZFS from 
  Solaris

  helps a lot to the performance of pgsql, so dose anyone have information
  about that?

 the filesystem you use will affect the performance of postgres
 significantly. I've heard a lot of claims for ZFS, unfortunantly many of
 them from people who have prooven that they didn't know what they were
 talking about by the end of their first or second e-mails.

 much of the hype for ZFS is it's volume management capabilities and admin
 tools. Linux has most (if not all) of the volume management capabilities,
 it just seperates them from the filesystems so that any filesystem can use
 them, and as a result you use one tool to setup your RAID, one to setup
 snapshots, and a third to format your filesystems where ZFS does this in
 one userspace tool.


Even though those posters may have proven them selves wrong, zfs is
still a very handy fs and it should not be judged relative to these
statements.


I don't disagree with you, I'm just noteing that too many of the 'ZFS is 
great' posts need to be discounted as a result (the same thing goes for 
the 'reiserfs4 is great' posts)



 once you seperate the volume management piece out, the actual performance
 question is a lot harder to answer. there are a lot of people who say that
 it's far faster then the alternate filesystems on Solaris, but I haven't
 seen any good comparisons between it and Linux filesystems.


One could install pg on solaris 10 and format the data-area as ufs and
then as zfs and compare import- and query-times and other benchmarking
but comparing ufs/zfs to Linux-filesystems would also be a comparison
of those two os'es.


however, such a comparison is very legitimate, it doesn't really matter 
which filesystem is better if the OS that it's tied to limits it so much 
that the other one wins out with an inferior filesystem


currently ZFS is only available on Solaris, parts of it have been released 
under GPLv2, but it doesn't look like enough of it to be ported to Linux 
(enough was released for grub to be able to access it read-only, but not 
the full filesystem). there are also patent concerns that are preventing 
any porting to Linux.


on the other hand, it's integrated userspace tools are pushing people to 
create similar tools for Linux (without needeing to combine the vairous 
pieces in the kernel)


David Lang

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Trygve Laugstøl

[EMAIL PROTECTED] wrote:

On Tue, 8 May 2007, Claus Guttesen wrote:

  In #postgresql on freenode, somebody ever mentioned that ZFS from 
  Solaris
  helps a lot to the performance of pgsql, so dose anyone have 
information

  about that?

 the filesystem you use will affect the performance of postgres
 significantly. I've heard a lot of claims for ZFS, unfortunantly 
many of

 them from people who have prooven that they didn't know what they were
 talking about by the end of their first or second e-mails.

 much of the hype for ZFS is it's volume management capabilities and 
admin
 tools. Linux has most (if not all) of the volume management 
capabilities,
 it just seperates them from the filesystems so that any filesystem 
can use

 them, and as a result you use one tool to setup your RAID, one to setup
 snapshots, and a third to format your filesystems where ZFS does 
this in

 one userspace tool.


Even though those posters may have proven them selves wrong, zfs is
still a very handy fs and it should not be judged relative to these
statements.


I don't disagree with you, I'm just noteing that too many of the 'ZFS is 
great' posts need to be discounted as a result (the same thing goes for 
the 'reiserfs4 is great' posts)


 once you seperate the volume management piece out, the actual 
performance
 question is a lot harder to answer. there are a lot of people who 
say that
 it's far faster then the alternate filesystems on Solaris, but I 
haven't

 seen any good comparisons between it and Linux filesystems.


One could install pg on solaris 10 and format the data-area as ufs and
then as zfs and compare import- and query-times and other benchmarking
but comparing ufs/zfs to Linux-filesystems would also be a comparison
of those two os'es.


however, such a comparison is very legitimate, it doesn't really matter 
which filesystem is better if the OS that it's tied to limits it so much 
that the other one wins out with an inferior filesystem


currently ZFS is only available on Solaris, parts of it have been 
released under GPLv2, but it doesn't look like enough of it to be ported 
to Linux (enough was released for grub to be able to access it 
read-only, but not the full filesystem). there are also patent concerns 
that are preventing any porting to Linux.


This is not entirely correct. ZFS is only under the CDDL license and it 
has been ported to FreeBSD.


http://mail.opensolaris.org/pipermail/zfs-discuss/2007-April/026922.html

--
Trygve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Steinar H. Gunderson
On Mon, May 07, 2007 at 11:56:14PM -0400, Greg Smith wrote:
 Debian packages PostgreSQL in a fashion unique to it; it's arguable 
 whether it's better or not (I don't like it), but going with that will 
 assure your installation is a bit non-standard compared with most Linux 
 installas.  The main reasons you'd pick Debian are either that you like 
 that scheme (which tries to provide some structure to running multiple 
 clusters on one box), or that you plan to rely heavily on community 
 packages that don't come with the Redhat distributions and therefore would 
 appreciate how easy it is to use apt-get against the large Debian software 
 repository.

Just to add to this: As far as I understand it, this scheme was originally
mainly put in place to allow multiple _versions_ of Postgres to be installed
alongside each other, for smoother upgrades. (There's a command that does all
the details of running first pg_dumpall for the users and groups, then the
new pg_dump with -Fc to get all data and LOBs over, then some hand-fixing to
change explicit paths to $libdir, etc...)

Of course, you lose all that if you need a newer Postgres version than the OS
provides. (Martin Pitt, the Debian/Ubuntu maintainer of Postgres -- the
packaging in Debian and Ubuntu is the same, sans version differences -- makes
his own backported packages of the newest Postgres to Debian stable; it's up
to you if you'd trust that or not.)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Pomarede Nicolas



Hello to all,

I have a table that is used as a spool for various events. Some processes 
write data into it, and another process reads the resulting rows, do some 
work, and delete the rows that were just processed.


As you can see, with hundreds of thousands events a day, this table will 
need being vaccumed regularly to avoid taking too much space (data and 
index).


Note that processing rows is quite fast in fact, so at any time a 
count(*) on this table rarely exceeds 10-20 rows.



For the indexes, a good way to bring them to a size corresponding to the 
actual count(*) is to run 'reindex'.


But for the data (dead rows), even running a vacuum analyze every day is 
not enough, and doesn't truncate some empty pages at the end, so the data 
size remains in the order of 200-300 MB, when only a few effective rows 
are there.


I see in the 8.3 list of coming changes that the FSM will try to re-use 
pages in a better way to help truncating empty pages. Is this correct ?


Running a vacuum full is a solution for now, but it locks the table for 
too long (10 minutes or so), which is not acceptable in that case, since 
events should be processed in less that 10 seconds.


So, I would like to truncate the table when the number of rows reaches 0 
(just after the table was processed, and just before some new rows are 
added).


Is there an easy way to do this under psql ? For example, lock the table, 
do a count(*), if result is 0 row then truncate the table, unlock the 
table (a kind of atomic 'truncate table if count(*) == 0').


Would this work and what would be the steps ?

Thanks

Nicolas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Alexander Staubo

On 5/8/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
[snip]

I personally don't trust reiserfs, jfs seems to be a tools for
transitioning from AIX more then anything else [...]


What makes you say this? I have run JFS for years with complete
satisfaction, and I have never logged into an AIX box.

JFS has traditionally been seen as an underdog, but undeservedly so,
in my opinion; one cause might be the instability of the very early
releases, which seems to have tainted its reputation, or the alienness
of its AIX heritage. However, every benchmark I have come across puts
its on par with, and often surpassing, the more popular file systems
in performance. In particular, JFS seems to shine with respect to CPU
overhead.

Alexander.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Steinar H. Gunderson
On Mon, May 07, 2007 at 03:14:08PM -0700, Joshua D. Drake wrote:
 It is my understanding (and I certainly could be wrong) that FreeBSD
 doesn't handle SMP nearly as well as Linux (and Linux not as well as
 Solaris).

I'm not actually sure about the last part. There are installations as big as
1024 CPUs that run Linux -- most people won't need that, but it's probably an
indicator that eight cores should run OK :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Guillaume Cottenceau
Pomarede Nicolas npomarede 'at' corp.free.fr writes:

 Hello to all,
 
 I have a table that is used as a spool for various events. Some
 processes write data into it, and another process reads the resulting
 rows, do some work, and delete the rows that were just processed.
 
 As you can see, with hundreds of thousands events a day, this table
 will need being vaccumed regularly to avoid taking too much space
 (data and index).
 
 Note that processing rows is quite fast in fact, so at any time a
 count(*) on this table rarely exceeds 10-20 rows.
 
 
 For the indexes, a good way to bring them to a size corresponding to
 the actual count(*) is to run 'reindex'.
 
 But for the data (dead rows), even running a vacuum analyze every day
 is not enough, and doesn't truncate some empty pages at the end, so
 the data size remains in the order of 200-300 MB, when only a few
 effective rows are there.

As far as I know, you probably need to increase your
max_fsm_pages, because your pg is probably not able to properly
track unused pages between subsequent VACUUM's.

http://www.postgresql.org/docs/8.2/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM

Have you investigated this? It seems that you already know about
the FSM stuff, according to your question about FSM and 8.3.

You can also run VACUUM ANALYZE more frequently (after all, it
doesn't lock the table).

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Heikki Linnakangas

Pomarede Nicolas wrote:
But for the data (dead rows), even running a vacuum analyze every day is 
not enough, and doesn't truncate some empty pages at the end, so the 
data size remains in the order of 200-300 MB, when only a few effective 
rows are there.


For a table like that you should run VACUUM much more often than once a 
day. Turn on autovacuum, or set up a cron script etc. to run it every 15 
minutes or so.


Running a vacuum full is a solution for now, but it locks the table for 
too long (10 minutes or so), which is not acceptable in that case, since 
events should be processed in less that 10 seconds.


So, I would like to truncate the table when the number of rows reaches 0 
(just after the table was processed, and just before some new rows are 
added).


Is there an easy way to do this under psql ? For example, lock the 
table, do a count(*), if result is 0 row then truncate the table, unlock 
the table (a kind of atomic 'truncate table if count(*) == 0').


Would this work and what would be the steps ?


It should work, just like you describe it, with the caveat that TRUNCATE 
will remove any old row versions that might still be visible to an older 
transaction running in serializable mode. It sounds like it's not a 
problem in your scenario, but it's hard to say for sure without seeing 
the application. Running vacuum more often is probably a simpler and 
better solution, anyway.


Which version of PostgreSQL is this?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Pomarede Nicolas

On Tue, 8 May 2007, [EMAIL PROTECTED] wrote:




On Tue, 8 May 2007, Pomarede Nicolas wrote:


As you can see, with hundreds of thousands events a day, this table will need
being vaccumed regularly to avoid taking too much space (data and index).

Note that processing rows is quite fast in fact, so at any time a count(*) on
this table rarely exceeds 10-20 rows.

For the indexes, a good way to bring them to a size corresponding to the
actual count(*) is to run 'reindex'.


why you have index in table where is only 10-20 rows?

are those indexes to prevent some duplicate rows?


I need these indexes to sort rows to process in chronological order. I'm 
also using an index on 'oid' to delete a row after it was processed (I 
could use a unique sequence too, but I think it would be the same).


Also, I sometime have peaks that insert lots of data in a short time, so 
an index on the event's date is useful.


And as the number of effective row compared to the number of dead rows is 
only 1%, doing a count(*) for example takes many seconds, even if the 
result of count(*) is 10 row (because pg will sequential scan all the data 
pages of the table). Without index on the date, I would need sequential 
scan to fetch row to process, and this would be slower due to the high 
number of dead rows.




I have some tables just to store unprosessed data, and because there is
only few rows and I always process all rows there is no need for
indexes. there is just column named id, and when I insert row I take
nextval('id_seq') :

insert into some_tmp_table(id,'message',...) values (nextval('id_seq'),'do
something',...);

I know that deleting is slower than with indexes, but it's still fast
enough, because all rows are in memory.

and that id-column is just for delete, it's unique and i can always delete
using only it.

Ismo


Nicolas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Pomarede Nicolas

On Tue, 8 May 2007, Guillaume Cottenceau wrote:


Pomarede Nicolas npomarede 'at' corp.free.fr writes:


Hello to all,

I have a table that is used as a spool for various events. Some
processes write data into it, and another process reads the resulting
rows, do some work, and delete the rows that were just processed.

As you can see, with hundreds of thousands events a day, this table
will need being vaccumed regularly to avoid taking too much space
(data and index).

Note that processing rows is quite fast in fact, so at any time a
count(*) on this table rarely exceeds 10-20 rows.


For the indexes, a good way to bring them to a size corresponding to
the actual count(*) is to run 'reindex'.

But for the data (dead rows), even running a vacuum analyze every day
is not enough, and doesn't truncate some empty pages at the end, so
the data size remains in the order of 200-300 MB, when only a few
effective rows are there.


As far as I know, you probably need to increase your
max_fsm_pages, because your pg is probably not able to properly
track unused pages between subsequent VACUUM's.

http://www.postgresql.org/docs/8.2/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM

Have you investigated this? It seems that you already know about
the FSM stuff, according to your question about FSM and 8.3.

You can also run VACUUM ANALYZE more frequently (after all, it
doesn't lock the table).


thanks, but max FSM is already set to a large enough value (I'm running a 
vacuum analyze every day on the whole database, and set max fsm according 
to the last lines of vacuum, so all pages are stored in the FSM).



Nicolas


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread david

On Tue, 8 May 2007, Trygve Laugstøl wrote:


 currently ZFS is only available on Solaris, parts of it have been released
 under GPLv2, but it doesn't look like enough of it to be ported to Linux
 (enough was released for grub to be able to access it read-only, but not
 the full filesystem). there are also patent concerns that are preventing
 any porting to Linux.


This is not entirely correct. ZFS is only under the CDDL license and it has 
been ported to FreeBSD.


http://mail.opensolaris.org/pipermail/zfs-discuss/2007-April/026922.html

I wonder how they handled the license issues? I thought that if you 
combined stuff that was BSD licensed with stuff with a more restrictive 
license the result was under the more restrictive license. thanks for the 
info.


here's a link about the GPLv2 stuff for zfs

http://blogs.sun.com/darren/entry/zfs_under_gplv2_already_exists
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Gregory Stark

Pomarede Nicolas [EMAIL PROTECTED] writes:

 But for the data (dead rows), even running a vacuum analyze every day is not
 enough, and doesn't truncate some empty pages at the end, so the data size
 remains in the order of 200-300 MB, when only a few effective rows are there.

Try running vacuum more frequently. Once per day isn't very frequent for
vacuum, every 60 or 30 minutes isn't uncommon. For your situation you might
even consider running it continuously in a loop.

 I see in the 8.3 list of coming changes that the FSM will try to re-use pages
 in a better way to help truncating empty pages. Is this correct ?

There are several people working on improvements to vacuum but it's not clear
right now exactly what we'll end up with. I think most of the directly vacuum
related changes wouldn't actually help you either. 

The one that would help you is named HOT. If you're interested in
experimenting with an experimental patch you could consider taking CVS and
applying HOT and seeing how it affects you. Or if you see an announcement that
it's been comitted taking a beta and experimenting with it before the 8.3
release could be interesting. Experiments with real-world databases can be
very helpful for developers since it's hard to construct truly realistic
benchmarks.

 So, I would like to truncate the table when the number of rows reaches 0 (just
 after the table was processed, and just before some new rows are added).

 Is there an easy way to do this under psql ? For example, lock the table, do a
 count(*), if result is 0 row then truncate the table, unlock the table (a kind
 of atomic 'truncate table if count(*) == 0').

 Would this work and what would be the steps ?

It would work but you may end up keeping the lock for longer than you're happy
for. Another option to consider would be to use CLUSTER instead of vacuum full
though the 8.2 CLUSTER wasn't entirely MVCC safe and I think in your situation
that might actually be a problem. It would cause transactions that started
before the cluster (but didn't access the table before the cluster) to not see
any records after the cluster.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Pomarede Nicolas

On Tue, 8 May 2007, Heikki Linnakangas wrote:


Pomarede Nicolas wrote:
But for the data (dead rows), even running a vacuum analyze every day is 
not enough, and doesn't truncate some empty pages at the end, so the data 
size remains in the order of 200-300 MB, when only a few effective rows are 
there.


For a table like that you should run VACUUM much more often than once a day. 
Turn on autovacuum, or set up a cron script etc. to run it every 15 minutes 
or so.


Yes, I already do this on another spool table ; I run a vacuum after 
processing it, but I wondered if there was another way to keep the disk 
size low for this table.


As for autovacuum, the threshold values to analyze/vacuum are not adapted 
to my situation, because I have some big tables that I prefer to keep 
vacuumed frequently to prevent growing in disk size, even if the number of 
insert/update is not big enough and in my case autovacuum would not run 
often enough. Instead of configuring autovacuum on a per table basis, I 
prefer running a vacuum on the database every day.






Running a vacuum full is a solution for now, but it locks the table for too 
long (10 minutes or so), which is not acceptable in that case, since events 
should be processed in less that 10 seconds.


So, I would like to truncate the table when the number of rows reaches 0 
(just after the table was processed, and just before some new rows are 
added).


Is there an easy way to do this under psql ? For example, lock the table, 
do a count(*), if result is 0 row then truncate the table, unlock the table 
(a kind of atomic 'truncate table if count(*) == 0').


Would this work and what would be the steps ?


It should work, just like you describe it, with the caveat that TRUNCATE will 
remove any old row versions that might still be visible to an older 
transaction running in serializable mode. It sounds like it's not a problem 
in your scenario, but it's hard to say for sure without seeing the 
application. Running vacuum more often is probably a simpler and better 
solution, anyway.


Which version of PostgreSQL is this?


Shouldn't locking the table prevent this ? I mean, if I try to get an 
exclusive lock on the table, shouldn't I get one only when there's no 
older transaction, and in that case I can truncate the table safely, 
knowing that no one is accessing it due to the lock ?


the pg version is 8.1.2 (not the latest I know, but migrating this base is 
quite complicated since it needs to be up 24/24 a day)


thanks

Nicolas


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread ismo . tuononen


On Tue, 8 May 2007, Pomarede Nicolas wrote:

 As you can see, with hundreds of thousands events a day, this table will need
 being vaccumed regularly to avoid taking too much space (data and index).
 
 Note that processing rows is quite fast in fact, so at any time a count(*) on
 this table rarely exceeds 10-20 rows.
 
 For the indexes, a good way to bring them to a size corresponding to the
 actual count(*) is to run 'reindex'.

why you have index in table where is only 10-20 rows?

are those indexes to prevent some duplicate rows?

I have some tables just to store unprosessed data, and because there is 
only few rows and I always process all rows there is no need for 
indexes. there is just column named id, and when I insert row I take 
nextval('id_seq') :

insert into some_tmp_table(id,'message',...) values (nextval('id_seq'),'do 
something',...);

I know that deleting is slower than with indexes, but it's still fast 
enough, because all rows are in memory.

and that id-column is just for delete, it's unique and i can always delete 
using only it.

Ismo

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Heikki Linnakangas

Pomarede Nicolas wrote:

On Tue, 8 May 2007, Heikki Linnakangas wrote:

Pomarede Nicolas wrote:
But for the data (dead rows), even running a vacuum analyze every day 
is not enough, and doesn't truncate some empty pages at the end, so 
the data size remains in the order of 200-300 MB, when only a few 
effective rows are there.


For a table like that you should run VACUUM much more often than once 
a day. Turn on autovacuum, or set up a cron script etc. to run it 
every 15 minutes or so.


Yes, I already do this on another spool table ; I run a vacuum after 
processing it, but I wondered if there was another way to keep the disk 
size low for this table.


How much concurrent activity is there in the database? Running a vacuum 
right after processing it would not remove the deleted tuples if there's 
another transaction running at the same time. Running the vacuum a few 
minutes later might help with that. You should run VACUUM VERBOSE to see 
how many non-removable dead tuples there is.


Is there an easy way to do this under psql ? For example, lock the 
table, do a count(*), if result is 0 row then truncate the table, 
unlock the table (a kind of atomic 'truncate table if count(*) == 0').


Would this work and what would be the steps ?


It should work, just like you describe it, with the caveat that 
TRUNCATE will remove any old row versions that might still be visible 
to an older transaction running in serializable mode. It sounds like 
it's not a problem in your scenario, but it's hard to say for sure 
without seeing the application. Running vacuum more often is probably 
a simpler and better solution, anyway.


Shouldn't locking the table prevent this ? I mean, if I try to get an 
exclusive lock on the table, shouldn't I get one only when there's no 
older transaction, and in that case I can truncate the table safely, 
knowing that no one is accessing it due to the lock ?


Serializable transactions that started before the transaction that takes 
the lock would need to see the old row versions:


Xact 1: BEGIN ISOLATION LEVEL SERIALIZABLE;
Xact 1: SELECT 1; -- To take a snapshot, perform any query
Xact 2: DELETE FROM foo;
Xact 3: BEGIN;
Xact 3: LOCK TABLE foo;
Xact 3: SELECT COUNT(*) FROM foo; -- Sees delete by xact 2, returns 0,
Xact 3: TRUNCATE foo;
Xact 3: COMMIT;
Xact 1: SELECT COUNT(*) FROM foo; -- Returns 0, but because the 
transaction is in serializable mode, it should've still seen the rows 
deleted by xact 2.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Trygve Laugstøl

[EMAIL PROTECTED] wrote:

On Tue, 8 May 2007, Trygve Laugstøl wrote:

 currently ZFS is only available on Solaris, parts of it have been 
released
 under GPLv2, but it doesn't look like enough of it to be ported to 
Linux
 (enough was released for grub to be able to access it read-only, but 
not
 the full filesystem). there are also patent concerns that are 
preventing

 any porting to Linux.


This is not entirely correct. ZFS is only under the CDDL license and 
it has been ported to FreeBSD.


http://mail.opensolaris.org/pipermail/zfs-discuss/2007-April/026922.html

I wonder how they handled the license issues? I thought that if you 
combined stuff that was BSD licensed with stuff with a more restrictive 
license the result was under the more restrictive license. thanks for 
the info.


The CDDL is not a restrictive license like GPL, it is based on the MIT 
license so it can be used with BSD stuff without problems. There are 
lots of discussion going on (read: flamewars) on the opensolaris lists 
about how it can/should it/will it be integrated into linux.



here's a link about the GPLv2 stuff for zfs

http://blogs.sun.com/darren/entry/zfs_under_gplv2_already_exists


That title is fairly misleading as it's only some read-only bits to be 
able to boot off ZFS with grub.


--
Trygve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Gregory Stark

Pomarede Nicolas [EMAIL PROTECTED] writes:

 Yes, I already do this on another spool table ; I run a vacuum after 
 processing
 it, but I wondered if there was another way to keep the disk size low for this
 table.

after processing it might be too soon if there are still transactions around
that are a few minutes old and predate you committing after processing it.

But any table that receives as many deletes or updates as these tables do will
need to be vacuumed on the order of minutes, not days.

 It should work, just like you describe it, with the caveat that TRUNCATE will
 remove any old row versions that might still be visible to an older
 transaction running in serializable mode. 

 Shouldn't locking the table prevent this ? I mean, if I try to get an 
 exclusive
 lock on the table, shouldn't I get one only when there's no older transaction,
 and in that case I can truncate the table safely, knowing that no one is
 accessing it due to the lock ?

It would arise if the transaction starts before you take the lock but hasn't
looked at the table yet. Then the lock table succeeds, you truncate it and
commit, then the old transaction gets around to looking at the table.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Pomarede Nicolas

On Tue, 8 May 2007, Heikki Linnakangas wrote:


Pomarede Nicolas wrote:

On Tue, 8 May 2007, Heikki Linnakangas wrote:

Pomarede Nicolas wrote:
But for the data (dead rows), even running a vacuum analyze every day is 
not enough, and doesn't truncate some empty pages at the end, so the data 
size remains in the order of 200-300 MB, when only a few effective rows 
are there.


For a table like that you should run VACUUM much more often than once a 
day. Turn on autovacuum, or set up a cron script etc. to run it every 15 
minutes or so.


Yes, I already do this on another spool table ; I run a vacuum after 
processing it, but I wondered if there was another way to keep the disk 
size low for this table.


How much concurrent activity is there in the database? Running a vacuum right 
after processing it would not remove the deleted tuples if there's another 
transaction running at the same time. Running the vacuum a few minutes later 
might help with that. You should run VACUUM VERBOSE to see how many 
non-removable dead tuples there is.




There's not too much simultaneous transaction on the database, most of the 
time it shouldn't exceed one minute (worst case). Except, as I need to run 
a vacuum analyze on the whole database every day, it now takes 8 hours to 
do the vacuum (I changed vacuum values to be a little slower instead of 
taking too much i/o and making the base unusable, because with 
default vacuum values it takes 3-4 hours of high i/o usage (total base 
is 20 GB) ).


So, at this time, the complete vacuum is running, and vacuuming only the 
spool table gives all dead rows are currently not removable (which is 
normal).


I will run it again later when the complete vacuum is over, to see how 
pages are affected.



Nicolas



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Heikki Linnakangas

Pomarede Nicolas wrote:
There's not too much simultaneous transaction on the database, most of 
the time it shouldn't exceed one minute (worst case). Except, as I need 
to run a vacuum analyze on the whole database every day, it now takes 8 
hours to do the vacuum (I changed vacuum values to be a little slower 
instead of taking too much i/o and making the base unusable, because 
with default vacuum values it takes 3-4 hours of high i/o usage (total 
base is 20 GB) ).


So, at this time, the complete vacuum is running, and vacuuming only the 
spool table gives all dead rows are currently not removable (which is 
normal).


Oh, I see. I know you don't want to upgrade, but that was changed in 
8.2. Vacuum now ignores concurrent vacuums in the oldest xid 
calculation, so the long-running vacuum won't stop the vacuum on the 
spool table from removing dead rows.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [OT] Best OS for Postgres 8.2

2007-05-08 Thread C. Bergström
I'm really not a senior member around here and while all this licensing
stuff and underlying fs between OSs is very interesting can we please
think twice before continuing it.

Thanks for the minute,

./C

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Guillaume Cottenceau
Heikki Linnakangas heikki 'at' enterprisedb.com writes:

 Pomarede Nicolas wrote:
  But for the data (dead rows), even running a vacuum analyze every
  day is not enough, and doesn't truncate some empty pages at the end,
  so the data size remains in the order of 200-300 MB, when only a few
  effective rows are there.
 
 For a table like that you should run VACUUM much more often than once
 a day. Turn on autovacuum, or set up a cron script etc. to run it
 every 15 minutes or so.

Heikki, is there theoretical need for frequent VACUUM when
max_fsm_pages is large enough to hold references of dead rows?

VACUUM documentation says: tuples that are deleted or obsoleted
by an update are not physically removed from their table; they
remain present until a VACUUM is done.

Free Space Map documentation says: the shared free space map
tracks the locations of unused space in the database. An
undersized free space map may cause the database to consume
increasing amounts of disk space over time, because free space
that is not in the map cannot be re-used.

I am not sure of the relationship between these two statements.
Are these deleted/obsoleted tuples stored in the FSM and actually
the occupied space is reused before a VACUUM is performed, or is
something else happening? Maybe the FSM is only storing a
reference to diskspages containing only dead rows, and that's the
difference I've been missing?

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Heikki Linnakangas

Guillaume Cottenceau wrote:

Heikki, is there theoretical need for frequent VACUUM when
max_fsm_pages is large enough to hold references of dead rows?


Not really, if you don't mind that your table with 10 rows takes 
hundreds of megabytes on disk. If max_fsm_pages is large enough, the 
table size will reach a steady state size and won't grow further. It 
depends on your scenario, it might be totally acceptable.



VACUUM documentation says: tuples that are deleted or obsoleted
by an update are not physically removed from their table; they
remain present until a VACUUM is done.

Free Space Map documentation says: the shared free space map
tracks the locations of unused space in the database. An
undersized free space map may cause the database to consume
increasing amounts of disk space over time, because free space
that is not in the map cannot be re-used.

I am not sure of the relationship between these two statements.
Are these deleted/obsoleted tuples stored in the FSM and actually
the occupied space is reused before a VACUUM is performed, or is
something else happening? Maybe the FSM is only storing a
reference to diskspages containing only dead rows, and that's the
difference I've been missing?


FSM stores information on how much free space there is on each page. 
Deleted but not yet vacuumed tuples don't count as free space. If a page 
is full of dead tuples, it's not usable for inserting new tuples, and 
it's not recorded in the FSM.


When vacuum runs, it physically removes tuples from the table and frees 
the space occupied by them. At the end it updates the FSM.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Luke Lonergan
WRT ZFS on Linux, if someone were to port it, the license issue would get 
worked out IMO (with some discussion to back me up).  From discussions with the 
developers, the biggest issue is a technical one: the Linux VFS layer makes the 
port difficult.

I don't hold any hope that the FUSE port will be a happy thing, the performance 
won't be there.

Any volunteers to port ZFS to Linux?

- Luke


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] estimating the need for VACUUM FULL and REINDEX

2007-05-08 Thread Guillaume Cottenceau
I'm trying to come up with a way to estimate the need for a
VACUUM FULL and/or a REINDEX on some tables.


According to documentation[1], VACUUM FULL's only benefit is
returning unused disk space to the operating system; am I correct
in assuming there's also the benefit of optimizing the
performance of scans, because rows are physically compacted on
the disk?

With that in mind, I've tried to estimate how much benefit would
be brought by running VACUUM FULL, with the output of VACUUM
VERBOSE. However, it seems that for example the removable rows
reported by each VACUUM VERBOSE run is actually reused by VACUUM,
so is not what I'm looking for.


Then according to documentation[2], REINDEX has some benefit when
all but a few index keys on a page have been deleted, because the
page remains allocated (thus, I assume it improves index scan
performance, am I correct?). However, again I'm unable to
estimate the expected benefit. With a slightly modified version
of a query found in documentation[3] to see the pages used by a
relation[4], I'm able to see that the index data from a given
table...

relname | relpages | reltuples 
+--+---
 idx_sessions_owner_key |   38 |  2166
 pk_sessions|   25 |  2166

...is duly optimized after a REINDEX:

relname | relpages | reltuples 
+--+---
 idx_sessions_owner_key |   13 |  2166
 pk_sessions|7 |  2166

but what I'd need is really these 38-13 and 25-7 figures (or
estimates) prior to running REINDEX.


Thanks for any insight.


Ref: 
[1] http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html

[2] http://www.postgresql.org/docs/8.2/interactive/routine-reindex.html

[3] http://www.postgresql.org/docs/8.2/interactive/disk-usage.html

[4] SELECT c2.relname, c2.relpages, c2.reltuples
  FROM pg_class c, pg_class c2, pg_index i
 WHERE c.relname = 'sessions'
   AND c.oid = i.indrelid
   AND c2.oid = i.indexrelid
 ORDER BY c2.relname;

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [OT] Best OS for Postgres 8.2

2007-05-08 Thread Adam Tauno Williams
 I'm really not a senior member around here and while all this licensing
 stuff and underlying fs between OSs is very interesting can we please
 think twice before continuing it.

Agree, there are other lists for this stuff;  and back to what one of
the original posters said: it doesn't matter much.

[Also not a regular poster, but I always gain something from reading
this list.]

Most people who really go into OS selection /  FS selection are looking
for a cheap/silver bullet for performance.  No such thing exists.  The
difference made by any modern OS/FS is almost immaterial.  You need to
do the slow slogging work of site/application specific optimization and
tuning;  that is where you will find significant performance
improvements.

- 
Adam Tauno Williams, Network  Systems Administrator
Consultant - http://www.whitemiceconsulting.com
Developer - http://www.opengroupware.org


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] estimating the need for VACUUM FULL and REINDEX

2007-05-08 Thread Heikki Linnakangas

Guillaume Cottenceau wrote:

According to documentation[1], VACUUM FULL's only benefit is
returning unused disk space to the operating system; am I correct
in assuming there's also the benefit of optimizing the
performance of scans, because rows are physically compacted on
the disk?


That's right.


With that in mind, I've tried to estimate how much benefit would
be brought by running VACUUM FULL, with the output of VACUUM
VERBOSE. However, it seems that for example the removable rows
reported by each VACUUM VERBOSE run is actually reused by VACUUM,
so is not what I'm looking for.


Take a look at contrib/pgstattuple. If a table has high percentage of 
free space, VACUUM FULL will compact that out.



Then according to documentation[2], REINDEX has some benefit when
all but a few index keys on a page have been deleted, because the
page remains allocated (thus, I assume it improves index scan
performance, am I correct?). However, again I'm unable to
estimate the expected benefit. With a slightly modified version
of a query found in documentation[3] to see the pages used by a
relation[4], I'm able to see that the index data from a given
table...


See pgstatindex, in the same contrib-module. The number you're looking 
for is avg_leaf_density. REINDEX will bring that to 90% (with default 
fill factor), so if it's much lower than that REINDEX will help.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] estimating the need for VACUUM FULL and REINDEX

2007-05-08 Thread Bill Moran
In response to Guillaume Cottenceau [EMAIL PROTECTED]:

 I'm trying to come up with a way to estimate the need for a
 VACUUM FULL and/or a REINDEX on some tables.

You shouldn't vacuum full unless you have a good reason.  Vacuum full
causes index bloat.

 According to documentation[1], VACUUM FULL's only benefit is
 returning unused disk space to the operating system; am I correct
 in assuming there's also the benefit of optimizing the
 performance of scans, because rows are physically compacted on
 the disk?

In my experience, the smaller the overall database size, the less shared
memory it requires.  Keeping it vacuumed will reduce the amount of space
taken up in memory, which means it's more likely that the data you need
at any particular time is in memory.

Look up a thread with my name on it a lot related to reindexing.  I did
some experiments with indexes and reindexing and the only advantage I found
was that the space requirement for the indexes is reduced by reindexing.
I was not able to find any performance difference in newly created indexes
vs. indexes that were starting to bloat.

 With that in mind, I've tried to estimate how much benefit would
 be brought by running VACUUM FULL, with the output of VACUUM
 VERBOSE. However, it seems that for example the removable rows
 reported by each VACUUM VERBOSE run is actually reused by VACUUM,
 so is not what I'm looking for.

I'm not sure what you mean by that last sentence.

There are only two circumstances (I can think of) for running vacuum
full:
1) You've just made some major change to the database (such as adding
   an obscene # of records, making massive changes to a large
   percentage of the existing data, or issuing a lot of alter table)
   and want to get the FSM back down to a manageable size.
2) You are desperately hurting for disk space, and need a holdover
   until you can get bigger drives.

Reindexing pretty much falls into the same 2 scenerios.  I do recommend
that you reindex after any vacuum full.

However, a much better approach is to either schedule frequent vacuums
(without the full) or configure/enable autovacuum appropriately for your
setup.

 Then according to documentation[2], REINDEX has some benefit when
 all but a few index keys on a page have been deleted, because the
 page remains allocated (thus, I assume it improves index scan
 performance, am I correct?). However, again I'm unable to
 estimate the expected benefit. With a slightly modified version
 of a query found in documentation[3] to see the pages used by a
 relation[4], I'm able to see that the index data from a given
 table...
 
 relname | relpages | reltuples 
 +--+---
  idx_sessions_owner_key |   38 |  2166
  pk_sessions|   25 |  2166
 
 ...is duly optimized after a REINDEX:
 
 relname | relpages | reltuples 
 +--+---
  idx_sessions_owner_key |   13 |  2166
  pk_sessions|7 |  2166
 
 but what I'd need is really these 38-13 and 25-7 figures (or
 estimates) prior to running REINDEX.

Again, my experience shows that reindexing is only worthwhile if you're
really hurting for disk space/memory.

I don't know of any way to tell what size an index would be if it were
completely packed, but it doesn't seem as if this is the best approach
anyway.  Newer versions of PG have the option to create indexes with
empty space already there at creation time (I believe this is called
fill factor) to allow for future growth.

The only other reason I can see for vacuum full/reindex is if you _can_.
For example, if there is a period that you know the database will be
unused that it sufficiently long that you know these operations can
complete.  Keep in mind that both reindex and vacuum full create performance
problems while they are running.  If you knew, however, that the system
was _never_ being used between 6:00 PM and 8:00 AM, you could run them
over night.  In that case, I would recommend replacing vacuum full with
cluster.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Susan Russo

Hi,

Despite numerous efforts, we're unable to solve a severe performance limitation 
between Pg 7.3.2
and Pg 8.1.4.

The query and 'explain analyze' plan below, runs in 
26.20 msec on Pg 7.3.2, and 
2463.968 ms on Pg 8.1.4, 

and the Pg7.3.2 is on older hardware and OS.

Multiply this time difference by 82K, and a 10 minute procedure (which includes
this query), now runs in 10 *hours*.not good

In general, however, we're pleased with performance of this very same Pg8.1.4 
server 
as compared to the Pg7.3.2 server (loading/dumping, and other queries are much 
faster).

QUERY:

SELECT dx.db_id, dx.accession, f.uniquename, f.name, cvt.name as ntype,
fd.is_current 
from feature f, feature_dbxref fd, dbxref dx, cvterm cvt 
where fd.dbxref_id = dx.dbxref_id
and fd.feature_id = f.feature_id 
and f.type_id = cvt.cvterm_id 
and accession like 'AY851043%' 
and cvt.name not in 
('gene','protein','natural_transposable_element','chromosome_structure_variation','chromosome_arm','repeat_region')
;


explain analyze output on Pg7.3.2:

---
 Nested Loop  (cost=0.00..23.45 rows=1 width=120) (actual time=25.59..25.59 
rows=0 loops=1)
   -  Nested Loop  (cost=0.00..17.49 rows=1 width=82) (actual 
time=25.58..25.58 rows=0 loops=1)
 -  Nested Loop  (cost=0.00..11.93 rows=1 width=30) (actual 
time=25.58..25.58 rows=0 loops=1)
   -  Index Scan using dbxref_idx2 on dbxref dx  (cost=0.00..5.83 
rows=1 width=21) (actual time=25.58..25.58 rows=0 loops=1)
 Index Cond: ((accession = 'AY851043'::character varying) 
AND (accession  'AY851044'::character varying))
 Filter: (accession ~~ 'AY851043%'::text)
   -  Index Scan using feature_dbxref_idx2 on feature_dbxref fd  
(cost=0.00..6.05 rows=5 width=9) (never executed)
 Index Cond: (fd.dbxref_id = outer.dbxref_id)
 -  Index Scan using feature_pkey on feature f  (cost=0.00..5.54 
rows=1 width=52) (never executed)
   Index Cond: (outer.feature_id = f.feature_id)
   -  Index Scan using cvterm_pkey on cvterm cvt  (cost=0.00..5.94 rows=1 
width=38) (never executed)
 Index Cond: (outer.type_id = cvt.cvterm_id)
 Filter: ((name  'gene'::character varying) AND (name  
'protein'::character varying) AND (name  
'natural_transposable_element'::character varying) AND (name  
'chromosome_structure_variation'::character varying) AND (name  
'chromosome_arm'::character varying) AND (name  'repeat_region'::character 
varying))
 Total runtime: 26.20 msec
(14 rows)




explain analyze output on Pg8.1.4:

-
 Nested Loop  (cost=0.00..47939.87 rows=1 width=108) (actual 
time=2463.654..2463.654 rows=0 loops=1)
   -  Nested Loop  (cost=0.00..47933.92 rows=1 width=73) (actual 
time=2463.651..2463.651 rows=0 loops=1)
 -  Nested Loop  (cost=0.00..47929.86 rows=1 width=22) (actual 
time=2463.649..2463.649 rows=0 loops=1)
   -  Seq Scan on dbxref dx  (cost=0.00..47923.91 rows=1 width=21) 
(actual time=2463.646..2463.646 rows=0 loops=1)
 Filter: ((accession)::text ~~ 'AY851043%'::text)
   -  Index Scan using feature_dbxref_idx2 on feature_dbxref fd  
(cost=0.00..5.90 rows=4 width=9) (never executed)
 Index Cond: (fd.dbxref_id = outer.dbxref_id)
 -  Index Scan using feature_pkey on feature f  (cost=0.00..4.05 
rows=1 width=59) (never executed)
   Index Cond: (outer.feature_id = f.feature_id)
   -  Index Scan using cvterm_pkey on cvterm cvt  (cost=0.00..5.94 rows=1 
width=43) (never executed)
 Index Cond: (outer.type_id = cvt.cvterm_id)
 Filter: (((name)::text  'gene'::text) AND ((name)::text  
'protein'::text) AND ((name)::text  'natural_transposable_element'::text) AND 
((name)::text  'chromosome_structure_variation'::text) AND ((name)::text  
'chromosome_arm'::text) AND ((name)::text  'repeat_region'::text))
 Total runtime: 2463.968 ms
(13 rows)


===

I tried tuning configs, including shutting off enable seqscan, forcing use of 
index (set shared_buffers high
with random_page_cost set low).  A colleague who gets 1697ms on Pg8.1.4 with 
this query provided his 
postgresql.conf -- didn't help

We use standard dump/load commands between these servers:
pg_dump -O fb_2007_01_05 | compress  fb_2007_01_05.Z
uncompress -c fb_2007_01_05.Z  |  psql fb_2007_01_05

Hardware/OS specs:
- Pg7.3.2:  SunFire 280R, 900mHz SPARC processor, 3gb total RAM, 10Krpm 
SCSI internal disks, Solaris 2.8 
- Pg8.1.4:  v240 - dual Ultra-SPARC IIIi 1500MHz SPARC processor, 8GB 
total RAM, Solaris 2.10 
  (used both Sun-supplied postgres binaries, and compiled postgres from 
source)


Thanks for your help,
Susan Russo



Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Tom Lane
Susan Russo [EMAIL PROTECTED] writes:
 Despite numerous efforts, we're unable to solve a severe performance 
 limitation between Pg 7.3.2
 and Pg 8.1.4.

 The query and 'explain analyze' plan below, runs in 
   26.20 msec on Pg 7.3.2, and 
   2463.968 ms on Pg 8.1.4, 

You're not getting the indexscan optimization of the LIKE clause, which
is most likely due to having initdb'd the 8.1 installation in something
other than C locale.  You can either redo the initdb in C locale (which
might be a good move to fix other inconsistencies from the 7.3 behavior
you're used to) or create a varchar_pattern_ops index on the column(s)
you're using LIKE with.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Steinar H. Gunderson
On Tue, May 08, 2007 at 10:18:34AM -0400, Susan Russo wrote:
 explain analyze output on Pg7.3.2:
 
-  Index Scan using dbxref_idx2 on dbxref dx  
 (cost=0.00..5.83 rows=1 width=21) (actual time=25.58..25.58 rows=0 loops=1)
  Index Cond: ((accession = 'AY851043'::character 
 varying) AND (accession  'AY851044'::character varying))
  Filter: (accession ~~ 'AY851043%'::text)
 
 explain analyze output on Pg8.1.4:
 
-  Seq Scan on dbxref dx  (cost=0.00..47923.91 rows=1 
 width=21) (actual time=2463.646..2463.646 rows=0 loops=1)
  Filter: ((accession)::text ~~ 'AY851043%'::text)

This is almost all of your cost. Did you perchance initdb the 8.1.4 cluster
in a non-C locale? You could always try

  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);

which would create an index that might be more useful for your LIKE query,
even in a non-C locale.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Richard Broersma Jr

--- Susan Russo [EMAIL PROTECTED] wrote:
 and accession like 'AY851043%' 

I don't know if you've tried refactoring your query, but you could try:

   AND accession BETWEEN 'AY8510430' AND 'AY8510439'  -- where the last digit is
  ^   ^   -- lowest AND highest 
expected value

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Alexander Staubo

On 5/8/07, Tom Lane [EMAIL PROTECTED] wrote:

You're not getting the indexscan optimization of the LIKE clause, which
is most likely due to having initdb'd the 8.1 installation in something
other than C locale.  You can either redo the initdb in C locale (which
might be a good move to fix other inconsistencies from the 7.3 behavior
you're used to) or create a varchar_pattern_ops index on the column(s)
you're using LIKE with.


Given the performance implications of setting the wrong locale, and
the high probability of accidentally doing this (I run my shells with
LANG=en_US.UTF-8, so all my databases have inherited this locale), why
is there no support for changing the database locale after the fact?

# alter database test set lc_collate = 'C';
ERROR:  parameter lc_collate cannot be changed

Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Bill Moran
In response to Alexander Staubo [EMAIL PROTECTED]:

 On 5/8/07, Tom Lane [EMAIL PROTECTED] wrote:
  You're not getting the indexscan optimization of the LIKE clause, which
  is most likely due to having initdb'd the 8.1 installation in something
  other than C locale.  You can either redo the initdb in C locale (which
  might be a good move to fix other inconsistencies from the 7.3 behavior
  you're used to) or create a varchar_pattern_ops index on the column(s)
  you're using LIKE with.
 
 Given the performance implications of setting the wrong locale, and
 the high probability of accidentally doing this (I run my shells with
 LANG=en_US.UTF-8, so all my databases have inherited this locale), why
 is there no support for changing the database locale after the fact?
 
 # alter database test set lc_collate = 'C';
 ERROR:  parameter lc_collate cannot be changed

How would that command handle UTF data that could not be converted to C?

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Tom Lane
Alexander Staubo [EMAIL PROTECTED] writes:
 why is there no support for changing the database locale after the fact?

It'd corrupt all your indexes (or all the ones on textual columns anyway).

There are some TODO entries related to this, but don't hold your breath
waiting for a fix ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Nested loops overpriced

2007-05-08 Thread Peter Eisentraut
This query does some sort of analysis on an email archive:

SELECT
eh_subj.header_body AS subject,
count(distinct eh_from.header_body)
FROM
email JOIN mime_part USING (email_id)
JOIN email_header eh_subj USING (email_id, mime_part_id)
JOIN email_header eh_from USING (email_id, mime_part_id)
WHERE
eh_subj.header_name = 'subject'
AND eh_from.header_name = 'from'
AND mime_part_id = 0
AND (time = timestamp '2007-05-05 17:01:59' AND time  
timestamp '2007-05-05 17:01:59' + interval '60 min')
GROUP BY
eh_subj.header_body;

The planner chooses this plan:


  QUERY PLAN
   
---
 GroupAggregate  (cost=87142.18..87366.58 rows=11220 width=184) (actual 
time=7883.541..8120.647 rows=35000 loops=1)
   -  Sort  (cost=87142.18..87170.23 rows=11220 width=184) (actual 
time=7883.471..7926.031 rows=35000 loops=1)
 Sort Key: eh_subj.header_body
 -  Hash Join  (cost=46283.30..86387.42 rows=11220 width=184) (actual 
time=5140.182..7635.615 rows=35000 loops=1)
   Hash Cond: (eh_subj.email_id = email.email_id)
   -  Bitmap Heap Scan on email_header eh_subj  
(cost=11853.68..50142.87 rows=272434 width=104) (actual time=367.956..1719.736 
rows=280989 loops=1)
 Recheck Cond: ((mime_part_id = 0) AND (header_name = 
'subject'::text))
 -  BitmapAnd  (cost=11853.68..11853.68 rows=27607 
width=0) (actual time=326.507..326.507 rows=0 loops=1)
   -  Bitmap Index Scan on 
idx__email_header__header_body_subject  (cost=0.00..5836.24 rows=272434 
width=0) (actual time=178.041..178.041 rows=280989 loops=1)
   -  Bitmap Index Scan on 
idx__email_header__header_name  (cost=0.00..5880.97 rows=281247 width=0) 
(actual time=114.574..114.574 rows=280989 loops=1)
 Index Cond: (header_name = 'subject'::text)
   -  Hash  (cost=34291.87..34291.87 rows=11020 width=120) (actual 
time=4772.148..4772.148 rows=35000 loops=1)
 -  Hash Join  (cost=24164.59..34291.87 rows=11020 
width=120) (actual time=3131.067..4706.997 rows=35000 loops=1)
   Hash Cond: (mime_part.email_id = email.email_id)
   -  Seq Scan on mime_part  (cost=0.00..8355.81 
rows=265804 width=12) (actual time=0.038..514.291 rows=267890 loops=1)
 Filter: (mime_part_id = 0)
   -  Hash  (cost=24025.94..24025.94 rows=11092 
width=112) (actual time=3130.982..3130.982 rows=35000 loops=1)
 -  Hash Join  (cost=22244.54..24025.94 
rows=11092 width=112) (actual time=996.556..3069.280 rows=35000 loops=1)
   Hash Cond: (eh_from.email_id = 
email.email_id)
   -  Bitmap Heap Scan on email_header 
eh_from  (cost=15576.58..16041.55 rows=107156 width=104) (actual 
time=569.762..1932.017 rows=280990 loops=1)
 Recheck Cond: ((mime_part_id = 0) 
AND (header_name = 'from'::text))
 -  BitmapAnd  
(cost=15576.58..15576.58 rows=160 width=0) (actual time=532.217..532.217 rows=0 
loops=1)
   -  Bitmap Index Scan on 
dummy_index  (cost=0.00..3724.22 rows=107156 width=0) (actual 
time=116.386..116.386 rows=280990 loops=1)
   -  Bitmap Index Scan on 
idx__email_header__from_local  (cost=0.00..5779.24 rows=107156 width=0) (actual 
time=174.883..174.883 rows=280990 loops=1)
   -  Bitmap Index Scan on 
dummy2_index  (cost=0.00..5992.25 rows=107156 width=0) (actual 
time=173.575..173.575 rows=280990 loops=1)
   -  Hash  (cost=6321.79..6321.79 
rows=27694 width=8) (actual time=426.739..426.739 rows=35000 loops=1)
 -  Index Scan using 
idx__email__time on email  (cost=0.00..6321.79 rows=27694 width=8) (actual 
time=50.000..375.021 rows=35000 loops=1)
   Index Cond: ((time = 
'2007-05-05 17:01:59'::timestamp without time zone) AND (time  '2007-05-05 
18:01:59'::timestamp without time zone))
 Total runtime: 8160.442 ms

The estimates all look pretty good and reasonable.

A faster plan, however, is this:

  

Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Susan Russo
Hi,

You could always try

  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);

WOW!  we're now at runtime 0.367ms on Pg8

Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).

Thanks again - will report back soon.

Susan

From [EMAIL PROTECTED] Tue May  8 10:49:14 2007
X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on borise.harvard.edu
X-Spam-Status: No, score=-1.7 required=3.0 tests=AWL,BAYES_00 autolearn=ham 
version=3.1.4
X-Spam-Level: 
X-Greylist: from auto-whitelisted by SQLgrey-1.7.5
Date: Tue, 8 May 2007 16:48:34 +0200
From: Steinar H. Gunderson [EMAIL PROTECTED]
To: Susan Russo [EMAIL PROTECTED]
Cc: pgsql-performance@postgresql.org, [EMAIL PROTECTED]
Subject: Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
Mail-Followup-To: Susan Russo [EMAIL PROTECTED],
pgsql-performance@postgresql.org, [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Disposition: inline
X-Operating-System: Linux 2.6.20.4 on a x86_64
X-Message-Flag: Outlook? -- http://www.mozilla.org/products/thunderbird/
User-Agent: Mutt/1.5.13 (2006-08-11)
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Mailing-List: pgsql-performance
List-Archive: http://archives.postgresql.org/pgsql-performance
List-Help: mailto:[EMAIL PROTECTED]
List-ID: pgsql-performance.postgresql.org
List-Owner: mailto:[EMAIL PROTECTED]
List-Post: mailto:pgsql-performance@postgresql.org
List-Subscribe: mailto:[EMAIL PROTECTED]
List-Unsubscribe: mailto:[EMAIL PROTECTED]

On Tue, May 08, 2007 at 10:18:34AM -0400, Susan Russo wrote:
 explain analyze output on Pg7.3.2:
 
-  Index Scan using dbxref_idx2 on dbxref dx  
 (cost=0.00..5.83 rows=1 width=21) (actual time=25.58..25.58 rows=0 loops=1)
  Index Cond: ((accession = 'AY851043'::character 
 varying) AND (accession  'AY851044'::character varying))
  Filter: (accession ~~ 'AY851043%'::text)
 
 explain analyze output on Pg8.1.4:
 
-  Seq Scan on dbxref dx  (cost=0.00..47923.91 rows=1 
 width=21) (actual time=2463.646..2463.646 rows=0 loops=1)
  Filter: ((accession)::text ~~ 'AY851043%'::text)

This is almost all of your cost. Did you perchance initdb the 8.1.4 cluster
in a non-C locale? You could always try

  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);

which would create an index that might be more useful for your LIKE query,
even in a non-C locale.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Alvaro Herrera
Susan Russo wrote:
 Hi,
 
 You could always try
 
   CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);
 
 WOW!  we're now at runtime 0.367ms on Pg8
 
 Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).

That's alternative to the pattern_ops index; it won't help you obtain a
plan faster than this one.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Nested loops overpriced

2007-05-08 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Note how spectacularly overpriced this plan is.

Hmm, I'd have expected it to discount the repeated indexscans a lot more
than it seems to be doing for you.  As an example in the regression
database, note what happens to the inner indexscan cost estimate when
the number of outer tuples grows:

regression=# set enable_hashjoin TO 0;
SET
regression=# set enable_mergejoin TO 0;
SET
regression=# set enable_bitmapscan TO 0;
SET
regression=# explain select * from tenk1 a join tenk1 b using (thousand) where 
a.unique1 = 1;
  QUERY PLAN
--
 Nested Loop  (cost=0.00..52.82 rows=10 width=484)
   -  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..8.27 rows=1 
width=244)
 Index Cond: (unique1 = 1)
   -  Index Scan using tenk1_thous_tenthous on tenk1 b  (cost=0.00..44.42 
rows=10 width=244)
 Index Cond: (a.thousand = b.thousand)
(5 rows)

regression=# explain select * from tenk1 a join tenk1 b using (thousand) where 
a.ten = 1;
 QUERY PLAN
-
 Nested Loop  (cost=0.00..2531.08 rows=9171 width=484)
   -  Seq Scan on tenk1 a  (cost=0.00..483.00 rows=900 width=244)
 Filter: (ten = 1)
   -  Index Scan using tenk1_thous_tenthous on tenk1 b  (cost=0.00..2.15 
rows=10 width=244)
 Index Cond: (a.thousand = b.thousand)
(5 rows)

This is with 8.2.4 but AFAICS from the CVS logs, 8.2's cost estimation
code didn't change since 8.2.1.  What do you get for a comparably
simple case?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Ron
I've seen the FUSE port of ZFS, and it does run sslloowwllyy.  It 
appears that a native linux port is going to be required if we want 
ZFS to be reasonably performant.


WRT which FS to use for pg; the biggest issue is what kind of DB you 
will be building.  The best pg FS for OLTP and OLAP are not the same 
IME.  Ditto a dependence on how large your records and the amount of 
IO in your typical transactions are.


For lot's of big, more reads than writes transactions, SGI's XFS 
seems to be best.

XFS is not the best for OLTP.  Especially for OLTP involving lots of small IOs.

jfs seems to be best for that.

Caveat: I have not yet experimented with any version of reiserfs in 
production.


Cheers,
Ron Peacetree


At 08:01 AM 5/8/2007, Luke Lonergan wrote:
WRT ZFS on Linux, if someone were to port it, the license issue 
would get worked out IMO (with some discussion to back me up).  From 
discussions with the developers, the biggest issue is a technical 
one: the Linux VFS layer makes the port difficult.


I don't hold any hope that the FUSE port will be a happy thing, the 
performance won't be there.


Any volunteers to port ZFS to Linux?

- Luke


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-08 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Susan Russo wrote:
 Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).

 That's alternative to the pattern_ops index; it won't help you obtain a
 plan faster than this one.

No, but since their old DB was evidently running in C locale, this seems
like a prudent thing to do to avoid other surprising changes in behavior.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread 李彦 Ian Li
I am back with the chatlog and seem it's the Transparent compression 
that helps a lot, very interesting...


here is the log of #postgresql on Apr. 21th around 13:20 GMT (snipped) :
Solatis  why is that, when hard disk i/o is my bottleneck ?
Solatis  well i have 10 disks in a raid1+0 config
Solatis  it's sata2 yes
Solatis  i run solaris express, whose kernel says SunOS
Solatis  running 'SunOS solatis2 5.11 snv_61 i86pc i386 i86pc
Solatis  well, the thing is, i'm using zfs
Solatis  yeah, it was the reason for me to install solaris in 
the first place
Solatis  and a benchmark for my system comparing debian linux 
with solaris express showed a +- 18% performance gain when switching 
to solaris

Solatis  so i'm happy
Solatis  (note: the benchmarking was not scientifically 
grounded at all, it was just around 50 million stored procedure 
calls which do select/update/inserts on my database which would 
simulate my specific case)

Solatis  but the killer thing was to enable compression on zfs
Solatis  that reduced the hard disk i/o with a factor 3, which 
was the probable cause of the performance increase

Solatis  oh, at the moment it's factor 2.23
Solatis  still, it's funny to see that postgresql says that my 
database is using around 41GB's, while only taking up 18GB on the 
hard disk

=== end of log ===

[EMAIL PROTECTED] wrote:

On Tue, 8 May 2007, �~]~N彦 Ian Li wrote:

In #postgresql on freenode, somebody ever mentioned that ZFS from 
Solaris helps a lot to the performance of pgsql, so dose anyone have 
information about that?


the filesystem you use will affect the performance of postgres 
significantly. I've heard a lot of claims for ZFS, unfortunantly many of 
them from people who have prooven that they didn't know what they were 
talking about by the end of their first or second e-mails.


much of the hype for ZFS is it's volume management capabilities and 
admin tools. Linux has most (if not all) of the volume management 
capabilities, it just seperates them from the filesystems so that any 
filesystem can use them, and as a result you use one tool to setup your 
RAID, one to setup snapshots, and a third to format your filesystems 
where ZFS does this in one userspace tool.


once you seperate the volume management piece out, the actual 
performance question is a lot harder to answer. there are a lot of 
people who say that it's far faster then the alternate filesystems on 
Solaris, but I haven't seen any good comparisons between it and Linux 
filesystems.


On Linux you have the choice of several filesystems, and the perfomance 
will vary wildly depending on your workload. I personally tend to favor 
ext2 (for small filesystems where the application is ensuring data 
integrity) or XFS (for large filesystems)


I personally don't trust reiserfs, jfs seems to be a tools for 
transitioning from AIX more then anything else, and ext3 seems to have 
all the scaling issues of ext2 plus the overhead (and bottleneck) of 
journaling.


one issue with journaling filesystems, if you journal the data as well 
as the metadata you end up with a very reliable setup, however it means 
that all your data needs to be written twice, oncce to the journal, and 
once to the final location. the write to the journal can be slightly 
faster then a normal write to the final location (the journal is a 
sequential write to an existing file), however the need to write twice 
can effectivly cut your disk I/O bandwidth in half when doing heavy 
writes. worse, when you end up writing mor ethen will fit in the journal 
(128M is the max for ext3) the entire system then needs to stall while 
the journal gets cleared to make space for the additional writes.


if you don't journal your data then you avoid the problems above, but in 
a crash you may find that you lost data, even though the filesystem is 
'intact' according to fsck.


David Lang


Regards
Ian

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0

2007-05-08 Thread Pomarede Nicolas

On Tue, 8 May 2007, Heikki Linnakangas wrote:


Pomarede Nicolas wrote:
There's not too much simultaneous transaction on the database, most of the 
time it shouldn't exceed one minute (worst case). Except, as I need to run 
a vacuum analyze on the whole database every day, it now takes 8 hours to 
do the vacuum (I changed vacuum values to be a little slower instead of 
taking too much i/o and making the base unusable, because with default 
vacuum values it takes 3-4 hours of high i/o usage (total base is 20 GB) ).


So, at this time, the complete vacuum is running, and vacuuming only the 
spool table gives all dead rows are currently not removable (which is 
normal).


Oh, I see. I know you don't want to upgrade, but that was changed in 8.2. 
Vacuum now ignores concurrent vacuums in the oldest xid calculation, so the 
long-running vacuum won't stop the vacuum on the spool table from removing 
dead rows.


Well, this concurrent vacuum is very interesting, I didn't notice this in 
8.2, but it would really help here to vacuum frequently this spool table 
and have dead rows removed while the 'big' vacuum is running.

Seems, I will have to consider migrating to 8.2 then :)


Anyway, now my vacuum is over, I can vacuum the spool table and see the 
results :


before : 6422 pages for the data and 1700 pages for the indexes.

after vacuum analyze : 6422 data pages / 1700 index pages


here's the log for vacuum :

fbxtv=# vacuum analyze verbose mysql_spool ;
INFO:  vacuuming public.mysql_spool
INFO:  index pk_mysql_spool now contains 21 row versions in 1700 pages
DETAIL:  7759 index row versions were removed.
1696 index pages have been deleted, 1667 are currently reusable.
CPU 0.01s/0.00u sec elapsed 1.78 sec.
INFO:  mysql_spool: removed 7759 row versions in 1521 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 4.88 sec.
INFO:  mysql_spool: found 7759 removable, 21 nonremovable row versions 
in 6422 pages

DETAIL:  20 dead row versions cannot be removed yet.
There were 261028 unused item pointers.
0 pages are entirely empty.
CPU 0.01s/0.01u sec elapsed 25.90 sec.
INFO:  vacuuming pg_toast.pg_toast_386146338
INFO:  index pg_toast_386146338_index now contains 0 row versions in 1 
pages

DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  pg_toast_386146338: found 0 removable, 0 nonremovable row 
versions in 0 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  analyzing public.mysql_spool
INFO:  mysql_spool: scanned 3000 of 6422 pages, containing 0 live rows 
and 14 dead rows; 0 rows in sample, 0 estimated total rows

VACUUM


So far, so good, nearly all rows are marked as dead and removable. But 
then, if I do 'select ctid,* from mysql_spool', I can see ctid values in 
the range 5934, 5935, 6062, ...


Isn't it possible for postgres to start using pages 0,1,2, ... after the 
vacuum, which would mean that after a few minutes, all high pages number 
would now be completly free and could be truncated when the next vacuum is 
run ?


Actually, if I run another vacuum, some more dead rows are added to the 
list of removable rows, but I can never reach the point where data is 
stored in the low pages number (in my case a few pages would be enough) 
and all other pages get truncated at the end.
Well at least, the number of pages doesn't increase past 6422 in this 
case, but I'd like to reclaim space sometimes.


Is this one of the feature that is planned for 8.3 : reusing low pages 
number in piority after a vacuum to help subsequent vacuums truncating the 
end of the table once data are located at the beginning of the table ?



Thanks to all for all your very interesting answers.

Nicolas



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Query performance problems with partitioned tables

2007-05-08 Thread Fei Liu

Scott Marlowe wrote:

On Thu, 2007-05-03 at 21:37, Merlin Moncure wrote:
  

On 5/3/07, Fei Liu [EMAIL PROTECTED] wrote:


Hello, Andreas, I too am having exactly the same issue as you do.
Comparing my partitioned and plain table performance, I've found that
the plain tables perform about 25% faster than partitioned table. Using
'explain select ...', I see that constraints are being used so in
partitioned tables fewer rows are examined. But still partitioned tables
are 25% slower, what a let down.
  

That's a little bit harsh.  The main use of partitioning is not to
make the table faster but to make the maintenance easier.  When
constraint exclusion works well for a particular query you can get a
small boost but many queries will break down in a really negative way.
 So, you are sacrificing flexibility for easier maintenance.  You have
to really be careful how you use it.

The best case for partitioning is when you can logically divide up
your data so that you really only have to deal with one sliver of it
at a time...for joins and such.  If the OP could force the constraint
exclusion (maybe by hashing the timestamp down to a period and using
that for where clause), his query would be fine.  The problem is it's
not always easy to do that.



Agree++

I've been testing partitioning for a zip code lookup thing that was
posted here earlier, and I partitioned a 10,000,000 row set into about
400 partitions.  I found that selecting a range of areas defined by x/y
coordinates was faster without any indexes.  The same selection with one
big table and one big (x,y) index took 3 to 10 seconds typically, same
select against the partitions with no indexes took 0.2 to 0.5 seconds.

For that particular application, the only way to scale it was with
partitioning.
  
In my particular case, I have 2 million records uniformly split up in 40 
partitions. It's ranged data varying with time, each partition has one 
month of data. Do you think this is a good candidate to seek performance 
boost with partitioned tables?


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Query performance problems with partitioned tables

2007-05-08 Thread Scott Marlowe
On Tue, 2007-05-08 at 13:41, Fei Liu wrote:
 Scott Marlowe wrote:
  On Thu, 2007-05-03 at 21:37, Merlin Moncure wrote:

  On 5/3/07, Fei Liu [EMAIL PROTECTED] wrote:
  
  Hello, Andreas, I too am having exactly the same issue as you do.
  Comparing my partitioned and plain table performance, I've found that
  the plain tables perform about 25% faster than partitioned table. Using
  'explain select ...', I see that constraints are being used so in
  partitioned tables fewer rows are examined. But still partitioned tables
  are 25% slower, what a let down.

  That's a little bit harsh.  The main use of partitioning is not to
  make the table faster but to make the maintenance easier.  When
  constraint exclusion works well for a particular query you can get a
  small boost but many queries will break down in a really negative way.
   So, you are sacrificing flexibility for easier maintenance.  You have
  to really be careful how you use it.
 
  The best case for partitioning is when you can logically divide up
  your data so that you really only have to deal with one sliver of it
  at a time...for joins and such.  If the OP could force the constraint
  exclusion (maybe by hashing the timestamp down to a period and using
  that for where clause), his query would be fine.  The problem is it's
  not always easy to do that.
  
 
  Agree++
 
  I've been testing partitioning for a zip code lookup thing that was
  posted here earlier, and I partitioned a 10,000,000 row set into about
  400 partitions.  I found that selecting a range of areas defined by x/y
  coordinates was faster without any indexes.  The same selection with one
  big table and one big (x,y) index took 3 to 10 seconds typically, same
  select against the partitions with no indexes took 0.2 to 0.5 seconds.
 
  For that particular application, the only way to scale it was with
  partitioning.

 In my particular case, I have 2 million records uniformly split up in 40 
 partitions. It's ranged data varying with time, each partition has one 
 month of data. Do you think this is a good candidate to seek performance 
 boost with partitioned tables?

That really really really depends on your access patterns.  IF you
typically access them by certain date ranges, then partitioning is
almost always a win.  If you have enough requests that don't select a
range of dates, it might wind up being slow.

There are other advantages to partitioning though, such as ease of
maintenance, being able to do partial backups easily, archiving old
partitions, placing the more active partitions on faster storage.

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] DISTINCT Question

2007-05-08 Thread Y Sidhu

Does using DISTINCT in a query force PG to abandon any index search it might
have embarked upon?

--
Yudhvir Singh Sidhu
408 375 3134 cell


Re: [PERFORM] DISTINCT Question

2007-05-08 Thread Steinar H. Gunderson
On Tue, May 08, 2007 at 12:52:35PM -0700, Y Sidhu wrote:
 Does using DISTINCT in a query force PG to abandon any index search it might
 have embarked upon?

No.

If you need help with a specific query, please post it, along with your table
definitions and EXPLAIN ANALYZE output.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] DISTINCT Question

2007-05-08 Thread Joshua D. Drake

Y Sidhu wrote:
Does using DISTINCT in a query force PG to abandon any index search it 
might have embarked upon?


Depends on the where clause.



--
Yudhvir Singh Sidhu
408 375 3134 cell



--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] DISTINCT Question

2007-05-08 Thread Scott Marlowe
On Tue, 2007-05-08 at 14:52, Y Sidhu wrote:
 Does using DISTINCT in a query force PG to abandon any index search it
 might have embarked upon?

 explain analyze select distinct request from businessrequestsummary
where lastflushtime between now() - interval '30 minutes' and now();

   QUERY PLAN   

  
-
 Unique  (cost=3.04..3.04 rows=1 width=17) (actual time=110.565..162.630
rows=75 loops=1)
   -  Sort  (cost=3.04..3.04 rows=1 width=17) (actual
time=110.555..135.252 rows=6803 loops=1)
 Sort Key: request
 -  Index Scan using businessrequestsummary_lastflushtime_dx on
businessrequestsummary  (cost=0.01..3.03 rows=1 width=17) (actual
time=0.063..59.674 rows=6803 loops=1)
   Index Cond: ((lastflushtime = (now() -
'00:30:00'::interval)) AND (lastflushtime = now()))
 Total runtime: 163.925 ms
(6 rows)

I'd say no. 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Daniel Griscom
I'm building a kiosk with a 3D front end accessing PostGIS/PostgreSQL 
via Apache/PHP. The 3D display is supposed to show smooth motion from 
location to location, with PostGIS giving dynamically updated 
information on the locations. Everything runs on the same machine, 
and it all works, but when I start a query the 3D display stutters 
horribly. It looks like PostgreSQL grabs hold of the CPU and doesn't 
let go until it's completed the query.


I don't need the PostgreSQL query to return quickly, but I must 
retain smooth animation while the query is being processed. In other 
words, I need PostgreSQL to spread out its CPU usage so that it 
doesn't monopolize the CPU for any significant time (more than 50ms 
or so).


Possible solutions:

1: Set the PostgreSQL task priority lower than the 3D renderer task, 
and to make sure that the 3D renderer sleep()s enough to let 
PostgreSQL get its work done. The obvious objection to this obvious 
solution is Priority inversion!, but I see that as an additional 
challenge to be surmounted rather than an absolute prohibition. So, 
any thoughts on setting the PostgreSQL task priority (including by 
the much-maligned tool shown at 
http://weblog.bignerdranch.com/?p=11)?


2: Some variation of the Cost-Based Vacuum Delay. Hypothetically, 
this would have the PostgreSQL task sleep() periodically while 
processing the query, allowing the 3D renderer to continue working at 
a reduced frame rate. My understanding, however, is that this only 
works during VACUUM and ANALYZE commands, so it won't help during my 
SELECT commands. So, any thoughts on using Cost-Based Vacuum Delay as 
a Cost-Based Select Delay?


3: ... some other solution I haven't thought of.


Any thoughts, suggestions, ideas?


Thanks,
Dan

--
Daniel T. Griscom [EMAIL PROTECTED]
Suitable Systems  http://www.suitable.com/
1 Centre Street, Suite 204(781) 665-0053
Wakefield, MA  01880-2400

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Charles Sprickman

On Tue, 8 May 2007, [EMAIL PROTECTED] wrote:

one issue with journaling filesystems, if you journal the data as well as the 
metadata you end up with a very reliable setup, however it means that all 
your data needs to be written twice, oncce to the journal, and once to the 
final location. the write to the journal can be slightly faster then a normal 
write to the final location (the journal is a sequential write to an existing 
file), however the need to write twice can effectivly cut your disk I/O 
bandwidth in half when doing heavy writes. worse, when you end up writing mor 
ethen will fit in the journal (128M is the max for ext3) the entire system 
then needs to stall while the journal gets cleared to make space for the 
additional writes.


if you don't journal your data then you avoid the problems above, but in a 
crash you may find that you lost data, even though the filesystem is 'intact' 
according to fsck.


That sounds like an ad for FreeBSD and UFS2+Softupdates. :)

Metadata is as safe as it is in a journaling filesystem, but none of the 
overhead of journaling.


Charles


David Lang


Steve Atkins wrote:


 On May 7, 2007, at 2:55 PM, David Levy wrote:

  Hi,
   I am about to order a new server for my Postgres cluster. I will
  probably get a Dual Xeon Quad Core instead of my current Dual Xeon.
  Which OS would you recommend to optimize Postgres behaviour (i/o
  access, multithreading, etc) ?
   I am hesitating between Fedora Core 6, CentOS and Debian. Can anyone
  help with this ?

 Well, all three you mention are much the same, just with a different
 badge on the box, as far as performance is concerned. They're all
 going to be a moderately recent Linux kernel, with your choice
 of filesystems, so any choice between them is going to be driven
 more by available staff and support or personal preference.

 I'd probably go CentOS 5 over Fedora  just because Fedora doesn't
 get supported for very long - more of an issue with a dedicated
 database box with a long lifespan than your typical desktop or
 interchangeable webserver.

 I might also look at Solaris 10, though. I've yet to play with it much,
 but it
 seems nice, and I suspect it might manage 8 cores better than current
 Linux setups.

 Cheers,
   Steve



 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings



Regards

Ian

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Steinar H. Gunderson
On Tue, May 08, 2007 at 04:27:10PM -0400, Daniel Griscom wrote:
 3: ... some other solution I haven't thought of.

On a wild guess, could you try setting the CPU costs higher, to make the
planner choose a less CPU-intensive plan?

Other (weird) suggestions would include calling a user-defined function that
sleep()ed for you between every row. Or use a dual-core system. :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Bill Moran
In response to Daniel Griscom [EMAIL PROTECTED]:

 I'm building a kiosk with a 3D front end accessing PostGIS/PostgreSQL 
 via Apache/PHP. The 3D display is supposed to show smooth motion from 
 location to location, with PostGIS giving dynamically updated 
 information on the locations. Everything runs on the same machine, 
 and it all works, but when I start a query the 3D display stutters 
 horribly. It looks like PostgreSQL grabs hold of the CPU and doesn't 
 let go until it's completed the query.
 
 I don't need the PostgreSQL query to return quickly, but I must 
 retain smooth animation while the query is being processed. In other 
 words, I need PostgreSQL to spread out its CPU usage so that it 
 doesn't monopolize the CPU for any significant time (more than 50ms 
 or so).
 
 Possible solutions:
 
 1: Set the PostgreSQL task priority lower than the 3D renderer task, 
 and to make sure that the 3D renderer sleep()s enough to let 
 PostgreSQL get its work done. The obvious objection to this obvious 
 solution is Priority inversion!, but I see that as an additional 
 challenge to be surmounted rather than an absolute prohibition. So, 
 any thoughts on setting the PostgreSQL task priority (including by 
 the much-maligned tool shown at 
 http://weblog.bignerdranch.com/?p=11)?

If it's all PostgreSQL processes that you want take a backseat to your
rendering process, why not just nice the initial PostgreSQL daemon?  All
children will inherit the nice value, and there's no chance of priority
inversion because all the PostgreSQL backends are running at the same
priority.

Just a thought.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread david

On Tue, 8 May 2007, Daniel Griscom wrote:

I'm building a kiosk with a 3D front end accessing PostGIS/PostgreSQL via 
Apache/PHP. The 3D display is supposed to show smooth motion from location to 
location, with PostGIS giving dynamically updated information on the 
locations. Everything runs on the same machine, and it all works, but when I 
start a query the 3D display stutters horribly. It looks like PostgreSQL 
grabs hold of the CPU and doesn't let go until it's completed the query.


I don't need the PostgreSQL query to return quickly, but I must retain smooth 
animation while the query is being processed. In other words, I need 
PostgreSQL to spread out its CPU usage so that it doesn't monopolize the CPU 
for any significant time (more than 50ms or so).


Possible solutions:

1: Set the PostgreSQL task priority lower than the 3D renderer task, and to 
make sure that the 3D renderer sleep()s enough to let PostgreSQL get its work 
done. The obvious objection to this obvious solution is Priority 
inversion!, but I see that as an additional challenge to be surmounted 
rather than an absolute prohibition. So, any thoughts on setting the 
PostgreSQL task priority (including by the much-maligned tool shown at 
http://weblog.bignerdranch.com/?p=11)?


this may or may not help


3: ... some other solution I haven't thought of.


take a look at the scheduler discussion that has been takeing place on the 
linux-kernel list. there are a number of things being discussed specificly 
to address the type of problems that you are running into (CPU hog causes 
latencies for graphics processes).


it looks like nothing will go into the 2.6.22 kernel officially, but if 
you are willing to test the begezzes out of it before you depend on it, I 
suspect that either the SD or CFS schedulers will clean things up for you.


David Lang

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Mark Lewis
1. If you go the route of using nice, you might want to run the 3D
front-end at a higher priority instead of running PG at a lower
priority.  That way apache, php and the other parts all run at the same
priority as PG and just the one task that you want to run smoothly is
elevated.

2. You may not even need separate priorities if you're running on Linux
with a recent kernel and you enable the sleep() calls that you would
need anyway for solution #1 to work.  This is because Linux kernels are
getting pretty good nowadays about rewarding tasks with a lot of sleeps,
although there are some further kernel changes still under development
that look even more promising.

-- Mark

On Tue, 2007-05-08 at 16:27 -0400, Daniel Griscom wrote:
 I'm building a kiosk with a 3D front end accessing PostGIS/PostgreSQL 
 via Apache/PHP. The 3D display is supposed to show smooth motion from 
 location to location, with PostGIS giving dynamically updated 
 information on the locations. Everything runs on the same machine, 
 and it all works, but when I start a query the 3D display stutters 
 horribly. It looks like PostgreSQL grabs hold of the CPU and doesn't 
 let go until it's completed the query.
 
 I don't need the PostgreSQL query to return quickly, but I must 
 retain smooth animation while the query is being processed. In other 
 words, I need PostgreSQL to spread out its CPU usage so that it 
 doesn't monopolize the CPU for any significant time (more than 50ms 
 or so).
 
 Possible solutions:
 
 1: Set the PostgreSQL task priority lower than the 3D renderer task, 
 and to make sure that the 3D renderer sleep()s enough to let 
 PostgreSQL get its work done. The obvious objection to this obvious 
 solution is Priority inversion!, but I see that as an additional 
 challenge to be surmounted rather than an absolute prohibition. So, 
 any thoughts on setting the PostgreSQL task priority (including by 
 the much-maligned tool shown at 
 http://weblog.bignerdranch.com/?p=11)?
 
 2: Some variation of the Cost-Based Vacuum Delay. Hypothetically, 
 this would have the PostgreSQL task sleep() periodically while 
 processing the query, allowing the 3D renderer to continue working at 
 a reduced frame rate. My understanding, however, is that this only 
 works during VACUUM and ANALYZE commands, so it won't help during my 
 SELECT commands. So, any thoughts on using Cost-Based Vacuum Delay as 
 a Cost-Based Select Delay?
 
 3: ... some other solution I haven't thought of.
 
 
 Any thoughts, suggestions, ideas?
 
 
 Thanks,
 Dan
 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?

2007-05-08 Thread Y Sidhu

I am trying to follow a message thread. One guy says we should be running
vacuum analyze daily and the other says we should be running vacuum multiple
times a day. I have tried looking for what a vacuum analyze is to help me
understand but no luck.

--
Yudhvir Singh Sidhu
408 375 3134 cell


Re: [PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?

2007-05-08 Thread david

On Tue, 8 May 2007, Y Sidhu wrote:


I am trying to follow a message thread. One guy says we should be running
vacuum analyze daily and the other says we should be running vacuum multiple
times a day. I have tried looking for what a vacuum analyze is to help me
understand but no luck.


vaccum frees tuples that are no longer refrenced
vaccum analyse does the same thing, but then does some additional 
information gathering about what data is in the tables Postgres uses this 
data to adjust it's estimates of how long various things will take 
(sequential scan, etc). if these estimates are off by a huge amount 
(especially if you have never done a vaccum analyse after loading your 
table) then it's very likely that postgres will be slow becouse it's doing 
expensive operations that it thinks are cheap.


David Lang

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?

2007-05-08 Thread Alvaro Herrera
Y Sidhu escribió:
 I am trying to follow a message thread. One guy says we should be running
 vacuum analyze daily and the other says we should be running vacuum multiple
 times a day. I have tried looking for what a vacuum analyze is to help me
 understand but no luck.

VACUUM ANALYZE is like VACUUM, except that it also runs an ANALYZE
afterwards.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?

2007-05-08 Thread Steinar H. Gunderson
On Tue, May 08, 2007 at 05:52:13PM -0400, Alvaro Herrera wrote:
 I am trying to follow a message thread. One guy says we should be running
 vacuum analyze daily and the other says we should be running vacuum multiple
 times a day. I have tried looking for what a vacuum analyze is to help me
 understand but no luck.
 VACUUM ANALYZE is like VACUUM, except that it also runs an ANALYZE
 afterwards.

Shoot me if I'm wrong here, but doesn't VACUUM ANALYZE check _all_ tuples,
as compared to the random selection employed by ANALYZE?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?

2007-05-08 Thread Alvaro Herrera
Steinar H. Gunderson wrote:
 On Tue, May 08, 2007 at 05:52:13PM -0400, Alvaro Herrera wrote:
  I am trying to follow a message thread. One guy says we should be running
  vacuum analyze daily and the other says we should be running vacuum 
  multiple
  times a day. I have tried looking for what a vacuum analyze is to help me
  understand but no luck.
  VACUUM ANALYZE is like VACUUM, except that it also runs an ANALYZE
  afterwards.
 
 Shoot me if I'm wrong here, but doesn't VACUUM ANALYZE check _all_ tuples,
 as compared to the random selection employed by ANALYZE?

You are wrong, but it won't be me the one to shoot you.

There have been noises towards making the ANALYZE portion use the same
scan that VACUUM already does, but nobody has written the code (it would
be useful for some kinds of stats).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Carlos Moreno

Steinar H. Gunderson wrote:

Or use a dual-core system. :-)


Am I missing something??  There is just *one* instance of this idea in, 
what,
four replies??  I find it so obvious, and so obviously the only solution 
that

has any hope to work, that it makes me think I'm missing something ...

Is it that multiple PostgreSQL processes will end up monopolizing as many
CPU cores as you give it?  (ok, that would suck, for sure  :-))

If there is a way to guarantee (or at least to encourage) that PG will 
not use

more than one, or even two cores, then a quad-core machine looks like a
promising solution...  One thing feels kind of certain to me:  the kind of
system that the OP describes has a most-demanding need for *extremely
high* CPU power --- multi-core, or multi-CPU, would seem the better
solution anyway, since it promotes responsiveness more than raw CPU
power.

Carlos
--


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Joshua D. Drake

Carlos Moreno wrote:

Steinar H. Gunderson wrote:

Or use a dual-core system. :-)


Am I missing something??  There is just *one* instance of this idea in, 
what,
four replies??  I find it so obvious, and so obviously the only solution 
that

has any hope to work, that it makes me think I'm missing something ...

Is it that multiple PostgreSQL processes will end up monopolizing as many
CPU cores as you give it?  (ok, that would suck, for sure  :-))



PostgreSQL is process based, so if you have one query that is eating a 
lot of cpu, it is only one cpu... you would have another for your render 
to run on.


Joshua D. Drake



If there is a way to guarantee (or at least to encourage) that PG will 
not use

more than one, or even two cores, then a quad-core machine looks like a
promising solution...  One thing feels kind of certain to me:  the kind of
system that the OP describes has a most-demanding need for *extremely
high* CPU power --- multi-core, or multi-CPU, would seem the better
solution anyway, since it promotes responsiveness more than raw CPU
power.

Carlos
--


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Steinar H. Gunderson
On Tue, May 08, 2007 at 06:32:14PM -0400, Carlos Moreno wrote:
 Or use a dual-core system. :-)
 Am I missing something??  There is just *one* instance of this idea in,
 what, four replies??  I find it so obvious, and so obviously the only
 solution that has any hope to work, that it makes me think I'm missing
 something ...

Actually, it should be added that this suggestion was only partially
tongue-in-cheek. I wrote a 3D application as part of an internship a couple
of years ago, and it had a problem that worked vaguely like the given
scenario: Adding a background task (in this case the task that loaded in new
pieces of terrain) would kill the framerate for the user, but nicing down
(actually, down-prioritizing, as this was on Windows) the back-end would
starve it completely of cycles. The solution was to just define that this
would only be run on multiprocessor systems, where both tasks would chug
along nicely :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Carlos Moreno

Joshua D. Drake wrote:
Am I missing something??  There is just *one* instance of this idea 
in, what,
four replies??  I find it so obvious, and so obviously the only 
solution that

has any hope to work, that it makes me think I'm missing something ...

Is it that multiple PostgreSQL processes will end up monopolizing as 
many

CPU cores as you give it?  (ok, that would suck, for sure  :-))


PostgreSQL is process based, so if you have one query that is eating a 
lot of cpu, it is only one cpu... you would have another for your 
render to run on.


There is still the issue that there could be several (many?) queries 
running
concurrently --- but that's much easier to control at the application 
level;

so maybe simply using a multi-CPU/multi-core hardware would be the
simplest solution?

Carlos
--


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Daniel Griscom
Thanks for all the feedback. Unfortunately I didn't specify that this 
is running on a WinXP machine (the 3D renderer is an ActiveX plugin), 
and I don't even think nice is available. I've tried using the 
Windows Task Manager to set every postgres.exe process to a low 
priority, but that didn't make a difference.


Several people have mentioned having multiple processors; my current 
machine is a uni-processor machine, but I believe we could spec the 
actual runtime machine to have multiple processors/cores. I'm only 
running one query at a time; would that query be guaranteed to 
confine itself to a single processor/core?


In terms of performance, I don't think simply more power will do the 
trick; I've got an AMD 3200+, and even doubling the power/halving the 
stutter time won't be good enough.


Someone suggested setting the CPU costs higher; where would I learn 
about that?


Someone else mentioned having a custom function that sleep()ed on 
every row access; where would I learn more about that?


I've also been reading up on VACUUM. I haven't explicitly run it in 
the several days since I've installed the database (by loading a 
humongous data.sql file); might this be part of the performance 
problem?



Thanks again,
Dan

--
Daniel T. Griscom [EMAIL PROTECTED]
Suitable Systems  http://www.suitable.com/
1 Centre Street, Suite 204(781) 665-0053
Wakefield, MA  01880-2400

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread david

On Tue, 8 May 2007, Carlos Moreno wrote:


Daniel Griscom wrote:


 Several people have mentioned having multiple processors; my current
 machine is a uni-processor machine, but I believe we could spec the actual
 runtime machine to have multiple processors/cores. 


My estimate is that yes, you should definitely consider that.


 I'm only running one query at a time; would that query be guaranteed to
 confine itself to a single processor/core?


From what Joshua mentions, looks like you do have that guarantee.


isn't there a way to limit how many processes postgres will create?

if this is limited to 1, what happens when a vaccum run hits (or 
autovaccum)


David Lang


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM]

2007-05-08 Thread Orhan Aglagul
Hi Everybody,

I was trying to see how many inserts per seconds my application could
handle on various machines.

Those are the machines I used to run my app:

 

1)   Pentium M 1.7Ghz

2)   Pentium 4 2.4 Ghz

3)   DMP Xeon 3Ghz

 

Sure, I was expecting the dual Zeon to outperform the Pentium M and 4.
But the data showed the opposite.

So, I wrote a simple program (in C) using the libpq.so.5 which opens a
connection to the database (DB in localhost), 

Creates a Prepared statement for the insert and does a 10,000 insert.
The result did not change.

 

Only after setting fsync to off in the config file, the amount of time
to insert 10,000 records was acceptable.

 

Here is the data:



Time for 1 inserts

Fsync=on

Fsync=off

Pentium M 1.7

~17 sec

~6 sec

Pentium 4 2.4

~13 sec

~11 sec

Dual Xeon

~65 sec

~1.9 sec

 

I read that postgres does have issues with MP Xeon (costly context
switching). But I still think that with fsync=on 65 seconds is
ridiculous. 

 

Can anybody direct me to some improved/acceptable  performance with
fsync=on?

 

Thx,

 

Orhan a.



Re: [PERFORM]

2007-05-08 Thread Dan Harris

Orhan Aglagul wrote:

Hi Everybody,

I was trying to see how many inserts per seconds my application could 
handle on various machines.



I read that postgres does have issues with MP Xeon (costly context 
switching). But I still think that with fsync=on 65 seconds is ridiculous.


CPU is unlikely your bottleneck..  You failed to mention anything about your I/O 
setup.  More details in this regard will net you better responses.  However, an 
archive search for insert performance will probably be worthwhile, since this 
type of question is repeated about once a month.





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM]

2007-05-08 Thread Carlos Moreno

Joshua D. Drake wrote:


CPU is unlikely your bottleneck..  You failed to mention anything 
about your I/O setup.  [...]


He also fails to mention if he is doing the inserts one at a time or 
as batch.


Would this really be important?  I mean, would it affect a *comparison*?? 
As long as he does it the same way for all the hardware setups, seems ok

to me.

Carlos
--


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM]

2007-05-08 Thread Scott Marlowe
On Tue, 2007-05-08 at 17:59, Orhan Aglagul wrote:
 Hi Everybody,
 
 I was trying to see how many inserts per seconds my application could
 handle on various machines.

 
 Here is the data:
 
 
 
 Time for 1 inserts
 
 Fsync=on
 
 Fsync=off
 
 Pentium M 1.7
 
 ~17 sec
 
 ~6 sec
 
 Pentium 4 2.4
 
 ~13 sec
 
 ~11 sec
 
 Dual Xeon
 
 ~65 sec
 
 ~1.9 sec
 
 
  

In addition to my previous post, if you see that big a change between
fsync on and off, you likely have a drive subsystem that is actually
reporting fsync properly.

The other two machines are lying.  Or they have a battery backed caching
raid controller

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM]

2007-05-08 Thread Orhan Aglagul
Forgot to reply to the mailing list. Sorry (new here)
Here are responses to previous questions

-Original Message-
From: Orhan Aglagul 
Sent: Tuesday, May 08, 2007 5:30 PM
To: 'Joshua D. Drake'
Subject: RE: [PERFORM]

I am using a prepared statement and inserting in a loop 10,000 records. 
I need the data real time, so I am not using batch inserts. I have to
run each insert as a separate transaction 
I am running the app on a RH EL4 (Kernel 2.6.20). 
In fact my CPU usage is too low when running the app with fsync=off. 

Here is the output of vmstat during the test:
First 10 lines:


r  b  swpd   free   buff  cache   si   sobibo   incs us sy
id wa
 0  1   0 1634144  21828 2347520032   408  210   404  0  0
90  9
 0  1   0 1634020  21828 23481600 0  1404  538  1879  0  0
50 50
 0  1   0 1633896  21828 23494000 0  1400  525  1849  0  0
50 49
 0  1   0 1633772  21828 23504800 0  1412  537  1878  0  0
50 50
 0  1   0 1633648  21832 23516800 0  1420  531  1879  0  0
50 50
 0  1   0 1633524  21840 23528000 0  1420  535  1884  0  0
50 50
 0  1   0 1633524  21844 23540000 0  1396  535  1718  0  0
50 50
 0  1   0 1633524  21848 23552400 0  1536  561  1127  0  0
50 50
 0  1   0 1633524  21852 23564400 0  1412  557  1390  0  0
50 50
 0  1   0 1633268  21860 23572800 0  1408  582  1393  0  0
50 50
 0  1   0 1633268  21868 23584400 0  1424  548  1377  1  4
50 45
 1  0   0 1633144  21876 23596800 0  1404  548  1394 14  4
48 34
 0  1   0 1633020  21884 23608400 0  1420  540  1374  5  0
50 46
...

The logical volume is an ext3 file system. That's where all the database
files reside. (No hardware optimization done). 

Sorry for the delay,
Thanks..


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Joshua D.
Drake
Sent: Tuesday, May 08, 2007 5:05 PM
To: Dan Harris
Cc: PostgreSQL Performance
Subject: Re: [PERFORM]

Dan Harris wrote:
 Orhan Aglagul wrote:
 Hi Everybody,

 I was trying to see how many inserts per seconds my application could

 handle on various machines.


 I read that postgres does have issues with MP Xeon (costly context 
 switching). But I still think that with fsync=on 65 seconds is 
 ridiculous.
 
 CPU is unlikely your bottleneck..  You failed to mention anything
about 
 your I/O setup.  More details in this regard will net you better 
 responses.  However, an archive search for insert performance will 
 probably be worthwhile, since this type of question is repeated about 
 once a month.

He also fails to mention if he is doing the inserts one at a time or as 
batch.


Joshua D. Drake

 
 
 
 
 ---(end of
broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 


-- 

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
  http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


FW: [PERFORM]

2007-05-08 Thread Orhan Aglagul


-Original Message-
From: Orhan Aglagul 
Sent: Tuesday, May 08, 2007 5:37 PM
To: 'Scott Marlowe'
Subject: RE: [PERFORM]

But 10,000 records in 65 sec comes to ~153 records per second. On a dual
3.06 Xeon
What range is acceptable?

-Original Message-
From: Scott Marlowe [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 08, 2007 5:31 PM
To: Orhan Aglagul
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM]

On Tue, 2007-05-08 at 17:59, Orhan Aglagul wrote:
 Hi Everybody,
 
 I was trying to see how many inserts per seconds my application could
 handle on various machines.

 
 Here is the data:
 
 
 
 Time for 1 inserts
 
 Fsync=on
 
 Fsync=off
 
 Pentium M 1.7
 
 ~17 sec
 
 ~6 sec
 
 Pentium 4 2.4
 
 ~13 sec
 
 ~11 sec
 
 Dual Xeon
 
 ~65 sec
 
 ~1.9 sec
 
 
  

In addition to my previous post, if you see that big a change between
fsync on and off, you likely have a drive subsystem that is actually
reporting fsync properly.

The other two machines are lying.  Or they have a battery backed caching
raid controller

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


FW: [PERFORM]

2007-05-08 Thread Orhan Aglagul

No, it is one transaction per insert. 

-Original Message-
From: Scott Marlowe [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 08, 2007 5:38 PM
To: Orhan Aglagul
Subject: RE: [PERFORM]

On Tue, 2007-05-08 at 19:36, Orhan Aglagul wrote:
 But 10,000 records in 65 sec comes to ~153 records per second. On a
dual
 3.06 Xeon
 What range is acceptable?

If you're doing that in one big transaction, that's horrible.  Because
it shouldn't be waiting for each insert to fsync, but the whole
transaction.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: FW: [PERFORM]

2007-05-08 Thread david

On Tue, 8 May 2007, Orhan Aglagul wrote:


No, it is one transaction per insert.

-Original Message-
From: Scott Marlowe [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 08, 2007 5:38 PM
To: Orhan Aglagul
Subject: RE: [PERFORM]

On Tue, 2007-05-08 at 19:36, Orhan Aglagul wrote:

But 10,000 records in 65 sec comes to ~153 records per second. On a

dual

3.06 Xeon
What range is acceptable?


If you're doing that in one big transaction, that's horrible.  Because
it shouldn't be waiting for each insert to fsync, but the whole
transaction.


with a standard 7200 rpm drive ~150 transactions/sec sounds about right

to really speed things up you want to get a disk controller with a battery 
backed cache so that the writes don't need to hit the disk to be safe.


that should get your speeds up to (and possibly above) what you got by 
turning fsync off.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Greg Smith

On Tue, 8 May 2007, Tom Lane wrote:


What Debian has done is set up an arrangement that lets you run two (or
more) different PG versions in parallel.  Since that's amazingly helpful
during a major-PG-version upgrade, most of the other packagers are
scheming how to do something similar.


I alluded to that but it is worth going into more detail on for those not 
familiar with this whole topic.  I normally maintain multiple different PG 
versions in parallel already, mostly using environment variables to switch 
between them with some shell code.  Debian has taken an approach where 
commands like pg_ctl are wrapped in multi-version/cluster aware scripts, 
so you can do things like restarting multiple installations more easily 
than that.


My issue wasn't with the idea, it was with the implementation.  When I 
have my newbie hat on, it adds a layer of complexity that isn't needed for 
simple installs.  And when I have my developer hat on, I found that need 
to conform to the requirements of that system on top of Debian's already 
unique install locations and packaging issues just made it painful to 
build and work with with customized versions of Postgres, compared to 
distributions that use a relatively simple packaging scheme (like the RPM 
based RedHat or SuSE).


I hope anyone else working this problem is thinking about issues like 
this.  Debian's approach strikes me as being a good one for a seasoned 
systems administrator or DBA, which is typical for them.  I'd hate to see 
a change in this area make it more difficult for new users though, as 
that's already perceived as a PG weakness.  I think you can build a layer 
that adds the capability for the people who need it without complicating 
things for people who don't.


and if someday you want commercial support for your OS, a Centos-RHEL 
update will get you there easily.


For those that like to live dangerously, it's also worth mentioning that 
it's possible to hack this conversion in either direction without actually 
doing an OS re-install/upgrade just by playing with the packages that are 
different between the two.  So someone who installs CentOS now could swap 
to RHEL very quickly in a pinch if they have enough cojones to do the 
required package substitutions.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Greg Smith

On Tue, 8 May 2007, Luke Lonergan wrote:

From discussions with the developers, the biggest issue is a technical 
one: the Linux VFS layer makes the [ZFS] port difficult.


Difficult on two levels.  First you'd have to figure out how to make it 
work at all; then you'd have to reshape it into a form that it would be 
acceptable to the Linux kernel developers, who haven't seemed real keen on 
the idea so far.


The standard article I'm you've already seen this week on this topic is 
Jeff Bonwick's at 
http://blogs.sun.com/bonwick/entry/rampant_layering_violation


What really bugged me was his earlier article linked to there where he 
talks about how ZFS eliminates the need for hardware RAID controllers:

http://blogs.sun.com/bonwick/entry/raid_z

While there may be merit to that idea for some applications, like 
situations where you have a pig of a RAID5 volume, that's just hype for 
database writes.  We issue the SYNCHRONIZE CACHE command to the disks 
after pushing all data in a transaction group--see, that would be the 
part the hardware controller is needed to accelerate.  If you really care 
about whether your data hit disk, there is no way to break the RPM barrier 
without hardware support.  The fact that he misunderstands such a 
fundamental point makes me wonder what other gigantic mistakes might be 
buried in his analysis.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Luke Lonergan
You can use the workload management feature that we've contributed to
Bizgres.  That allows you to control the level of statement concurrency by
establishing queues and associating them with roles.

That would provide the control you are seeking.

- Luke


On 5/8/07 4:24 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 On Tue, 8 May 2007, Carlos Moreno wrote:
 
 Daniel Griscom wrote:
 
  Several people have mentioned having multiple processors; my current
  machine is a uni-processor machine, but I believe we could spec the actual
  runtime machine to have multiple processors/cores.
 
 My estimate is that yes, you should definitely consider that.
 
  I'm only running one query at a time; would that query be guaranteed to
  confine itself to a single processor/core?
 
 From what Joshua mentions, looks like you do have that guarantee.
 
 isn't there a way to limit how many processes postgres will create?
 
 if this is limited to 1, what happens when a vaccum run hits (or
 autovaccum)
 
 David Lang
 
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM]

2007-05-08 Thread Greg Smith

On Tue, 8 May 2007, Orhan Aglagul wrote:


Time for 1 inserts
Pentium M 1.7
~17 sec fsync=on
~6 sec fsync=off


This is 588 inserts/second with fsync on.  It's impossible to achieve that 
without write caching at either the controller or hard drive.  My bet 
would be that your hard drive in this system is a regular IDE/SATA drive 
that has write caching enabled, which is the normal case.  That means this 
system doesn't really do a fsync when you tell it to.



Pentium 4 2.4
~13 sec fsync=on
~11 sec fsync=off


Same response here.  Odds are good the fsync=on numbers here are a 
fantasy; unless you have some serious disk hardware in this server, it 
can't really be doing an fsync and giving this performance level.



Dual Xeon
~65 sec fsync=on
~1.9 sec fsync=off


Now this looks reasonable.  5263/second with fsync off, 154/second with it 
on.  This system appears to have hard drives in it that correctly write 
data out when asked to via the fsync mechanism.  I would bet this one is a 
server that has some number of 10,000 RPM SCSI drives in it.  Such a drive 
gives a theoretical maximum of 166.7 inserts/second if the inserts are 
done one at a time.


If this all is confusing to you, I have written a long primer on this 
subject that explains how the interaction between the PostgreSQL, fsync, 
and the underlying drives work.  If you have the patience to work your way 
through it and follow the references along the way, I think you'll find 
the results you've been seeing will make more sense, and you'll be in a 
better position to figure out what you should do next:


http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Magnus Hagander
 Thanks for all the feedback. Unfortunately I didn't specify that this 
 is running on a WinXP machine (the 3D renderer is an ActiveX plugin), 
 and I don't even think nice is available. I've tried using the 
 Windows Task Manager to set every postgres.exe process to a low 
 priority, but that didn't make a difference.

Are you sure you're actually cpu limited? The windows schedules is actually 
pretty good at down shifting like that. It sounds like you might be i/o bound 
instead. Especially if you're on ide disks in this machine.

 Several people have mentioned having multiple processors; my current 
 machine is a uni-processor machine, but I believe we could spec the 
 actual runtime machine to have multiple processors/cores. I'm only 
 running one query at a time; would that query be guaranteed to 
 confine itself to a single processor/core?

Yes. Background processes can run on the other, like the background writer. 
They normally don't use a lot of cpu. You can avoid that as well by setting the 
cpu 
affinity on pg_ctl or postmaster.


 In terms of performance, I don't think simply more power will do the 
 trick; I've got an AMD 3200+, and even doubling the power/halving the 
 stutter time won't be good enough.

Again, make sure cpu really is the problem.

/Magnus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match