Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-16 Thread Ow Mun Heng

On Fri, 2007-11-09 at 16:41 +0100, Sebastian Hennebrueder wrote:

 If the queries are complex, this is understable. I had a performance
 review of a Hibernate project (Java Object Relation Mapping) using
 MySQL. ORM produces easily complex queries with joins and subqueries.
 MySQL uses nested loops for subqueries which lead to performance issues
 with growing database size.

Even for Postgresql, nested loops are still evil and hampers
performance.




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Curious about dead rows.

2007-11-16 Thread Brad Nicholson
On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote:
 Russell Smith [EMAIL PROTECTED] writes:
  It is possible that analyze is not getting the number of dead rows right?
 
 Hah, I think you are on to something.  ANALYZE is telling the truth
 about how many dead rows it saw, but its notion of dead is not good
 according to SnapshotNow.  Thus, rows inserted by a not-yet-committed
 transaction would be counted as dead.  So if these are background
 auto-analyzes being done in parallel with inserting transactions that
 run for awhile, seeing a few not-yet-committed rows would be
 unsurprising.
 
 I wonder if that is worth fixing?  I'm not especially concerned about
 the cosmetic aspect of it, but if we mistakenly launch an autovacuum
 on the strength of an inflated estimate of dead rows, that could be
 costly.

Sounds to me like that could result in autovacuum kicking off while
doing large data loads.  This sounds suspiciously like problem someone
on -novice was having - tripping over a windows autovac bug while doing
a data load

http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] autovacuum: recommended?

2007-11-16 Thread Gábor Farkas

hi,

we are moving one database from postgresql-7.4 to postgresql-8.2.4.

we have some cronjobs set up that vacuum the database (some tables more 
often, some tables less often). now, in pg82, there is the possibility 
of using the autovacuum.


my question is: is it recommended to use it? or in other words, should i 
only use autovacuum? or it's better to use manual-vacuuming? which one 
is the way of the future :) ? or should i use both auto-vacuum and 
manual-vacuum?


in other words, i'd like to find out, if we should simply stay with the 
vacuuming-cronjobs, or should we move to using auto-vacuum? and if we 
should move, should we try to set it up the way that no manual-vacuuming 
is used anymore?


thanks,
gabor

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] autovacuum: recommended?

2007-11-16 Thread hubert depesz lubaczewski
On Fri, Nov 16, 2007 at 10:40:43AM +0100, Gábor Farkas wrote:
 we are moving one database from postgresql-7.4 to postgresql-8.2.4.

any particular reason why not 8.2.5?
 
 my question is: is it recommended to use it? or in other words, should i 
 only use autovacuum? or it's better to use manual-vacuuming? which one 
 is the way of the future :) ? or should i use both auto-vacuum and 
 manual-vacuum?

autovacuum is definitely prefered (for most of the cases).

you might want to set vacuum delays though.

depesz

-- 
quicksil1er: postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV! :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-16 Thread Dave Dutcher
 -Original Message-
 From: Ow Mun Heng
 Subject: Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
 
 Even for Postgresql, nested loops are still evil and hampers 
 performance.


I don't know about that.  There are times when it is the right plan:
 

explain analyze select * from table1 t1 inner join table2 t2 on t1.f_id =
t2.id where t1.id = 'xyzzy';

 QUERY PLAN


 Nested Loop  (cost=0.00..17.65 rows=1 width=344) (actual time=0.080..0.096
rows=1 loops=1)
   -  Index Scan using table1_pkey on table1 t  (cost=0.00..9.18 rows=1
width=238) (actual time=0.044..0.048 rows=1 loops=1)
 Index Cond: ((id)::text = 'xyzzy'::text)
   -  Index Scan using table2_pkey on table2 i  (cost=0.00..8.46 rows=1
width=106) (actual time=0.019..0.023 rows=1 loops=1)
 Index Cond: (t.f_id = i.id)
 Total runtime: 0.224 ms


set enable_nestloop=off;
SET


explain analyze select * from table1 t1 inner join table2 t2 on t1.f_id =
t2.id where t1.id = 'xyzzy';

   QUERY PLAN


 Hash Join  (cost=9.18..72250.79 rows=1 width=344) (actual
time=13493.572..15583.049 rows=1 loops=1)
   Hash Cond: (i.id = t.f_id)
   -  Seq Scan on table2 i  (cost=0.00..61297.40 rows=2188840 width=106)
(actual time=0.015..8278.347 rows=2188840 loops=1)
   -  Hash  (cost=9.18..9.18 rows=1 width=238) (actual time=0.056..0.056
rows=1 loops=1)
 -  Index Scan using table1_pkey on table1 t  (cost=0.00..9.18
rows=1 width=238) (actual time=0.040..0.045 rows=1 loops=1)
   Index Cond: ((id)::text = 'xyzzy'::text)
 Total runtime: 15583.212 ms

(I changed the table names, but everything else is real.)



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Clustered/covering indexes (or lack thereof :-)

2007-11-16 Thread adrobj

This is probably a FAQ, but I can't find a good answer...

So - are there common techniques to compensate for the lack of
clustered/covering indexes in PostgreSQL? To be more specific - here is my
table (simplified):

topic_id int
post_id int
post_text varchar(1024)

The most used query is: SELECT post_id, post_text FROM Posts WHERE
topic_id=XXX. Normally I would have created a clustered index on topic_id,
and the whole query would take ~1 disk seek.

What would be the common way to handle this in PostgreSQL, provided that I
can't afford 1 disk seek per record returned?

-- 
View this message in context: 
http://www.nabble.com/Clustered-covering-indexes-%28or-lack-thereof-%3A-%29-tf4789321.html#a13700848
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Clustered/covering indexes (or lack thereof :-)

2007-11-16 Thread Heikki Linnakangas

adrobj wrote:

This is probably a FAQ, but I can't find a good answer...

So - are there common techniques to compensate for the lack of
clustered/covering indexes in PostgreSQL? To be more specific - here is my
table (simplified):

topic_id int
post_id int
post_text varchar(1024)

The most used query is: SELECT post_id, post_text FROM Posts WHERE
topic_id=XXX. Normally I would have created a clustered index on topic_id,
and the whole query would take ~1 disk seek.

What would be the common way to handle this in PostgreSQL, provided that I
can't afford 1 disk seek per record returned?


You can cluster the table, see 
http://www.postgresql.org/docs/8.2/interactive/sql-cluster.html.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] autovacuum: recommended?

2007-11-16 Thread tv
 That being said, we have some huge tables in our database and pretty
 much traffic, and got quite some performance problems when the
 autovacuum kicked in and started vacuuming those huge tables, so we're
 currently running without.  Autovacuum can be tuned to not touch those
 tables, but we've chosen to leave it off.

We had some performance problems with the autovacuum on large and
frequently modified tables too - but after a little bit of playing with
the parameters the overall performance is much better than it was before
the autovacuuming.

The table was quite huge (say 20k of products along with detailed
descriptions etc.) and was completely updated and about 12x each day, i.e.
it qrew to about 12x the original size (and 11/12 of the rows were dead).
This caused a serious slowdown of the application each day, as the
database had to scan 12x more data.

We set up autovacuuming with the default parameters, but it interfered
with the usual traffic - we had to play a little with the parameters
(increase the delays, decrease the duration or something like that) and
now it runs much better than before. No nightly vacuuming, no serious
performance degradation during the day, etc.

So yes - autovacuuming is recommended, but in some cases the default
parameters have to be tuned a little bit.

tomas


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-16 Thread Josh Trutwin
On Fri, 16 Nov 2007 11:06:11 -0500
Jonah H. Harris [EMAIL PROTECTED] wrote:

 On Nov 16, 2007 10:56 AM, Dave Dutcher [EMAIL PROTECTED] wrote:
  I don't know about that.  There are times when it is the right
  plan:
 
 Agreed.  IMHO, there's nothing wrong with nested-loop join as long
 as it's being used properly.

Can you explain further please?  (I'm not disagreeing with you, just
want to know when nested loops are not used properly - does the
planner make mistakes that you have to watch out for?)

Thx,

Josh

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Clustered/covering indexes (or lack thereof :-)

2007-11-16 Thread Bill Moran
In response to Jeff Davis [EMAIL PROTECTED]:

 On Sun, 2007-11-11 at 22:59 -0800, adrobj wrote:
  This is probably a FAQ, but I can't find a good answer...
  
  So - are there common techniques to compensate for the lack of
  clustered/covering indexes in PostgreSQL? To be more specific - here is my
  table (simplified):
  
  topic_id int
  post_id int
  post_text varchar(1024)
  
  The most used query is: SELECT post_id, post_text FROM Posts WHERE
  topic_id=XXX. Normally I would have created a clustered index on topic_id,
  and the whole query would take ~1 disk seek.
  
  What would be the common way to handle this in PostgreSQL, provided that I
  can't afford 1 disk seek per record returned?
  
 
 Periodically CLUSTER the table on the topic_id index. The table will not
 be perfectly clustered at all times, but it will be close enough that it
 won't make much difference.
 
 There's still the hit of performing a CLUSTER, however.
 
 Another option, if you have a relatively small number of topic_ids, is
 to break it into separate tables, one for each topic_id.

Or materialize the data, if performance is the utmost requirement.

Create second table:
materialized_topics (
 topic_id int,
 post_ids int[],
 post_texts text[]
)

Now add a trigger to your original table that updates materialized_topics
any time the first table is altered.  Thus you always have fast lookups.

Of course, this may be non-optimal if that table sees a lot of updates.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-16 Thread Jonah H. Harris
On Nov 16, 2007 10:56 AM, Dave Dutcher [EMAIL PROTECTED] wrote:
 I don't know about that.  There are times when it is the right plan:

Agreed.  IMHO, there's nothing wrong with nested-loop join as long as
it's being used properly.



-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Curious about dead rows.

2007-11-16 Thread Jean-David Beyer
Craig James wrote:
 Alvaro Herrera wrote:
 To recap:

 - your app only does inserts
 - there has been no rollback lately
 - there are no updates
 - there are no deletes

 The only other source of dead rows I can think is triggers ... do you
 have any?  (Not necessarily on this table -- perhaps triggers on other
 tables can cause updates on this one).

 Oh, rolled back COPY can cause dead rows too.
 
 
 What about an unreliable network that causes lot of disconnects? 
 Wouldn't the server process do a rollback?
 
Perhaps in theory, but in practice my client and the postgreSQL servers are
on the same machine and the 127.0.0.1 is pretty reliable:

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:30097919 errors:0 dropped:0 overruns:0 frame:0
  TX packets:30097919 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:931924602 (888.7 MiB)  TX bytes:931924602 (888.7 MiB)



-- 
  .~.  Jean-David Beyer  Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
 /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
 ^^-^^ 22:10:01 up 22 days, 15:28, 0 users, load average: 4.25, 4.21, 4.12

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-16 Thread Ivan Voras
Dimitri wrote:

 Reading this article I'm just happy for them to see progress done on FreeBSD 
 :-)
 As well to demonstrate OS parallelism it's not so impressive to see
 4CPU server results rather 8CPU or 32threaded Niagara... Don't know
 why they did not present similar performance graphs for these
 platform, strange no?...

Well, most of the results in the document
(http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf) are for
8-CPU machines, which is about the most you can get with off the shelf
hardware (2x4-core CPU, the document has both Xeon and Opteron results).
Niagara support is unfinished, so there's nothing to report there. On
the other hand, the document does compare between several versions of
Linux, FreeBSD, NetBSD and DragonflyBSD, with both MySQL and PostgreSQL,
so you can draw your conclusions (if any) from there.



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Clustered/covering indexes (or lack thereof :-)

2007-11-16 Thread Jeff Davis
On Sun, 2007-11-11 at 22:59 -0800, adrobj wrote:
 This is probably a FAQ, but I can't find a good answer...
 
 So - are there common techniques to compensate for the lack of
 clustered/covering indexes in PostgreSQL? To be more specific - here is my
 table (simplified):
 
 topic_id int
 post_id int
 post_text varchar(1024)
 
 The most used query is: SELECT post_id, post_text FROM Posts WHERE
 topic_id=XXX. Normally I would have created a clustered index on topic_id,
 and the whole query would take ~1 disk seek.
 
 What would be the common way to handle this in PostgreSQL, provided that I
 can't afford 1 disk seek per record returned?
 

Periodically CLUSTER the table on the topic_id index. The table will not
be perfectly clustered at all times, but it will be close enough that it
won't make much difference.

There's still the hit of performing a CLUSTER, however.

Another option, if you have a relatively small number of topic_ids, is
to break it into separate tables, one for each topic_id.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-16 Thread Jonah H. Harris
On Nov 16, 2007 3:36 PM, Josh Trutwin [EMAIL PROTECTED] wrote:
  Agreed.  IMHO, there's nothing wrong with nested-loop join as long
  as it's being used properly.

 Can you explain further please?  (I'm not disagreeing with you, just
 want to know when nested loops are not used properly - does the
 planner make mistakes that you have to watch out for?)

As long as statistics are updated properly, it's generally not an
issue.  You just don't want the system using a nested-loop join
incorrectly (like when table sizes are equal, the outer table is
larger than the inner table, or the inner table itself is overly
large).

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Curious about dead rows.

2007-11-16 Thread Merlin Moncure
On Nov 16, 2007 10:56 AM, Brad Nicholson [EMAIL PROTECTED] wrote:
 On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote:
  Russell Smith [EMAIL PROTECTED] writes:
   It is possible that analyze is not getting the number of dead rows right?
 
  Hah, I think you are on to something.  ANALYZE is telling the truth
  about how many dead rows it saw, but its notion of dead is not good
  according to SnapshotNow.  Thus, rows inserted by a not-yet-committed
  transaction would be counted as dead.  So if these are background
  auto-analyzes being done in parallel with inserting transactions that
  run for awhile, seeing a few not-yet-committed rows would be
  unsurprising.
 
  I wonder if that is worth fixing?  I'm not especially concerned about
  the cosmetic aspect of it, but if we mistakenly launch an autovacuum
  on the strength of an inflated estimate of dead rows, that could be
  costly.

 Sounds to me like that could result in autovacuum kicking off while
 doing large data loads.  This sounds suspiciously like problem someone
 on -novice was having - tripping over a windows autovac bug while doing
 a data load

 http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php

I am almost 100% I've seen this behavior in the field...

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Curious about dead rows.

2007-11-16 Thread Kevin Grittner
 On Fri, Nov 16, 2007 at  4:01 PM, in message
[EMAIL PROTECTED], Merlin Moncure
[EMAIL PROTECTED] wrote: 
 On Nov 16, 2007 10:56 AM, Brad Nicholson [EMAIL PROTECTED] wrote:
 On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote:
  Russell Smith [EMAIL PROTECTED] writes:
   It is possible that analyze is not getting the number of dead rows right?
 
  Hah, I think you are on to something.  ANALYZE is telling the truth
  about how many dead rows it saw, but its notion of dead is not good
  according to SnapshotNow.  Thus, rows inserted by a not-yet-committed
  transaction would be counted as dead.  So if these are background
  auto-analyzes being done in parallel with inserting transactions that
  run for awhile, seeing a few not-yet-committed rows would be
  unsurprising.
 
  I wonder if that is worth fixing?  I'm not especially concerned about
  the cosmetic aspect of it, but if we mistakenly launch an autovacuum
  on the strength of an inflated estimate of dead rows, that could be
  costly.

 Sounds to me like that could result in autovacuum kicking off while
 doing large data loads.  This sounds suspiciously like problem someone
 on -novice was having - tripping over a windows autovac bug while doing
 a data load

 http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php
 
 I am almost 100% I've seen this behavior in the field...
 
I know I've seen bulk loads go significantly faster with autovacuum
turned off.  It always seemed like a bigger difference than what the
ANALYZE would cause.  I bet this explains it.
 
-Kevin
 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] autovacuum: recommended?

2007-11-16 Thread Tobias Brox
[Gábor Farkas - Fri at 10:40:43AM +0100]
 my question is: is it recommended to use it? or in other words, should i 
 only use autovacuum? or it's better to use manual-vacuuming? which one 
 is the way of the future :) ? or should i use both auto-vacuum and 
 manual-vacuum?

Nightly vacuums are great if the activity on the database is very low
night time.  A combination is also good, the autovacuum will benefit
from the nightly vacuum.  My gut feeling says it's a good idea to leave
autovacuum on, regardless of whether the nightly vacuums have been
turned on or not.

That being said, we have some huge tables in our database and pretty
much traffic, and got quite some performance problems when the
autovacuum kicked in and started vacuuming those huge tables, so we're
currently running without.  Autovacuum can be tuned to not touch those
tables, but we've chosen to leave it off.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] autovacuum: recommended?

2007-11-16 Thread Csaba Nagy
On Fri, 2007-11-16 at 12:13 +0100, Tobias Brox wrote:
 [snip] should i use both auto-vacuum and 
  manual-vacuum?

I would say for 8.2 that's the best strategy (which might change with
8.3 and it's multiple vacuum workers thingy).

 That being said, we have some huge tables in our database and pretty
 much traffic, and got quite some performance problems when the
 autovacuum kicked in and started vacuuming those huge tables, so we're
 currently running without.  Autovacuum can be tuned to not touch those
 tables, but we've chosen to leave it off.

We are doing that here, i.e. set up autovacuum not to touch big tables,
and cover those with nightly vacuums if there is still some activity on
them, and one weekly complete vacuum of the whole DB (vacuum without
other params, preferably as the postgres user to cover system tables
too).

In fact we also have a few very frequently updated small tables, those
are also covered by very frequent crontab vacuums because in 8.2
autovacuum can spend quite some time vacuuming some medium sized tables
and in that interval the small but frequently updated ones get bloated.
This should be better with 8.3 and multiple autovacuum workers.

For the disable for autovacuum part search for pg_autovacuum in the
docs.

I would say the autovacuum + disable autovacuum on big tables + nightly
vacuum + weekly vacuumdb + frequent crontab vacuum of very updated small
tables works well in 8.2. One thing which could be needed is to also
schedule continuous vacuum of big tables which are frequently updated,
with big delay settings to throttle the resources used by the vacuum. We
don't need that here because we don't update frequently our big
tables...

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings