Re: [PERFORM] Serious issues with CPU usage

2003-09-08 Thread Shridhar Daithankar
On 8 Sep 2003 at 13:50, Andri Saar wrote:
 If this is the best you can get with postgres right now, then I'll just have 
 to increase the frequency of VACUUMing, but that feels like a hackish 
 solution :(

Use a autovacuum daemon. There is one in postgresql contrib module. It was 
introduced during 7.4 development and it works with 7.3.x. as well.

Current 7.4CVS head has some problems with stats collector but soon it should 
be fine.

Check it out..

Bye
 Shridhar

--
Punishment becomes ineffective after a certain point.  Men become insensitive. 
 
-- Eneg, Patterns of Force, stardate 2534.7


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Serious issues with CPU usage

2003-09-08 Thread Manfred Koizar
On Mon, 8 Sep 2003 13:50:23 +0300, Andri Saar [EMAIL PROTECTED]
wrote:
Basically I do this:
1) select about ~700 ID's I have to poll
2) poll them
3) update those 700 rows in that table I used (~2700 rows total).

And I do this cycle once per minute, so yes, I've got a zillion updates. 700 
of 2700 is roughly 25%, so I'd have to vacuum once per minute?

With such a small table VACUUM should be a matter of less than one
second:

fred=# vacuum verbose t;
INFO:  --Relation public.t--
INFO:  Index t_pkey: Pages 65; Tuples 16384: Deleted 4096.
CPU 0.01s/0.10u sec elapsed 0.21 sec.
INFO:  Removed 4096 tuples in 154 pages.
CPU 0.04s/0.02u sec elapsed 0.07 sec.
INFO:  Pages 192: Changed 192, Empty 0; Tup 16384: Vac 4096, Keep 0,
UnUsed 0.
Total CPU 0.08s/0.16u sec elapsed 0.36 sec.
VACUUM
Time: 415.00 ms

And this is on a 400 MHz machine under cygwin, so don't worry if you
have a real computer.

Servus
 Manfred

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Serious issues with CPU usage

2003-09-08 Thread Tom Lane
Andri Saar [EMAIL PROTECTED] writes:
 If this is the best you can get with postgres right now, then I'll just have 
 to increase the frequency of VACUUMing, but that feels like a hackish 
 solution :(

Not at all.  The overhead represented by VACUUM would have to be paid
somewhere, somehow, in any database.  Postgres allows you to control
exactly when it gets paid.

It looks to me like throwing a plain VACUUM into your poller cycle
(or possibly VACUUM ANALYZE depending on how fast the table's stats
change) would solve your problems nicely.

Note that once you have that policy in place, you will want to do one
VACUUM FULL, and possibly a REINDEX, to get the table's physical size
back down to something commensurate with 2700 useful rows.  I shudder
to think of where it had gotten to before.  Routine VACUUMing should
hold it to a reasonable size after that.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Serious issues with CPU usage

2003-09-08 Thread Andri Saar
On Monday 08 September 2003 17:04, Tom Lane wrote:

 It looks to me like throwing a plain VACUUM into your poller cycle
 (or possibly VACUUM ANALYZE depending on how fast the table's stats
 change) would solve your problems nicely.


I compled the pg_autovacuum daemon from 7.4beta sources as Shridhar Daithankar 
recommended, and it seems to work fine. At first glance I thought VACUUM is a 
thing you do maybe once per week during routine administration tasks like 
making a full backup, but I was wrong.

Thanks to all for your help, we can consider this problem solved.

Note to future generations: default postgres configuration settings are very 
conservative and don't be afraid to VACUUM very often.


andri


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] slow plan for min/max

2003-09-08 Thread scott.marlowe
On Sun, 7 Sep 2003, Pailloncy Jean-Gérard wrote:

Asking a question about why max(id) is so much slower than select id order 
by id desc limit 1, Pailloncy said:

 I ask for the same thing.
 That's better !

This is a Frequently asked question about something that isn't likely to 
change any time soon.

Basically, Postgresql uses an MVCC locking system that makes massively 
parallel operation possible, but costs in certain areas, and one of those 
areas is aggregate performance over large sets.  MVCC makes it very hard 
to optimize all but the simplest of aggregates, and even those 
optimzations which are possible would wind up being quite ugly at the 
parser level.

You might want to search the archives in the last couple years for this 
subject, as it's come up quite often.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] slow plan for min/max

2003-09-08 Thread Neil Conway
On Mon, 2003-09-08 at 11:56, scott.marlowe wrote:
 Basically, Postgresql uses an MVCC locking system that makes massively 
 parallel operation possible, but costs in certain areas, and one of those 
 areas is aggregate performance over large sets.  MVCC makes it very hard 
 to optimize all but the simplest of aggregates, and even those 
 optimzations which are possible would wind up being quite ugly at the 
 parser level.

As was pointed out in a thread a couple days ago, MIN/MAX() optimization
has absolutely nothing to do with MVCC. It does, however, make
optimizing COUNT() more difficult.

-Neil



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] slow plan for min/max

2003-09-08 Thread Christopher Browne
[EMAIL PROTECTED] (scott.marlowe) writes:
 On Sun, 7 Sep 2003, Pailloncy Jean-Gérard wrote:

 Asking a question about why max(id) is so much slower than select id order 
 by id desc limit 1, Pailloncy said:

 I ask for the same thing.  That's better !

 This is a Frequently asked question about something that isn't
 likely to change any time soon.

 Basically, Postgresql uses an MVCC locking system that makes
 massively parallel operation possible, but costs in certain areas,
 and one of those areas is aggregate performance over large sets.
 MVCC makes it very hard to optimize all but the simplest of
 aggregates, and even those optimzations which are possible would
 wind up being quite ugly at the parser level.

MVCC makes it difficult to optimize aggregates resembling COUNT(*) or
SUM(*), at least vis-a-vis having this available for a whole table
(e.g. - you have to be doing 'SELECT COUNT(*), SUM(SOMEFIELD) FROM
THIS_TABLE' with NO WHERE clause).

But there is nothing about MVCC that makes it particularly difficult
to handle the transformation:

 select max(field) from some_table where another_field 
still_another_field;

   (which isn't particularly efficient) into

 select field from some_table where another_field 
still_another_field order by field desc limit 1;

The problems observed are thus:

 1.  If the query asks for other data, it might be necessary to scan
 the table to get the other data, making the optimization
 irrelevant;

 2.  If there's a good index to key on, the transformed version might
 be a bunch quicker, but it is nontrivial to determine that, a
 priori;

 3.  It would be a fairly hairy optimization to throw into the query
 optimizer, so people are reluctant to try to do so.

Note that MVCC has _nothing_ to do with any of those three problems.

The MVCC-related point is that there is reluctance to create some
special case that will be troublesome to maintain instead of having
some comprehensive handling of _all_ aggregates.  It seems a better
idea to fix them all rather than to kludge things up by fixing one
after another.
-- 
let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;;
http://cbbrowne.com/info/lisp.html
Signs of a Klingon Programmer -  10. A TRUE  Klingon Warrior does not
comment his code!

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] slow plan for min/max

2003-09-08 Thread scott.marlowe
On Mon, 8 Sep 2003, Neil Conway wrote:

 On Mon, 2003-09-08 at 11:56, scott.marlowe wrote:
  Basically, Postgresql uses an MVCC locking system that makes massively 
  parallel operation possible, but costs in certain areas, and one of those 
  areas is aggregate performance over large sets.  MVCC makes it very hard 
  to optimize all but the simplest of aggregates, and even those 
  optimzations which are possible would wind up being quite ugly at the 
  parser level.
 
 As was pointed out in a thread a couple days ago, MIN/MAX() optimization
 has absolutely nothing to do with MVCC. It does, however, make
 optimizing COUNT() more difficult.

Not exactly.  While max(id) is easily optimized by query replacement, 
more complex aggregates will still have perfomance issues that would not 
be present in a row locking database.  i.e. max((field1/field2)*field3) is 
still going to cost more to process, isn't it?



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] slow plan for min/max

2003-09-08 Thread Tom Lane
scott.marlowe [EMAIL PROTECTED] writes:
 On Mon, 8 Sep 2003, Neil Conway wrote:
 As was pointed out in a thread a couple days ago, MIN/MAX() optimization
 has absolutely nothing to do with MVCC. It does, however, make
 optimizing COUNT() more difficult.

 Not exactly.  While max(id) is easily optimized by query replacement, 
 more complex aggregates will still have perfomance issues that would not 
 be present in a row locking database.  i.e. max((field1/field2)*field3) is 
 still going to cost more to process, isn't it?

Er, what makes you think that would be cheap in any database?

Postgres would actually have an advantage given its support for
expressional indexes (nee functional indexes).  If we had an optimizer
transform to convert MAX() into an index scan, I would expect it to be
able to match up max((field1/field2)*field3) with an index on
((field1/field2)*field3).

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] slow plan for min/max

2003-09-08 Thread Greg Stark

scott.marlowe [EMAIL PROTECTED] writes:

 Basically, Postgresql uses an MVCC locking system that makes massively 

As discussed, uh, a few days ago, this particular problem is not caused by
MVCC but by postgres having a general purpose aggregate system and not having
special code for handling min/max. Aggregates normally require access to every
record they're operating on, not just the first or last in some particular
order. You'll note the LIMIT 1/DISTINCT ON work-around works fine with MVCC...

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] slow plan for min/max

2003-09-08 Thread Matt Clark
 Actually, referring down to later parts of this thread, why can't this
 optimisation be performed internally for built-in types?  I understand the
 issue with aggregates over user-defined types, but surely optimising max()
 for int4, text, etc is safe and easy?

Sorry, missed the bit about user-defined functions.  So I should have said
built-in functions operating over built-in types.  Which does sound more
complicated, but anyone redefining max() is surely not in a position to seek
sympathy if they lose performance?


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM]

2003-09-08 Thread Odiel León



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] slow plan for min/max

2003-09-08 Thread Matt Clark
 Matt Clark [EMAIL PROTECTED] writes:
  Actually, referring down to later parts of this thread, why can't this
  optimisation be performed internally for built-in types?  I
 understand the
  issue with aggregates over user-defined types, but surely
 optimising max()
  for int4, text, etc is safe and easy?

 I can't see that the datatype involved has anything to do with it.
 None of the issues that come up in making the planner do this are
 datatype-specific.  You could possibly avoid adding some columns
 to pg_aggregate if you instead hard-wired the equivalent knowledge
 (for builtin types only) into some code somewhere, but a patch that
 approached it that way would be rejected as unmaintainable.

I don't pretend to have any useful knowledge of the internals of this, so
much of what I write may seem like noise to you guys.  The naive question is
'I have an index on X, so finding max(X) should be trivial, so why can't the
planner exploit that triviality?'.  AFAICS the short sophisticated answer is
that it just isn't trivial in the general case.

Upon rereading the docs on aggregates I see that it really isn't trivial at
all.  Not even knowing things like 'this index uses the same function as
this aggregate' gets you very far, because of the very general nature of the
implementation of aggs.

So it should be flagged very prominently in the docs that max() and min()
are almost always not what 90% of people want to use 90% of the time,
because indexes do the same job much better for anything other than tiny
tables.

Know what we (OK, I) need?  An explicitly non-aggregate max() and min(),
implemented differently, so they can be optimised.  let's call them
idx_max() and idx_min(), which completely bypass the standard aggregate
code.  Because let's face it, in most cases where you regularly want a max
or a min you have an index defined, and you want the DB to use it.

And I would volunteer to do it, I would, but you really don't want my C in
your project ;-)  I do volunteer to do some doc tweaking though - who do I
talk to?

M


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Quick question

2003-09-08 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Back in the 7.0 days, 
 WHERE EXISTS (SELECT * FROM a WHERE condition)
 was significantly slower on broad tables than
 WHERE EXISTS (SELECT small_col FROM a WHERE condition)
 Is this still true, or something that's been fixed in the last 3 versions?  

It's still true that all the sub-select's output columns will be
evaluated.  Given that this happens for at most one row, I'm not sure
how significant the hit really is.  But it's annoying, seeing that the
outer EXISTS doesn't care what the column values are.

 Joe Celko is making fun of me because Oracle doesn't have this performance 
 issue.

Perhaps Joe can tell us exactly which part of SQL92 says it's okay not
to evaluate side-effect-producing functions in the targetlist of an
EXISTS subselect.

I would like to make the system change the targetlist to just SELECT 1
in an EXISTS subquery.  But I'm slightly concerned about changing the
semantics of existing queries.  If someone can produce proof that this
is allowed (or even better, required) by the SQL spec, it'd be easier...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Explain Doc

2003-09-08 Thread Alberto Caso
On Mon, 08-sep-2003 at 16:29, Rhaoni Chiu Pereira wrote:

 Could anyone tell me a documentation that explains the  explain  result
 and how to analyze it ?
 

http://archives.postgresql.org/pgsql-performance/2003-09/msg0.php

Regards,

-- 
Alberto Caso Palomino
Adaptia Soluciones Integrales
http://www.adaptia.net
[EMAIL PROTECTED]




signature.asc
Description: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada	digitalmente