date:20051215

Re: [PERFORM] Simple Join

2005-12-15 Thread Mark Kirkwood


Kevin Brown wrote:

On Wednesday 14 December 2005 18:36, you wrote:


Well - that had no effect at all :-) You don't have and index on
to_ship.ordered_product_id do you? - try adding one (ANALYZE again), and
let use know what happens (you may want to play with SET
enable_seqscan=off as well).



I _DO_ have an index on to_ship.ordered_product_id.  It's a btree.



Sorry - read right past it!

Did you try out enable_seqscan=off? I'm interested to see if we can get 
8.1 bitmap anding the three possibly useful columns together on 
ordered_products and *then* doing the join to to_ship.


Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Simple Join

2005-12-15 Thread Mitch Skinner

On Thu, 2005-12-15 at 01:48 -0600, Kevin Brown wrote:
  Well, I'm no expert either, but if there was an index on
  ordered_products (paid, suspended_sub, id) it should be mergejoinable
  with the index on to_ship.ordered_product_id, right?  Given the
  conditions on paid and suspended_sub.
 
 The following is already there:
 
 CREATE INDEX ordered_product_id_index
   ON to_ship
   USING btree
   (ordered_product_id);
 
 That's why I emailed this list.

I saw that; what I'm suggesting is that that you try creating a 3-column
index on ordered_products using the paid, suspended_sub, and id columns.
In that order, I think, although you could also try the reverse.  It may
or may not help, but it's worth a shot--the fact that all of those
columns are used together in the query suggests that you might do better
with a three-column index on those. 

With all three columns indexed individually, you're apparently not
getting the bitmap plan that Mark is hoping for.  I imagine this has to
do with the lack of multi-column statistics in postgres, though you
could also try raising the statistics target on the columns of interest.

Setting enable_seqscan to off, as others have suggested, is also a
worthwhile experiment, just to see what you get.

Mitch


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James

I asked a while back if there were any plans to allow developers to override the
optimizer's plan and force certain plans, and received a fairly resounding
No. The general feeling I get is that a lot of work has gone into the
optimizer, and by God we're going to use it!

I think this is just wrong, and I'm curious whether I'm alone in this opinion.

Over and over, I see questions posted to this mailing list about execution
plans that don't work out well. Many times there are good answers - add an
index, refactor the design, etc. - that yield good results. But, all too often
the answer comes down to something like this recent one:

Right on. Some of these coerced plans may perform
much better. If so, we can look at tweaking your runtime

config: e.g.

effective_cache_size
random_page_cost
default_statistics_target

to see if said plans can be chosen naturally.

I see this over and over. Tweak the parameters to force a certain plan, because
there's no formal way for a developer to say, I know the best plan.

There isn't a database in the world that is as smart as a developer, or that
can have insight into things that only a developer can possibly know. Here's a
real-life example that caused me major headaches. It's a trivial query, but
Postgres totally blows it:

select * from my_table
where row_num = 5 and row_num 10

and myfunc(foo, bar);

How can Postgres possibly know what myfunc() does? In this example, my_table
is about 10 million rows and row_num is indexed. When the row_num range is less than
about 30,000, Postgres (correctly) uses an row_num index scan, then filters by myfunc().
But beyond that, it chooses a sequential scan, filtering by myfunc(). This is just
wrong. Postgres can't possibly know that myfunc() is VERY expensive. The correct plan
would be to switch from index to filtering on row_num. Even if 99% of the database is
selected by row_num, it should STILL at least filter by row_num first, and only filter by
myfunc() as the very last step.

How can a database with no ability to override a plan possibly cope with this?

Without the explicit ability to override the plan Postgres generates, these
problems dominate our development efforts. Postgres does an excellent job
optimizing on 90% of the SQL we write, but the last 10% is nearly impossible to
get right. We spend huge amounts of time on trial-and-error queries, second
guessing Postgress, creating unnecessary temporary tables, sticking in the
occasional OFFSET in a subquery to prevent merging layers, and so forth.

This same application also runs on Oracle, and although I've cursed Oracle's
stupid planner many times, at least I can force it to do it right if I need to.

The danger of forced plans is that inexperienced developers tend to abuse them. So it goes -- the documentation should be clear that forced plans are always a last resort.

But there's no getting around the fact that Postgres needs a way for a
developer to specify the execution plan.

Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] How much expensive are row level statistics?

2005-12-15 Thread Michael Fuhr

On Mon, Dec 12, 2005 at 10:20:45PM -0500, Tom Lane wrote:
 Given the rather lackadaisical way in which the stats collector makes
 the data available, it seems like the backends are being much too
 enthusiastic about posting their stats_command_string status
 immediately.  Might be worth thinking about how to cut back the
 overhead by suppressing some of these messages.

Would a GUC setting akin to log_min_duration_statement be feasible?
Does the backend support, or could it be easily modified to support,
a mechanism that would post the command string after a configurable
amount of time had expired, and then continue processing the query?
That way admins could avoid the overhead of posting messages for
short-lived queries that nobody's likely to see in pg_stat_activity
anyway.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] How much expensive are row level statistics?

2005-12-15 Thread Tom Lane

Michael Fuhr [EMAIL PROTECTED] writes:
 Does the backend support, or could it be easily modified to support,
 a mechanism that would post the command string after a configurable
 amount of time had expired, and then continue processing the query?

Not really, unless you want to add the overhead of setting a timer
interrupt for every query.  Which is sort of counterproductive when
the motivation is to reduce overhead ...

(It might be more or less free if you have statement_timeout set, since
there would be a setitimer call anyway.  But I don't think that's the
norm.)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

30 matches

Mail list logo