Re: [PERFORM] Simple Join

2005-12-15 Thread Mark Kirkwood

Kevin Brown wrote:

On Wednesday 14 December 2005 18:36, you wrote:


Well - that had no effect at all :-) You don't have and index on
to_ship.ordered_product_id do you? - try adding one (ANALYZE again), and
let use know what happens (you may want to play with SET
enable_seqscan=off as well).



I _DO_ have an index on to_ship.ordered_product_id.  It's a btree.



Sorry - read right past it!

Did you try out enable_seqscan=off? I'm interested to see if we can get 
8.1 bitmap anding the three possibly useful columns together on 
ordered_products and *then* doing the join to to_ship.


Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Simple Join

2005-12-15 Thread Mitch Skinner
On Thu, 2005-12-15 at 01:48 -0600, Kevin Brown wrote:
  Well, I'm no expert either, but if there was an index on
  ordered_products (paid, suspended_sub, id) it should be mergejoinable
  with the index on to_ship.ordered_product_id, right?  Given the
  conditions on paid and suspended_sub.
 
 The following is already there:
 
 CREATE INDEX ordered_product_id_index
   ON to_ship
   USING btree
   (ordered_product_id);
 
 That's why I emailed this list.

I saw that; what I'm suggesting is that that you try creating a 3-column
index on ordered_products using the paid, suspended_sub, and id columns.
In that order, I think, although you could also try the reverse.  It may
or may not help, but it's worth a shot--the fact that all of those
columns are used together in the query suggests that you might do better
with a three-column index on those. 

With all three columns indexed individually, you're apparently not
getting the bitmap plan that Mark is hoping for.  I imagine this has to
do with the lack of multi-column statistics in postgres, though you
could also try raising the statistics target on the columns of interest.

Setting enable_seqscan to off, as others have suggested, is also a
worthwhile experiment, just to see what you get.

Mitch


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James

I asked a while back if there were any plans to allow developers to override the 
optimizer's plan and force certain plans, and received a fairly resounding 
No.  The general feeling I get is that a lot of work has gone into the 
optimizer, and by God we're going to use it!

I think this is just wrong, and I'm curious whether I'm alone in this opinion.

Over and over, I see questions posted to this mailing list about execution 
plans that don't work out well.  Many times there are good answers - add an 
index, refactor the design, etc. - that yield good results.  But, all too often 
the answer comes down to something like this recent one:

   Right on. Some of these coerced plans may perform 
   much better. If so, we can look at tweaking your runtime

   config: e.g.
  
   effective_cache_size
   random_page_cost
   default_statistics_target
  
   to see if said plans can be chosen naturally.

I see this over and over.  Tweak the parameters to force a certain plan, because 
there's no formal way for a developer to say, I know the best plan.

There isn't a database in the world that is as smart as a developer, or that 
can have insight into things that only a developer can possibly know.  Here's a 
real-life example that caused me major headaches.  It's a trivial query, but 
Postgres totally blows it:

   select * from my_table 
where row_num = 5 and row_num  10

and myfunc(foo, bar);

How can Postgres possibly know what myfunc() does?  In this example, my_table 
is about 10 million rows and row_num is indexed.  When the row_num range is less than 
about 30,000, Postgres (correctly) uses an row_num index scan, then filters by myfunc().  
But beyond that, it chooses a sequential scan, filtering by myfunc().  This is just 
wrong.  Postgres can't possibly know that myfunc() is VERY expensive.  The correct plan 
would be to switch from index to filtering on row_num.  Even if 99% of the database is 
selected by row_num, it should STILL at least filter by row_num first, and only filter by 
myfunc() as the very last step.

How can a database with no ability to override a plan possibly cope with this?

Without the explicit ability to override the plan Postgres generates, these 
problems dominate our development efforts.  Postgres does an excellent job 
optimizing on 90% of the SQL we write, but the last 10% is nearly impossible to 
get right.  We spend huge amounts of time on trial-and-error queries, second 
guessing Postgress, creating unnecessary temporary tables, sticking in the 
occasional OFFSET in a subquery to prevent merging layers, and so forth.

This same application also runs on Oracle, and although I've cursed Oracle's 
stupid planner many times, at least I can force it to do it right if I need to.

The danger of forced plans is that inexperienced developers tend to abuse them.  So it goes -- the documentation should be clear that forced plans are always a last resort.  


But there's no getting around the fact that Postgres needs a way for a 
developer to specify the execution plan.

Craig


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How much expensive are row level statistics?

2005-12-15 Thread Michael Fuhr
On Mon, Dec 12, 2005 at 10:20:45PM -0500, Tom Lane wrote:
 Given the rather lackadaisical way in which the stats collector makes
 the data available, it seems like the backends are being much too
 enthusiastic about posting their stats_command_string status
 immediately.  Might be worth thinking about how to cut back the
 overhead by suppressing some of these messages.

Would a GUC setting akin to log_min_duration_statement be feasible?
Does the backend support, or could it be easily modified to support,
a mechanism that would post the command string after a configurable
amount of time had expired, and then continue processing the query?
That way admins could avoid the overhead of posting messages for
short-lived queries that nobody's likely to see in pg_stat_activity
anyway.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] How much expensive are row level statistics?

2005-12-15 Thread Tom Lane
Michael Fuhr [EMAIL PROTECTED] writes:
 Does the backend support, or could it be easily modified to support,
 a mechanism that would post the command string after a configurable
 amount of time had expired, and then continue processing the query?

Not really, unless you want to add the overhead of setting a timer
interrupt for every query.  Which is sort of counterproductive when
the motivation is to reduce overhead ...

(It might be more or less free if you have statement_timeout set, since
there would be a setitimer call anyway.  But I don't think that's the
norm.)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Tom Lane
Craig A. James [EMAIL PROTECTED] writes:
 I see this over and over.  Tweak the parameters to force a certain
 plan, because there's no formal way for a developer to say, I know
 the best plan.

I think you've misunderstood those conversations entirely.  The point
is not to force the planner into a certain plan, it is to explore what's
going on with a view to understanding why the planner isn't making a
good choice, and thence hopefully improve the planner in future.  (Now,
that's not necessarily what the user with an immediate problem is
thinking, but that's definitely what the developers are thinking.)

 There isn't a database in the world that is as smart as a developer,

People who are convinced they are smarter than the machine are often
wrong ;-).  If we did put in the nontrivial amount of work needed to
have such a facility, it would probably get abused more often than it
was used correctly.  I'd rather spend the work on making the planner
better.

This discussion has been had before (many times) ... see the -hackers
archives for detailed arguments.  The one that carries the most weight
in my mind is that planner hints embedded in applications will not adapt
to changing circumstances --- the plan that was best when you designed
the code might not be best today.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Christopher Kings-Lynne

   select * from my_table where row_num = 5 and row_num  10
and myfunc(foo, bar);


You just create an index on myfunc(foo, bar)

Chris


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Jaime Casanova
On 12/15/05, Christopher Kings-Lynne [EMAIL PROTECTED] wrote:
 select * from my_table where row_num = 5 and row_num  10
  and myfunc(foo, bar);

 You just create an index on myfunc(foo, bar)

 Chris


only if myfunc(foo, bar) is immutable...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Christopher Kings-Lynne

  select * from my_table where row_num = 5 and row_num  10
   and myfunc(foo, bar);


You just create an index on myfunc(foo, bar)


only if myfunc(foo, bar) is immutable...


And if it's not then the best any database can do is to index scan 
row_num - so still you have no problem.


Chris


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James

Tom,


I see this over and over.  Tweak the parameters to force a certain
plan, because there's no formal way for a developer to say, I know
the best plan.


I think you've misunderstood those conversations entirely.  The point
is not to force the planner into a certain plan, it is to explore what's
going on with a view to understanding why the planner isn't making a
good choice, and thence hopefully improve the planner in future.


No, I understood the conversations very clearly.  But no matter how clever the 
optimizer, it simply can't compete with a developer who has knowledge that 
Postgres *can't* have.  The example of a user-written function is obvious.


There isn't a database in the world that is as smart as a developer,


People who are convinced they are smarter than the machine are often
wrong ;-). 


Often, but not always -- as I noted in my original posting.  And when the 
developer is smarter than Postgres, and Postgres makes the wrong choice, what 
is the developer supposed to do?  This isn't academic -- the wrong plans 
Postgres makes can be *catastrophic*, e.g. turning a 3-second query into a 
three-hour query.

How about this: Instead of arguing in the abstract, tell me in concrete terms 
how you would address the very specific example I gave, where myfunc() is a 
user-written function.  To make it a little more challenging, try this: 
myfunc() can behave very differently depending on the parameters, and sometimes 
(but not always), the application knows how it will behave and could suggest a 
good execution plan.

(And before anyone suggests that I rewrite myfunc(), I should explain that it's 
in the class of NP-complete problems.  The function is inherently hard and 
can't be made faster or more predictable.)

The example I raised in a previous thread, of irregular usage, is the same: I have a particular 
query that I *always* want to be fast even if it's only used rarely, but the system swaps its 
tables out of the file-system cache, based on low usage, even though the high 
usage queries are low priority.  How can Postgres know such things when there's no way for me 
to tell it?

The answers from the Postgres community were essentially, Postgres is smarter than 
you, let it do its job.  Unfortunately, this response completely ignores the 
reality: Postgres is NOT doing its job, and can't, because it doesn't have enough 
information.

Craig


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James

Christopher Kings-Lynne wrote:

   select * from my_table where row_num = 5 and row_num  10
and myfunc(foo, bar);



You just create an index on myfunc(foo, bar)


Thanks, but myfunc() takes parameters (shown here as foo, bar), one of which 
is not a column, it's external and changes with every query.  A function index won't work.

Craig

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Christopher Kings-Lynne
   Right on. Some of these coerced plans may performmuch better. 
If so, we can look at tweaking your runtime

   config: e.g.
  
   effective_cache_size
   random_page_cost
   default_statistics_target
  
   to see if said plans can be chosen naturally.

I see this over and over.  Tweak the parameters to force a certain 
plan, because there's no formal way for a developer to say, I know the 
best plan.


No, this is fixing your wrongn, inaccurate parameters so that 
postgresql can choose a better plan.


I don't necessarily disagree with your assertion that we need planner 
hints, but unless you or someone else is willing to submit a patch with 
the feature it's unlikely to ever be implemented...


Chris


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James



Christopher Kings-Lynne wrote:
  select * from my_table where row_num = 5 and row_num  
10

   and myfunc(foo, bar);



You just create an index on myfunc(foo, bar)



only if myfunc(foo, bar) is immutable...



And if it's not then the best any database can do is to index scan 
row_num - so still you have no problem.


Boy, you picked a *really* bad example ;-)

The problem is that Postgres decided to filter on myfunc() *first*, and then 
filter on row_num, resulting in a query time that jumped from seconds to hours. 
 And there's no way for me to tell Postgres not to do that!

So, you still have no problem is exactly wrong, because Postgres picked the 
wrong plan.  Postgres decided that applying myfunc() to 10,000,000 rows was a better plan 
than an index scan of 50,000 row_nums.  So I'm screwed.

Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Mark Kirkwood

Craig A. James wrote:
I asked a while back if there were any plans to allow developers to 
override the optimizer's plan and force certain plans, and received a 
fairly resounding No.  The general feeling I get is that a lot of work 
has gone into the optimizer, and by God we're going to use it!


I think this is just wrong, and I'm curious whether I'm alone in this 
opinion.


Over and over, I see questions posted to this mailing list about 
execution plans that don't work out well.  Many times there are good 
answers - add an index, refactor the design, etc. - that yield good 
results.  But, all too often the answer comes down to something like 
this recent one:


   Right on. Some of these coerced plans may performmuch better. 
If so, we can look at tweaking your runtime

   config: e.g.
  
   effective_cache_size
   random_page_cost
   default_statistics_target
  
   to see if said plans can be chosen naturally.

I see this over and over.  Tweak the parameters to force a certain 
plan, because there's no formal way for a developer to say, I know the 
best plan.




I hear what you are saying, but to use this fine example - I don't know 
what the best plan is - these experiments part of an investigation to 
find *if* there is a better plan, and if so, why Postgres is not finding it.


There isn't a database in the world that is as smart as a developer, or 
that can have insight into things that only a developer can possibly 
know.


That is often true - but the aim is to get Postgres's optimizer closer 
to developer smartness.


After years of using several other database products (some supporting 
hint type constructs and some not), I have come to believe that hinting 
(or similar) actually *hinders* the development of a great optimizer.



Best wishes

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James

Christopher Kings-Lynne wrote:
I don't necessarily disagree with your assertion that we need planner 
hints, but unless you or someone else is willing to submit a patch with 
the feature it's unlikely to ever be implemented...


Now that's an answer I understand and appreciate.  Open-source development 
relies on many volunteers, and I've benefitted from it since the early 1980's 
when emacs and Common Lisp first came to my attention.  I've even written a 
widely-circulated article about open-source development, which some of you may 
have read:

http://www.moonviewscientific.com/essays/software_lifecycle.htm

I hope nobody here thinks I'm critical of all the hard work that's been put into 
Postgres.  My hope is to raise the awareness of this issue in the hope that it's at least 
put on the list for serious consideration.

Craig


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James

Mark Kirkwood wrote:
I hear what you are saying, but to use this fine example - I don't know 
what the best plan is - these experiments part of an investigation to 
find *if* there is a better plan, and if so, why Postgres is not finding 
it.


There isn't a database in the world that is as smart as a developer, 
or that can have insight into things that only a developer can 
possibly know.


That is often true - but the aim is to get Postgres's optimizer closer 
to developer smartness.


What would be cool would be some way the developer could alter the plan, but they way of 
doing so would strongly encourage the developer to send the information to this mailing 
list.  Postgres would essentially say, Ok, you can do that, but we want to know 
why!

After years of using several other database products (some supporting 
hint type constructs and some not), I have come to believe that hinting 
(or similar) actually *hinders* the development of a great optimizer.


I agree.  It takes the pressure off the optimizer gurus.  If the users can just 
work around every problem, then the optimizer can suck and the system is still 
usable.

Lest anyone think I'm an all-out advocate of overriding the optimizer, I know from 
first-hand experience what a catastrophe it can be.  An Oracle hint I used worked fine on 
my test schema, but the customer's table turned out to be a view, and 
Oracle's optimizer worked well on the view whereas my hint was horrible.  Unfortunately, 
without the hint, Oracle sucked when working on an ordinary table.  Hints are dangerous, 
and I consider them a last resort.

Craig

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James

Tom Lane wrote:

This discussion has been had before (many times) ... see the -hackers
archives for detailed arguments.  The one that carries the most weight
in my mind is that planner hints embedded in applications will not adapt
to changing circumstances --- the plan that was best when you designed
the code might not be best today.


Absolutely right.  But what am I supposed to do *today* if the planner makes a 
mistake?  Shut down my web site?

Ropes are useful, but you can hang yourself with them.  Knives are useful, but 
you can cut yourself with them.  Should we ban useful tools because they cause 
harm to the careless?

Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James

Christopher Kings-Lynne wrote:

Can you paste explain analyze and your effective_cache_size, etc. settings.
... 
This seems like a case where PostgreSQL's current optimiser should 
easily know what to do if your config settings are correct and you've 
been running ANALYZE, so I'd like to see your settings and the explain 
analyze plan...


I could, but it would divert us from the main topic of this discussion.  It's 
not about that query, which was just an example.  It's the larger issue.

Tom's earlier response tells the story better than I can:

This discussion has been had before (many times) ... see
the -hackers archives for detailed arguments. 


If it's been had before (many times), and now I'm bringing it up again, then 
it's clearly an ongoing problem that hasn't been resolved.

Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Mark Kirkwood

Craig A. James wrote:



What would be cool would be some way the developer could alter the plan, 
but they way of doing so would strongly encourage the developer to send 
the information to this mailing list.  Postgres would essentially say, 
Ok, you can do that, but we want to know why!




Yeah it would - an implementation I have seen that I like is where the 
developer can supply the *entire* execution plan with a query. This is 
complex enough to make casual use unlikely :-), but provides the ability 
to try out other plans, and also fix that vital query that must run 
today.


cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Christopher Kings-Lynne
... This seems like a case where PostgreSQL's current optimiser should 
easily know what to do if your config settings are correct and you've 
been running ANALYZE, so I'd like to see your settings and the explain 
analyze plan...


I could, but it would divert us from the main topic of this discussion.  
It's not about that query, which was just an example.  It's the larger 
issue.


So your main example bad query is possibly just a case of lack of 
analyze stats and wrong postgresql.conf config?  And that's what causes 
you to shut down your database?  Don't you want your problem FIXED?


But like I said - no developer is interested in doing planner hints. 
Possibly you could get a company to sponsor it.  Maybe what you want is 
a statement of If someone submits a good, working, fully implemented 
patch that does planner hints, then we'll accept it.


Chris


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James

Christopher Kings-Lynne wrote:
So your main example bad query is possibly just a case of lack of 
analyze stats and wrong postgresql.conf config?  And that's what causes 
you to shut down your database?  Don't you want your problem FIXED?


I'm trying to help by raising a question that I think is important, and have an 
honest, perhaps vigorous, but respectful, discussion about it.  I respect 
everyone's opinion, and I hope you respect mine.  I've been in this business a 
long time, and I don't raise issues lightly.

Yes, I want my query fixed.  And I may post it, in a thread with a new title.  
In fact, I posted a different query with essentially the same problem a while 
back and got nothing that helped:

   http://archives.postgresql.org/pgsql-performance/2005-11/msg00133.php

(I can't help but point out that Tom's response was to suggest a way to fool the 
optimizer so as to prevent it from optimizing the query.  In other words, he 
told me a trick that would force a particular plan on the optimizer.  Which is exactly 
the point of this discussion.)

The point is that the particular query is not relevant -- it's the fact that 
this topic (according to Tom) has been and continues to be raised.  This should 
tell us all something, that it's not going to go away, and that it's a real 
issue.

Regards,
Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Simple Join

2005-12-15 Thread David Lang

On Fri, 16 Dec 2005, Mark Kirkwood wrote:



Right on. Some of these coerced plans may perform much better. If so, we 
can look at tweaking your runtime config: e.g.


effective_cache_size
random_page_cost
default_statistics_target

to see if said plans can be chosen naturally.


Mark, I've seen these config options listed as tweaking targets fairly 
frequently, has anyone put any thought or effort into creating a test 
program that could analyse the actual system and set the defaults based on 
the measured performance?


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Jaime Casanova
On 12/15/05, Craig A. James [EMAIL PROTECTED] wrote:
  Yeah it would - an implementation I have seen that I like is where the
  developer can supply the *entire* execution plan with a query. This is
  complex enough to make casual use unlikely :-), but provides the ability
  to try out other plans, and also fix that vital query that must run
  today.

 So, to move on to the concrete...

 I'm not familiar with the innards of Postgres except in a theoretical way.
 Maybe this is a totally naive or dumb question, but I have to ask:   How
 hard would it be to essentially turn off the optimizer?

 1. Evaluate WHERE clauses left-to-right.

 select ... from FOO where A and B and C;

 This would just apply the criteria left-to-right, first A, then B, then C.
 If an index was available it would use it, but only in left-to-right order,
 i.e. if A had no index but B did, then too bad, you should have written B
 and A and C.


pg  8.1 when you use multi-column indexes do exactly this... but i
don't know why everyone wants this...


 2. Evaluate joins left-to-right.

 select ... from FOO join BAR on (...) join BAZ on (...) where ...

 This would join FOO to BAR, then join the result to BAZ.  The only
 optimization would be to apply relevant where conditions to each join
 before processing the next join.


using explicit INNER JOIN syntax and parenthesis


 3. Don't flatten sub-selects

 select ... from (select ... from FOO where ...) as X where ...;


select ... from (select ... from FOO where ... offset 0) as X where ...;

 This would do the inner select then use the result in the outer select, and
 wouldn't attempt to flatten the query.

 Thanks,
 Craig


what else?

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread David Lang

On Thu, 15 Dec 2005, Craig A. James wrote:

The example I raised in a previous thread, of irregular usage, is the same: I 
have a particular query that I *always* want to be fast even if it's only 
used rarely, but the system swaps its tables out of the file-system cache, 
based on low usage, even though the high usage queries are low priority. 
How can Postgres know such things when there's no way for me to tell it?


actually, postgres doesn't manage the file-system cache, it deliberatly 
leaves that up to the OS it is running on to do that job.


one (extremely ugly) method that you could use would be to have a program 
that looks up what files are used to store your high priority tables and 
then write a trivial program to keep those files in memory (it may be as 
simple as mmaping the files and then going to sleep, or you may have to 
read various points through the file to keep them current in the cache, it 
WILL vary depending on your OS and filesystem in use)


oracle goes to extremes with this sort of control, I'm actually mildly 
surprised that they still run on a host OS and haven't completely taken 
over the machine (I guess they don't want to have to write device drivers, 
that's about the only OS code they really want to use, they do their own 
memory management, filesystem, and user systems), by avoiding areas like 
this postgres sacrafices a bit of performance, but gains a much broader 
set of platforms (hardware and OS) that it can run on. and this by itself 
can result in significant wins (does oracle support Opteron CPU's in 64 
bit mode yet? as of this summer it just wasn't an option)


David Lang

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Kevin Brown
Craig A. James wrote:
 
 
 Christopher Kings-Lynne wrote:
   select * from my_table where row_num = 5 and row_num  
 10
and myfunc(foo, bar);
 
 
 You just create an index on myfunc(foo, bar)
 
 
 only if myfunc(foo, bar) is immutable...
 
 
 And if it's not then the best any database can do is to index scan 
 row_num - so still you have no problem.
 
 Boy, you picked a *really* bad example ;-)
 
 The problem is that Postgres decided to filter on myfunc() *first*, and 
 then filter on row_num, resulting in a query time that jumped from seconds 
 to hours.  And there's no way for me to tell Postgres not to do that!

Apologies in advance if all of this has been said, or if any of it is
wrong.


What kind of plan do you get if you eliminate the myfunc(foo, bar)
from the query entirely?  An index scan or a full table scan?  If the
latter then (assuming that the statistics are accurate) the reason you
want inclusion of myfunc() to change the plan must be the expense of
the function, not the expense of the scan (index versus sequential).
While the expense of the function isn't, as far as I know, known or
used by the planner, that obviously needn't be the case.

On the other hand, if the inclusion of the function call changes the
plan that is selected from an index scan to a sequential scan, then
that, I think, is clearly a bug, since even a zero-cost function
cannot make the sequential scan more efficient than an index scan
which is already more efficient than the base sequential scan.


 So, you still have no problem is exactly wrong, because Postgres picked 
 the wrong plan.  Postgres decided that applying myfunc() to 10,000,000 
 rows was a better plan than an index scan of 50,000 row_nums.  So I'm 
 screwed.

If PostgreSQL is indeed applying myfunc() to 10,000,000 rows, then
that is a bug if the function is declared VOLATILE (which is the
default if no volatility is specified), because it implies that it's
applying the function to rows that don't match the selection
condition.  From your prior description, it sounds like your function
is declared STABLE.


For your specific situation, my opinion is that the proper
modification to PostgreSQL would be to give it (if it isn't already
there) the ability to include the cost of functions in the plan.  The
cost needn't be something that it automatically measures -- it could
be specified at function creation time.



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Kevin Brown
Craig A. James wrote:
 Hints are dangerous, and I consider them a last resort.

If you consider them a last resort, then why do you consider them to
be a better alternative than a workaround such as turning off
enable_seqscan, when all the other tradeoffs are considered?

If your argument is that planner hints would give you finer grained
control, then the question is whether you'd rather the developers
spend their time implementing planner hints or improving the planner.
I'd rather they did the latter, as long as workarounds are available
when needed.  A workaround will probably give the user greater
incentive to report the problem than use of planner hints.


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James

Kevin Brown wrote:

Hints are dangerous, and I consider them a last resort.


If you consider them a last resort, then why do you consider them to
be a better alternative than a workaround such as turning off
enable_seqscan, when all the other tradeoffs are considered?


If I understand enable_seqscan, it's an all-or-nothing affair.  Turning it off 
turns it off for the whole database, right?  The same is true of all of the 
planner-tuning parameters in the postgres conf file.  Since the optimizer does 
a good job most of the time, I'd hate to change a global setting like this -- 
what else would be affected?  I could try this, but it would make me nervous to 
alter the whole system to fix one particular query.


If your argument is that planner hints would give you finer grained
control, then the question is whether you'd rather the developers
spend their time implementing planner hints or improving the planner.


I agree 100% -- I'd much prefer a better planner.  But when it comes down to a 
do-or-die situation, you need a hack, some sort of workaround, to get you 
working *today*.

Regards,
Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How much expensive are row level statistics?

2005-12-15 Thread Kevin Brown
Tom Lane wrote:
 Michael Fuhr [EMAIL PROTECTED] writes:
  Does the backend support, or could it be easily modified to support,
  a mechanism that would post the command string after a configurable
  amount of time had expired, and then continue processing the query?
 
 Not really, unless you want to add the overhead of setting a timer
 interrupt for every query.  Which is sort of counterproductive when
 the motivation is to reduce overhead ...
 
 (It might be more or less free if you have statement_timeout set, since
 there would be a setitimer call anyway.  But I don't think that's the
 norm.)

Actually, it's probably not necessary to set the timer at the
beginning of every query.  It's probably sufficient to just have it go
off periodically, e.g. once every second, and thus set it when the
timer goes off.  And the running command wouldn't need to be re-posted
if it's the same as last time around.  Turn off the timer if the
connection is idle now and was idle last time around (or not, if
there's no harm in having the timer running all the time), turn it on
again at the start of the next transaction.

In essence, the backend would be polling itself every second or so
and recording its state at that time, rather than on every
transaction.

Assuming that doing all that wouldn't screw something else up...



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Kevin Brown
Craig A. James wrote:
 Kevin Brown wrote:
 Hints are dangerous, and I consider them a last resort.
 
 If you consider them a last resort, then why do you consider them to
 be a better alternative than a workaround such as turning off
 enable_seqscan, when all the other tradeoffs are considered?
 
 If I understand enable_seqscan, it's an all-or-nothing affair.  Turning it 
 off turns it off for the whole database, right?  The same is true of all 
 of the planner-tuning parameters in the postgres conf file.

Nope.  What's in the conf file are the defaults.  You can change them
on a per-connection basis, via the SET command.  Thus, before doing
your problematic query:

SET enable_seqscan = off;

and then, after your query is done, 

SET enable_seqscan = on;

 If your argument is that planner hints would give you finer grained
 control, then the question is whether you'd rather the developers
 spend their time implementing planner hints or improving the planner.
 
 I agree 100% -- I'd much prefer a better planner.  But when it comes down 
 to a do-or-die situation, you need a hack, some sort of workaround, to get 
 you working *today*.

And that's why I was asking about workarounds versus planner hints.  I
expect that the situations in which the planner gets things wrong
*and* where there's no workaround are very rare indeed.


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Bruno Wolff III
On Thu, Dec 15, 2005 at 21:41:06 -0800,
  Craig A. James [EMAIL PROTECTED] wrote:
 
 If I understand enable_seqscan, it's an all-or-nothing affair.  Turning it 
 off turns it off for the whole database, right?  The same is true of all of 

You can turn it off just for specific queries. However, it will apply to
all joins within a query.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org