Re: [HACKERS] Function execution costs 'n all that

2007-01-28 Thread Mark Dilger

Tom Lane wrote:

Would a simple constant value be workable, or do we need some more
complex model (and if so what)?


Consider:

ANALYZE myfunc(integer) ON (SELECT myfunc(7)) WITH RATIO 0.03;

ANALYZE myfunc(text,text) ON (SELECT myfunc(mt.a,mt.b) FROM mytable mt) WITH 
RATIO 1.071;


ANALYZE myfunc(text,text) ON (
  SELECT myfunc(mt.a,mt.b) FROM mytable mt
  UNION
  SELECT myfunc(ot.a,ot.b) FROM othertable ot
) WITH RATIO 0.5;

These commands could turn on function timing for the lifespan of the query, with 
statistics gathered about the given function's runtimes.  The WITH RATIO 
clause would be there to translate runtimes (in milliseconds) into units of 
cpu_operator_cost.  The WITH RATIO clause could be optional, with a default 
ratio taken from the postgresql.conf file, if any exists, and finally defaulting 
to a hardcoded reasonable value.  Users would be well advised to adopt a 
consistent policy regarding system load at the time that various analyze 
functions are run.


If the function has side effects, it would be the user's responsibility to not 
analyze the function unless those side effects are acceptable.  The user can 
only analyze those queries that the user has permissions to run, so there 
shouldn't be any additional ability to generate side-effects beyond what the 
user already has permission to do.


The syntax might need some adjusting to make the parser happy and to avoid new 
reserved words.  The syntax used above is just an example.


It seems to me that the above system would work perfectly well for collecting 
the number of rows returned from a set returning function, not just the run times.


mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Function execution costs 'n all that

2007-01-28 Thread Tom Lane
Mark Dilger [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Would a simple constant value be workable, or do we need some more
 complex model (and if so what)?

 Consider:
 ANALYZE myfunc(integer) ON (SELECT myfunc(7)) WITH RATIO 0.03;
 ...
 It seems to me that the above system would work perfectly well for
 collecting the number of rows returned from a set returning function,
 not just the run times.

I don't really think that data collection is the bottleneck.  If a
constant estimate isn't good enough for you, then you need some kind of
model of how the runtime or number of rows varies with the function's
inputs ... and I hardly see how something like the above is likely to
figure out how to fit a good model.  Or at least, if you think it can,
then you skipped all the interesting bits.

One other point is that we already know that sampling overhead and
measurement error are significant problems when trying to measure
intervals on the order of one Plan-node execution.  I'm afraid that
would get a great deal worse if we try to use a similar approach to
timing individual function calls.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Function execution costs 'n all that

2007-01-28 Thread Mark Dilger

Tom Lane wrote:

Mark Dilger [EMAIL PROTECTED] writes:

Tom Lane wrote:

Would a simple constant value be workable, or do we need some more
complex model (and if so what)?



Consider:
ANALYZE myfunc(integer) ON (SELECT myfunc(7)) WITH RATIO 0.03;
...
It seems to me that the above system would work perfectly well for
collecting the number of rows returned from a set returning function,
not just the run times.


I don't really think that data collection is the bottleneck.


Ahh, I'm not just thinking about data collection.  I'm thinking about usability 
for non-hackers who know enough plpgsql to write a function and then want to 
train the system to plan for it appropriately.  It's a much easier task for a 
novice user to say go away and figure out how expensive this thing is than for 
a novice to think about things like statistical variance, etc.  We don't demand 
that users have that kind of knowledge to write queries or run analyze on 
tables, so why would they need that kind of knowledge to write a function?



If a
constant estimate isn't good enough for you, then you need some kind of
model of how the runtime or number of rows varies with the function's
inputs ... and I hardly see how something like the above is likely to
figure out how to fit a good model.  Or at least, if you think it can,
then you skipped all the interesting bits.


I am (perhaps naively) imagining that the user will train the database over the 
same query as the one that will actually get used most often in production.  In 
the case that the query modifies the table, the user could train the database 
over a copy of that table.  The data collected by the analyze phase would just 
be constant stuff like average and stddev.  That would make the job of the 
planner / cost estimator easier, right?  It could treat the function as a 
constant cost function.



One other point is that we already know that sampling overhead and
measurement error are significant problems when trying to measure
intervals on the order of one Plan-node execution.  I'm afraid that
would get a great deal worse if we try to use a similar approach to
timing individual function calls.


The query could be run with the arguments passed to myfunc being recorded to a 
temporary table.  After the query is complete (and the temporary table 
populated), data from the temp table could be pulled into memory in batches, 
with the myfunc run on them again in a tight loop.  The loop itself could be 
timed, rather than each iteration.  The sum of all the timings for the various 
loops would then be the final runtime which would be divided by the total number 
of rows to get the average runtime per call.  The downside is that I don't see 
how you retrieve the standard deviation.  (I also don't know if the planner 
knows how to use standard deviation information, so perhaps this is a non issue.)


A further refinement would be to batch the inputs based on properties of the 
input data.  For text, you could run a batch of short text first, a batch of 
medium second, and a batch of long text last, and use best-fit linear algebra to 
determine the runtime cost vs. input text length function.  I'm not sure how 
such a refinement would be done for fixed size datatypes.  And for some text 
functions the runtime won't vary with length but with some other property anyway.


mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Function execution costs 'n all that

2007-01-21 Thread Tom Lane
I complained about how:
 The query is

 SELECT p1.opcname, p1.opcfamily
 FROM pg_opclass AS p1
 WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS p2
  WHERE p2.amopfamily = p1.opcfamily
AND binary_coercible(p1.opcintype, p2.amoplefttype));

 and investigation showed that the plan changed from (8.2 and before)

  Seq Scan on pg_opclass p1  (cost=0.00..393.94 rows=51 width=68)
Filter: (NOT (subplan))
SubPlan
  -  Seq Scan on pg_amop p2  (cost=0.00..7.66 rows=2 width=0)
Filter: ((amopfamily = $0) AND binary_coercible($1, amoplefttype))

 to

  Seq Scan on pg_opclass p1  (cost=0.00..393.94 rows=51 width=68)
Filter: (NOT (subplan))
SubPlan
  -  Seq Scan on pg_amop p2  (cost=0.00..7.66 rows=2 width=0)
Filter: (binary_coercible($1, amoplefttype) AND (amopfamily = $0))

Now that some function-cost smarts are in there, I expected to see the
plan go back to the first case, but what I actually see in CVS HEAD is

 Seq Scan on pg_opclass p1  (cost=0.00..660.35 rows=51 width=68)
   Filter: (NOT (subplan))
   SubPlan
 -  Bitmap Heap Scan on pg_amop p2  (cost=4.29..8.60 rows=2 width=0)
   Recheck Cond: (amopfamily = $0)
   Filter: binary_coercible($1, amoplefttype)
   -  Bitmap Index Scan on pg_amop_fam_strat_index  (cost=0.00..4.29 
rows=5 width=0)
 Index Cond: (amopfamily = $0)

The reason this happens is that cost_qual_eval charges the entire cost of
evaluating all the arms of an AND, even though we'll drop out as soon as
something returns FALSE; and so the planner is led to avoid the seqscan
because it now appears to have a high filter-condition evaluation cost,
in favor of a plan that will evaluate the filter condition many fewer
times.  In reality those two plans will call binary_coercible() exactly
the same number of times, and so this is a bogus reason to switch.

I'm kind of inclined to leave it alone though, because the second plan
seems a bit more failsafe.  To do anything differently, we'd have to
order the qual conditions the way we expect to execute them before
any use of cost_qual_eval, which sounds expensive; and as noted in
an upthread discussion with Greg, relying on the correctness of *both*
cost and selectivity estimates seems a tad fragile.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Function execution costs 'n all that

2007-01-20 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 On Mon, 2007-01-15 at 15:05 -0500, Tom Lane wrote:
 maybe we should just do the constant for starters and see how many
 people really want to write C-code estimators ...

 +1

It seemed like that was the list's consensus, so I'll go off and do the
simple-constant case.  We can always extend it later.

For reference, the plan is to add these options to CREATE/ALTER
FUNCTION:

COST float-constant (defaults to 1)
ROWS float-constant (defaults to 1000)

feeding into two float4 columns added to pg_proc.  ROWS is only
allowed/meaningful for set-returning functions.  COST is implicitly
scaled by cpu_operator_cost.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Function execution costs 'n all that

2007-01-17 Thread Ron Mayer
Tom Lane wrote:
 
 BTW, I'm thinking that a cost constant probably ought to be measured
 in units of cpu_operator_cost.  The default for built-in functions would
 thus be 1, at least till such time as someone wants to refine the
 estimates.  We'd probably want the default for PL and SQL functions to
 be 10 or 100 or so.

Any chance that costs could eventually change to real-world units?

It's hard for me to guess how many cpu_operator_cost units
something might take; but relatively easy for me to measure
or estimate in fractions-of-a-seconds how long something takes.

I could imagine having the other planner costs be measured in seconds
too - perhaps with the goal of eventually writing some auto-config
code that tries to measure values like cpu_tuple_cost on a given
piece of hardware.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Function execution costs 'n all that

2007-01-17 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 BTW, I'm thinking that a cost constant probably ought to be measured
 in units of cpu_operator_cost.

 Any chance that costs could eventually change to real-world units?

Define real world units.

If you like you can try to adjust things so that cpu_operator_cost bears
some relation to elapsed-msec-on-your-own-machine, and scale everything
else accordingly; indeed that's why we added seq_page_cost in 8.2.
But it's awfully hard to see the numbers being very portable.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Function execution costs 'n all that

2007-01-17 Thread Jeff Davis
On Mon, 2007-01-15 at 13:54 -0500, Neil Conway wrote:
 On Mon, 2007-01-15 at 10:51 -0800, Richard Troy wrote:
  I therefore propose that the engine evaluate -
  benchmark, if you will - all functions as they are ingested, or
  vacuum-like at some later date (when valid data for testing may exist),
  and assign a cost relative to what it already knows - the built-ins, for
  example.
 
 That seems pretty unworkable. It is unsafe, for one: evaluating a
 function may have side effects (inside or outside the database), so the

Would any form of cost estimate have meaning if the function has side
effects? If it's a volatile function, doesn't that mean that the planner
can't avoid or favor executing it?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Function execution costs 'n all that

2007-01-17 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 Would any form of cost estimate have meaning if the function has side
 effects? If it's a volatile function, doesn't that mean that the planner
 can't avoid or favor executing it?

No, not really.  If the function is down inside a sub-select or
something like that, the number of executions could depend on any number
of factors (like whether we put it on the inside or outside of a join)
--- and if it's expensive then we will and should try to arrange the
query to minimize the number of executions.  We're certainly not going
to drop back to all-plain-nestloops just because the query contains one
volatile function.

(Now mind you, a smart user will probably avoid using a volatile
function like that, but I don't think it's an adequate argument for
saying that we don't need cost information.)

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Function execution costs 'n all that

2007-01-16 Thread Mark Cave-Ayland
On Mon, 2007-01-15 at 15:05 -0500, Tom Lane wrote:
 Brian Hurt [EMAIL PROTECTED] writes:
  Non-developer here, but we use a lot of plpgsql functions at work.  And 
  the functions we use fall into two broad, ill-defined catagories- 
  expensive functions and cheap functions.  What I'd like as a user is 
  some way to tell the planner this function is expensive- prefer plans 
  which call this function less even if they're otherwise more expensive 
  or this function is cheap, prefer plans that are otherwise less 
  expensive even if they call this function more often.  Precise cost 
  estimates aren't that important, IMHO.
 
 Right, so a plain constant cost would be plenty for your situation.
 
 I suspect there's an 80/20 rule at work here --- the estimator-function
 side of this will take most of the effort to design/implement, but not
 get used nearly as much as the plain-constant form ... maybe we should
 just do the constant for starters and see how many people really want to
 write C-code estimators ...
 
   regards, tom lane


Hi Tom et al,

Having worked with stored procedures on large datasets for reporting, I
would say that it would be useful to have a non-constant estimator for
the number of rows, whereas a single CPU cost constant should be fine.
Where I have struggled with this has been joining onto slightly more
exotic queries when doing large scale data processing as part of a
custom report or an application upgrade.

Using PL/PGSQL I would find it useful to have access to the constants
passed into a function to be used to help provide a row count estimate
(typically useful for passing in table/column names), e.g.

SELECT * FROM my_func('my_table1') AS t1, my_table2 AS t2 WHERE t1.id =
t2.id;


CREATE FUNCTION my_func(text) AS $$
...
$$ LANGUAGE 'plpgsql' COST 1.0 ROWS my_func_row_cost;


In my cost function, I could then estimate the number of rows using
something like below, where all constants are passed into the cost
function as parameters, e.g.:


CREATE FUNCTION my_func_row_cost(text) AS $$
DECLARE
foo bigint;
BEGIN
EXECUTE INTO foo 'SELECT COUNT(*) FROM ' || quote_literal($1);
RETURN foo;
END;
$$ LANGUAGE 'plpgsql';


In the case where a parameter was not a constant but a column name, then
it would be reasonable in my mind to simply replace that parameter with
NULL when passing to the row cost function, e.g.


SELECT * FROM my_table1 WHERE my_table1.junk = (SELECT
my_func(my_table1.name));


In this case, the text parameter passed to my_func_row_cost would be
replaced by NULL to indicate that it was non-constant.

Of course, even with constants passed upon input, it still may not be
possible to calculate a number of rows that can be returned - it could
be the case that the only parameter passed to cost function has been
converted to NULL because it is a column name. Perhaps in this case it
would be useful to specify returning NULL from my_func_row_cost means I
can't return anything meaningful, so use the fallback values.


Kind regards,

Mark.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Function execution costs 'n all that

2007-01-16 Thread Gregory Stark

 Tom Lane wrote:
  Instead, I'm thinking it might be time to re-introduce some notion of
  function execution cost into the system, and make use of that info to
  sort WHERE clauses into a reasonable execution order.

I imagine you've thought of this already but just in case, the cost of the
function call has to be combined with the selectivity to get this right. If
you can do an expensive but very selective clause first and save 100 cheap
calls that almost always return true it may still be worthwhile.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Function execution costs 'n all that

2007-01-16 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 I imagine you've thought of this already but just in case, the cost of the
 function call has to be combined with the selectivity to get this right. If
 you can do an expensive but very selective clause first and save 100 cheap
 calls that almost always return true it may still be worthwhile.

I've thought of it, but I haven't figured out a reasonable algorithm for
ordering the clauses in view of that.  Have you?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Function execution costs 'n all that

2007-01-16 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 I imagine you've thought of this already but just in case, the cost of the
 function call has to be combined with the selectivity to get this right. If
 you can do an expensive but very selective clause first and save 100 cheap
 calls that almost always return true it may still be worthwhile.

 I've thought of it, but I haven't figured out a reasonable algorithm for
 ordering the clauses in view of that.  Have you?

Hum, I hadn't tried. Now that I think about it it's certainly not obvious.

And picturing the possible failure modes I would rather it execute cheap
expressions more often than necessary than call some user-defined perl
function that could be doing i/o or involve waiting on other resources any
more than absolutely necessary. So I guess what you originally described is
safest.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Heikki Linnakangas

Tom Lane wrote:

Instead, I'm thinking it might be time to re-introduce some notion of
function execution cost into the system, and make use of that info to
sort WHERE clauses into a reasonable execution order.  


That sounds like a straightforward idea.


This example
would be fixed with even a very stupid rule-of-thumb about SQL functions
being more expensive than C functions, but if we're going to go to the
trouble it seems like it'd be a good idea to provide a way to label
user-defined functions with execution costs.


Agreed.


Would a simple constant value be workable, or do we need some more
complex model (and if so what)?


A simple constant would probably be enough. If we want anything fancier 
than that, it should be up to the author of the function to define the 
cost model as well. I'm envisioning that you could attach a custom cost 
function to a user-defined function which would return the estimated CPU 
cost. And # of rows returned for a set-returning function.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Would a simple constant value be workable, or do we need some more
 complex model (and if so what)?

 A simple constant would probably be enough. If we want anything fancier 
 than that, it should be up to the author of the function to define the 
 cost model as well. I'm envisioning that you could attach a custom cost 
 function to a user-defined function which would return the estimated CPU 
 cost. And # of rows returned for a set-returning function.

But what will such an estimation function work on?  In general the
planner does not have the value(s) that will be passed to the actual
function at runtime.  It might have expressions or estimates but
those data structures are certainly not something we could pass to
non-C-coded functions.  Are we willing to restrict these functions
to being coded in C, as selectivity estimation functions are?

If we go this route it seems like we'll need four more columns in
pg_proc (cost estimation function OID, rowcount estimation function OID,
fallback cost constant, fallback rowcount constant).

BTW, I'm thinking that a cost constant probably ought to be measured
in units of cpu_operator_cost.  The default for built-in functions would
thus be 1, at least till such time as someone wants to refine the
estimates.  We'd probably want the default for PL and SQL functions to
be 10 or 100 or so.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
A simple constant would probably be enough. If we want anything fancier 
than that, it should be up to the author of the function to define the 
cost model as well. I'm envisioning that you could attach a custom cost 
function to a user-defined function which would return the estimated CPU 
cost. And # of rows returned for a set-returning function.


But what will such an estimation function work on?  In general the
planner does not have the value(s) that will be passed to the actual
function at runtime.  It might have expressions or estimates but
those data structures are certainly not something we could pass to
non-C-coded functions.   Are we willing to restrict these functions
to being coded in C, as selectivity estimation functions are?


Yeah, I don't know. If the planner knows the actual value, that would 
certainly be the easiest for the UDF writer to work with. Anything more 
than that gets really complicated.



If we go this route it seems like we'll need four more columns in
pg_proc (cost estimation function OID, rowcount estimation function OID,
fallback cost constant, fallback rowcount constant).


What would the fallbacks be for?


BTW, I'm thinking that a cost constant probably ought to be measured
in units of cpu_operator_cost.  The default for built-in functions would
thus be 1, at least till such time as someone wants to refine the
estimates.  We'd probably want the default for PL and SQL functions to
be 10 or 100 or so.


Agreed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 If we go this route it seems like we'll need four more columns in
 pg_proc (cost estimation function OID, rowcount estimation function OID,
 fallback cost constant, fallback rowcount constant).

 What would the fallbacks be for?

By fallback I meant this is what to use if no estimation function is
provided.

I'm envisioning that the CREATE FUNCTION syntax would add optional
clauses

COST function-name-or-numeric-constant
ROWS function-name-or-numeric-constant

that would be used to fill these columns.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Richard Troy
On Mon, 15 Jan 2007, Tom Lane wrote:

 So I've been working on the scheme I suggested a few days ago of
 representing equivalence classes of variables explicitly, and avoiding
 the current ad-hocery of generating and then removing redundant clauses
 in favor of generating only the ones we want in the first place.  Any
 clause that looks like an equijoin gets sent to the EquivalenceClass
 machinery by distribute_qual_to_rels, and not put into the
 restrictlist/joinlist data structure at all.  Then we make passes over
 the EquivalenceClass lists at appropriate times to generate the clauses
 we want.  This is turning over well enough now to pass the regression
 tests,

That was quick...

 In short, this approach results in a whole lot less stability in the
 order in which WHERE clauses are evaluated.  That might be a killer
 objection to the whole thing, but on the other hand we've never made
 any strong promises about WHERE evaluation order.

Showing my ignorance here, but I've never been a fan of syntax based
optimization, though it is better than no optimization. If people are
counting on order for optimization, then, hmmm... If you can provide a way
to at least _try_ to do better, then don't worry about it. It will improve
with time.

 Instead, I'm thinking it might be time to re-introduce some notion of
 function execution cost into the system, and make use of that info to
 sort WHERE clauses into a reasonable execution order.

Ingres did/does it that way, IIRC. It's a solid strategy.

  This example
 would be fixed with even a very stupid rule-of-thumb about SQL functions
 being more expensive than C functions, but if we're going to go to the
 trouble it seems like it'd be a good idea to provide a way to label
 user-defined functions with execution costs.

 Would a simple constant value be workable, or do we need some more
 complex model (and if so what)?

Ingres would, if I'm not mistaken, gain through historical use through
histograms. Short of that, you've got classes of functions, agregations,
for example, and there's sure to be missing information to make a great
decision at planning time. However, I take it that the cost here is
primarily CPU and not I/O. I therefore propose that the engine evaluate -
benchmark, if you will - all functions as they are ingested, or
vacuum-like at some later date (when valid data for testing may exist),
and assign a cost relative to what it already knows - the built-ins, for
example. Doing so could allow this strategy to be functional in short
order and be improved with time so all the work doesn't have to be
implemented on day 1. And, DBA/sys-admin tweaking can always be done by
updating the catalogues.

HTH,
Richard

-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[EMAIL PROTECTED], http://ScienceTools.com/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Neil Conway
On Mon, 2007-01-15 at 10:51 -0800, Richard Troy wrote:
 I therefore propose that the engine evaluate -
 benchmark, if you will - all functions as they are ingested, or
 vacuum-like at some later date (when valid data for testing may exist),
 and assign a cost relative to what it already knows - the built-ins, for
 example.

That seems pretty unworkable. It is unsafe, for one: evaluating a
function may have side effects (inside or outside the database), so the
DBMS cannot just invoke user-defined functions at whim. Also, the
relationship between a function's arguments and its performance will
often be highly complex -- it would be very difficult, not too mention
computationally infeasible, to reconstruct that relationship
automatically, especially without any real knowledge about the
function's behavior.

-Neil



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Richard Troy

On Mon, 15 Jan 2007, Neil Conway wrote:
 On Mon, 2007-01-15 at 10:51 -0800, Richard Troy wrote:
  I therefore propose that the engine evaluate -
  benchmark, if you will - all functions as they are ingested, or
  vacuum-like at some later date (when valid data for testing may exist),
  and assign a cost relative to what it already knows - the built-ins, for
  example.

 That seems pretty unworkable. It is unsafe, for one: evaluating a
 function may have side effects (inside or outside the database), so the
 DBMS cannot just invoke user-defined functions at whim. Also, the
 relationship between a function's arguments and its performance will
 often be highly complex -- it would be very difficult, not too mention
 computationally infeasible, to reconstruct that relationship
 automatically, especially without any real knowledge about the
 function's behavior.

 -Neil

Hi Neil,

Tom had already proposed:

 I'm envisioning that the CREATE FUNCTION syntax would add optional
 clauses

COST function-name-or-numeric-constant
ROWS function-name-or-numeric-constant

 that would be used to fill these columns.

I was considering these ideas in the mix; let the user provide either a
numeric or a function, the distinction here being that instead of running
that function at planning-time, it could be run off-line, so to speak.

Richard

-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[EMAIL PROTECTED], http://ScienceTools.com/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Brian Hurt

Neil Conway wrote:


On Mon, 2007-01-15 at 10:51 -0800, Richard Troy wrote:
 


I therefore propose that the engine evaluate -
benchmark, if you will - all functions as they are ingested, or
vacuum-like at some later date (when valid data for testing may exist),
and assign a cost relative to what it already knows - the built-ins, for
example.
   



That seems pretty unworkable. It is unsafe, for one: evaluating a
function may have side effects (inside or outside the database), so the
DBMS cannot just invoke user-defined functions at whim. Also, the
relationship between a function's arguments and its performance will
often be highly complex -- it would be very difficult, not too mention
computationally infeasible, to reconstruct that relationship
automatically, especially without any real knowledge about the
function's behavior.
 

Non-developer here, but we use a lot of plpgsql functions at work.  And 
the functions we use fall into two broad, ill-defined catagories- 
expensive functions and cheap functions.  What I'd like as a user is 
some way to tell the planner this function is expensive- prefer plans 
which call this function less even if they're otherwise more expensive 
or this function is cheap, prefer plans that are otherwise less 
expensive even if they call this function more often.  Precise cost 
estimates aren't that important, IMHO.


Brian



Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Tom Lane
Brian Hurt [EMAIL PROTECTED] writes:
 Non-developer here, but we use a lot of plpgsql functions at work.  And 
 the functions we use fall into two broad, ill-defined catagories- 
 expensive functions and cheap functions.  What I'd like as a user is 
 some way to tell the planner this function is expensive- prefer plans 
 which call this function less even if they're otherwise more expensive 
 or this function is cheap, prefer plans that are otherwise less 
 expensive even if they call this function more often.  Precise cost 
 estimates aren't that important, IMHO.

Right, so a plain constant cost would be plenty for your situation.

I suspect there's an 80/20 rule at work here --- the estimator-function
side of this will take most of the effort to design/implement, but not
get used nearly as much as the plain-constant form ... maybe we should
just do the constant for starters and see how many people really want to
write C-code estimators ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Neil Conway
On Mon, 2007-01-15 at 15:05 -0500, Tom Lane wrote:
 maybe we should just do the constant for starters and see how many
 people really want to write C-code estimators ...

+1

BTW, your proposal would still pushdown all qualifiers, right?
Hellerstein's xfunc work discusses situations in which it makes sense to
pullup expensive qualifiers above joins, for example, in order to reduce
the number of tuples the qualifier is applied to. Unfortunately, this
would probably increase the optimizer's search space by a fairly
significant degree, so it might need to be controlled by a GUC variable,
or only applied when the estimated cost of applying a qualifier is
particularly large relative to the total estimated cost of the plan.

-Neil



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 BTW, your proposal would still pushdown all qualifiers, right?

Yeah, I have no intention of readopting xfunc in the near future ...
especially seeing that it's possible for the user to force that
sort of thing if he really has to.

SELECT * FROM (SELECT ... OFFSET 0) ss
WHERE expensive_function(...);

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend