People: (HACKERS: Please read this entire thread at http://archives.postgresql.org/pgsql-performance/2005-04/msg00179.php Sorry for crossing this over.)

> > The larger point is that writing an estimator for an SRF is frequently a > > task about as difficult as writing the SRF itself > > True, although I think this doesn't necessarily kill the idea. If > writing an estimator for a given SRF is too difficult, the user is no > worse off than they are today. Hopefully there would be a fairly large > class of SRFs for which writing an estimator would be relatively simple, > and result in improved planner behavior. For that matter, even supplying an estimate constant would be a vast improvement over current functionality. I would suggest, in fact, that we allow the use of either a constant number, or an estimator function, in that column. Among other things, this would allow implementing the constant number right now and the use of an estimating function later, in case we can do the one but not the other for 8.1. To be more sophisticated about the estimator function, it could take a subset of the main functions arguments, based on $1 numbering, for example: CREATE FUNCTION some_func ( INT, TEXT, TEXT, INT, INT ) ... ALTER FUNCTION some_func WITH ESTIMATOR some_func_est( $4, $5 ) This would make writing estimators which would work for several functions easier. Estimators would be a special type of functions which would take any params and RETURN ESTIMATOR, which would be implicitly castable from some general numeric type (like INT or FLOAT). > > I don't foresee a whole lot of use of an estimator hook designed as > > proposed here. In particular, if the API is such that we can only > > use the estimator when all the function arguments are plan-time > > constants, it's not going to be very helpful. Actually, 95% of the time I use SRFs they are accepting constants and not row references. And I use a lot of SRFs. > > Yes :( One approach might be to break the function's domain into pieces > and have the estimator function calculate the estimated result set size > for each piece. So, given a trivial function like: > > foo(int): > if $1 < 10 then produce 100 rows > else produce 10000 rows > > If the planner has encoded the distribution of input tuples to the > function as a histogram, it could invoke the SRF's estimator function > for the boundary values of each histogram bucket, and use that to get an > idea of the function's likely result set size at runtime. > > And yes, the idea as sketched is totally unworkable :) For one thing, > the difficulty of doing this grows rapidly as the number of arguments to > the function increases. But perhaps there is some variant of this idea > that might work... > > Another thought is that the estimator could provide information on the > cost of evaluating the function, the number of tuples produced by the > function, and even the distribution of those tuples. Another possibility would be to support default values for all estimator functions and have functions called in row context passed DEFAULT, thus leaving it up to the estimator writer to supply median values for context cases. Or to simply take the "first" values and use those. While any of these possibilites aren't ideal, they are an improvement over the current "flat 1000" estimate. As I said, even the ability to set a per-function flat constant estimate would be an improvement. > BTW, why is this on -performance? It should be on -hackers. 'cause I spend more time reading -performance, and I started the thread. Crossed over now. -- Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]