Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

PFC Wed, 10 May 2006 07:38:00 -0700

    The problem is that you need a set-returning function to retrieve
the  values. SRFs don't have rowcount estimates, so the plans suck.


What about adding some way of rowcount estimation to SRFs, in the way of:

CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS
$$ ... function code ... $$ LANGUAGE plpgsql
ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ;

Internally, this could create two functions, foo (para, meters) and
estimate_foo(para, meters) that are the same language and coupled
together (just like a SERIAL column and its sequence). The estimator
functions have an implicit return parameter of int8. Parameters may be
NULL when they are not known at query planning time.

What do you think about this idea?


        It would be very useful.
        A few thoughts...

You need to do some processing to know how many rows the function wouldreturn.

        Often, this processing will be repeated in the function itself.

Sometimes it's very simple (ie. the function will RETURN NEXT eachelement in an array, you know the array length...)Sometimes, for functions returning few rows, it might be faster tocompute the entire result set in the cost estimator.

        
        So, it might be a bit hairy to find a good compromise.

        Ideas on how to do this (clueless hand-waving mode) :

1- Add new attributes to set-returning functions ; basically a list offunctions, each returning an estimation parameter (rowcount, cpu tuplecost, etc).

        This is just like you said.

2- Add an "estimator", to a function, which would just be anotherfunction, returning one row, a record, containing the estimations inseveral columns (rowcount, cpu tuple cost, etc).Pros : only one function call to estimate, easier and faster, theestimator just leaves the unknown columns to NULL.The estimator needs not be in the same language as the function itself.It's just another function.

3- The estimator could be a set-returning function itself which wouldreturn rows mimicking pg_statisticsPros : planner-friendly, the planner would SELECT from the SRF instead oflooking in pg_statistics, and the estimator could tell the planner that,for instance, the function will return unique values.

        Cons : complex, maybe slow

        4- Add simple flags to a function, like :
        - returns unique values
        - returns sorted values (no need to sort my results)

- please execute me and store my results in a temporary storage, countthe rows returned, and plan the outer query accordingly

        - etc.
        

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

Reply via email to