[HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Julius Stroffek

Hi All,

Tomas Kovarik and I have presented at PGCon 2007 in Ottawa
the ideas about other possible optimizer algorithms to be used
in PostgreSQL.

We are quite new to PostgreSQL project so it took us some
time to go through the sources end explore the possibilities
how things could be implemented.

There is a proposal attached to this mail about the interface
we would like to implement for switching between different
optimizers. Please review it and provide a feedback to us.
Thank You.

Regards

Julius Stroffek

Proposal for Pluggable Optimizer Interface
==

Overview


We have presented at PGCon 2007 in Ottawa couple of other
approaches and algorithms that can be used for query optimization,
see 
http://www.pgcon.org/2008/papers/Execution_Plan_Optimization_Techniques_Julius_Stroffek.pdf
We have focused on algorithms for searching the space of
all possible orders of joins including bushy trees.
The possible algorithms include

 * Dynamic Programming (already implemented
   in PostgreSQL)
 * Genetic Algorithm (already implemented
   in PostgreSQL)
 * Dijkstra's Algorithm
 * A* Search Algorithm
 * Greedy “Nearest Neighbor” Algorithm
 * Hill-Climbing
 * Simulated Annealing
 * Iterative Improvement
 * Two Phase Optimization
 * Toured Simulated Annealing

Choosing the best algorithm from the above list is
difficult. We have to consider the length of the optimizer
computation vs. the quality of the solution which we would
like to achieve. We may want to do some hybrid optimization
– run couple of the algorithms from the above and choose
the best solution found. We might run some very fast
algorithm at the beginning and depending on the solution
cost we may decide whether it is worthwhile to try
to optimize the plan even more (using other algorithm
with longer running time but producing better solutions).
Therefore we would like to propose an interface which
can be used to switch between different optimizer algorithms
and/or allow a user to write and use his own implementation.
It would allow the community to ship more optimizer
algorithms as contrib modules and users may then decide
which of those algorithms should be used for their queries.

Creating an optimizer
=

We would propose to create a catalog holding the available
optimizers in the system called pg_optimizer. Users could
than use a SET command to switch between different
optimizers.

postgres=# select * from pg_optimizer;
 optname |  optproc
-+---
 geqo| geqo_optimizer
 dynamic | dynamic_optimizer
 greedy  | greedy_optimizer
(4 rows)

postgres=# set optimizer=greedy;
SET


Optimizer Invocation Point
==
There is a code in function make_rel_from_joinlist which
decides whether to invoke dynamic programming or genetic
algorithm for query optimization. We would propose to place
the invocation of the plugged optimizer to the same place
and with the same parameters as function geqo and
make_one_rel_by_joins are currently invoked.

Creating and dropping an optimizer
==
The optimizer function have to be implemented as a C-Language
Function using “Version 1 Calling Conventions”. The return
type of the function is RelOptInfo * and the arguments
passed to the function are

 1.PlannerInfo *root
 2.int levels_needed
 3.List * initial_rels

The proper “CREATE FUNCTION” statement have to be used
to create the optimizer function.

 CREATE FUNCTION greedyoptimizer(internal, int, internal)
 RETURNS internal
 AS 'mylib', 'greedy_optimizer'
 LANGUAGE C
 ;

Once, the optimizer function is created user may create
an optimizer using the function with the statement

 CREATE OPTIMIZER greedy (
 function = greedyoptimizer
 comment = 'Greedy Nearest Neighbor Optimizer'
 );

If the user decides not to use the optimizer anymore
he can invoke

 DROP OPTIMIZER greedy;

User have to also drop the optimizer function with

 DROP FUNCTION greedyoptimizer;

Project TODO List
=
1.Create a pg_optimizer catalog to hold available
  optimizers.
2.Create wrappers above the current dynamic
  programming and genetic algorithm optimizers
  to be used to call those implementations.
3.Modify the parser and add the functions to handle
  and execute the CREATE/DROP OPTIMIZER statements.
4.Modify GUC that it would be possible to switch
  between optimizers.
5.Change the code at the optimizer invocation
  point that the appropriate optimizer function
  would be called.
6.Handle object dependencies – make an entry
  in pg_depend that optimizer depends on its
  function.
7.Implement '\dO' command that will list the
  available optimizers.
8.Create a contrib module and ship some other
  optimizer algorithms.
9.Any other suggestion, comments and changes
  that will come out from the review of this
  proposal.

Things to Decide

1.Rights. Who can create/drop optimizers?
  Who can use 

Re: [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Stefan Kaltenbrunner
Julius Stroffek wrote:
 Hi All,
 
 Tomas Kovarik and I have presented at PGCon 2007 in Ottawa
 the ideas about other possible optimizer algorithms to be used
 in PostgreSQL.
 
 We are quite new to PostgreSQL project so it took us some
 time to go through the sources end explore the possibilities
 how things could be implemented.
 
 There is a proposal attached to this mail about the interface
 we would like to implement for switching between different
 optimizers. Please review it and provide a feedback to us.
 Thank You.

hmm - how does is that proposal different from what got implemented with:

http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php


Stefan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Julius Stroffek wrote:
 There is a proposal attached to this mail about the interface
 we would like to implement for switching between different
 optimizers. Please review it and provide a feedback to us.

 hmm - how does is that proposal different from what got implemented with:
 http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php

Well, it's a very different level of abstraction.  The planner_hook
would allow you to replace the *entire* planner, but if you only want to
replace GEQO (that is, only substitute some other heuristics for partial
search of a large join-order space), doing it from planner_hook will
probably require duplicating a great deal of code.  A hook right at the
place where we currently choose geqo or regular would be a lot easier
to experiment with.

Replacing GEQO sounds like a fine area for investigation to me; I've
always been dubious about whether it's doing a good job.  But I'd prefer
a simple hook function pointer designed in the same style as
planner_hook (ie, intended to be overridden by a loadable module).
The proposed addition of a system catalog and SQL-level management
commands sounds like a great way to waste a lot of effort on mere
decoration, before ever getting to the point of being able to
demonstrate that there's any value in it.  Also, while we might accept
a small hook-function patch for 8.3, there's zero chance of any of that
other stuff making it into this release cycle.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Julius Stroffek

Stefan,

thanks for pointing this out. I missed this change.

We would like to place the hooks to a different place in the planner and 
we would like to just replace the non-deterministic algorithm searching 
for the best order of joins and keep the rest of the planner untouched.


I am not quite sure about the usage from the user point of view of what 
got implemented. I read just the code of the patch. Are there more 
explanations somewhere else?


I understood that if the user creates his own implementation of the 
planner which can be stored in some external library, he have to provide 
some C language function as a hook activator which will assign the 
desired value to the planner_hook variable. Both, the activator function 
and the new planner implementation have to be located in the same 
dynamic library which will be loaded when CREATE FUNCTION statement 
would be used on hook activator function.


Am I correct? Have I missed something?

If the above is the case than it is exactly what we wanted except we 
would like to have the hook also in the different place.


There are more things in the proposal as a new pg_optimizer catalog and 
different way of configuring the hooks. However, this thinks are not 
mandatory for the functionality but are more user friendly.


Thanks

Julo

Stefan Kaltenbrunner wrote:

Julius Stroffek wrote:
  

Hi All,

Tomas Kovarik and I have presented at PGCon 2007 in Ottawa
the ideas about other possible optimizer algorithms to be used
in PostgreSQL.

We are quite new to PostgreSQL project so it took us some
time to go through the sources end explore the possibilities
how things could be implemented.

There is a proposal attached to this mail about the interface
we would like to implement for switching between different
optimizers. Please review it and provide a feedback to us.
Thank You.



hmm - how does is that proposal different from what got implemented with:

http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php


Stefan
  


Re: [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Tom Lane
Julius Stroffek [EMAIL PROTECTED] writes:
 I understood that if the user creates his own implementation of the 
 planner which can be stored in some external library, he have to provide 
 some C language function as a hook activator which will assign the 
 desired value to the planner_hook variable. Both, the activator function 
 and the new planner implementation have to be located in the same 
 dynamic library which will be loaded when CREATE FUNCTION statement 
 would be used on hook activator function.

You could do it that way if you wanted, but a minimalistic solution is
just to install the hook from the _PG_init function of a loadable
library, and then LOAD is sufficient for a user to execute the thing.
There's a small example at
http://archives.postgresql.org/pgsql-patches/2007-05/msg00421.php

Also, having the loadable module add a custom GUC variable would likely
be a preferable solution for control purposes than making specialized
functions.  I attach another small hack I made recently, which simply
scales all the planner's relation size estimates by a scale_factor GUC;
this is handy for investigating how a plan will change with relation
size, without having to actually create gigabytes of test data.

 There are more things in the proposal as a new pg_optimizer catalog and 
 different way of configuring the hooks. However, this thinks are not 
 mandatory for the functionality but are more user friendly.

Granted, but at this point we are talking about infrastructure for
planner-hackers to play with, not something that's intended to be a
long-term API for end users.  It may or may not happen that we ever
need a user API for this at all.  I think a planner that just does the
right thing is far preferable to one with a lot of knobs that users
have to know how to twiddle, so I see this more as scaffolding on which
someone can build and test the replacement for GEQO; which ultimately
would go in without any user-visible API additions.

regards, tom lane

#include postgres.h

#include fmgr.h
#include commands/explain.h
#include optimizer/plancat.h
#include optimizer/planner.h
#include utils/guc.h


PG_MODULE_MAGIC;

void_PG_init(void);
void_PG_fini(void);

static double scale_factor = 1.0;

static void my_get_relation_info(PlannerInfo *root, Oid relationObjectId,
 bool 
inhparent, RelOptInfo *rel);


/*
 * Get control during planner's get_relation_info() function, which sets up
 * a RelOptInfo struct based on the system catalog contents.  We can modify
 * the struct contents to cause the planner to work with a hypothetical
 * situation rather than what's actually in the catalogs.
 *
 * This simplistic example just scales all relation size estimates by a
 * user-settable factor.
 */
static void
my_get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 RelOptInfo *rel)
{
ListCell   *ilist;

/* Do nothing for an inheritance parent RelOptInfo */
if (inhparent)
return;

rel-pages = (BlockNumber) ceil(rel-pages * scale_factor);
rel-tuples = ceil(rel-tuples * scale_factor);

foreach(ilist, rel-indexlist)
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ilist);

ind-pages = (BlockNumber) ceil(ind-pages * scale_factor);
ind-tuples = ceil(ind-tuples * scale_factor);
}
}


/*
 * _pg_init()   - library load-time initialization
 *
 * DO NOT make this static nor change its name!
 */
void
_PG_init(void)
{
/* Get into the hooks we need to be in all the time */
get_relation_info_hook = my_get_relation_info;
/* Make scale_factor accessible through GUC */
DefineCustomRealVariable(scale_factor,
 ,
 ,
 scale_factor,
 0.0001,
 1e9,
 PGC_USERSET,
 NULL,
 NULL);
}


/*
 * _PG_fini()   - library unload-time finalization
 *
 * DO NOT make this static nor change its name!
 */
void
_PG_fini(void)
{
/* Get out of all the hooks (just to be sure) */
get_relation_info_hook = NULL;
}

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Josh Berkus
Tom,

 Also, while we might accept
 a small hook-function patch for 8.3, there's zero chance of any of that
 other stuff making it into this release cycle.

I don't think anyone was thinking about 8.3.  This is pretty much 8.4 
stuff; Julius is just raising it now becuase they don't want to go down 
the wrong path and waste everyone's time.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Tom,
 Also, while we might accept
 a small hook-function patch for 8.3, there's zero chance of any of that
 other stuff making it into this release cycle.

 I don't think anyone was thinking about 8.3.  This is pretty much 8.4 
 stuff; Julius is just raising it now becuase they don't want to go down 
 the wrong path and waste everyone's time.

Well, if they get the hook in now, then in six months or so when they
have something to play with, people would be able to play with it.
If not, there'll be zero uptake till after 8.4 is released...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster