I will follow it. Thank you. 2009/12/23 Jan Urba��ski <wulc...@wulczer.org>
> Hi, > > I've been playing with using a Simulated Annealing-type algorithm for > determinig join ordering for relations. To get into context see > http://archives.postgresql.org/pgsql-hackers/2009-05/msg00098.php > (there's also a TODO in the wiki). There's a nice paper on that in > > http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf > (also linked from that thread) and someone even wrote a patch: > http://archives.postgresql.org/pgsql-hackers/2009-05/msg00736.php > > This generally aims at being able to replace GEQO. > > I have some rough prototype code, but I'm not even asking people to look > at it. It's stuffed with silly things and writes three Graphviz-style > .dot files in /tmp for each algorithm step. But I'd like to at least > present the idea. > > There are three main problems that have to be solved to get a good > Simulated Annealing implementation: > o) choosing the starting point for the algorithm > o) generating the next point > o) computing the cost in the current point > > The code I have now creates the initial plan by doing something similar > to what gimme_tree does in GEQO, but without any kind of heuristic to > try to favour joins with join restrictions (the idea is that it doesn't > matter, since we only want to get *any* plan and we only do it once), > but ideally it would be somehow chosen randonly from the space of all > possible join orderings. > > I keep a binary tree of relations in memory, where leaves are > RelOptInfos with 1 relid and the root is a RelOptInfo with all relids. > Each iteration of the algorithm picks two nodes at random (with some > restrictions, but that's not important) and tries to swap them around, > meaning that a tree like (use a monospace font for best results): > > (1 2 3 4) > *(1 2) (3 4) > (1) (2) *(3) (4) > > where the parenthesised things are the two chosen nodes would get > transformed into > > (1 2 3 4) > (3) (1 2 4) > (1 2) (4) > (1) (2) > > that is, the (1 2) node and its subtree gets swapped with the (3) node > and the upper-level nodes get changed accordingly. Sometimes a swap like > that will produce an invalid join ordering - then swap is then reverted. > If the join order given by the tree after the swap is legal, the paths > are recomputed, much like in geqo_eval, and if the root node's cheapest > path is not cheaper that before the swap, the swap gets reverted. > Simulated Annealing algorithms permit uphill moves in terms of cost, > with some probability that's decreasing as time passes, but that's not > implemented yet. After a fixed amount of moves, the algorithm stops and > returns the root node of the tree as the single RelOptInfo. > > The issues with the approach are: > > o) the initial state is not really a random plan, it's usualy a > left-deep tree (and is very inefficient) and this might skew results. > > o) is swapping random nodes like that a good way of exploring the > solution space? The SA paper suggests something much simpler, but some > of the moves proposed there don't really make sense when using the > make_join_rel machinery: > *) changing the inner and outer relation of a join doesn't make > sense, because make_join_rel is symmetrical > *) changing the join executing strategy (nested loop, merge join, > etc.) doesn't make sense, because make_join_rel considers all possible > paths for a join > > o) each swap needs a full recalcualtion of the whole solution > (geqo_eval does that, for instance), maybe it's possible to leave the > subtrees of swapped nodes intact and only recalculate the nodes above > the two swapped nodes? > > o) because make_join_rel scribbles on the PlannerInfo, the same hack as > in geqo_eval is necessary: truncating join_rel_list after building all > joinrels and nulling out the join_rel_hash > > o) I use make_join_rel to determine whether a join is legal, which does > lots of other things. That looks like wasted effort, although in the end > each time I need to build the final rel to assess the resulting path's > cost. To follow the SA algorithm spirit more closely it would be > necessary to only consider one, random path for each relation and make > using different paths a move that the algorithm can choose while > exploring the solution space. A cheaper way of determining the current > solution's cost would be nice, too - fully rebuilding the final > RelOptInfo after each move is annoying. > > Lastly, I'm lacking good testcases or even a testing approach: I'm > generating silly queries and looking at how they get optimised, but if > someone has a real dataset and examples of joins that cannot be planned > with the standard planner, I would be interested to compare the results > my prototype gets with those produced by GEQO. > > The code, a module that you can LOAD into a backend, is here (if you > really want to see it): > http://git.wulczer.org/?p=saio.git > Set the saio GUC to true and saio_cutoff to the number of iterations you > want. > > Cheers, > Jan > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >