On Sun, September 3, 2006 21:52, Gregory Stark wrote:

> I read that but apparently I misunderstood it since it would not have been
> doable the way I understood it. I thought you wanted the predictor bits to
> correspond to particular plans. If a plan was "wrong" then you marked it
> as a
> bad guess. I don't think that can be made to work though for the reasons I
> stated then.

Oh, sorry--I guess I haven't been too systematic about it.  In the
algorithm's current incarnation, the confidence counters don't mark
anything as bad ideas, as such.  Instead, whenever a new invocation has a
set of parameters that are (1) correctly and (2) confidently predicted by
the counters, the predictor decides that it's probably a good idea to plan
for calls with that particular set of parameters built in as constants.

Assuming we didn't already have a plan for the particular combination of
values at hand, the algorithm generates a new one.  The new plan is
optimized on the assumption that those predicted parameters are constants.

We keep a small cache of recently-used plans, possibly including the
original plan where all parameters are truly variable.  Every plan also
remembers a list of the predicted parameter values, so on any next call,
we can check whether a particular cached plan actually applies to the
call.  If it doesn't (because of a mismatch between the incoming
parameters and the plan's assumed pseudo-constants), we just pick another
plan.

If multiple cached plans can be applied to a given call, we prefer the one
that optimizes away the most parameters.  Age is used as a tiebreaker, on
the assumption that more recent planning information is likely to be more
accurate.

The tricky part is deciding when to generate a new, more specialized plan
when we already have a matching one that may not be optimal.  Without this
step we'd never get beyond that first, completely generic plan--it applies
to every call.  The way I've approached it is this: when the predictor's
current state correctly and confidently predicts more of the invocation's
parameter values than any of the cached plans did, then it's time to
generate a new plan.

So let's say your statement has a parameter x that's always the same
value, say x==0, and another parameter y that's a randomly alternating
Boolean value, and another one z that varies randomly between lots of
values.  What order they're in doesn't matter, and they needn't be the
only parameters.  You're probably going to see four plans generated for
this example:

1. Either on the first call or during definition, you get the generic
plan.  This is the same plan you'd get in the existing backend, with
placeholders for variable x, y, and z.

2. Pretty soon, the algorithm is going to detect that x is always zero. 
It will generate a new plan, substituting the constant value 0 for x, and
hopefully getting better optimization because of it.

3. Sooner or later (probably fairly soon) you'll see a run of consecutive
calls where y happens to be "true."  A new plan is generated with the
assumption that y==true.  The new plan will also still assume that x==0.

4. The same is going to happen for y==false.  Yet another specialized plan
is generated.  If we keep up to 3 plans per statement, say, then this new
plan overflows the cache.  The least recently used plan is flushed to make
room for the new one--in this case, the generic one because we haven't
seen any cases where x!=0 recently.

More complex scenarios will also happen, of course, such as "if y==true
then x will usually be 0, but otherwise x will be highly variable" or "if
y==true then x is pseudo-constant and z is highly variable, but if
y==false then it's the other way around" or "if y==false then z is usually
the empty string," and so on.  The predictor as I've simulated it is "too
dumb to be intimidated" by the complexity.  It should work reasonably well
for all those scenarios, assuming of course that its cache is large enough
to remember the most frequent patterns.

Right now I'm using the plans' time since last use as the only eviction
criterion when the cache overflows.  There may be a better policy; the one
Achilles heel of LRU is the "big loop" where every cache entry is used
once, then evicted shortly before it is needed again.  (If the loop is so
big that entries are flushed long before they're needed again, well, then
it's just a big job and you stop blaming LRU :-)


> But if you have something working clearly that's not what you're doing. So
> what are you doing? Storing up a list of arguments seen for each parameter
> when executed and use the predictor bits to determine if any of those
> arguments are constants? Storing up a list of selectivity estimates?

The former.  I'm keeping a single predictor with a single "more or less
last-seen value" per parameter; plus a checklist of pseudoconstants for
every cached plan.  It's pretty simple, really, with no cost functions or
spanning trees or other intelligent logic--and certainly nothing original.
 Which is what makes me cautiously optimistic: it's not so hard to come up
with good or original ideas, but ones that are good *and* original are
exceedingly rare.  :)

This particular algorithm is based entirely on standard processor
architecture tricks.  One (to me) surprising lesson I learned in that
field was that simple, dynamic schemes with obvious flaws are often more
effective than what the combination of a smart programmer, an expert user,
and an aggressive compiler can come up with beforehand.  Another one was
that performance breakthroughs often come from applying these standard
tricks to problems you'd think they'd already been applied to.


Jeroen



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Reply via email to