On 5/15/2013 6:25 PM, Michael Kay wrote:
[about the optimizer that it's]
making pure guesses based on observed behaviour rather than hard data - and by doing so, is reinforcing that behaviour. It's a black art.)

This is very insightful. We tend to think of the optimizer as "go-faster sauce," and often underestimate the impact that optimizers have, or should have, on program design, when performance is critical.

A familiar (to me) example of this is which indexes get built in persistent data stores. MarkLogic, eg, builds automatic indexes on all element names + words/values, and on all element/attribute name pairs + words/values, and these enable all kinds of optimizations. But they aren't always the best choices. One thing we've had to grapple with is customers who use a particular attribute (id comes to mind) that can appear on any element, and be a target of cross-references. In that case, we'd really want an index on all attributes named "id", regardless of the element name they're attached to. The ML indexes really do enforce a particular style of markup (if you want good performance easily). As another example, we tend to advise ML customers against having the same-named element in different contexts since they aren't as easily indexed. I don't mean to beat up on ML here which now offers XPath-based indexes, just like eXist! this is more in the way of illustrating a broader point:

I wonder how much schema design has been / will be influenced by the availability of various optimizations (and indexing options) in such systems, and to what extent these schemas will be more or less tuned to the indexing options available on the platform where they were first used. Has there ever been any sort of attempt to study which kinds of indexes are most effective across some wide swath of use cases? I can't imagine how one would gather enough meaningful cases for that, so perhaps its a mere pipe dream. By the same token, has there been any attempt to standardize the specification of XML indexes, as we have for SQL indexes? I guess we have the example of xsl:key -- that's really the only standard I know of.

To echo what Daniela said in an earlier message in this thread, I think the key to helping users work with optimizers is to make it apparent to the user what optimizations are being performed (if they ask), so they can tell whether the optimizer is working for or against them, and to provide tools for the user to specify particular optimizations, or to constrain the optimizer, at least in critical decisions. There are probably too many details to expose everything, but in particular in the case of indexing optimizations, the correct (or incorrect) choice can have such an overwhelming effect on performance that it is really important to give the user the ability to understand and control the execution plan.

Query plans can often be opaque and difficult for all but the most expert users to understand, though. This has historically been true for SQL query plans, as well, although I think visualization tools can sometimes help. I like the approach of expressing all query optimizations as built-in functions. In this way, an optimized query is just another query in the same language the user is familiar with, albeit with some special-purpose functions they have to learn in order to understand what the optimizations are.

-Mike
_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk

Reply via email to