On Mon, Mar 01, 2010 at 11:44:51PM +0100, Jean-Christophe Deschamps scratched on the wall: > >> So indexes are not used for NOT conditions, as NOT conditions >> generally require a full scan, regardless. Yes, it is a simple >> reverse of a binary test, but the reverse of a specific indexed >> lookup of a known value is a table scan to gather all the unknown >> values. > > I understand how your argument goes, but I still have difficulty buying > it. Why not negate the condition itself and then only proceed?
That seems to be mixing concepts a bit. Yes, you can try to use algebra to negate the base of a "NOT..." condition and transform it into an equivalent condition that lacks a secondary negation, like transforming "NOT A > B" into "A <= B". That new condition can them be applied to the query, and the query engine may then choose to use an index, because the condition now defines the set of rows it is looking for, rather than the set of rows it is not looking for. But, I'd argue that's not really the point, for several reasons. First and foremost, the original conversation was about why indexes aren't used with "NOT..." conditions. If you go through this process, by the time you actually pass off the expression to the database engine, you no longer have a "NOT..." condition. You may have an expression that can be transformed (back) into a "NOT..." condition, but that's not the same at the database level. Fine, you say, but shouldn't the query optimizer be able to produce that transformation? So the transform is made between the input level and the database level? Well, I suppose it is possible for simple expressions, but I'd argue that really isn't the business of the query optimizer. At least not the one in SQLite. For one thing, as we've already seen, 3LV and those pesky NULLs make this kind of transformation a lot more complex than it first looks. But more to the point, it is only likely to work for fairly simple expressions. For a large number of expressions a transformed expression simply does not exist. Take, for example, one of the previous expressions you gave: "NOT col1 LIKE 'abc%'". The LIKE operator is defined by the SQL function like(), which offers no inverse function. The optimizations that use (or don't) an index are deep inside the code that implements the like() function. You can't simply flip them, you need to start over. And yes, you might be able to devise a new function based off the default implementation of like(), maybe even one that pays attention to the case_sensitive_like pragma. You might even be able to implement it so that it uses all the same index aware optimizations the default like() does. And what you end up with is an unlike() function that you feed into the database engine as a non-"NOT..." expression, putting us back to the argument that by the time you go through all this, you're not using an index for a "NOT..." expression, you're using it for an equivalent, but different expression. Perhaps more importantly, the transform isn't systematic, but predefined by swapping one function for another that has been pre-defined to have an inverse meaning. And, of course, it can't be a simple inverse at a lower layer either. For an unlike() function to have similar efficiency and use an index when possible, the whole function needs to be re-written at a much deeper level so that similar index aware optimizations can be used and still answer the question of like() or unlike(). That's far from simple, even for one specific and isolated case. Then there's the nasty bit about user-define over-rides for like(), or any expression that happens to use a function. You can't exactly expect everyone that writes a function to also write an inverse function. (Of course, I'll admit that the majority of such functions can't use an index anyways...) So the only question left is why the optimizer doesn't do it for the simple cases. And I'd argue that, "Why don't *you* do it?" It hardly seems worth the very complex code (and code size, and runtime cost to check and test all this, not to mention development time, plus the huge amount of testing to verify it works correctly) just to optimize the extremely simple cases on the order of "NOT col1 > 3." If performance is a big concern the developer should just write different queries (and, very likely, come to appreciate that when NULL handling is important, this isn't as easy as it sounds). It might be nice if the optimizer could do it for us, but I'd guess that the optimizer would only be able to bite off very simple cases that most of us could do in our heads in just a few seconds. Then the only issue is the knowledge and understanding that a "NOT..." condition may be significantly more expensive than an equivalent "base" condition. And that's a point worth raising. I had never thought of it before, so the fact you brought it up and asked some good questions and made some of us think about a bit deeper and a big harder is a Good Thing. Hopefully the next time one of us sees a "NOT..." condition, we'll take a moment or two to see if we can jiggle it around the other way. <http://google.com/images?q=the%20more%20you%20know> -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users