Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

Jay A. Kreibich Mon, 01 Mar 2010 21:28:40 -0800

On Mon, Mar 01, 2010 at 11:44:51PM +0100, Jean-Christophe Deschamps scratched 
on the wall:
>
>>   So indexes are not used for NOT conditions, as NOT conditions
>>   generally require a full scan, regardless.  Yes, it is a simple
>>   reverse of a binary test, but the reverse of a specific indexed
>>   lookup of a known value is a table scan to gather all the unknown
>>   values.
>
> I understand how your argument goes, but I still have difficulty buying  
> it.  Why not negate the condition itself and then only proceed?


  That seems to be mixing concepts a bit.
  
  Yes, you can try to use algebra to negate the base of a "NOT..."
  condition and transform it into an equivalent condition that lacks a
  secondary negation, like transforming "NOT A > B" into "A <= B".
  That new condition can them be applied to the query, and the query
  engine may then choose to use an index, because the condition now
  defines the set of rows it is looking for, rather than the set of rows
  it is not looking for.

  But, I'd argue that's not really the point, for several reasons.
  First and foremost, the original conversation was about why indexes
  aren't used with "NOT..." conditions.  If you go through this process,
  by the time you actually pass off the expression to the database
  engine, you no longer have a "NOT..." condition.  You may have an
  expression that can be transformed (back) into a "NOT..." condition,
  but that's not the same at the database level.

  Fine, you say, but shouldn't the query optimizer be able to produce
  that transformation?  So the transform is made between the input level
  and the database level?

  Well, I suppose it is possible for simple expressions, but I'd argue
  that really isn't the business of the query optimizer.  At least not
  the one in SQLite.  For one thing, as we've already seen, 3LV and
  those pesky NULLs make this kind of transformation a lot more complex
  than it first looks.  But more to the point, it is only likely to
  work for fairly simple expressions.   For a large number of expressions
  a transformed expression simply does not exist.

  Take, for example, one of the previous expressions you gave:
  "NOT col1 LIKE 'abc%'".  The LIKE operator is defined by the SQL
  function like(), which offers no inverse function.  The optimizations
  that use (or don't) an index are deep inside the code that implements
  the like() function.  You can't simply flip them, you need to start
  over.  And yes, you might be able to devise a new function based off
  the default implementation of like(), maybe even one that pays
  attention to the case_sensitive_like pragma.  You might even be able
  to implement it so that it uses all the same index aware optimizations
  the default like() does.  And what you end up with is an unlike()
  function that you feed into the database engine as a non-"NOT..."
  expression, putting us back to the argument that by the time you go
  through all this, you're not using an index for a "NOT..." expression,
  you're using it for an equivalent, but different expression.

  Perhaps more importantly, the transform isn't systematic, but
  predefined by swapping one function for another that has been
  pre-defined to have an inverse meaning.  And, of course, it can't be
  a simple inverse at a lower layer either.  For an unlike() function
  to have similar efficiency and use an index when possible, the whole
  function needs to be re-written at a much deeper level so that
  similar index aware optimizations can be used and still answer the
  question of like() or unlike().  That's far from simple, even for one
  specific and isolated case.

  Then there's the nasty bit about user-define over-rides for like(),
  or any expression that happens to use a function.  You can't exactly
  expect everyone that writes a function to also write an inverse function.
  (Of course, I'll admit that the majority of such functions can't use
  an index anyways...)

  So the only question left is why the optimizer doesn't do it for
  the simple cases.  And I'd argue that, "Why don't *you* do it?"  It
  hardly seems worth the very complex code (and code size, and runtime
  cost to check and test all this, not to mention development time,
  plus the huge amount of testing to verify it works correctly) just
  to optimize the extremely simple cases on the order of "NOT col1 > 3."
  If performance is a big concern the developer should just write
  different queries (and, very likely, come to appreciate that when
  NULL handling is important, this isn't as easy as it sounds).  It
  might be nice if the optimizer could do it for us, but I'd guess that
  the optimizer would only be able to bite off very simple cases that
  most of us could do in our heads in just a few seconds.
 


  Then the only issue is the knowledge and understanding that a 
  "NOT..." condition may be significantly more expensive than an
  equivalent "base" condition.



  And that's a point worth raising.  I had never thought of it before,
  so the fact you brought it up and asked some good questions and made
  some of us think about a bit deeper and a big harder is a Good Thing.
  Hopefully the next time one of us sees a "NOT..." condition, we'll
  take a moment or two to see if we can jiggle it around the other way.

  <http://google.com/images?q=the%20more%20you%20know>



    -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

Reply via email to