Re: [sqlite] range enclosing a number

Pavel Ivanov Tue, 14 Jul 2009 04:32:52 -0700

Not to continue argument with Jay but just to express my opinion in comparison:


>  The ORDER/LIMIT approach is much more resilient to changes, however,
>  and should more or less behave the same no matter what you do to the
>  rest of the query.

Seriously, I don't believe this. There's no way to change the query so
that min/max will scan all table and order/limit will take just value
from index. They both will take the same approach in all cases (except
of course old SQLite which I fortunately didn't work with :) ).

>  The ORDER/LIMIT approach is, arguable, less graceful, but
>  it is also (IMHO) a lot easier to break down into logical blocks that
>  a newbie can follow and understand both the design and intent-- even
>  if the query is a bit strung out.

This is where my opinion is exactly opposite: query with min/max is
more readable by newbie just getting introduced to the code - it
clearly states what requester is trying to do: "get minimum among
records with this condition". Order/limit approach requires more
thinking before you clearly understand what was the intention behind
the query - I've struggled myself trying to understand which sign (<
or >) should go with which order by (desc or asc).

But... Here are opinions and they have already become an off-topic
because the OP's case looks like more complicated and couldn't be
solved by simple min/max. And in response to Bogdan's letter:

> Adding the second column as in:select max(ticks),time from time_pair where 
> ticks <= ?;
> seems to work, although I don't understand the GROUP BY comments.

The essence of "GROUP BY comments" is that if you write query this way
then what is returned in time column is implementation dependent (not
all sql engines even support this syntax) and you better think that in
this case value in time column is completely arbitrary and unrelated
to actual data in the table. So if you want to return both columns you
should use either order/limit approach or already mentioned "select *
... where ticks = (select max(ticks) ...)" approach.

Pavel

On Tue, Jul 14, 2009 at 12:35 AM, Jay A. Kreibich<j...@kreibi.ch> wrote:
> On Mon, Jul 13, 2009 at 10:33:00PM -0400, Pavel Ivanov scratched on the wall:
>> Jay, you're pretty much mistaken:
>>
>> >  I'm pretty sure you don't want to do it this way.  What this does is
>> >  gather every row that meets the WHERE condition and then runs a max()
>> >  or min() aggregation function across all of those rows.  That means
>> >  that even if the column "number" has an index on it, between these
>> >  two statements you're going to end up scanning the whole table.
>>
>> All database engines optimize queries which ask for min/max on indexed
>> column with condition including only <, > or = on this very column.
>> And SQLite is among these too:
>
>
>  Not "all."  This type of optimization is actually a fairly new
>  addition to SQLite (considering the product lifetime) and first
>  appeared in 3.5.5, which was released in early 2008.
>
>
>  And I'd still go with ORDER/LIMIT.  Here's why:
>
>
>  For my tests I just used the default build under the current version
>  of Mac OS X, which is a bit old (3.4).  Under that build, the
>  ORDER/LIMIT is clearly faster, as this is before the optimization
>  existed:
>
>  (using the same test set you did)
>
>  Full scan, 3.4:
>  -------------
>  real  0m5.99s
>  user  0m4.73s
>  sys   0m0.84s
>
>  Using ORDER/LIMIT, 3.4:
>  -------------
>  real  0m0.00s
>  user  0m0.01s
>  sys   0m0.00s
>
>  Using min/max, 3.4:
>  -------------
>  real  0m5.97s
>  user  0m2.94s
>  sys   0m0.38s
>
>  In this case, it is clear that min/max are NOT integrated into the
>  optimizer, and requires half a table scan, just as I stated.
>
>  I also have a build of the current 3.6.16 around, and in that case,
>  the numbers are better:
>
>  Using ORDER/LIMIT, 3.6.16
>  -------------
>  real  0m0.12s
>  user  0m0.01s
>  sys   0m0.03s
>
>  Using min/max, 3.6.16
>  -------------
>  real  0m0.04s
>  user  0m0.01s
>  sys   0m0.03s
>
>  This clearly shows that the optimization does exist, and that for
>  this very basic case my assumptions were incorrect.
>
>  With the current 3.6.16 build, using min/max seems a tad faster-- but
>  only in "real" time.  In terms of user/sys times, the results shown
>  here (and you're own numbers, which were 0.043/0.001/0.005 and
>  0.005/0.001/0.001) were pretty typical (i.e. very very close).
>  That might just be an I/O fluke.  We're getting small enough that
>  to really say anything definite requires better profiling.  So
>  there does appear to be a difference, but it is pretty small and
>  unclear where it is coming from.
>
>  However, I'd point out that using ORDER/LIMIT under 3.4 is the
>  fastest of all.  This isn't just a quirk of one run, either.
>  I ran these several times and the 3.4 ORDER/LIMIT was always fastest.
>  We're still playing with number to small to really trust, but it
>  seems that if the 3.6.16 ORDER/LIMIT was as fast as the one in 3.4,
>  it would likely be the best choice of all.
>
>
>  So you've sold me that the current version of SQLite clearly does
>  have the min/max optimization and doesn't require a table scan.  It
>  also appears to be slightly faster, but not by a big enough gap to
>  clearly consider it a better choice on that alone.
>
>
>
>  Personally, I'd still go with ORDER/LIMIT.  With the current version
>  of SQLite the runtimes of both approaches are extremely similar,
>  but the min/max approach depends on the query optimizer being able to
>  take the min/max notation and basically turn it into an internal
>  ORDER BY (look at the code).
>
>  There are a lot of limits on when the optimizer can do this.  If the
>  real-world query is a bit more complex than this example, or if (down
>  the road) the query conditions get changed, or if the query gets more
>  complex in just about anyway, the optimization is going to break with
>  the min/max approach and you'll be stuck with a table scan--
>  something that I think we can all agree is MUCH slower.  If that comes
>  up due to a change in the SQL six months down the road, you're going
>  to be spending a lot of time wondering why things just got so slow.
>  The ORDER/LIMIT approach is much more resilient to changes, however,
>  and should more or less behave the same no matter what you do to the
>  rest of the query.
>
>  The ORDER/LIMIT approach is also the clear winner if there is any
>  possible chance at all you'll be running on older code.  "Older," in
>  this case, is code that was released less than two years ago.  In
>  many production environments, that's not all that long.
>
>  There is also this...
>
>> But of course your other point is true - if you want some other data
>> from table along with min/max value, you need to make additional
>> select in case of using min/max.
>
>  ... which also negates any possible advantage of using the min/max
>  method.  The ORDER/LIMIT approach lets you pull out all the data you
>  might need or want in a single query.
>
>  For the simple case on newer code, both approaches give similar
>  returns.  But the min/max approach will default to a very expensive
>  behavior if it is changed in just about any way, while the
>  ORDER/LIMIT approach is valid for most variations on this example.
>
>
>  I could also go off on a more idealistic tangent, about how I think
>  the ORDER/LIMIT approach is more in the spirit of how SELECT is meant
>  to be used... the min/max approach clearly depends on the query
>  optimizer to do its work, while the ORDER/LIMIT works "by design."
>  I'm never a fan of trusting the query optimizer.  It changes from time
>  to time and sometimes things break.  The min/max approach also
>  depends on things like an implied GROUP BY.  That might be standard
>  SQL, but for someone less versed in SQL it looks like a query that
>  shouldn't even work (and, in fact, doesn't once you add another
>  column).  The ORDER/LIMIT approach is, arguable, less graceful, but
>  it is also (IMHO) a lot easier to break down into logical blocks that
>  a newbie can follow and understand both the design and intent-- even
>  if the query is a bit strung out.  I would say that the min/max
>  approach is more in the spirit of the Relational Model, but that's
>  not what SQL is about and, at least where I work, the mention of
>  "Relational Model" will get you nothing but blank looks from the
>  majority of my co-workers.  It might be a more elegant approach,
>  but six months down the road when some random team member needs to
>  update it, that elegance is going to be totally lost on them.
>  I'd prefer the practical, more bullet proof solution that isn't
>  dependent on the optimizer retaining the exact same behavor.
>  And speaking of changes...  if the current version of SQLite can be
>  made to run the ORDER/LIMIT as fast as the older ones, then
>  ORDER/LIMIT could win in time as well!
>
>  I can go on, but I'll stop there, since at this point we're well into
>  opinions and I need to be doing other things.
>
>   -j
>
> --
> Jay A. Kreibich < J A Y  @  K R E I B I.C H >
>
> "Our opponent is an alien starship packed with atomic bombs.  We have
>  a protractor."   "I'll go home and see if I can scrounge up a ruler
>  and a piece of string."  --from Anathem by Neal Stephenson
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] range enclosing a number

Reply via email to